Re: SparkSQL + Tableau Connector

2015-02-19 Thread Silvio Fiorito
Great, glad it worked out!

From: Todd Nist
Date: Thursday, February 19, 2015 at 9:19 AM
To: Silvio Fiorito
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL + Tableau Connector

Hi Silvio,

I got this working today using your suggestion with the Initial SQL and a 
Custom Query.  See here for details:

http://stackoverflow.com/questions/28403664/connect-to-existing-hive-in-intellij-using-sbt-as-build/28608608#28608608

It is not ideal as I need to write a custom query, but does work for now.  I 
also have it working by doing a SaveAsTable on the ingested data which stores 
the reference into the metastore for access via the thrift server.

Thanks for the help.

-Todd

On Wed, Feb 11, 2015 at 8:41 PM, Silvio Fiorito 
silvio.fior...@granturing.commailto:silvio.fior...@granturing.com wrote:
Hey Todd,

I don’t have an app to test against the thrift server, are you able to define 
custom SQL without using Tableau’s schema query? I guess it’s not possible to 
just use SparkSQL temp tables, you may have to use permanent Hive tables that 
are actually in the metastore so Tableau can discover them in the schema. In 
that case you will either have to generate the Hive tables externally from 
Spark or use Spark to process the data and save them using a HiveContext.


From: Todd Nist
Date: Wednesday, February 11, 2015 at 7:53 PM
To: Andrew Lee
Cc: Arush Kharbanda, user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL + Tableau Connector

First sorry for the long post.  So back to tableau and Spark SQL, I'm still 
missing something.

TL;DR

To get the Spark SQL Temp table associated with the metastore are there 
additional steps required beyond doing the below?

Initial SQL on connection:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/json/*');

cache table test;

I feel like I'm missing a step of associating the Spark SQL table with the 
metastore, do I need to actually save it in some fashion?   I'm trying to avoid 
saving to hive if possible.

Details:

I configured the hive-site.xml and placed it in the $SPARK_HOME/conf.  It looks 
like this, thanks Andrew and Arush for the assistance:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

configuration
  property
namehive.semantic.analyzer.factory.impl/name
valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value
  /property

  property
namehive.metastore.sasl.enabled/name
valuefalse/value
  /property

  property
namehive.server2.authentication/name
valueNONE/value
  /property

  property
namehive.server2.enable.doAs/name
valuetrue/value
  /property

  !--
  property
namehive.metastore.uris/name
valuethrift://localhost:9083/value
descriptionIP address (or fully-qualified domain name) and port of the 
metastore host/description
  /property
  --

  property
namehive.warehouse.subdir.inherit.perms/name
valuetrue/value
  /property

  property
namehive.metastore.schema.verification/name
valuefalse/value
  /property

  property
namejavax.jdo.option.ConnectionURL/name

valuejdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value
descriptionmetadata is stored in a MySQL server/description
  /property

  property
namejavax.jdo.option.ConnectionDriverName/name
valuecom.mysql.jdbc.Driver/value
descriptionMySQL JDBC driver class/description
  /property

  property
namejavax.jdo.option.ConnectionUserName/name
valuehiveuser/value
  /property

  property
namejavax.jdo.option.ConnectionPassword/name
valuehiveuser/value
  /property

/configuration

When I start the server it looks fine:

$ ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001 
--hiveconf hive.server2.thrift.bind.host radtech.iohttp://radtech.io 
--master spark://radtech.io:7077http://radtech.io:7077 --driver-class-path 
/usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/usr/local/spark-1.2.1-bin-hadoop2.4/logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
radtech:spark tnist$ tail -f 
logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
15/02/11 19:15:24 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150211191524-0008/1 on hostPort 
192.168.1.2:50851http://192.168.1.2:50851 with 2 cores, 512.0 MB RAM
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now LOADING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/1 is now LOADING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now RUNNING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/1 is now RUNNING
15/02/11 19:15:24 INFO NettyBlockTransferService: Server created on 50938
15/02/11 19

Re: SparkSQL + Tableau Connector

2015-02-19 Thread Todd Nist
Hi Silvio,

I got this working today using your suggestion with the Initial SQL and a
Custom Query.  See here for details:

http://stackoverflow.com/questions/28403664/connect-to-existing-hive-in-intellij-using-sbt-as-build/28608608#28608608

It is not ideal as I need to write a custom query, but does work for now.
I also have it working by doing a SaveAsTable on the ingested data which
stores the reference into the metastore for access via the thrift server.

Thanks for the help.

-Todd

On Wed, Feb 11, 2015 at 8:41 PM, Silvio Fiorito 
silvio.fior...@granturing.com wrote:

   Hey Todd,

  I don’t have an app to test against the thrift server, are you able to
 define custom SQL without using Tableau’s schema query? I guess it’s not
 possible to just use SparkSQL temp tables, you may have to use permanent
 Hive tables that are actually in the metastore so Tableau can discover them
 in the schema. In that case you will either have to generate the Hive
 tables externally from Spark or use Spark to process the data and save them
 using a HiveContext.


From: Todd Nist
 Date: Wednesday, February 11, 2015 at 7:53 PM
 To: Andrew Lee
 Cc: Arush Kharbanda, user@spark.apache.org
 Subject: Re: SparkSQL + Tableau Connector

   First sorry for the long post.  So back to tableau and Spark SQL, I'm
 still missing something.

  TL;DR

  To get the Spark SQL Temp table associated with the metastore are there
 additional steps required beyond doing the below?

 Initial SQL on connection:

  create temporary table test
 using org.apache.spark.sql.json
 options (path '/data/json/*');

 cache table test;

  I feel like I'm missing a step of associating the Spark SQL table with
 the metastore, do I need to actually save it in some fashion?   I'm trying
 to avoid saving to hive if possible.

  *Details:*

  I configured the hive-site.xml and placed it in the $SPARK_HOME/conf.
 It looks like this, thanks Andrew and Arush for the assistance:

  ?xml version=1.0?
 ?xml-stylesheet type=text/xsl href=configuration.xsl?

  configuration
   property
 namehive.semantic.analyzer.factory.impl/name
 valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value
   /property

property
 namehive.metastore.sasl.enabled/name
 valuefalse/value
   /property

property
 namehive.server2.authentication/name
 valueNONE/value
   /property

property
 namehive.server2.enable.doAs/name
 valuetrue/value
   /property

!--
   property
 namehive.metastore.uris/name
 valuethrift://localhost:9083/value
 descriptionIP address (or fully-qualified domain name) and port of
 the metastore host/description
   /property
   --

property
 namehive.warehouse.subdir.inherit.perms/name
 valuetrue/value
   /property

property
 namehive.metastore.schema.verification/name
 valuefalse/value
   /property

property
 namejavax.jdo.option.ConnectionURL/name

 valuejdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value
 descriptionmetadata is stored in a MySQL server/description
   /property

property
 namejavax.jdo.option.ConnectionDriverName/name
 valuecom.mysql.jdbc.Driver/value
 descriptionMySQL JDBC driver class/description
   /property

property
 namejavax.jdo.option.ConnectionUserName/name
 valuehiveuser/value
   /property

property
 namejavax.jdo.option.ConnectionPassword/name
 valuehiveuser/value
   /property

  /configuration

 When I start the server it looks fine:

  $ ./sbin/start-thriftserver.sh --hiveconf
 hive.server2.thrift.port=10001 --hiveconf hive.server2.thrift.bind.host
 radtech.io --master spark://radtech.io:7077 --driver-class-path
 /usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar
 starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging
 to
 /usr/local/spark-1.2.1-bin-hadoop2.4/logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
 radtech:spark tnist$ tail -f
 logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
 15/02/11 19:15:24 INFO SparkDeploySchedulerBackend: Granted executor ID
 app-20150211191524-0008/1 on hostPort 192.168.1.2:50851 with 2 cores,
 512.0 MB RAM
 15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated:
 app-20150211191524-0008/0 is now LOADING
 15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated:
 app-20150211191524-0008/1 is now LOADING
 15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated:
 app-20150211191524-0008/0 is now RUNNING
 15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated:
 app-20150211191524-0008/1 is now RUNNING
 15/02/11 19:15:24 INFO NettyBlockTransferService: Server created on 50938
 15/02/11 19:15:24 INFO BlockManagerMaster: Trying to register BlockManager
 15/02/11 19:15:24 INFO BlockManagerMasterActor: Registering block manager
 192.168.1.2:50938 with 265.1 MB RAM, BlockManagerId(driver,
 192.168.1.2, 50938)
 15/02

RE: SparkSQL + Tableau Connector

2015-02-17 Thread Andrew Lee
Hi Todd,
When I see /data/json appears in your log, I have a feeling that that is the 
default hive.metastore.warehouse.dir from hive-site.xml where the value is 
/data/.  Could you check that property and see if you can point that to the 
correct Hive table HDFS directory?
Another thing to look at is the Hive metastore mysql database if you are using 
mysql as the DB for metastore.
Date: Wed, 11 Feb 2015 19:53:35 -0500
Subject: Re: SparkSQL + Tableau Connector
From: tsind...@gmail.com
To: alee...@hotmail.com
CC: ar...@sigmoidanalytics.com; user@spark.apache.org

First sorry for the long post.  So back to tableau and Spark SQL, I'm still 
missing something.  
TL;DR
To get the Spark SQL Temp table associated with the metastore are there 
additional steps required beyond doing the below?

Initial SQL on connection:








create temporary table test 
using org.apache.spark.sql.json  
options (path '/data/json/*');
cache table test;

I feel like I'm missing a step of associating the Spark SQL table with the 
metastore, do I need to actually save it in some fashion?   I'm trying to avoid 
saving to hive if possible.
Details:
I configured the hive-site.xml and placed it in the $SPARK_HOME/conf.  It looks 
like this, thanks Andrew and Arush for the assistance:
?xml version=1.0??xml-stylesheet type=text/xsl href=configuration.xsl?
configuration  propertynamehive.semantic.analyzer.factory.impl/name 
   valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value  
/property
  propertynamehive.metastore.sasl.enabled/name
valuefalse/value  /property
  propertynamehive.server2.authentication/namevalueNONE/value 
 /property
  propertynamehive.server2.enable.doAs/namevaluetrue/value  
/property
  !--  propertynamehive.metastore.uris/name
valuethrift://localhost:9083/valuedescriptionIP address (or 
fully-qualified domain name) and port of the metastore host/description  
/property  --
  propertynamehive.warehouse.subdir.inherit.perms/name
valuetrue/value  /property
  propertynamehive.metastore.schema.verification/name
valuefalse/value  /property
  propertynamejavax.jdo.option.ConnectionURL/name
valuejdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value
descriptionmetadata is stored in a MySQL server/description  /property
  propertynamejavax.jdo.option.ConnectionDriverName/name
valuecom.mysql.jdbc.Driver/valuedescriptionMySQL JDBC driver 
class/description  /property
  propertynamejavax.jdo.option.ConnectionUserName/name
valuehiveuser/value  /property
  propertynamejavax.jdo.option.ConnectionPassword/name
valuehiveuser/value  /property
/configuration
When I start the server it looks fine:
$ ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001 
--hiveconf hive.server2.thrift.bind.host radtech.io --master 
spark://radtech.io:7077 --driver-class-path 
/usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jarstarting 
org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/usr/local/spark-1.2.1-bin-hadoop2.4/logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.outradtech:spark
 tnist$ tail -f 
logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out15/02/11
 19:15:24 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150211191524-0008/1 on hostPort 192.168.1.2:50851 with 2 cores, 512.0 MB 
RAM15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now LOADING15/02/11 19:15:24 INFO 
AppClient$ClientActor: Executor updated: app-20150211191524-0008/1 is now 
LOADING15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now RUNNING15/02/11 19:15:24 INFO 
AppClient$ClientActor: Executor updated: app-20150211191524-0008/1 is now 
RUNNING15/02/11 19:15:24 INFO NettyBlockTransferService: Server created on 
5093815/02/11 19:15:24 INFO BlockManagerMaster: Trying to register 
BlockManager15/02/11 19:15:24 INFO BlockManagerMasterActor: Registering block 
manager 192.168.1.2:50938 with 265.1 MB RAM, BlockManagerId(driver, 
192.168.1.2, 50938)15/02/11 19:15:24 INFO BlockManagerMaster: Registered 
BlockManager15/02/11 19:15:25 INFO SparkDeploySchedulerBackend: 
SchedulerBackend is ready for scheduling beginning after reached 
minRegisteredResourcesRatio: 0.015/02/11 19:15:25 INFO HiveMetaStore: 0: 
Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore15/02/11 19:15:25 INFO 
ObjectStore: ObjectStore, initialize called15/02/11 19:15:26 INFO Persistence: 
Property hive.metastore.integral.jdo.pushdown unknown - will be 
ignored15/02/11 19:15:26 INFO Persistence: Property datanucleus.cache.level2 
unknown - will be ignored15/02/11 19:15:26 WARN Connection: BoneCP specified 
but not present in CLASSPATH (or one of dependencies)15/02/11 19:15:26 WARN 
Connection: BoneCP

RE: SparkSQL + Tableau Connector

2015-02-11 Thread Andrew Lee
Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read the 
logs since there were other activities going on on the cluster.

From: alee...@hotmail.com
To: ar...@sigmoidanalytics.com; tsind...@gmail.com
CC: user@spark.apache.org
Subject: RE: SparkSQL + Tableau Connector
Date: Wed, 11 Feb 2015 11:56:44 -0800




I'm using mysql as the metastore DB with Spark 1.2.I simply copy the 
hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in 
/etc/spark/, everything works fine now.
My setup looks like this.
Tableau = Spark ThriftServer2 = HiveServer2
It's talking to Tableau Desktop 8.3. Interestingly, when I query a Hive table, 
it still invokes Hive queries to HiveServer2 which is running MR or Tez engine. 
 Is this expected?  
I thought it should at least use the catalyst engine and talk to the underlying 
HDFS like what HiveContext API does to pull in the data into RDD.  Did I 
misunderstood the purpose of Spark ThriftServer2?


Date: Wed, 11 Feb 2015 16:07:40 +0530
Subject: Re: SparkSQL + Tableau Connector
From: ar...@sigmoidanalytics.com
To: tsind...@gmail.com
CC: user@spark.apache.org

Hi
I used this, though its using a embedded driver and is not a good approch.It 
works. You can configure for some other metastore type also. I have not tried 
the metastore uri's.








configuration




property

  namejavax.jdo.option.ConnectionURL/name

  
valuejdbc:derby:;databaseName=/opt/bigdata/spark-1.2.0/metastore_db;create=true/value

  descriptionURL for the DB/description

/property




property

  namejavax.jdo.option.ConnectionDriverName/name

  valueorg.apache.derby.jdbc.EmbeddedDriver/value

/property




!-- property

  namehive.metastore.uris/name

  valuethrift://x.x.x.x:1/value

  descriptionIP address (or fully-qualified domain name) and port of the 
metastore host/description

/property --

/configuration

On Wed, Feb 11, 2015 at 3:59 PM, Todd Nist tsind...@gmail.com wrote:
Hi Arush,
So yes I want to create the tables through Spark SQL.  I have placed the 
hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was 
all I should need to do to have the thriftserver use it.  Perhaps my 
hive-site.xml is worng, it currently looks like this:

configurationproperty  namehive.metastore.uris/name  !-- Ensure that 
the following statement points to the Hive Metastore URI in your cluster --  
valuethrift://sandbox.hortonworks.com:9083/value  descriptionURI for 
client to contact metastore server/description/property/configuration
Which leads me to believe it is going to pull form the thriftserver from 
Horton?  I will go look at the docs to see if this is right, it is what Horton 
says to do.  Do you have an example hive-site.xml by chance that works with 
Spark SQL?
I am using 8.3 of tableau with the SparkSQL Connector.
Thanks for the assistance.
-Todd
On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda ar...@sigmoidanalytics.com 
wrote:
BTW what tableau connector are you using?
On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda ar...@sigmoidanalytics.com 
wrote:
 I am a little confused here, why do you want to create the tables in hive. You 
want to create the tables in spark-sql, right?
If you are not able to find the same tables through tableau then thrift is 
connecting to a diffrent metastore than your spark-shell.
One way to specify a metstore to thrift is to provide the path to hive-site.xml 
while starting thrift using --files hive-site.xml.
similarly you can specify the same metastore to your spark-submit or 
sharp-shell using the same option.



On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote:
Arush,
As for #2 do you mean something like this from the docs:

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value STRING))
sqlContext.sql(LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 
INTO TABLE src)

// Queries are expressed in HiveQL
sqlContext.sql(FROM src SELECT key, value).collect().foreach(println)Or did 
you have something else in mind?
-Todd

On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:
Arush,
Thank you will take a look at that approach in the morning.  I sort of figured 
the answer to #1 was NO and that I would need to do 2 and 3 thanks for 
clarifying it for me.
-Todd
On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda ar...@sigmoidanalytics.com 
wrote:
1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON 
files? NO

2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive? Create a table in spark sql to expose via spark sql

3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2 you would need to configure thrift to 
read from the metastore you expect it read from - by default it reads from 
metastore_db directory present in the directory used to launch

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Silvio Fiorito
Hey Todd,

I don’t have an app to test against the thrift server, are you able to define 
custom SQL without using Tableau’s schema query? I guess it’s not possible to 
just use SparkSQL temp tables, you may have to use permanent Hive tables that 
are actually in the metastore so Tableau can discover them in the schema. In 
that case you will either have to generate the Hive tables externally from 
Spark or use Spark to process the data and save them using a HiveContext.


From: Todd Nist
Date: Wednesday, February 11, 2015 at 7:53 PM
To: Andrew Lee
Cc: Arush Kharbanda, user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL + Tableau Connector

First sorry for the long post.  So back to tableau and Spark SQL, I'm still 
missing something.

TL;DR

To get the Spark SQL Temp table associated with the metastore are there 
additional steps required beyond doing the below?

Initial SQL on connection:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/json/*');

cache table test;

I feel like I'm missing a step of associating the Spark SQL table with the 
metastore, do I need to actually save it in some fashion?   I'm trying to avoid 
saving to hive if possible.

Details:

I configured the hive-site.xml and placed it in the $SPARK_HOME/conf.  It looks 
like this, thanks Andrew and Arush for the assistance:

?xml version=1.0?
?xml-stylesheet type=text/xsl href=configuration.xsl?

configuration
  property
namehive.semantic.analyzer.factory.impl/name
valueorg.apache.hcatalog.cli.HCatSemanticAnalyzerFactory/value
  /property

  property
namehive.metastore.sasl.enabled/name
valuefalse/value
  /property

  property
namehive.server2.authentication/name
valueNONE/value
  /property

  property
namehive.server2.enable.doAs/name
valuetrue/value
  /property

  !--
  property
namehive.metastore.uris/name
valuethrift://localhost:9083/value
descriptionIP address (or fully-qualified domain name) and port of the 
metastore host/description
  /property
  --

  property
namehive.warehouse.subdir.inherit.perms/name
valuetrue/value
  /property

  property
namehive.metastore.schema.verification/name
valuefalse/value
  /property

  property
namejavax.jdo.option.ConnectionURL/name

valuejdbc:mysql://localhost:3306/metastore_db?createDatabaseIfNotExist=true/value
descriptionmetadata is stored in a MySQL server/description
  /property

  property
namejavax.jdo.option.ConnectionDriverName/name
valuecom.mysql.jdbc.Driver/value
descriptionMySQL JDBC driver class/description
  /property

  property
namejavax.jdo.option.ConnectionUserName/name
valuehiveuser/value
  /property

  property
namejavax.jdo.option.ConnectionPassword/name
valuehiveuser/value
  /property

/configuration

When I start the server it looks fine:

$ ./sbin/start-thriftserver.sh --hiveconf hive.server2.thrift.port=10001 
--hiveconf hive.server2.thrift.bind.host radtech.iohttp://radtech.io 
--master spark://radtech.io:7077http://radtech.io:7077 --driver-class-path 
/usr/local/spark/lib/mysql-connector-java-5.1.34-bin.jar
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to 
/usr/local/spark-1.2.1-bin-hadoop2.4/logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
radtech:spark tnist$ tail -f 
logs/spark-tnist-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-radtech.io.out
15/02/11 19:15:24 INFO SparkDeploySchedulerBackend: Granted executor ID 
app-20150211191524-0008/1 on hostPort 
192.168.1.2:50851http://192.168.1.2:50851 with 2 cores, 512.0 MB RAM
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now LOADING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/1 is now LOADING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/0 is now RUNNING
15/02/11 19:15:24 INFO AppClient$ClientActor: Executor updated: 
app-20150211191524-0008/1 is now RUNNING
15/02/11 19:15:24 INFO NettyBlockTransferService: Server created on 50938
15/02/11 19:15:24 INFO BlockManagerMaster: Trying to register BlockManager
15/02/11 19:15:24 INFO BlockManagerMasterActor: Registering block manager 
192.168.1.2:50938http://192.168.1.2:50938 with 265.1 MB RAM, 
BlockManagerId(driver, 192.168.1.2, 50938)
15/02/11 19:15:24 INFO BlockManagerMaster: Registered BlockManager
15/02/11 19:15:25 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready 
for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
15/02/11 19:15:25 INFO HiveMetaStore: 0: Opening raw store with implemenation 
class:org.apache.hadoop.hive.metastore.ObjectStore
15/02/11 19:15:25 INFO ObjectStore: ObjectStore, initialize called
15/02/11 19:15:26 INFO Persistence: Property 
hive.metastore.integral.jdo.pushdown unknown - will be ignored
15/02/11 19:15:26 INFO Persistence: Property

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
 class:org.apache.hadoop.hive.metastore.ObjectStore
15/02/11 19:25:35 INFO ObjectStore: ObjectStore, initialize called
15/02/11 19:25:36 INFO Query: Reading in results for query
org.datanucleus.store.rdbms.query.SQLQuery@0 since the connection used is
closing
15/02/11 19:25:36 INFO ObjectStore: Initialized ObjectStore
15/02/11 19:25:36 INFO HiveMetaStore: 3: get_database: default
15/02/11 19:25:36 INFO audit: ugi=anonymous ip=unknown-ip-addr
cmd=get_database: default
15/02/11 19:25:36 INFO Driver: OK

*15/02/11 19:25:36 INFO SparkExecuteStatementOperation: Running query
'create temporary table test*
*using org.apache.spark.sql.json**options (path ‘/data/json/*')*'



15/02/11 19:25:38 INFO Driver: Starting command: use `default`
15/02/11 19:25:38 INFO HiveMetaStore: 4: get_database: default
15/02/11 19:25:38 INFO audit: ugi=anonymous ip=unknown-ip-addr
cmd=get_database: default
15/02/11 19:25:38 INFO HiveMetaStore: 4: Opening raw store with
implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
15/02/11 19:25:38 INFO ObjectStore: ObjectStore, initialize called
15/02/11 19:25:38 INFO Query: Reading in results for query
org.datanucleus.store.rdbms.query.SQLQuery@0 since the connection used is
closing
15/02/11 19:25:38 INFO ObjectStore: Initialized ObjectStore
15/02/11 19:25:38 INFO HiveMetaStore: 4: get_database: default
15/02/11 19:25:38 INFO audit: ugi=anonymous ip=unknown-ip-addr
cmd=get_database: default
15/02/11 19:25:38 INFO Driver: OK
15/02/11 19:25:38 INFO SparkExecuteStatementOperation: Running query '
*cache table test* '
15/02/11 19:25:38 INFO MemoryStore: ensureFreeSpace(211383) called with
curMem=101514, maxMem=278019440
15/02/11 19:25:38 INFO MemoryStore: Block broadcast_2 stored as values in
memory (estimated size 206.4 KB, free 264.8 MB)

I see no way in Tableau to see the cached table test.  I think I am
missing a step of associating the generated temp table from Spark SQL with
the metastore.  Any guidance or insights on what I'm missing here.

Thanks for the assistance.

-Todd

On Wed, Feb 11, 2015 at 3:20 PM, Andrew Lee alee...@hotmail.com wrote:

 Sorry folks, it is executing Spark jobs instead of Hive jobs. I mis-read
 the logs since there were other activities going on on the cluster.

 --
 From: alee...@hotmail.com
 To: ar...@sigmoidanalytics.com; tsind...@gmail.com
 CC: user@spark.apache.org
 Subject: RE: SparkSQL + Tableau Connector
 Date: Wed, 11 Feb 2015 11:56:44 -0800


 I'm using mysql as the metastore DB with Spark 1.2.
 I simply copy the hive-site.xml to /etc/spark/ and added the mysql JDBC
 JAR to spark-env.sh in /etc/spark/, everything works fine now.

 My setup looks like this.

 Tableau = Spark ThriftServer2 = HiveServer2

 It's talking to Tableau Desktop 8.3. Interestingly, when I query a Hive
 table, it still invokes Hive queries to HiveServer2 which is running MR or
 Tez engine.  Is this expected?

 I thought it should at least use the catalyst engine and talk to the
 underlying HDFS like what HiveContext API does to pull in the data into
 RDD.  Did I misunderstood the purpose of Spark ThriftServer2?



 --
 Date: Wed, 11 Feb 2015 16:07:40 +0530
 Subject: Re: SparkSQL + Tableau Connector
 From: ar...@sigmoidanalytics.com
 To: tsind...@gmail.com
 CC: user@spark.apache.org

 Hi

 I used this, though its using a embedded driver and is not a good
 approch.It works. You can configure for some other metastore type also. I
 have not tried the metastore uri's.

 configuration


 property
   namejavax.jdo.option.ConnectionURL/name

 valuejdbc:derby:;databaseName=/opt/bigdata/spark-1.2.0/metastore_db;create=true/value
   descriptionURL for the DB/description
 /property


 property
   namejavax.jdo.option.ConnectionDriverName/name
   valueorg.apache.derby.jdbc.EmbeddedDriver/value
 /property


 !-- property
   namehive.metastore.uris/name
   valuethrift://x.x.x.x:1 http://172.17.1.172:1//value
   descriptionIP address (or fully-qualified domain name) and port of the
 metastore host/description
 /property --
 /configuration

 On Wed, Feb 11, 2015 at 3:59 PM, Todd Nist tsind...@gmail.com wrote:

 Hi Arush,

 So yes I want to create the tables through Spark SQL.  I have placed the
 hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that
 was all I should need to do to have the thriftserver use it.  Perhaps my
 hive-site.xml is worng, it currently looks like this:

 configuration
 property
   namehive.metastore.uris/name
   !-- Ensure that the following statement points to the Hive Metastore
 URI in your cluster --
   valuethrift://sandbox.hortonworks.com:9083/value
   descriptionURI for client to contact metastore server/description
 /property
 /configuration

 Which leads me to believe it is going to pull form the thriftserver from
 Horton?  I will go look at the docs to see if this is right, it is what
 Horton says to do.  Do you have an example hive-site.xml by chance that
 works with Spark SQL?

 I

Re: SparkSQL + Tableau Connector

2015-02-11 Thread Todd Nist
Hi Arush,

So yes I want to create the tables through Spark SQL.  I have placed the
hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that
was all I should need to do to have the thriftserver use it.  Perhaps my
hive-site.xml is worng, it currently looks like this:

configuration
property
  namehive.metastore.uris/name
  !-- Ensure that the following statement points to the Hive Metastore URI
in your cluster --
  valuethrift://sandbox.hortonworks.com:9083/value
  descriptionURI for client to contact metastore server/description
/property
/configuration

Which leads me to believe it is going to pull form the thriftserver from
Horton?  I will go look at the docs to see if this is right, it is what
Horton says to do.  Do you have an example hive-site.xml by chance that
works with Spark SQL?

I am using 8.3 of tableau with the SparkSQL Connector.

Thanks for the assistance.

-Todd

On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda ar...@sigmoidanalytics.com
 wrote:

 BTW what tableau connector are you using?

 On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda 
 ar...@sigmoidanalytics.com wrote:

  I am a little confused here, why do you want to create the tables in
 hive. You want to create the tables in spark-sql, right?

 If you are not able to find the same tables through tableau then thrift
 is connecting to a diffrent metastore than your spark-shell.

 One way to specify a metstore to thrift is to provide the path to
 hive-site.xml while starting thrift using --files hive-site.xml.

 similarly you can specify the same metastore to your spark-submit or
 sharp-shell using the same option.



 On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 As for #2 do you mean something like this from the docs:

 // sc is an existing SparkContext.val sqlContext = new 
 org.apache.spark.sql.hive.HiveContext(sc)
 sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
 STRING))sqlContext.sql(LOAD DATA LOCAL INPATH 
 'examples/src/main/resources/kv1.txt' INTO TABLE src)
 // Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key, 
 value).collect().foreach(println)

 Or did you have something else in mind?

 -Todd


 On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 Thank you will take a look at that approach in the morning.  I sort of
 figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
 for clarifying it for me.

 -Todd

 On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda 
 ar...@sigmoidanalytics.com wrote:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files? NO
 2.  Do I need to do something to expose these via hive / metastore
 other than creating a table in hive? Create a table in spark sql to expose
 via spark sql
 3.  Does the thriftserver need to be configured to expose these in
 some fashion, sort of related to question 2 you would need to configure
 thrift to read from the metastore you expect it read from - by default it
 reads from metastore_db directory present in the directory used to launch
 the thrift server.
  On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to
 SparkSQL is able to access.  My understanding is it needs to connect to 
 the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / 
 hive.


 For example, I do the following from the spark-shell which generates
 a schemaRDD from a csv file and saves it as a JSON file as well as a
 parquet file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files?
 2.  Do I need to do something to expose these via hive / metastore
 other than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in
 some fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd






 --

 [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

 *Arush Kharbanda* || Technical Teamlead

 ar...@sigmoidanalytics.com || www.sigmoidanalytics.com




 --

 [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

 *Arush Kharbanda* || Technical Teamlead

 ar...@sigmoidanalytics.com || www.sigmoidanalytics.com



Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Hi Silvio,

So the Initial SQL is executing now, I did not have the * added that
and it worked fine. FWIW the * is not needed for the parquet files:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/*')
;

cache table test;

select count(1) from test;
Unfortunately while the table is created and cached, i can see the
statements being executed in the log file of spark, it is not associated
with any schema at least that is being picked up by the Tableau Connector.
So unless there is someway to associate it with a given schema I think I'm
at a dead end on this one.  Anything I may be missing here?

Thanks for the help, it is much appreciated.  I will give Arush suggestion
a try tomorrow.

-Todd

On Tue, Feb 10, 2015 at 7:24 PM, Silvio Fiorito 
silvio.fior...@granturing.com wrote:

  Todd,

  I just tried it in bin/spark-sql shell. I created a folder *json *and
 just put 2 copies of the same people.json file

  This is what I ran:

  spark-sql create temporary table people
   using org.apache.spark.sql.json
   options (path 'examples/src/main/resources/json/*')
   ;
 Time taken: 0.34 seconds

 spark-sql select * from people;
 NULLMichael
 30  Andy
 19  Justin
 NULLMichael
 30  Andy
 19  Justin
 Time taken: 0.576 seconds

   From: Todd Nist
 Date: Tuesday, February 10, 2015 at 6:49 PM
 To: Silvio Fiorito
 Cc: user@spark.apache.org
 Subject: Re: SparkSQL + Tableau Connector

   Hi Silvio,

  Ah, I like that, there is a section in Tableau for Initial SQL to be
 executed upon connecting this would fit well there.  I guess I will need to
 issue a collect(), coalesce(1,true).saveAsTextFile(...) or use
 repartition(1), as the file currently is being broken into multiple parts.
   While this works in the spark-shell:

  val test = sqlContext.jsonFile(/data/out/“)  // returs all parts back
 as one
  It seems to fail in just spark-sql:

  create temporary table test
 using org.apache.spark.sql.json
 options (path '/data/out/')
 cache table test

  with:

 [Simba][SparkODBC] (35) Error from Spark: error code: '0' error message:
 'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create
 temporary table test using
 org.apache.spark.sql.json
 options (path '/data/out/')
 cache table test'.

  Initial SQL Error. Check that the syntax is correct and that you have
 access privileges to the requested database.

  Thanks again for the suggestion and I will give work with it a bit more
 tomorrow.

  -Todd



 On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito 
 silvio.fior...@granturing.com wrote:

   Hi Todd,

  What you could do is run some SparkSQL commands immediately after the
 Thrift server starts up. Or does Tableau have some init SQL commands you
 could run?


  You can actually load data using SQL, such as:

  create temporary table people using org.apache.spark.sql.json options
 (path 'examples/src/main/resources/people.json’)
 cache table people

  create temporary table users using org.apache.spark.sql.parquet options
 (path 'examples/src/main/resources/users.parquet’)
 cache table users

   From: Todd Nist
 Date: Tuesday, February 10, 2015 at 3:03 PM
 To: user@spark.apache.org
 Subject: SparkSQL + Tableau Connector

   Hi,

  I'm trying to understand how and what the Tableau connector to SparkSQL
 is able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


  For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

   import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

   When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files?
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd





Re: SparkSQL + Tableau Connector

2015-02-10 Thread Arush Kharbanda
 I am a little confused here, why do you want to create the tables in hive.
You want to create the tables in spark-sql, right?

If you are not able to find the same tables through tableau then thrift is
connecting to a diffrent metastore than your spark-shell.

One way to specify a metstore to thrift is to provide the path to
hive-site.xml while starting thrift using --files hive-site.xml.

similarly you can specify the same metastore to your spark-submit or
sharp-shell using the same option.



On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 As for #2 do you mean something like this from the docs:

 // sc is an existing SparkContext.val sqlContext = new 
 org.apache.spark.sql.hive.HiveContext(sc)
 sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
 STRING))sqlContext.sql(LOAD DATA LOCAL INPATH 
 'examples/src/main/resources/kv1.txt' INTO TABLE src)
 // Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key, 
 value).collect().foreach(println)

 Or did you have something else in mind?

 -Todd


 On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 Thank you will take a look at that approach in the morning.  I sort of
 figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
 for clarifying it for me.

 -Todd

 On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda 
 ar...@sigmoidanalytics.com wrote:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files? NO
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive? Create a table in spark sql to expose via
 spark sql
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2 you would need to configure thrift
 to read from the metastore you expect it read from - by default it reads
 from metastore_db directory present in the directory used to launch the
 thrift server.
  On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to SparkSQL
 is able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


 For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files?
 2.  Do I need to do something to expose these via hive / metastore
 other than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd






-- 

[image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com


Re: SparkSQL + Tableau Connector

2015-02-10 Thread Arush Kharbanda
BTW what tableau connector are you using?

On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda 
ar...@sigmoidanalytics.com wrote:

  I am a little confused here, why do you want to create the tables in
 hive. You want to create the tables in spark-sql, right?

 If you are not able to find the same tables through tableau then thrift is
 connecting to a diffrent metastore than your spark-shell.

 One way to specify a metstore to thrift is to provide the path to
 hive-site.xml while starting thrift using --files hive-site.xml.

 similarly you can specify the same metastore to your spark-submit or
 sharp-shell using the same option.



 On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 As for #2 do you mean something like this from the docs:

 // sc is an existing SparkContext.val sqlContext = new 
 org.apache.spark.sql.hive.HiveContext(sc)
 sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value 
 STRING))sqlContext.sql(LOAD DATA LOCAL INPATH 
 'examples/src/main/resources/kv1.txt' INTO TABLE src)
 // Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key, 
 value).collect().foreach(println)

 Or did you have something else in mind?

 -Todd


 On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 Thank you will take a look at that approach in the morning.  I sort of
 figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
 for clarifying it for me.

 -Todd

 On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda 
 ar...@sigmoidanalytics.com wrote:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files? NO
 2.  Do I need to do something to expose these via hive / metastore
 other than creating a table in hive? Create a table in spark sql to expose
 via spark sql
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2 you would need to configure thrift
 to read from the metastore you expect it read from - by default it reads
 from metastore_db directory present in the directory used to launch the
 thrift server.
  On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to
 SparkSQL is able to access.  My understanding is it needs to connect to 
 the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / 
 hive.


 For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files?
 2.  Do I need to do something to expose these via hive / metastore
 other than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in
 some fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd






 --

 [image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

 *Arush Kharbanda* || Technical Teamlead

 ar...@sigmoidanalytics.com || www.sigmoidanalytics.com




-- 

[image: Sigmoid Analytics] http://htmlsig.com/www.sigmoidanalytics.com

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com


Re: SparkSQL + Tableau Connector

2015-02-10 Thread Arush Kharbanda
1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
files? NO
2.  Do I need to do something to expose these via hive / metastore other
than creating a table in hive? Create a table in spark sql to expose via
spark sql
3.  Does the thriftserver need to be configured to expose these in some
fashion, sort of related to question 2 you would need to configure thrift
to read from the metastore you expect it read from - by default it reads
from metastore_db directory present in the directory used to launch the
thrift server.
 On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to SparkSQL is
 able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


 For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default schema
 and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files?
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd



Re: SparkSQL + Tableau Connector

2015-02-10 Thread Silvio Fiorito
Hi Todd,

What you could do is run some SparkSQL commands immediately after the Thrift 
server starts up. Or does Tableau have some init SQL commands you could run?


You can actually load data using SQL, such as:

create temporary table people using org.apache.spark.sql.json options (path 
'examples/src/main/resources/people.json’)
cache table people

create temporary table users using org.apache.spark.sql.parquet options (path 
'examples/src/main/resources/users.parquet’)
cache table users

From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL + Tableau Connector

Hi,

I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.

For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.


import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = 
sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
test.saveAsParquetFile(/data/out)

When I connect from Tableau, the only thing I see is the default schema and 
nothing in the tables section.

So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.

TIA for the assistance.

-Todd


Re: SparkSQL + Tableau Connector

2015-02-10 Thread Silvio Fiorito
Todd,

I just tried it in bin/spark-sql shell. I created a folder json and just put 2 
copies of the same people.json file

This is what I ran:

spark-sql create temporary table people
  using org.apache.spark.sql.json
  options (path 'examples/src/main/resources/json/*')
  ;
Time taken: 0.34 seconds
spark-sql select * from people;
NULLMichael
30  Andy
19  Justin
NULLMichael
30  Andy
19  Justin
Time taken: 0.576 seconds

From: Todd Nist
Date: Tuesday, February 10, 2015 at 6:49 PM
To: Silvio Fiorito
Cc: user@spark.apache.orgmailto:user@spark.apache.org
Subject: Re: SparkSQL + Tableau Connector

Hi Silvio,

Ah, I like that, there is a section in Tableau for Initial SQL to be executed 
upon connecting this would fit well there.  I guess I will need to issue a 
collect(), coalesce(1,true).saveAsTextFile(...) or use repartition(1), as the 
file currently is being broken into multiple parts.   While this works in the 
spark-shell:


val test = sqlContext.jsonFile(/data/out/“)  // returs all parts back as one

It seems to fail in just spark-sql:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/')
cache table test

with:

[Simba][SparkODBC] (35) Error from Spark: error code: '0' error message: 
'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create 
temporary table test using
org.apache.spark.sql.json
options (path '/data/out/')
cache table test'.

Initial SQL Error. Check that the syntax is correct and that you have access 
privileges to the requested database.

Thanks again for the suggestion and I will give work with it a bit more 
tomorrow.

-Todd



On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito 
silvio.fior...@granturing.commailto:silvio.fior...@granturing.com wrote:
Hi Todd,

What you could do is run some SparkSQL commands immediately after the Thrift 
server starts up. Or does Tableau have some init SQL commands you could run?


You can actually load data using SQL, such as:

create temporary table people using org.apache.spark.sql.json options (path 
'examples/src/main/resources/people.json’)
cache table people

create temporary table users using org.apache.spark.sql.parquet options (path 
'examples/src/main/resources/users.parquet’)
cache table users

From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: user@spark.apache.orgmailto:user@spark.apache.org
Subject: SparkSQL + Tableau Connector

Hi,

I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.

For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.


import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = 
sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
test.saveAsParquetFile(/data/out)

When I connect from Tableau, the only thing I see is the default schema and 
nothing in the tables section.

So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.

TIA for the assistance.

-Todd



Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Arush,

As for #2 do you mean something like this from the docs:

// sc is an existing SparkContext.val sqlContext = new
org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(CREATE TABLE IF NOT EXISTS src (key INT, value
STRING))sqlContext.sql(LOAD DATA LOCAL INPATH
'examples/src/main/resources/kv1.txt' INTO TABLE src)
// Queries are expressed in HiveQLsqlContext.sql(FROM src SELECT key,
value).collect().foreach(println)

Or did you have something else in mind?

-Todd


On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist tsind...@gmail.com wrote:

 Arush,

 Thank you will take a look at that approach in the morning.  I sort of
 figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
 for clarifying it for me.

 -Todd

 On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda 
 ar...@sigmoidanalytics.com wrote:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files? NO
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive? Create a table in spark sql to expose via
 spark sql
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2 you would need to configure thrift
 to read from the metastore you expect it read from - by default it reads
 from metastore_db directory present in the directory used to launch the
 thrift server.
  On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to SparkSQL
 is able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


 For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
 JSON files?
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd





Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Arush,

Thank you will take a look at that approach in the morning.  I sort of
figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
for clarifying it for me.

-Todd

On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda ar...@sigmoidanalytics.com
 wrote:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files? NO
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive? Create a table in spark sql to expose via
 spark sql
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2 you would need to configure thrift
 to read from the metastore you expect it read from - by default it reads
 from metastore_db directory present in the directory used to launch the
 thrift server.
  On 11 Feb 2015 01:35, Todd Nist tsind...@gmail.com wrote:

 Hi,

 I'm trying to understand how and what the Tableau connector to SparkSQL
 is able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


 For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

 import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

 When I connect from Tableau, the only thing I see is the default schema
 and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files?
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd




Re: SparkSQL + Tableau Connector

2015-02-10 Thread Todd Nist
Hi Silvio,

Ah, I like that, there is a section in Tableau for Initial SQL to be
executed upon connecting this would fit well there.  I guess I will need to
issue a collect(), coalesce(1,true).saveAsTextFile(...) or use
repartition(1), as the file currently is being broken into multiple parts.
  While this works in the spark-shell:

val test = sqlContext.jsonFile(/data/out/“)  // returs all parts back as
one
It seems to fail in just spark-sql:

create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/')
cache table test

with:

[Simba][SparkODBC] (35) Error from Spark: error code: '0' error message:
'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create
temporary table test using
org.apache.spark.sql.json
options (path '/data/out/')
cache table test'.

Initial SQL Error. Check that the syntax is correct and that you have
access privileges to the requested database.

Thanks again for the suggestion and I will give work with it a bit more
tomorrow.

-Todd



On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito 
silvio.fior...@granturing.com wrote:

   Hi Todd,

  What you could do is run some SparkSQL commands immediately after the
 Thrift server starts up. Or does Tableau have some init SQL commands you
 could run?


  You can actually load data using SQL, such as:

  create temporary table people using org.apache.spark.sql.json options
 (path 'examples/src/main/resources/people.json’)
 cache table people

  create temporary table users using org.apache.spark.sql.parquet options
 (path 'examples/src/main/resources/users.parquet’)
 cache table users

   From: Todd Nist
 Date: Tuesday, February 10, 2015 at 3:03 PM
 To: user@spark.apache.org
 Subject: SparkSQL + Tableau Connector

   Hi,

  I'm trying to understand how and what the Tableau connector to SparkSQL
 is able to access.  My understanding is it needs to connect to the
 thriftserver and I am not sure how or if it exposes parquet, json,
 schemaRDDs, or does it only expose schemas defined in the metastore / hive.


  For example, I do the following from the spark-shell which generates a
 schemaRDD from a csv file and saves it as a JSON file as well as a parquet
 file.

  import *org.apache.sql.SQLContext
 *import com.databricks.spark.csv._
 val sqlContext = new SQLContext(sc)
 val test = 
 sqlContext.csfFile(/data/test.csv)test.toJSON().saveAsTextFile(/data/out)
 test.saveAsParquetFile(/data/out)

  When I connect from Tableau, the only thing I see is the default
 schema and nothing in the tables section.

 So my questions are:

 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
 files?
 2.  Do I need to do something to expose these via hive / metastore other
 than creating a table in hive?
 3.  Does the thriftserver need to be configured to expose these in some
 fashion, sort of related to question 2.

 TIA for the assistance.

 -Todd