I'm using mysql as the metastore DB with Spark 1.2.I simply copy the 
hive-site.xml to /etc/spark/ and added the mysql JDBC JAR to spark-env.sh in 
/etc/spark/, everything works fine now.
My setup looks like this.
Tableau => Spark ThriftServer2 => HiveServer2
It's talking to Tableau Desktop 8.3. Interestingly, when I query a Hive table, 
it still invokes Hive queries to HiveServer2 which is running MR or Tez engine. 
 Is this expected?  
I thought it should at least use the catalyst engine and talk to the underlying 
HDFS like what HiveContext API does to pull in the data into RDD.  Did I 
misunderstood the purpose of Spark ThriftServer2?

Date: Wed, 11 Feb 2015 16:07:40 +0530
Subject: Re: SparkSQL + Tableau Connector
From: ar...@sigmoidanalytics.com
To: tsind...@gmail.com
CC: user@spark.apache.org

I used this, though its using a embedded driver and is not a good approch.It 
works. You can configure for some other metastore type also. I have not tried 
the metastore uri's.





  <description>URL for the DB</description>






<!-- <property>



  <description>IP address (or fully-qualified domain name) and port of the 
metastore host</description>

</property> -->


On Wed, Feb 11, 2015 at 3:59 PM, Todd Nist <tsind...@gmail.com> wrote:
Hi Arush,
So yes I want to create the tables through Spark SQL.  I have placed the 
hive-site.xml file inside of the $SPARK_HOME/conf directory I thought that was 
all I should need to do to have the thriftserver use it.  Perhaps my 
hive-site.xml is worng, it currently looks like this:

<configuration><property>  <name>hive.metastore.uris</name>  <!-- Ensure that 
the following statement points to the Hive Metastore URI in your cluster -->  
<value>thrift://sandbox.hortonworks.com:9083</value>  <description>URI for 
client to contact metastore server</description></property></configuration>
Which leads me to believe it is going to pull form the thriftserver from 
Horton?  I will go look at the docs to see if this is right, it is what Horton 
says to do.  Do you have an example hive-site.xml by chance that works with 
Spark SQL?
I am using 8.3 of tableau with the SparkSQL Connector.
Thanks for the assistance.
On Wed, Feb 11, 2015 at 2:34 AM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
BTW what tableau connector are you using?
On Wed, Feb 11, 2015 at 12:55 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
 I am a little confused here, why do you want to create the tables in hive. You 
want to create the tables in spark-sql, right?
If you are not able to find the same tables through tableau then thrift is 
connecting to a diffrent metastore than your spark-shell.
One way to specify a metstore to thrift is to provide the path to hive-site.xml 
while starting thrift using --files hive-site.xml.
similarly you can specify the same metastore to your spark-submit or 
sharp-shell using the same option.

On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist <tsind...@gmail.com> wrote:
As for #2 do you mean something like this from the docs:

// sc is an existing SparkContext.
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' 

// Queries are expressed in HiveQL
sqlContext.sql("FROM src SELECT key, value").collect().foreach(println)Or did 
you have something else in mind?

On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote:
Thank you will take a look at that approach in the morning.  I sort of figured 
the answer to #1 was NO and that I would need to do 2 and 3 thanks for 
clarifying it for me.
On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda <ar...@sigmoidanalytics.com> 
1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON 
files? NO

2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive? Create a table in spark sql to expose via spark sql

3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2 you would need to configure thrift to 
read from the metastore you expect it read from - by default it reads from 
metastore_db directory present in the directory used to launch the thrift 

On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote:
I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.  
For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.
import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = sqlContext.csfFile("/data/test.csv")

When I connect from Tableau, the only thing I see is the "default" schema and 
nothing in the tables section.
So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.
TIA for the assistance.

Reply via email to