Re: SparkSQL + Tableau Connector

Arush Kharbanda Tue, 10 Feb 2015 23:29:16 -0800

 I am a little confused here, why do you want to create the tables in hive.
You want to create the tables in spark-sql, right?


If you are not able to find the same tables through tableau then thrift is
connecting to a diffrent metastore than your spark-shell.

One way to specify a metstore to thrift is to provide the path to
hive-site.xml while starting thrift using --files hive-site.xml.

similarly you can specify the same metastore to your spark-submit or
sharp-shell using the same option.



On Wed, Feb 11, 2015 at 5:23 AM, Todd Nist <tsind...@gmail.com> wrote:

> Arush,
>
> As for #2 do you mean something like this from the docs:
>
> // sc is an existing SparkContext.val sqlContext = new 
> org.apache.spark.sql.hive.HiveContext(sc)
> sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value 
> STRING)")sqlContext.sql("LOAD DATA LOCAL INPATH 
> 'examples/src/main/resources/kv1.txt' INTO TABLE src")
> // Queries are expressed in HiveQLsqlContext.sql("FROM src SELECT key, 
> value").collect().foreach(println)
>
> Or did you have something else in mind?
>
> -Todd
>
>
> On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote:
>
>> Arush,
>>
>> Thank you will take a look at that approach in the morning.  I sort of
>> figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
>> for clarifying it for me.
>>
>> -Todd
>>
>> On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda <
>> ar...@sigmoidanalytics.com> wrote:
>>
>>> 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
>>> JSON files? NO
>>> 2.  Do I need to do something to expose these via hive / metastore other
>>> than creating a table in hive? Create a table in spark sql to expose via
>>> spark sql
>>> 3.  Does the thriftserver need to be configured to expose these in some
>>> fashion, sort of related to question 2 you would need to configure thrift
>>> to read from the metastore you expect it read from - by default it reads
>>> from metastore_db directory present in the directory used to launch the
>>> thrift server.
>>>  On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I'm trying to understand how and what the Tableau connector to SparkSQL
>>>> is able to access.  My understanding is it needs to connect to the
>>>> thriftserver and I am not sure how or if it exposes parquet, json,
>>>> schemaRDDs, or does it only expose schemas defined in the metastore / hive.
>>>>
>>>>
>>>> For example, I do the following from the spark-shell which generates a
>>>> schemaRDD from a csv file and saves it as a JSON file as well as a parquet
>>>> file.
>>>>
>>>> import *org.apache.sql.SQLContext
>>>> *import com.databricks.spark.csv._
>>>> val sqlContext = new SQLContext(sc)
>>>> val test = 
>>>> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
>>>> test.saveAsParquetFile("/data/out")
>>>>
>>>> When I connect from Tableau, the only thing I see is the "default"
>>>> schema and nothing in the tables section.
>>>>
>>>> So my questions are:
>>>>
>>>> 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
>>>> JSON files?
>>>> 2.  Do I need to do something to expose these via hive / metastore
>>>> other than creating a table in hive?
>>>> 3.  Does the thriftserver need to be configured to expose these in some
>>>> fashion, sort of related to question 2.
>>>>
>>>> TIA for the assistance.
>>>>
>>>> -Todd
>>>>
>>>
>>
>


-- 

[image: Sigmoid Analytics] <http://htmlsig.com/www.sigmoidanalytics.com>

*Arush Kharbanda* || Technical Teamlead

ar...@sigmoidanalytics.com || www.sigmoidanalytics.com

Re: SparkSQL + Tableau Connector

Reply via email to