Re: SparkSQL + Tableau Connector

Todd Nist Tue, 10 Feb 2015 15:54:44 -0800

Arush,

As for #2 do you mean something like this from the docs:


// sc is an existing SparkContext.val sqlContext = new
org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value
STRING)")sqlContext.sql("LOAD DATA LOCAL INPATH
'examples/src/main/resources/kv1.txt' INTO TABLE src")
// Queries are expressed in HiveQLsqlContext.sql("FROM src SELECT key,
value").collect().foreach(println)

Or did you have something else in mind?

-Todd


On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote:

> Arush,
>
> Thank you will take a look at that approach in the morning.  I sort of
> figured the answer to #1 was NO and that I would need to do 2 and 3 thanks
> for clarifying it for me.
>
> -Todd
>
> On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda <
> ar...@sigmoidanalytics.com> wrote:
>
>> 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
>> files? NO
>> 2.  Do I need to do something to expose these via hive / metastore other
>> than creating a table in hive? Create a table in spark sql to expose via
>> spark sql
>> 3.  Does the thriftserver need to be configured to expose these in some
>> fashion, sort of related to question 2 you would need to configure thrift
>> to read from the metastore you expect it read from - by default it reads
>> from metastore_db directory present in the directory used to launch the
>> thrift server.
>>  On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to understand how and what the Tableau connector to SparkSQL
>>> is able to access.  My understanding is it needs to connect to the
>>> thriftserver and I am not sure how or if it exposes parquet, json,
>>> schemaRDDs, or does it only expose schemas defined in the metastore / hive.
>>>
>>>
>>> For example, I do the following from the spark-shell which generates a
>>> schemaRDD from a csv file and saves it as a JSON file as well as a parquet
>>> file.
>>>
>>> import *org.apache.sql.SQLContext
>>> *import com.databricks.spark.csv._
>>> val sqlContext = new SQLContext(sc)
>>> val test = 
>>> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
>>> test.saveAsParquetFile("/data/out")
>>>
>>> When I connect from Tableau, the only thing I see is the "default"
>>> schema and nothing in the tables section.
>>>
>>> So my questions are:
>>>
>>> 1.  Can the connector fetch or query schemaRDD's saved to Parquet or
>>> JSON files?
>>> 2.  Do I need to do something to expose these via hive / metastore other
>>> than creating a table in hive?
>>> 3.  Does the thriftserver need to be configured to expose these in some
>>> fashion, sort of related to question 2.
>>>
>>> TIA for the assistance.
>>>
>>> -Todd
>>>
>>
>

Re: SparkSQL + Tableau Connector

Reply via email to