Re: SparkSQL + Tableau Connector

Todd Nist Tue, 10 Feb 2015 18:24:58 -0800

Hi Silvio,

So the "Initial SQL" is executing now, I did not have the "*" added that
and it worked fine. FWIW the "*" is not needed for the parquet files:


create temporary table test
using org.apache.spark.sql.json
options (path '/data/out/*')
;

cache table test;

select count(1) from test;
Unfortunately while the table is created and cached, i can see the
statements being executed in the log file of spark, it is not associated
with any schema at least that is being picked up by the Tableau Connector.
So unless there is someway to associate it with a given schema I think I'm
at a dead end on this one.  Anything I may be missing here?

Thanks for the help, it is much appreciated.  I will give Arush suggestion
a try tomorrow.

-Todd

On Tue, Feb 10, 2015 at 7:24 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:

>  Todd,
>
>  I just tried it in bin/spark-sql shell. I created a folder *json *and
> just put 2 copies of the same people.json file
>
>  This is what I ran:
>
>  spark-sql> create temporary table people
>          > using org.apache.spark.sql.json
>          > options (path 'examples/src/main/resources/json/*')
>          > ;
> Time taken: 0.34 seconds
>
> spark-sql> select * from people;
> NULL    Michael
> 30  Andy
> 19  Justin
> NULL    Michael
> 30  Andy
> 19  Justin
> Time taken: 0.576 seconds
>
>   From: Todd Nist
> Date: Tuesday, February 10, 2015 at 6:49 PM
> To: Silvio Fiorito
> Cc: "user@spark.apache.org"
> Subject: Re: SparkSQL + Tableau Connector
>
>   Hi Silvio,
>
>  Ah, I like that, there is a section in Tableau for "Initial SQL" to be
> executed upon connecting this would fit well there.  I guess I will need to
> issue a collect(), coalesce(1,true).saveAsTextFile(...) or use
> repartition(1), as the file currently is being broken into multiple parts.
>   While this works in the spark-shell:
>
>  val test = sqlContext.jsonFile("/data/out/“)  // returs all parts back
> as one
>  It seems to fail in just spark-sql:
>
>  create temporary table test
> using org.apache.spark.sql.json
> options (path '/data/out/')
> cache table test
>
>  with:
>
> [Simba][SparkODBC] (35) Error from Spark: error code: '0' error message:
> 'org.apache.spark.sql.hive.HiveQl$ParseException: Failed to parse: create
> temporary table test using
> org.apache.spark.sql.json
> options (path '/data/out/')
> cache table test'.
>
>  Initial SQL Error. Check that the syntax is correct and that you have
> access privileges to the requested database.
>
>  Thanks again for the suggestion and I will give work with it a bit more
> tomorrow.
>
>  -Todd
>
>
>
> On Tue, Feb 10, 2015 at 5:48 PM, Silvio Fiorito <
> silvio.fior...@granturing.com> wrote:
>
>>   Hi Todd,
>>
>>  What you could do is run some SparkSQL commands immediately after the
>> Thrift server starts up. Or does Tableau have some init SQL commands you
>> could run?
>>
>>
>>  You can actually load data using SQL, such as:
>>
>>  create temporary table people using org.apache.spark.sql.json options
>> (path 'examples/src/main/resources/people.json’)
>> cache table people
>>
>>  create temporary table users using org.apache.spark.sql.parquet options
>> (path 'examples/src/main/resources/users.parquet’)
>> cache table users
>>
>>   From: Todd Nist
>> Date: Tuesday, February 10, 2015 at 3:03 PM
>> To: "user@spark.apache.org"
>> Subject: SparkSQL + Tableau Connector
>>
>>   Hi,
>>
>>  I'm trying to understand how and what the Tableau connector to SparkSQL
>> is able to access.  My understanding is it needs to connect to the
>> thriftserver and I am not sure how or if it exposes parquet, json,
>> schemaRDDs, or does it only expose schemas defined in the metastore / hive.
>>
>>
>>  For example, I do the following from the spark-shell which generates a
>> schemaRDD from a csv file and saves it as a JSON file as well as a parquet
>> file.
>>
>>   import *org.apache.sql.SQLContext
>> *import com.databricks.spark.csv._
>> val sqlContext = new SQLContext(sc)
>> val test = 
>> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
>> test.saveAsParquetFile("/data/out")
>>
>>   When I connect from Tableau, the only thing I see is the "default"
>> schema and nothing in the tables section.
>>
>> So my questions are:
>>
>> 1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON
>> files?
>> 2.  Do I need to do something to expose these via hive / metastore other
>> than creating a table in hive?
>> 3.  Does the thriftserver need to be configured to expose these in some
>> fashion, sort of related to question 2.
>>
>> TIA for the assistance.
>>
>> -Todd
>>
>
>

Re: SparkSQL + Tableau Connector

Reply via email to