Hi Todd,

What you could do is run some SparkSQL commands immediately after the Thrift 
server starts up. Or does Tableau have some init SQL commands you could run?


You can actually load data using SQL, such as:

create temporary table people using org.apache.spark.sql.json options (path 
'examples/src/main/resources/people.json’)
cache table people

create temporary table users using org.apache.spark.sql.parquet options (path 
'examples/src/main/resources/users.parquet’)
cache table users

From: Todd Nist
Date: Tuesday, February 10, 2015 at 3:03 PM
To: "user@spark.apache.org<mailto:user@spark.apache.org>"
Subject: SparkSQL + Tableau Connector

Hi,

I'm trying to understand how and what the Tableau connector to SparkSQL is able 
to access.  My understanding is it needs to connect to the thriftserver and I 
am not sure how or if it exposes parquet, json, schemaRDDs, or does it only 
expose schemas defined in the metastore / hive.

For example, I do the following from the spark-shell which generates a 
schemaRDD from a csv file and saves it as a JSON file as well as a parquet file.


import org.apache.sql.SQLContext
import com.databricks.spark.csv._

val sqlContext = new SQLContext(sc)
val test = 
sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out")
test.saveAsParquetFile("/data/out")

When I connect from Tableau, the only thing I see is the "default" schema and 
nothing in the tables section.

So my questions are:

1.  Can the connector fetch or query schemaRDD's saved to Parquet or JSON files?
2.  Do I need to do something to expose these via hive / metastore other than 
creating a table in hive?
3.  Does the thriftserver need to be configured to expose these in some 
fashion, sort of related to question 2.

TIA for the assistance.

-Todd

Reply via email to