Arush, As for #2 do you mean something like this from the docs:
// sc is an existing SparkContext.val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") // Queries are expressed in HiveQLsqlContext.sql("FROM src SELECT key, value").collect().foreach(println) Or did you have something else in mind? -Todd On Tue, Feb 10, 2015 at 6:35 PM, Todd Nist <tsind...@gmail.com> wrote: > Arush, > > Thank you will take a look at that approach in the morning. I sort of > figured the answer to #1 was NO and that I would need to do 2 and 3 thanks > for clarifying it for me. > > -Todd > > On Tue, Feb 10, 2015 at 5:24 PM, Arush Kharbanda < > ar...@sigmoidanalytics.com> wrote: > >> 1. Can the connector fetch or query schemaRDD's saved to Parquet or JSON >> files? NO >> 2. Do I need to do something to expose these via hive / metastore other >> than creating a table in hive? Create a table in spark sql to expose via >> spark sql >> 3. Does the thriftserver need to be configured to expose these in some >> fashion, sort of related to question 2 you would need to configure thrift >> to read from the metastore you expect it read from - by default it reads >> from metastore_db directory present in the directory used to launch the >> thrift server. >> On 11 Feb 2015 01:35, "Todd Nist" <tsind...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm trying to understand how and what the Tableau connector to SparkSQL >>> is able to access. My understanding is it needs to connect to the >>> thriftserver and I am not sure how or if it exposes parquet, json, >>> schemaRDDs, or does it only expose schemas defined in the metastore / hive. >>> >>> >>> For example, I do the following from the spark-shell which generates a >>> schemaRDD from a csv file and saves it as a JSON file as well as a parquet >>> file. >>> >>> import *org.apache.sql.SQLContext >>> *import com.databricks.spark.csv._ >>> val sqlContext = new SQLContext(sc) >>> val test = >>> sqlContext.csfFile("/data/test.csv")test.toJSON().saveAsTextFile("/data/out") >>> test.saveAsParquetFile("/data/out") >>> >>> When I connect from Tableau, the only thing I see is the "default" >>> schema and nothing in the tables section. >>> >>> So my questions are: >>> >>> 1. Can the connector fetch or query schemaRDD's saved to Parquet or >>> JSON files? >>> 2. Do I need to do something to expose these via hive / metastore other >>> than creating a table in hive? >>> 3. Does the thriftserver need to be configured to expose these in some >>> fashion, sort of related to question 2. >>> >>> TIA for the assistance. >>> >>> -Todd >>> >> >