Re: Spark SQL and running parquet tables?

2014-09-12 Thread DanteSama
So, after toying around a bit, here's what I ended up with. First off, there's no function registerTempTable -- registerTable seems to be enough to work (it's the same whether directly on a SchemaRDD or on a SqlContext being passed an RDD). The problem I encountered after was reloading a table in

Re: Spark SQL and running parquet tables?

2014-09-12 Thread DanteSama
Turns out it was Spray with a bad route -- the results weren't updating despite the table running. This thread can be ignored. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-SQL-and-running-parquet-tables-tp13987p14114.html Sent from the Apache Spark

Re: Spark SQL and running parquet tables?

2014-09-11 Thread DanteSama
Michael Armbrust wrote You'll need to run parquetFile(path).registerTempTable(name) to refresh the table. I'm not seeing that function on SchemaRDD in 1.0.2, is there something I'm missing? SchemaRDD Scaladoc

SchemaRDD - Parquet - insertInto makes many files

2014-09-04 Thread DanteSama
It seems that running insertInto on an SchemaRDD with a ParquetRelation creates an individual file for each item in the RDD. Sometimes, it has multiple rows in one file, and sometimes it only writes the column headers. My question is, is it possible to have it write the entire RDD as 1 file, but

Re: SchemaRDD - Parquet - insertInto makes many files

2014-09-04 Thread DanteSama
Yep, that worked out. Does this solution have any performance implications past all the work being done on (probably) 1 node? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/SchemaRDD-Parquet-insertInto-makes-many-files-tp13480p13501.html Sent from the