Re: Merging two Spark SQL tables?

2014-08-25 Thread Michael Armbrust
SO I tried the above (why doesn't union or ++ have the same behavior btw?) I don't think there is a good reason for this. I'd open a JIRA. and it works, but is slow because the original Rdds are not cached and files must be read from disk. I also discovered you can recover the

Merging two Spark SQL tables?

2014-08-21 Thread Evan Chan
Is it possible to merge two cached Spark SQL tables into a single table so it can queried with one SQL statement? ie, can you do schemaRdd1.union(schemaRdd2), then register the new schemaRdd and run a query over it? Ideally, both schemaRdd1 and schemaRdd2 would be cached, so the union should run

Re: Merging two Spark SQL tables?

2014-08-21 Thread Michael Armbrust
I believe this should work if you run srdd1.unionAll(srdd2). Both RDDs must have the same schema. On Wed, Aug 20, 2014 at 11:30 PM, Evan Chan velvia.git...@gmail.com wrote: Is it possible to merge two cached Spark SQL tables into a single table so it can queried with one SQL statement? ie,

Re: Merging two Spark SQL tables?

2014-08-21 Thread Evan Chan
SO I tried the above (why doesn't union or ++ have the same behavior btw?) and it works, but is slow because the original Rdds are not cached and files must be read from disk. I also discovered you can recover the InMemoryCached versions of the Rdds using sqlContext.table(table1). Thus you can