SO I tried the above (why doesn't union or ++ have the same behavior
btw?)
I don't think there is a good reason for this. I'd open a JIRA.
and it works, but is slow because the original Rdds are not
cached and files must be read from disk.
I also discovered you can recover the
Is it possible to merge two cached Spark SQL tables into a single
table so it can queried with one SQL statement?
ie, can you do schemaRdd1.union(schemaRdd2), then register the new
schemaRdd and run a query over it?
Ideally, both schemaRdd1 and schemaRdd2 would be cached, so the union
should run
I believe this should work if you run srdd1.unionAll(srdd2). Both RDDs
must have the same schema.
On Wed, Aug 20, 2014 at 11:30 PM, Evan Chan velvia.git...@gmail.com wrote:
Is it possible to merge two cached Spark SQL tables into a single
table so it can queried with one SQL statement?
ie,
SO I tried the above (why doesn't union or ++ have the same behavior
btw?) and it works, but is slow because the original Rdds are not
cached and files must be read from disk.
I also discovered you can recover the InMemoryCached versions of the
Rdds using sqlContext.table(table1).
Thus you can