Very cool thank you!
On Wed, Nov 19, 2014 at 11:15 AM, Marius Soutier <mps....@gmail.com> wrote: > You can also insert into existing tables via .insertInto(tableName, > overwrite). You just have to import sqlContext._ > > On 19.11.2014, at 09:41, Daniel Haviv <danielru...@gmail.com> wrote: > > Hello, > I'm writing a process that ingests json files and saves them a parquet > files. > The process is as such: > > val sqlContext = new org.apache.spark.sql.SQLContext(sc) > val jsonRequests=sqlContext.jsonFile("/requests") > val parquetRequests=sqlContext.parquetFile("/requests_parquet") > > jsonRequests.registerTempTable("jsonRequests") > parquetRequests.registerTempTable("parquetRequests") > > val unified_requests=sqlContext.sql("select * from jsonRequests union > select * from parquetRequests") > > unified_requests.saveAsParquetFile("/tempdir") > > and then I delete /requests_parquet and rename /tempdir as > /requests_parquet > > Is there a better way to achieve that ? > > Another problem I have is that I get a lot of small json files and as a > result a lot of small parquet files, I'd like to merge the json files into > a few parquet files.. how I do that? > > Thank you, > Daniel > > > >