Re: JsonRDD to parquet -- data loss

2015-02-18 Thread Michael Armbrust
Concurrent inserts into the same table are not supported. I can try to make this clearer in the documentation. On Tue, Feb 17, 2015 at 8:01 PM, Vasu C vasuc.bigd...@gmail.com wrote: Hi, I am running spark batch processing job using spark-submit command. And below is my code snippet.

JsonRDD to parquet -- data loss

2015-02-17 Thread Vasu C
Hi, I am running spark batch processing job using spark-submit command. And below is my code snippet. Basically converting JsonRDD to parquet and storing it in HDFS location. The problem I am facing is if multiple jobs are are triggered parallely, even though job executes properly (as i can see

Re: JsonRDD to parquet -- data loss

2015-02-17 Thread Arush Kharbanda
I am not sure, if this the easiest way to solve your problem. But you can connect to the HIVE metastore(through derby) and find the HDFS path from there. On Wed, Feb 18, 2015 at 9:31 AM, Vasu C vasuc.bigd...@gmail.com wrote: Hi, I am running spark batch processing job using spark-submit