subject:"Safe to write to parquet at the same time\?"

Re: Safe to write to parquet at the same time?

2015-08-04 Thread Cheng Lian

It should be safe for Spark 1.4.1 and later versions. Now Spark SQL adds a job-wise UUID to output file names to distinguish files written by different write jobs. So those two write jobs you gave should play well with each other. And the job committed later will generate a summary file for

Safe to write to parquet at the same time?

2015-08-03 Thread Philip Weaver

I think this question applies regardless if I have two completely separate Spark jobs or tasks on different machines, or two cores that are part of the same task on the same machine. If two jobs/tasks/cores/stages both save to the same parquet directory in parallel like this: