Re: parquet repartitions and parquet.enable.summary-metadata does not work

2016-01-12 Thread Cheng Lian
I see. So there are actually 3000 tasks instead of 3000 jobs right? Would you mind to provide the full stack trace of the GC issue? At first I thought it's identical to the _metadata one in the mail thread you mentioned. Cheng On 1/11/16 5:30 PM, Gavin Yue wrote: Here is how I set the conf:

Re: parquet repartitions and parquet.enable.summary-metadata does not work

2016-01-11 Thread Cheng Lian
Hey Gavin, Could you please provide a snippet of your code to show how did you disabled "parquet.enable.summary-metadata" and wrote the files? Especially, you mentioned you saw "3000 jobs" failed. Were you writing each Parquet file with an individual job? (Usually people use

parquet repartitions and parquet.enable.summary-metadata does not work

2016-01-10 Thread Gavin Yue
Hey, I am trying to convert a bunch of json files into parquet, which would output over 7000 parquet files. But tthere are too many files, so I want to repartition based on id to 3000. But I got the error of GC problem like this one: