RE: Huge partitioning job takes longer to close after all tasks finished

2017-03-09 Thread PSwain
: user@spark.apache.org Subject: Re: Huge partitioning job takes longer to close after all tasks finished Thank you liu. Can you please explain what do you mean by enabling spark fault tolerant mechanism? I observed that after all tasks finishes, spark is working on concatenating same partitions from all

Re: Huge partitioning job takes longer to close after all tasks finished

2017-03-09 Thread Gourav Sengupta
Hi, you are definitely not using SPARK 2.1 in the way it should be used. Try using sessions, and follow their guidelines, this issue has been specifically resolved as a part of Spark 2.1 release. Regards, Gourav On Wed, Mar 8, 2017 at 8:00 PM, Swapnil Shinde wrote:

Re: Huge partitioning job takes longer to close after all tasks finished

2017-03-08 Thread Swapnil Shinde
Thank you liu. Can you please explain what do you mean by enabling spark fault tolerant mechanism? I observed that after all tasks finishes, spark is working on concatenating same partitions from all tasks on file system. eg, task1 - partition1, partition2, partition3 task2 - partition1,

Re: Huge partitioning job takes longer to close after all tasks finished

2017-03-07 Thread cht liu
Do you enable the spark fault tolerance mechanism, RDD run at the end of the job, will start a separate job, to the checkpoint data written to the file system before the persistence of high availability 2017-03-08 2:45 GMT+08:00 Swapnil Shinde : > Hello all >I have