Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
We haven't seen many of these, but we have seen it a couple of times -- there is ongoing work under SPARK-26089 to address the issue we know about, namely that we don't detect corruption in large shuffle blocks. Do you believe the cases you have match that -- does it appear to be corruption in large shuffle blocks? Or do you not have compression or encryption enabled? Both the prior solution and the work under SPARK-26089 only work if either one of those is enabled. On Tue, Mar 12, 2019 at 9:36 AM Vadim Semenov wrote: > I/We have seen this error before on 1.6 but ever since we upgraded to 2.1 > two years ago we haven't seen it > > On Tue, Mar 12, 2019 at 2:19 AM wangfei wrote: > >> Hi all, >> Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ >> errors >> may occur during shuffle read, described as this JIRA( >> https://issues.apache.org/jira/browse/SPARK-4105). >> There is not new comment for a long time in this JIRA. So, Is >> there anyone seen these errors in latest version, such as spark-2.3? >> Can anyone provide a reproducible case or analyze the cause of >> these errors? >> Thanks. >> > > > -- > Sent from my iPhone >
Re: [SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I/We have seen this error before on 1.6 but ever since we upgraded to 2.1 two years ago we haven't seen it On Tue, Mar 12, 2019 at 2:19 AM wangfei wrote: > Hi all, > Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ > errors > may occur during shuffle read, described as this JIRA( > https://issues.apache.org/jira/browse/SPARK-4105). > There is not new comment for a long time in this JIRA. So, Is there > anyone seen these errors in latest version, such as spark-2.3? > Can anyone provide a reproducible case or analyze the cause of these > errors? > Thanks. > -- Sent from my iPhone
[SHUFFLE]FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
Hi all, Non-deterministic FAILED_TO_UNCOMPRESS(5) or ’Stream is corrupted’ errors may occur during shuffle read, described as this JIRA(https://issues.apache.org/jira/browse/SPARK-4105). There is not new comment for a long time in this JIRA. So, Is there anyone seen these errors in latest version, such as spark-2.3? Can anyone provide a reproducible case or analyze the cause of these errors? Thanks.
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I am seeing the same issue with Spark 1.3.1. I see this issue when reading sequence file stored in Sequence File format (SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v? ) All i do is sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).partitionBy(new org.apache.spark.HashPartitioner(2053)) .set(spark.serializer, org.apache.spark.serializer.KryoSerializer) .set(spark.kryoserializer.buffer.mb, arguments.get(buffersize).get) .set(spark.kryoserializer.buffer.max.mb, arguments.get(maxbuffersize).get) .set(spark.driver.maxResultSize, arguments.get(maxResultSize).get) .set(spark.yarn.maxAppAttempts, 0) //.set(spark.akka.askTimeout, arguments.get(askTimeout).get) //.set(spark.akka.timeout, arguments.get(akkaTimeout).get) //.set(spark.worker.timeout, arguments.get(workerTimeout).get) .registerKryoClasses(Array(classOf[com.ebay.ep.poc.spark.reporting.process.model.dw.SpsLevelMetricSum])) and values are buffersize=128 maxbuffersize=1068 maxResultSize=200G On Thu, May 7, 2015 at 8:04 AM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm using the default settings. Jianshi On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi, Can you please share your compression etc settings, which you are using. Thanks, Twinkle On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm facing this error in Spark 1.3.1 https://issues.apache.org/jira/browse/SPARK-4105 Anyone knows what's the workaround? Change the compression codec for shuffle output? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Deepak
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
Hi, Can you please share your compression etc settings, which you are using. Thanks, Twinkle On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm facing this error in Spark 1.3.1 https://issues.apache.org/jira/browse/SPARK-4105 Anyone knows what's the workaround? Change the compression codec for shuffle output? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/
Re: FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
I'm using the default settings. Jianshi On Wed, May 6, 2015 at 7:05 PM, twinkle sachdeva twinkle.sachd...@gmail.com wrote: Hi, Can you please share your compression etc settings, which you are using. Thanks, Twinkle On Wed, May 6, 2015 at 4:15 PM, Jianshi Huang jianshi.hu...@gmail.com wrote: I'm facing this error in Spark 1.3.1 https://issues.apache.org/jira/browse/SPARK-4105 Anyone knows what's the workaround? Change the compression codec for shuffle output? -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/ -- Jianshi Huang LinkedIn: jianshi Twitter: @jshuang Github Blog: http://huangjs.github.com/