[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
[ https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040456#comment-17040456 ] Artur Sukhenko commented on SPARK-21065: [~zsxwing] Is `spark.streaming.concurrentJobs` still (2.2/2.3/2.4) risky? > Spark Streaming concurrentJobs + StreamingJobProgressListener conflict > -- > > Key: SPARK-21065 > URL: https://issues.apache.org/jira/browse/SPARK-21065 > Project: Spark > Issue Type: Bug > Components: DStreams, Scheduler, Web UI >Affects Versions: 2.1.0 >Reporter: Dan Dutrow >Priority: Major > > My streaming application has 200+ output operations, many of them stateful > and several of them windowed. In an attempt to reduce the processing times, I > set "spark.streaming.concurrentJobs" to 2+. Initial results are very > positive, cutting our processing time from ~3 minutes to ~1 minute, but > eventually we encounter an exception as follows: > (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a > batch from 45 minutes before the exception is thrown.) > 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR > org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener > StreamingJobProgressListener threw an exception > java.util.NoSuchElementException: key not found 149697756 ms > at scala.collection.MalLike$class.default(MapLike.scala:228) > at scala.collection.AbstractMap.default(Map.scala:59) > at scala.collection.mutable.HashMap.apply(HashMap.scala:65) > at > org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29) > at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29) > at > org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43) > ... > The Spark code causing the exception is here: > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125 > override def onOutputOperationCompleted( > outputOperationCompleted: StreamingListenerOutputOperationCompleted): > Unit = synchronized { > // This method is called before onBatchCompleted > {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color} > updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo) > } > It seems to me that it may be caused by that batch being removed earlier. > https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102 > override def onBatchCompleted(batchCompleted: > StreamingListenerBatchCompleted): Unit = { > synchronized { > waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime) > > {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color} > val batchUIData = BatchUIData(batchCompleted.batchInfo) > completedBatchUIData.enqueue(batchUIData) > if (completedBatchUIData.size > batchUIDataLimit) { > val removedBatch = completedBatchUIData.dequeue() > batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime) > } > totalCompletedBatches += 1L > totalProcessedRecords += batchUIData.numRecords > } > } > What is the solution here? Should I make my spark streaming context remember > duration a lot longer? ssc.remember(batchDuration * rememberMultiple) > Otherwise, it seems like there should be some kind of existence check on > runningBatchUIData before dereferencing it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true
[ https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko closed SPARK-16709. -- Resolution: Duplicate Fix Version/s: 1.6.2 > Task with commit failed will retry infinite when speculation set to true > > > Key: SPARK-16709 > URL: https://issues.apache.org/jira/browse/SPARK-16709 > Project: Spark > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Hong Shen > Fix For: 1.6.2 > > Attachments: commit failed.png > > > In our cluster, we set spark.speculation=true, but when a task throw > exception at SparkHadoopMapRedUtil.performCommit(), this task can retry > infinite. > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83 -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109101#comment-16109101 ] Artur Sukhenko commented on SPARK-21593: [~srowen] Yes, {code} spark.storage.replication.proactive {code} is breaking the page. > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation] > with should open Dynamic Allocation part of the page, but doesn't. > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Description: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation] with should open Dynamic Allocation part of the page, but doesn't. !dyn_latest.jpg! !dyn_211.jpg! was: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !dyn_latest.jpg! !dyn_211.jpg! > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation] > with should open Dynamic Allocation part of the page, but doesn't. > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108966#comment-16108966 ] Artur Sukhenko commented on SPARK-21593: Yes, as well as having {code}### Dynamic Allocation{code} instead of {code}Dynamic Allocation{code} > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Description: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !dyn_latest.jpg! !dyn_211.jpg! was: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !dyn_latest.jpg|thumbnail! !dyn_211.jpg|thumbnail! !doc_latest.jpg|thumbnail! !doc_211.jpg|thumbnail! > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > !dyn_latest.jpg! > > !dyn_211.jpg! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Description: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !dyn_latest.jpg|thumbnail! !dyn_211.jpg|thumbnail! !doc_latest.jpg|thumbnail! !doc_211.jpg|thumbnail! was: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > !dyn_latest.jpg|thumbnail! > !dyn_211.jpg|thumbnail! > !doc_latest.jpg|thumbnail! > !doc_211.jpg|thumbnail! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Attachment: (was: doc_211.png) > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Attachment: doc_latest.jpg doc_211.jpg dyn_211.jpg dyn_latest.jpg Broken anchors and menu > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Attachment: (was: doc_latest.png) > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Description: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] was: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !doc_latest.jpg|thumbnail! > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.png, doc_latest.png > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Description: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] !doc_latest.jpg|thumbnail! was: Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.png, doc_latest.png > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] > !doc_latest.jpg|thumbnail! -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21593) Fix broken configuration page
[ https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-21593: --- Attachment: doc_latest.png doc_211.png Latest documentation page with broken menu and working 2.1.1 > Fix broken configuration page > - > > Key: SPARK-21593 > URL: https://issues.apache.org/jira/browse/SPARK-21593 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 2.2.0 > Environment: Chrome/Firefox >Reporter: Artur Sukhenko >Priority: Minor > Attachments: doc_211.png, doc_latest.png > > > Latest configuration page for Spark 2.2.0 has broken menu list and named > anchors. > Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] > with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] > Or try this link [Configuration # Dynamic > Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21593) Fix broken configuration page
Artur Sukhenko created SPARK-21593: -- Summary: Fix broken configuration page Key: SPARK-21593 URL: https://issues.apache.org/jira/browse/SPARK-21593 Project: Spark Issue Type: Bug Components: Documentation Affects Versions: 2.2.0 Environment: Chrome/Firefox Reporter: Artur Sukhenko Priority: Minor Latest configuration page for Spark 2.2.0 has broken menu list and named anchors. Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] with [Latest docs |https://spark.apache.org/docs/latest/configuration.html] Or try this link [Configuration # Dynamic Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation] -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-18191) Port RDD API to use commit protocol
[ https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073860#comment-16073860 ] Artur Sukhenko commented on SPARK-18191: [~shridharama] Yes, updated 'Fix Version' from 2.1 to 2.2 > Port RDD API to use commit protocol > --- > > Key: SPARK-18191 > URL: https://issues.apache.org/jira/browse/SPARK-18191 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Jiang Xingbo > Fix For: 2.2.0 > > > Commit protocol is actually not specific to SQL. We can move it over to core > so the RDD API can use it too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-18191) Port RDD API to use commit protocol
[ https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-18191: --- Fix Version/s: (was: 2.1.0) 2.2.0 > Port RDD API to use commit protocol > --- > > Key: SPARK-18191 > URL: https://issues.apache.org/jira/browse/SPARK-18191 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Reporter: Reynold Xin >Assignee: Jiang Xingbo > Fix For: 2.2.0 > > > Commit protocol is actually not specific to SQL. We can move it over to core > so the RDD API can use it too. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19360) Spark 2.X does not support stored by clause
[ https://issues.apache.org/jira/browse/SPARK-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19360: --- Summary: Spark 2.X does not support stored by clause (was: Spark 2.X does not support stored by cluase) > Spark 2.X does not support stored by clause > --- > > Key: SPARK-19360 > URL: https://issues.apache.org/jira/browse/SPARK-19360 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0 >Reporter: Ran Haim >Priority: Minor > > Spark 1.6 and below versions support HiveContext which supports Hive storage > handler with "stored by" clause. However, Spark 2.x does not support "stored > by". -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener
[ https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-4906: -- Affects Version/s: 1.6.1 > Spark master OOMs with exception stack trace stored in JobProgressListener > -- > > Key: SPARK-4906 > URL: https://issues.apache.org/jira/browse/SPARK-4906 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 1.1.1, 1.6.1 >Reporter: Mingyu Kim > Attachments: LeakingJobProgressListener2OOM.docx, Screen Shot > 2016-06-26 at 10.43.57 AM.png > > > Spark master was OOMing with a lot of stack traces retained in > JobProgressListener. The object dependency goes like the following. > JobProgressListener.stageIdToData => StageUIData.taskData => > TaskUIData.errorMessage > Each error message is ~10kb since it has the entire stack trace. As we have a > lot of tasks, when all of the tasks across multiple stages go bad, these > error messages accounted for 0.5GB of heap at some point. > Please correct me if I'm wrong, but it looks like all the task info for > running applications are kept in memory, which means it's almost always bound > to OOM for long-running applications. Would it make sense to fix this, for > example, by spilling some UI states to disk? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly
[ https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-17922: --- Affects Version/s: 2.0.1 > ClassCastException java.lang.ClassCastException: > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator > cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection > - > > Key: SPARK-17922 > URL: https://issues.apache.org/jira/browse/SPARK-17922 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.0, 2.0.1 >Reporter: kanika dhuria > Attachments: spark_17922.tar.gz > > > I am using spark 2.0 > Seeing class loading issue because the whole stage code gen is generating > multiple classes with same name as > "org.apache.spark.sql.catalyst.expressions.GeneratedClass" > I am using dataframe transform. and within transform i use Osgi. > Osgi replaces the thread context class loader to ContextFinder which looks at > all the class loaders in the stack to find out the new generated class and > finds the GeneratedClass with inner class GeneratedIterator byteclass > loader(instead of falling back to the byte class loader created by janino > compiler), since the class name is same that byte class loader loads the > class and returns GeneratedClass$GeneratedIterator instead of expected > GeneratedClass$UnsafeProjection. > Can we generate different classes with different names or is it expected to > generate one class only? > This is the somewhat I am trying to do > {noformat} > import org.apache.spark.sql._ > import org.apache.spark.sql.types._ > import com.databricks.spark.avro._ > def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = { > //Initialize osgi > (rows:Iterator[Row]) => { > var outi = Iterator[Row]() > while(rows.hasNext) { > val r = rows.next > outi = outi.++(Iterator(Row(r.get(0 > } > //val ors = Row("abc") > //outi =outi.++( Iterator(ors)) > outi > } > } > def transform1( outType:StructType) :((DataFrame) => DataFrame) = { > (d:DataFrame) => { > val inType = d.schema > val rdd = d.rdd.mapPartitions(exePart(outType)) > d.sqlContext.createDataFrame(rdd, outType) > } > > } > val df = spark.read.avro("file:///data/builds/a1.avro") > val df1 = df.select($"id2").filter(false) > val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, > true)::Nil))).createOrReplaceTempView("tbl0") > spark.sql("insert overwrite table testtable select p1 from tbl0") > {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19578) Poor pyspark performance
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994617#comment-15994617 ] Artur Sukhenko commented on SPARK-19578: [~nchammas] I've opened separate issue for Incorrect UI input-size - https://issues.apache.org/jira/browse/SPARK-20244 > Poor pyspark performance > > > Key: SPARK-19578 > URL: https://issues.apache.org/jira/browse/SPARK-19578 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 1.6.1, 1.6.2, 2.0.1 > Environment: Spark 1.6.2 Hortonworks > Spark 2.0.1 MapR > Spark 1.6.1 MapR >Reporter: Artur Sukhenko > Attachments: reproduce_log > > > Simple job in pyspark takes 14 minutes to complete. > The text file used to reproduce contains multiple millions lines of one word > "yes" > (it might be the cause of poor performance) > {code} > var a = sc.textFile("/tmp/yes.txt") > a.count() > {code} > Same code took 33 sec in spark-shell > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > [spark@c6401 ~]$ yes > /tmp/yes.txt > [spark@c6401 ~]$ ll /tmp/ > -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt > [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > [spark@c6401 ~]$ pyspark > {code} > Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 > {code} > >>> a = sc.textFile("/tmp/yes.txt") > {code} > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 341.1 KB, free 341.1 KB) > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 28.3 KB, free 369.4 KB) > 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:43389 (size: 28.3 KB, free: 517.4 MB) > 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {code} > >>> a.count() > {code} > 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 > 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 > 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 > output partitions > 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at > :1) > 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() > 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() > 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] > at count at :1), which has no missing parents > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 5.7 KB, free 375.1 KB) > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.5 KB, free 378.6 KB) > 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:43389 (size: 3.5 KB, free: 517.4 MB) > 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from > ResultStage 0 (PythonRDD[2] at count at :1) > 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks > 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, partition 0,ANY, 2149 bytes) > 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 17/02/13 11:13:03 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 > 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init > = 445, finish = 212573 > 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 > bytes result sent to driver > 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, > localhost, partition 1,ANY, 2149 bytes) > 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 17/02/13 11:16:37 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 > 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 213605 ms on localhost (1/4) > 17/02/13 11:20:05 INFO PythonRunner:
[jira] [Updated] (SPARK-19578) Poor pyspark performance
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Description: Simple job in pyspark takes 14 minutes to complete. The text file used to reproduce contains multiple millions lines of one word "yes" (it might be the cause of poor performance) {code} var a = sc.textFile("/tmp/yes.txt") a.count() {code} Same code took 33 sec in spark-shell Reproduce steps: Run this to generate big file (press Ctrl+C after 5-6 seconds) [spark@c6401 ~]$ yes > /tmp/yes.txt [spark@c6401 ~]$ ll /tmp/ -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ [spark@c6401 ~]$ pyspark {code} Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 {code} >>> a = sc.textFile("/tmp/yes.txt") {code} 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 341.1 KB, free 341.1 KB) 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.3 KB, free 369.4 KB) 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:43389 (size: 28.3 KB, free: 517.4 MB) 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 {code} >>> a.count() {code} 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 output partitions 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at :1) 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at :1), which has no missing parents 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.7 KB, free 375.1 KB) 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.5 KB, free 378.6 KB) 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:43389 (size: 3.5 KB, free: 517.4 MB) 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[2] at count at :1) 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,ANY, 2149 bytes) 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 17/02/13 11:13:03 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 445, finish = 212573 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 bytes result sent to driver 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,ANY, 2149 bytes) 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 17/02/13 11:16:37 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 213605 ms on localhost (1/4) 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 122, finish = 208186 17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 bytes result sent to driver 17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,ANY, 2149 bytes) 17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 17/02/13 11:20:05 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728 17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 208302 ms on localhost (2/4) 17/02/13 11:23:37 INFO PythonRunner: Times: total = 212021, boot = -27, init = 45, finish = 212003 17/02/13 11:23:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2182 bytes
[jira] [Updated] (SPARK-19578) Poor pyspark performance
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Attachment: (was: spark_shell_correct_inputsize.png) > Poor pyspark performance > > > Key: SPARK-19578 > URL: https://issues.apache.org/jira/browse/SPARK-19578 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 1.6.1, 1.6.2, 2.0.1 > Environment: Spark 1.6.2 Hortonworks > Spark 2.0.1 MapR > Spark 1.6.1 MapR >Reporter: Artur Sukhenko > Attachments: reproduce_log > > > Simple job in pyspark takes 14 minutes to complete. > The text file used to reproduce contains multiple millions lines of one word > "yes" > (it might be the cause of poor performance) > {code} > var a = sc.textFile("/tmp/yes.txt") > a.count() > {code} > Same code took 33 sec in spark-shell > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > [spark@c6401 ~]$ yes > /tmp/yes.txt > [spark@c6401 ~]$ ll /tmp/ > -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt > [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > [spark@c6401 ~]$ pyspark > {code} > Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 > {code} > >>> a = sc.textFile("/tmp/yes.txt") > {code} > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 341.1 KB, free 341.1 KB) > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 28.3 KB, free 369.4 KB) > 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:43389 (size: 28.3 KB, free: 517.4 MB) > 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {code} > >>> a.count() > {code} > 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 > 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 > 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 > output partitions > 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at > :1) > 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() > 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() > 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] > at count at :1), which has no missing parents > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 5.7 KB, free 375.1 KB) > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.5 KB, free 378.6 KB) > 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:43389 (size: 3.5 KB, free: 517.4 MB) > 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from > ResultStage 0 (PythonRDD[2] at count at :1) > 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks > 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, partition 0,ANY, 2149 bytes) > 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 17/02/13 11:13:03 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 > 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init > = 445, finish = 212573 > 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 > bytes result sent to driver > 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, > localhost, partition 1,ANY, 2149 bytes) > 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 17/02/13 11:16:37 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 > 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 213605 ms on localhost (1/4) > 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init > = 122, finish = 208186 > 17/02/13 11:20:05 INFO Executor:
[jira] [Updated] (SPARK-19578) Poor pyspark performance
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Attachment: (was: pyspark_incorrect_inputsize.png) > Poor pyspark performance > > > Key: SPARK-19578 > URL: https://issues.apache.org/jira/browse/SPARK-19578 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 1.6.1, 1.6.2, 2.0.1 > Environment: Spark 1.6.2 Hortonworks > Spark 2.0.1 MapR > Spark 1.6.1 MapR >Reporter: Artur Sukhenko > Attachments: reproduce_log, spark_shell_correct_inputsize.png > > > Simple job in pyspark takes 14 minutes to complete. > The text file used to reproduce contains multiple millions lines of one word > "yes" > (it might be the cause of poor performance) > {code} > var a = sc.textFile("/tmp/yes.txt") > a.count() > {code} > Same code took 33 sec in spark-shell > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > [spark@c6401 ~]$ yes > /tmp/yes.txt > [spark@c6401 ~]$ ll /tmp/ > -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt > [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > [spark@c6401 ~]$ pyspark > {code} > Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 > {code} > >>> a = sc.textFile("/tmp/yes.txt") > {code} > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 341.1 KB, free 341.1 KB) > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 28.3 KB, free 369.4 KB) > 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:43389 (size: 28.3 KB, free: 517.4 MB) > 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {code} > >>> a.count() > {code} > 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 > 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 > 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 > output partitions > 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at > :1) > 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() > 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() > 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] > at count at :1), which has no missing parents > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 5.7 KB, free 375.1 KB) > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.5 KB, free 378.6 KB) > 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:43389 (size: 3.5 KB, free: 517.4 MB) > 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from > ResultStage 0 (PythonRDD[2] at count at :1) > 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks > 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, partition 0,ANY, 2149 bytes) > 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 17/02/13 11:13:03 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 > 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init > = 445, finish = 212573 > 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 > bytes result sent to driver > 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, > localhost, partition 1,ANY, 2149 bytes) > 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 17/02/13 11:16:37 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 > 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 213605 ms on localhost (1/4) > 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init > = 122, finish = 208186 >
[jira] [Updated] (SPARK-19578) Poor pyspark performance
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Summary: Poor pyspark performance (was: Poor pyspark performance + incorrect UI input-size metrics) > Poor pyspark performance > > > Key: SPARK-19578 > URL: https://issues.apache.org/jira/browse/SPARK-19578 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 1.6.1, 1.6.2, 2.0.1 > Environment: Spark 1.6.2 Hortonworks > Spark 2.0.1 MapR > Spark 1.6.1 MapR >Reporter: Artur Sukhenko > Attachments: reproduce_log, spark_shell_correct_inputsize.png > > > Simple job in pyspark takes 14 minutes to complete. > The text file used to reproduce contains multiple millions lines of one word > "yes" > (it might be the cause of poor performance) > {code} > var a = sc.textFile("/tmp/yes.txt") > a.count() > {code} > Same code took 33 sec in spark-shell > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > [spark@c6401 ~]$ yes > /tmp/yes.txt > [spark@c6401 ~]$ ll /tmp/ > -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt > [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > [spark@c6401 ~]$ pyspark > {code} > Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 > {code} > >>> a = sc.textFile("/tmp/yes.txt") > {code} > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 341.1 KB, free 341.1 KB) > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 28.3 KB, free 369.4 KB) > 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:43389 (size: 28.3 KB, free: 517.4 MB) > 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {code} > >>> a.count() > {code} > 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 > 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 > 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 > output partitions > 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at > :1) > 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() > 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() > 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] > at count at :1), which has no missing parents > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 5.7 KB, free 375.1 KB) > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.5 KB, free 378.6 KB) > 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:43389 (size: 3.5 KB, free: 517.4 MB) > 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from > ResultStage 0 (PythonRDD[2] at count at :1) > 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks > 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, partition 0,ANY, 2149 bytes) > 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 17/02/13 11:13:03 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 > 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init > = 445, finish = 212573 > 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 > bytes result sent to driver > 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, > localhost, partition 1,ANY, 2149 bytes) > 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 17/02/13 11:16:37 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 > 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 213605 ms on localhost (1/4) > 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227,
[jira] [Resolved] (SPARK-20308) org.apache.spark.shuffle.FetchFailedException: Too large frame
[ https://issues.apache.org/jira/browse/SPARK-20308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko resolved SPARK-20308. Resolution: Duplicate > org.apache.spark.shuffle.FetchFailedException: Too large frame > -- > > Key: SPARK-20308 > URL: https://issues.apache.org/jira/browse/SPARK-20308 > Project: Spark > Issue Type: Bug > Components: Shuffle >Affects Versions: 2.1.0 >Reporter: Stanislav Chernichkin > > Spark uses custom frame decoder (TransportFrameDecoder) which does not > support frames larger than 2G. This lead to fails when shuffling using large > partitions. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20244) Incorrect input size in UI with pyspark
[ https://issues.apache.org/jira/browse/SPARK-20244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-20244: --- Attachment: pyspark_incorrect_inputsize.png sparkshell_correct_inputsize.png Spark-shell and pyspark UI screenshots. > Incorrect input size in UI with pyspark > --- > > Key: SPARK-20244 > URL: https://issues.apache.org/jira/browse/SPARK-20244 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.0, 2.1.0 >Reporter: Artur Sukhenko >Priority: Minor > Attachments: pyspark_incorrect_inputsize.png, > sparkshell_correct_inputsize.png > > > In Spark UI (Details for Stage) Input Size is 64.0 KB when running in > PySparkShell. > Also it is incorrect in Tasks table: > 64.0 KB / 132120575 in pyspark > 252.0 MB / 132120575 in spark-shell > I will attach screenshots. > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > $ yes > /tmp/yes.txt > $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > $ ./bin/pyspark > {code} > Python 2.7.5 (default, Nov 6 2016, 00:28:07) > [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > Setting default log level to "WARN". > To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use > setLogLevel(newLevel). > Welcome to > __ > / __/__ ___ _/ /__ > _\ \/ _ \/ _ `/ __/ '_/ >/__ / .__/\_,_/_/ /_/\_\ version 2.1.0 > /_/ > Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) > SparkSession available as 'spark'.{code} > >>> a = sc.textFile("/tmp/yes.txt") > >>> a.count() > Open Spark UI and check Stage 0. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20244) Incorrect input size in UI with pyspark
Artur Sukhenko created SPARK-20244: -- Summary: Incorrect input size in UI with pyspark Key: SPARK-20244 URL: https://issues.apache.org/jira/browse/SPARK-20244 Project: Spark Issue Type: Bug Components: Web UI Affects Versions: 2.1.0, 2.0.0 Reporter: Artur Sukhenko Priority: Minor In Spark UI (Details for Stage) Input Size is 64.0 KB when running in PySparkShell. Also it is incorrect in Tasks table: 64.0 KB / 132120575 in pyspark 252.0 MB / 132120575 in spark-shell I will attach screenshots. Reproduce steps: Run this to generate big file (press Ctrl+C after 5-6 seconds) $ yes > /tmp/yes.txt $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ $ ./bin/pyspark {code} Python 2.7.5 (default, Nov 6 2016, 00:28:07) [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.0 /_/ Using Python version 2.7.5 (default, Nov 6 2016 00:28:07) SparkSession available as 'spark'.{code} >>> a = sc.textFile("/tmp/yes.txt") >>> a.count() Open Spark UI and check Stage 0. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,W
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko resolved SPARK-19068. Resolution: Duplicate Fix Version/s: 2.2.0 Closing as duplicate of SPARK-19146. Messages like : {code}ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(136,WrappedArray()){code} are the result of problem discussed in SPARK-19146. > Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: > SparkListenerBus has already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(41,WrappedArray()) > -- > > Key: SPARK-19068 > URL: https://issues.apache.org/jira/browse/SPARK-19068 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.1, 2.1.0 > Environment: RHEL 7.2 >Reporter: JESSE CHEN > Fix For: 2.2.0 > > Attachments: sparklog.tar.gz > > > On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in > order to use all RAM and cores for a 100TB Spark SQL workload. Long-running > queries tend to report the following ERRORs > {noformat} > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(136,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(853,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(395,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(736,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(439,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(16,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(307,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(51,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(535,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(63,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(333,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(484,WrappedArray()) > (omitted) > {noformat} > The message itself maybe a reasonable response to a already stopped > SparkListenerBus (so subsequent events are thrown away with that ERROR > message). The issue is that because SparkContext does NOT exit until all > these ERROR/events are reported, which is a huge number in our setup -- and > this can take, in some cases, hours!!! > We tried increasing the > Adding default property: spark.scheduler.listenerbus.eventqueue.size=13 > from 10K, this still occurs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,Wr
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19068: --- Affects Version/s: 2.0.1 > Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: > SparkListenerBus has already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(41,WrappedArray()) > -- > > Key: SPARK-19068 > URL: https://issues.apache.org/jira/browse/SPARK-19068 > Project: Spark > Issue Type: Bug >Affects Versions: 2.0.1, 2.1.0 > Environment: RHEL 7.2 >Reporter: JESSE CHEN > Attachments: sparklog.tar.gz > > > On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in > order to use all RAM and cores for a 100TB Spark SQL workload. Long-running > queries tend to report the following ERRORs > {noformat} > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(136,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(853,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(395,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(736,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(439,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(16,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(307,WrappedArray()) > 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(51,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(535,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(63,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(333,WrappedArray()) > 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has > already stopped! Dropping event > SparkListenerExecutorMetricsUpdate(484,WrappedArray()) > (omitted) > {noformat} > The message itself maybe a reasonable response to a already stopped > SparkListenerBus (so subsequent events are thrown away with that ERROR > message). The issue is that because SparkContext does NOT exit until all > these ERROR/events are reported, which is a huge number in our setup -- and > this can take, in some cases, hours!!! > We tried increasing the > Adding default property: spark.scheduler.listenerbus.eventqueue.size=13 > from 10K, this still occurs. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957258#comment-15957258 ] Artur Sukhenko commented on SPARK-19068: Having similar problem. Reproduce: {panel}[spark-2.1.0-bin-without-hadoop]$ ./bin/run-example --master yarn --deploy-mode client --num-executors 4 SparkPi 100{panel} Made jstack of driver process and found this thread: {code}"SparkListenerBus" #10 daemon prio=5 os_prio=0 tid=0x7fc8fdc8c800 nid=0x37e7 runnable [0x7fc838764000] java.lang.Thread.State: RUNNABLE at scala.collection.mutable.HashTable$class.resize(HashTable.scala:262) at scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$addEntry0(HashTable.scala:154) at scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:166) at scala.collection.mutable.LinkedHashMap.findOrAddEntry(LinkedHashMap.scala:49) at scala.collection.mutable.LinkedHashMap.put(LinkedHashMap.scala:71) at scala.collection.mutable.LinkedHashMap.$plus$eq(LinkedHashMap.scala:89) at scala.collection.mutable.LinkedHashMap.$plus$eq(LinkedHashMap.scala:49) at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59) at scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59) at scala.collection.mutable.AbstractMap.$plus$plus$eq(Map.scala:80) at scala.collection.IterableLike$class.drop(IterableLike.scala:152) at scala.collection.AbstractIterable.drop(Iterable.scala:54) at org.apache.spark.ui.jobs.JobProgressListener.onTaskEnd(JobProgressListener.scala:412) - locked <0x800b9db8> (a org.apache.spark.ui.jobs.JobProgressListener) at org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45) at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36) at org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63) at org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79) at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58) at org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78) at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245) at org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77) {code} At 100k+ tasks job starts to be very slow and I am getting following errors: {code} [Stage 0:=> (108000 + 4) / 100]17/04/06 01:58:49 WARN LiveListenerBus: Dropped 182143 SparkListenerEvents since Thu Apr 06 01:57:49 JST 2017 [Stage 0:=> (109588 + 4) / 100]17/04/06 01:59:49 WARN LiveListenerBus: Dropped 196647 SparkListenerEvents since Thu Apr 06 01:58:49 JST 2017 [Stage 0:=> (111241 + 5) / 100] {code} After some time we get this: {code} rBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(2,WrappedArray()) [Stage 0:==> (126782 + 4) / 100]17/04/06 02:12:28 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(3,WrappedArray()) [Stage 0:==> (126919 + 5) / 100]17/04/06 02:12:31 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(4,WrappedArray()) [Stage 0:==> (126982 + 4) / 100]17/04/06 02:12:32 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(1,WrappedArray()) [Stage 0:==> (127030 + 5) / 100]17/04/06 02:12:33 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(2,WrappedArray()) [Stage 0:==>
[jira] [Updated] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Description: Simple job in pyspark takes 14 minutes to complete. The text file used to reproduce contains multiple millions lines of one word "yes" (it might be the cause of poor performance) {code} var a = sc.textFile("/tmp/yes.txt") a.count() {code} Same code took 33 sec in spark-shell Reproduce steps: Run this to generate big file (press Ctrl+C after 5-6 seconds) [spark@c6401 ~]$ yes > /tmp/yes.txt [spark@c6401 ~]$ ll /tmp/ -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ [spark@c6401 ~]$ pyspark {code} Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 {code} >>> a = sc.textFile("/tmp/yes.txt") {code} 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 341.1 KB, free 341.1 KB) 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.3 KB, free 369.4 KB) 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:43389 (size: 28.3 KB, free: 517.4 MB) 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 {code} >>> a.count() {code} 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 output partitions 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at :1) 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at :1), which has no missing parents 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.7 KB, free 375.1 KB) 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.5 KB, free 378.6 KB) 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:43389 (size: 3.5 KB, free: 517.4 MB) 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[2] at count at :1) 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,ANY, 2149 bytes) 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 17/02/13 11:13:03 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 445, finish = 212573 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 bytes result sent to driver 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,ANY, 2149 bytes) 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 17/02/13 11:16:37 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 213605 ms on localhost (1/4) 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 122, finish = 208186 17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 bytes result sent to driver 17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,ANY, 2149 bytes) 17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 17/02/13 11:20:05 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728 17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 208302 ms on localhost (2/4) 17/02/13 11:23:37 INFO PythonRunner: Times: total = 212021, boot = -27, init = 45, finish = 212003 17/02/13 11:23:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2182 bytes
[jira] [Updated] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics
[ https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19578: --- Attachment: reproduce_log spark_shell_correct_inputsize.png pyspark_incorrect_inputsize.png > Poor pyspark performance + incorrect UI input-size metrics > -- > > Key: SPARK-19578 > URL: https://issues.apache.org/jira/browse/SPARK-19578 > Project: Spark > Issue Type: Bug > Components: PySpark, Web UI >Affects Versions: 1.6.1, 1.6.2, 2.0.1 > Environment: Spark 1.6.2 Hortonworks > Spark 2.0.1 MapR > Spark 1.6.1 MapR >Reporter: Artur Sukhenko > Attachments: pyspark_incorrect_inputsize.png, reproduce_log, > spark_shell_correct_inputsize.png > > > Simple job in pyspark takes 14 minutes to complete > {code} > var a = sc.textFile("/tmp/yes.txt") > a.count() > {code} > Same code took 33 sec in spark-shell > Reproduce steps: > Run this to generate big file (press Ctrl+C after 5-6 seconds) > [spark@c6401 ~]$ yes > /tmp/yes.txt > [spark@c6401 ~]$ ll /tmp/ > -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt > [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ > [spark@c6401 ~]$ pyspark > {code} > Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) > [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 > 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 > {code} > >>> a = sc.textFile("/tmp/yes.txt") > {code} > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in > memory (estimated size 341.1 KB, free 341.1 KB) > 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes > in memory (estimated size 28.3 KB, free 369.4 KB) > 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory > on localhost:43389 (size: 28.3 KB, free: 517.4 MB) > 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at > NativeMethodAccessorImpl.java:-2 > {code} > >>> a.count() > {code} > 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 > 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 > 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 > output partitions > 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at > :1) > 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() > 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() > 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] > at count at :1), which has no missing parents > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in > memory (estimated size 5.7 KB, free 375.1 KB) > 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes > in memory (estimated size 3.5 KB, free 378.6 KB) > 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory > on localhost:43389 (size: 3.5 KB, free: 517.4 MB) > 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at > DAGScheduler.scala:1008 > 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from > ResultStage 0 (PythonRDD[2] at count at :1) > 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks > 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, > localhost, partition 0,ANY, 2149 bytes) > 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) > 17/02/13 11:13:03 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 > 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use > mapreduce.task.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, > use mapreduce.task.attempt.id > 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. > Instead, use mapreduce.task.ismap > 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. > Instead, use mapreduce.task.partition > 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use > mapreduce.job.id > 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init > = 445, finish = 212573 > 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 > bytes result sent to driver > 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, > localhost, partition 1,ANY, 2149 bytes) > 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) > 17/02/13 11:16:37 INFO HadoopRDD: Input split: > hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 > 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) > in 213605 ms on localhost (1/4) > 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81,
[jira] [Created] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics
Artur Sukhenko created SPARK-19578: -- Summary: Poor pyspark performance + incorrect UI input-size metrics Key: SPARK-19578 URL: https://issues.apache.org/jira/browse/SPARK-19578 Project: Spark Issue Type: Bug Components: PySpark, Web UI Affects Versions: 2.0.1, 1.6.2, 1.6.1 Environment: Spark 1.6.2 Hortonworks Spark 2.0.1 MapR Spark 1.6.1 MapR Reporter: Artur Sukhenko Simple job in pyspark takes 14 minutes to complete {code} var a = sc.textFile("/tmp/yes.txt") a.count() {code} Same code took 33 sec in spark-shell Reproduce steps: Run this to generate big file (press Ctrl+C after 5-6 seconds) [spark@c6401 ~]$ yes > /tmp/yes.txt [spark@c6401 ~]$ ll /tmp/ -rw-r--r-- 1 spark hadoop 516079616 Feb 13 11:10 yes.txt [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/ [spark@c6401 ~]$ pyspark {code} Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2 {code} >>> a = sc.textFile("/tmp/yes.txt") {code} 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 341.1 KB, free 341.1 KB) 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 28.3 KB, free 369.4 KB) 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:43389 (size: 28.3 KB, free: 517.4 MB) 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at NativeMethodAccessorImpl.java:-2 {code} >>> a.count() {code} 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 output partitions 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at :1) 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List() 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List() 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at count at :1), which has no missing parents 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 5.7 KB, free 375.1 KB) 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 3.5 KB, free 378.6 KB) 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:43389 (size: 3.5 KB, free: 517.4 MB) 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1008 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from ResultStage 0 (PythonRDD[2] at count at :1) 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, partition 0,ANY, 2149 bytes) 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0) 17/02/13 11:13:03 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 445, finish = 212573 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 bytes result sent to driver 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, partition 1,ANY, 2149 bytes) 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1) 17/02/13 11:16:37 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 213605 ms on localhost (1/4) 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 122, finish = 208186 17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 bytes result sent to driver 17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, localhost, partition 2,ANY, 2149 bytes) 17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2) 17/02/13 11:20:05 INFO HadoopRDD: Input split: hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728 17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 208302 ms on localhost (2/4) 17/02/13 11:23:37
[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-4105: -- Affects Version/s: 2.0.0 > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based > shuffle > - > > Key: SPARK-4105 > URL: https://issues.apache.org/jira/browse/SPARK-4105 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.5.1, 1.6.1, 2.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Attachments: JavaObjectToSerialize.java, > SparkFailedToUncompressGenerator.scala > > > We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during > shuffle read. Here's a sample stacktrace from an executor: > {code} > 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID > 33053) > java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) > at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) > at > org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) > at > org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) > at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58) > at > org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) > at > org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Here's another occurrence of a similar error: > {code} >
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470574#comment-15470574 ] Artur Sukhenko commented on SPARK-17340: Agree with [~jerryshao]. > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470514#comment-15470514 ] Artur Sukhenko edited comment on SPARK-17340 at 9/7/16 12:57 PM: - I thought you can do {code} $yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED {code} to check if it is used. But there is spark.yarn.preserve.staging.files property which will make staging folder stay in hfds, if it is enabled - you probably don't want them to be cleaned. was (Author: asukhenko): I thought you can do {code} $yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED {code} to check if it used. But there is spark.yarn.preserve.staging.files property which will make staging folder stay in hfds, if it is enabled - you probably don't want them to be cleaned. > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470514#comment-15470514 ] Artur Sukhenko commented on SPARK-17340: I thought you can do {code} $yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED {code} to check if it used. But there is spark.yarn.preserve.staging.files property which will make staging folder stay in hfds, if it is enabled - you probably don't want them to be cleaned. > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26
[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
[ https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464604#comment-15464604 ] Artur Sukhenko commented on SPARK-17379: Tried to use spark with 4.1.5, after fixing deprecated and implementing new methods some tests failed. So agree with you to just upgrade to 4.0.41. > Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible) > -- > > Key: SPARK-17379 > URL: https://issues.apache.org/jira/browse/SPARK-17379 > Project: Spark > Issue Type: Improvement >Affects Versions: 2.0.0 >Reporter: Adam Roberts >Priority: Trivial > > We should use the newest version of netty based on info here: > http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially > interested in the static initialiser deadlock fix: > https://github.com/netty/netty/pull/5730 > Lots more fixes mentioned so will create the pull request - again a case of > updating the pom and then the dependency files -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455086#comment-15455086 ] Artur Sukhenko commented on SPARK-17340: Yes, I can. However some users will stop app like this and we cannot ask everyone not to kill local yarn#client process. Eventually .sparkStaging folder will be filled up and you'll have to manually clean it. > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null}
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455077#comment-15455077 ] Artur Sukhenko commented on SPARK-17340: Oh, now I get this. However, if you stop it in yarn-client with Ctrl+C, it does cleanup. Shouldn't yarn cluster be tolerant to such an shutdown? > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455038#comment-15455038 ] Artur Sukhenko commented on SPARK-17340: With yarn-client it will be cleaned up. I am talking about yarn-cluster. Where cleanup is done by ApplicationMaster.scala#cleanupStagingDir only when {code} if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } {code} > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102 ] Artur Sukhenko edited comment on SPARK-17340 at 8/31/16 7:14 PM: - Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS) It did clean up .stagingDirectory when I Ctrl+C spark submit and yarn-killed app. {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} And in Spark 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY): {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} was (Author: asukhenko): Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS) {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY): {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102 ] Artur Sukhenko edited comment on SPARK-17340 at 8/31/16 7:12 PM: - Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS) {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY): {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} was (Author: asukhenko): Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS) {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code:java} On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY): {code} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped >
[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102 ] Artur Sukhenko commented on SPARK-17340: Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS) {code:java} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code:java} On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY): {code} if (!unregistered) { // we only want to unregister if we don't want the RM to retry if (finalStatus == FinalApplicationStatus.SUCCEEDED || exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) { unregister(finalStatus, finalMsg) cleanupStagingDir(fs) } } {code} > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26
[jira] [Updated] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-17340: --- Summary: .sparkStaging not cleaned if application exited incorrectly (was: .sparkStaging is not cleaned if killed ) > .sparkStaging not cleaned if application exited incorrectly > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/08/26 00:51:17
[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-17340: --- Component/s: YARN > .sparkStaging is not cleaned if killed > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and press Ctrl+C > 3. kill app: $ yarn application -kill > 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > All of these apps are already finished/killed, but > sparkStaging/application_ remains: > {code} > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > 16/08/26 00:51:17 INFO handler.ContextHandler:
[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-17340: --- Description: When running Spark (yarn,cluster mode) and killing application .sparkStaging is not cleaned. Reproduce: 1. run SparkPi job in yarn cluster mode 2. Wait app to switch to RUNNING and press Ctrl+C 3. kill app: $ yarn application -kill 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. All of these apps are already finished/killed, but sparkStaging/application_ remains: {code} $ hadoop fs -ls .sparkStaging Found 6 items drwx-- - user user 3 2016-08-26 00:57 .sparkStaging/application_1472140614688_0001 drwx-- - user user 3 2016-08-26 01:09 .sparkStaging/application_1472140614688_0002 drwx-- - user user 3 2016-08-26 19:51 .sparkStaging/application_1472140614688_0005 drwx-- - user user 3 2016-08-26 19:53 .sparkStaging/application_1472140614688_0007 drwx-- - user user 3 2016-08-31 22:43 .sparkStaging/application_1472634296300_0011 drwx-- - user user 3 2016-08-31 23:30 .sparkStaging/application_1472651370711_0006 {code} {code} ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 0.0 (TID 504) in 14 ms on node1 (505/1000) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 0.0 (TID 505) in 14 ms on node1 (506/1000) 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 508.0 in stage 0.0 (TID 508, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 507.0 in stage 0.0 (TID 507) in 20 ms on node1 (507/1000) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 509.0 in stage 0.0 (TID 509, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO
[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed
[ https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-17340: --- Summary: .sparkStaging is not cleaned if killed (was: .sparkStaging not cleaned in yarn-cluster mode) > .sparkStaging is not cleaned if killed > --- > > Key: SPARK-17340 > URL: https://issues.apache.org/jira/browse/SPARK-17340 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.2, 1.6.1, 2.0.0 >Reporter: Artur Sukhenko >Priority: Minor > > When running Spark (yarn,cluster mode) and killing application > .sparkStaging is not cleaned. > Reproduce: > 1. run SparkPi job in yarn cluster mode > 2. Wait app to switch to RUNNING and Press Ctrl+C > 3. kill app: > #yarn application -kill > 4. Verify that it is not cleaned up: > #hadoop fs -ls .sparkStaging > Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. > When app is killed like this > == > $ hadoop fs -ls .sparkStaging > Found 6 items > drwx-- - user user 3 2016-08-26 00:57 > .sparkStaging/application_1472140614688_0001 > drwx-- - user user 3 2016-08-26 01:09 > .sparkStaging/application_1472140614688_0002 > drwx-- - user user 3 2016-08-26 19:51 > .sparkStaging/application_1472140614688_0005 > drwx-- - user user 3 2016-08-26 19:53 > .sparkStaging/application_1472140614688_0007 > drwx-- - user user 3 2016-08-31 22:43 > .sparkStaging/application_1472634296300_0011 > drwx-- - user user 3 2016-08-31 23:30 > .sparkStaging/application_1472651370711_0006 > {code} > ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM > 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage > 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage > 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage > 0.0 (TID 504) in 14 ms on node1 (505/1000) > 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage > 0.0 (TID 505) in 14 ms on node1 (506/1000) > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/metrics/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/kill,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/api,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/static,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/threadDump,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/executors,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/environment,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/rdd,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/storage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/pool,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/stage,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/stages,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job/json,null} > 16/08/26 00:51:17 INFO handler.ContextHandler: stopped > o.s.j.s.ServletContextHandler{/jobs/job,null} > 16/08/26
[jira] [Created] (SPARK-17340) .sparkStaging not cleaned in yarn-cluster mode
Artur Sukhenko created SPARK-17340: -- Summary: .sparkStaging not cleaned in yarn-cluster mode Key: SPARK-17340 URL: https://issues.apache.org/jira/browse/SPARK-17340 Project: Spark Issue Type: Bug Affects Versions: 2.0.0, 1.6.1, 1.5.2 Reporter: Artur Sukhenko Priority: Minor When running Spark (yarn,cluster mode) and killing application .sparkStaging is not cleaned. Reproduce: 1. run SparkPi job in yarn cluster mode 2. Wait app to switch to RUNNING and Press Ctrl+C 3. kill app: #yarn application -kill 4. Verify that it is not cleaned up: #hadoop fs -ls .sparkStaging Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error. When app is killed like this == $ hadoop fs -ls .sparkStaging Found 6 items drwx-- - user user 3 2016-08-26 00:57 .sparkStaging/application_1472140614688_0001 drwx-- - user user 3 2016-08-26 01:09 .sparkStaging/application_1472140614688_0002 drwx-- - user user 3 2016-08-26 19:51 .sparkStaging/application_1472140614688_0005 drwx-- - user user 3 2016-08-26 19:53 .sparkStaging/application_1472140614688_0007 drwx-- - user user 3 2016-08-31 22:43 .sparkStaging/application_1472634296300_0011 drwx-- - user user 3 2016-08-31 23:30 .sparkStaging/application_1472651370711_0006 {code} ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 0.0 (TID 504) in 14 ms on node1 (505/1000) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 0.0 (TID 505) in 14 ms on node1 (506/1000) 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/kill,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/api,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/static,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/threadDump,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/executors,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/environment,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/rdd,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/storage,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/pool,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/stage,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/stages,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/job,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs/json,null} 16/08/26 00:51:17 INFO handler.ContextHandler: stopped o.s.j.s.ServletContextHandler{/jobs,null} 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 508.0 in stage 0.0 (TID 508, node1, PROCESS_LOCAL, 2085 bytes) 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 507.0 in stage 0.0 (TID 507) in 20 ms on
[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-4105: -- Affects Version/s: 1.5.1 > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based > shuffle > - > > Key: SPARK-4105 > URL: https://issues.apache.org/jira/browse/SPARK-4105 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.5.1, 1.6.1 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Attachments: JavaObjectToSerialize.java, > SparkFailedToUncompressGenerator.scala > > > We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during > shuffle read. Here's a sample stacktrace from an executor: > {code} > 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID > 33053) > java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) > at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) > at > org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) > at > org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) > at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58) > at > org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) > at > org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Here's another occurrence of a similar error: > {code} >
[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-4105: -- Affects Version/s: 1.6.1 > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based > shuffle > - > > Key: SPARK-4105 > URL: https://issues.apache.org/jira/browse/SPARK-4105 > Project: Spark > Issue Type: Bug > Components: Shuffle, Spark Core >Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.6.1 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Attachments: JavaObjectToSerialize.java, > SparkFailedToUncompressGenerator.scala > > > We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during > shuffle read. Here's a sample stacktrace from an executor: > {code} > 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID > 33053) > java.io.IOException: FAILED_TO_UNCOMPRESS(5) > at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78) > at org.xerial.snappy.SnappyNative.rawUncompress(Native Method) > at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391) > at org.xerial.snappy.Snappy.uncompress(Snappy.java:427) > at > org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127) > at > org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88) > at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58) > at > org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128) > at > org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243) > at > org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at > org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at > org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159) > at > org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:229) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > {code} > Here's another occurrence of a similar error: > {code} > java.io.IOException: