[jira] [Commented] (SPARK-21065) Spark Streaming concurrentJobs + StreamingJobProgressListener conflict

2020-02-19 Thread Artur Sukhenko (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17040456#comment-17040456
 ] 

Artur Sukhenko commented on SPARK-21065:


[~zsxwing] Is `spark.streaming.concurrentJobs` still (2.2/2.3/2.4) risky?

> Spark Streaming concurrentJobs + StreamingJobProgressListener conflict
> --
>
> Key: SPARK-21065
> URL: https://issues.apache.org/jira/browse/SPARK-21065
> Project: Spark
>  Issue Type: Bug
>  Components: DStreams, Scheduler, Web UI
>Affects Versions: 2.1.0
>Reporter: Dan Dutrow
>Priority: Major
>
> My streaming application has 200+ output operations, many of them stateful 
> and several of them windowed. In an attempt to reduce the processing times, I 
> set "spark.streaming.concurrentJobs" to 2+. Initial results are very 
> positive, cutting our processing time from ~3 minutes to ~1 minute, but 
> eventually we encounter an exception as follows:
> (Note that 149697756 ms is 2017-06-09 03:06:00, so it's trying to get a 
> batch from 45 minutes before the exception is thrown.)
> 2017-06-09 03:50:28,259 [Spark Listener Bus] ERROR 
> org.apache.spark.streaming.scheduler.StreamingListenerBus - Listener 
> StreamingJobProgressListener threw an exception
> java.util.NoSuchElementException: key not found 149697756 ms
> at scala.collection.MalLike$class.default(MapLike.scala:228)
> at scala.collection.AbstractMap.default(Map.scala:59)
> at scala.collection.mutable.HashMap.apply(HashMap.scala:65)
> at 
> org.apache.spark.streaming.ui.StreamingJobProgressListener.onOutputOperationCompleted(StreamingJobProgressListener.scala:128)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:67)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.doPostEvent(StreamingListenerBus.scala:29)
> at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.postToAll(StreamingListenerBus.scala:29)
> at 
> org.apache.spark.streaming.scheduler.StreamingListenerBus.onOtherEvent(StreamingListenerBus.scala:43)
> ...
> The Spark code causing the exception is here:
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC125
>   override def onOutputOperationCompleted(
>   outputOperationCompleted: StreamingListenerOutputOperationCompleted): 
> Unit = synchronized {
> // This method is called before onBatchCompleted
> {color:red}runningBatchUIData(outputOperationCompleted.outputOperationInfo.batchTime).{color}
>   updateOutputOperationInfo(outputOperationCompleted.outputOperationInfo)
> }
> It seems to me that it may be caused by that batch being removed earlier.
> https://github.com/apache/spark/blob/branch-2.1/streaming/src/main/scala/org/apache/spark/streaming/ui/StreamingJobProgressListener.scala#LC102
>   override def onBatchCompleted(batchCompleted: 
> StreamingListenerBatchCompleted): Unit = {
> synchronized {
>   waitingBatchUIData.remove(batchCompleted.batchInfo.batchTime)
>   
> {color:red}runningBatchUIData.remove(batchCompleted.batchInfo.batchTime){color}
>   val batchUIData = BatchUIData(batchCompleted.batchInfo)
>   completedBatchUIData.enqueue(batchUIData)
>   if (completedBatchUIData.size > batchUIDataLimit) {
> val removedBatch = completedBatchUIData.dequeue()
> batchTimeToOutputOpIdSparkJobIdPair.remove(removedBatch.batchTime)
>   }
>   totalCompletedBatches += 1L
>   totalProcessedRecords += batchUIData.numRecords
> }
> }
> What is the solution here? Should I make my spark streaming context remember 
> duration a lot longer? ssc.remember(batchDuration * rememberMultiple)
> Otherwise, it seems like there should be some kind of existence check on 
> runningBatchUIData before dereferencing it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-16709) Task with commit failed will retry infinite when speculation set to true

2017-10-03 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-16709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko closed SPARK-16709.
--
   Resolution: Duplicate
Fix Version/s: 1.6.2

> Task with commit failed will retry infinite when speculation set to true
> 
>
> Key: SPARK-16709
> URL: https://issues.apache.org/jira/browse/SPARK-16709
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Hong Shen
> Fix For: 1.6.2
>
> Attachments: commit failed.png
>
>
> In our cluster, we set spark.speculation=true,  but when a task throw 
> exception at SparkHadoopMapRedUtil.performCommit(), this task can retry 
> infinite.
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala#L83



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16109101#comment-16109101
 ] 

Artur Sukhenko commented on SPARK-21593:


[~srowen] Yes, {code} spark.storage.replication.proactive 
{code} is breaking the page.

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
>  with should open Dynamic Allocation part of the page, but doesn't.
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Description: 
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
 with should open Dynamic Allocation part of the page, but doesn't.
!dyn_latest.jpg!



!dyn_211.jpg!



  was:
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
!dyn_latest.jpg!



!dyn_211.jpg!




> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/latest/configuration.html#dynamic-allocation]
>  with should open Dynamic Allocation part of the page, but doesn't.
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16108966#comment-16108966
 ] 

Artur Sukhenko commented on SPARK-21593:


Yes, as well as having {code}### Dynamic Allocation{code} instead of  {code}Dynamic Allocation{code} 

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Description: 
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
!dyn_latest.jpg!



!dyn_211.jpg!



  was:
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
!dyn_latest.jpg|thumbnail!
!dyn_211.jpg|thumbnail!

!doc_latest.jpg|thumbnail!
!doc_211.jpg|thumbnail!



> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
> !dyn_latest.jpg!
> 
> !dyn_211.jpg!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Description: 
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
!dyn_latest.jpg|thumbnail!
!dyn_211.jpg|thumbnail!

!doc_latest.jpg|thumbnail!
!doc_211.jpg|thumbnail!


  was:
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
> !dyn_latest.jpg|thumbnail!
> !dyn_211.jpg|thumbnail!
> !doc_latest.jpg|thumbnail!
> !doc_211.jpg|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Attachment: (was: doc_211.png)

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Attachment: doc_latest.jpg
doc_211.jpg
dyn_211.jpg
dyn_latest.jpg

Broken anchors and menu

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Attachment: (was: doc_latest.png)

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.jpg, doc_latest.jpg, dyn_211.jpg, dyn_latest.jpg
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Description: 
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]


  was:
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]

!doc_latest.jpg|thumbnail!


> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.png, doc_latest.png
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Description: 
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]

!doc_latest.jpg|thumbnail!

  was:
Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]


> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.png, doc_latest.png
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]
> !doc_latest.jpg|thumbnail!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-21593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-21593:
---
Attachment: doc_latest.png
doc_211.png

Latest documentation page with broken menu and working 2.1.1 

> Fix broken configuration page
> -
>
> Key: SPARK-21593
> URL: https://issues.apache.org/jira/browse/SPARK-21593
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 2.2.0
> Environment: Chrome/Firefox
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: doc_211.png, doc_latest.png
>
>
> Latest configuration page for Spark 2.2.0 has broken menu list and named 
> anchors.
> Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
> with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]
> Or try this link [Configuration # Dynamic 
> Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-21593) Fix broken configuration page

2017-08-01 Thread Artur Sukhenko (JIRA)
Artur Sukhenko created SPARK-21593:
--

 Summary: Fix broken configuration page
 Key: SPARK-21593
 URL: https://issues.apache.org/jira/browse/SPARK-21593
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Affects Versions: 2.2.0
 Environment: Chrome/Firefox
Reporter: Artur Sukhenko
Priority: Minor


Latest configuration page for Spark 2.2.0 has broken menu list and named 
anchors.
Compare [2.1.1 docs |https://spark.apache.org/docs/2.1.1/configuration.html] 
with [Latest docs |https://spark.apache.org/docs/latest/configuration.html]

Or try this link [Configuration # Dynamic 
Allocation|https://spark.apache.org/docs/2.1.1/configuration.html#dynamic-allocation]



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18191) Port RDD API to use commit protocol

2017-07-04 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16073860#comment-16073860
 ] 

Artur Sukhenko commented on SPARK-18191:


[~shridharama] Yes, updated 'Fix Version' from 2.1 to 2.2

> Port RDD API to use commit protocol
> ---
>
> Key: SPARK-18191
> URL: https://issues.apache.org/jira/browse/SPARK-18191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Jiang Xingbo
> Fix For: 2.2.0
>
>
> Commit protocol is actually not specific to SQL. We can move it over to core 
> so the RDD API can use it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-18191) Port RDD API to use commit protocol

2017-07-04 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-18191:
---
Fix Version/s: (was: 2.1.0)
   2.2.0

> Port RDD API to use commit protocol
> ---
>
> Key: SPARK-18191
> URL: https://issues.apache.org/jira/browse/SPARK-18191
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Reporter: Reynold Xin
>Assignee: Jiang Xingbo
> Fix For: 2.2.0
>
>
> Commit protocol is actually not specific to SQL. We can move it over to core 
> so the RDD API can use it too.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19360) Spark 2.X does not support stored by clause

2017-06-07 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19360:
---
Summary: Spark 2.X does not support stored by clause  (was: Spark 2.X does 
not support stored by cluase)

> Spark 2.X does not support stored by clause
> ---
>
> Key: SPARK-19360
> URL: https://issues.apache.org/jira/browse/SPARK-19360
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0
>Reporter: Ran Haim
>Priority: Minor
>
> Spark 1.6 and below versions support HiveContext which supports Hive storage 
> handler with "stored by" clause. However, Spark 2.x does not support "stored 
> by". 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4906) Spark master OOMs with exception stack trace stored in JobProgressListener

2017-05-22 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-4906:
--
Affects Version/s: 1.6.1

> Spark master OOMs with exception stack trace stored in JobProgressListener
> --
>
> Key: SPARK-4906
> URL: https://issues.apache.org/jira/browse/SPARK-4906
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.1.1, 1.6.1
>Reporter: Mingyu Kim
> Attachments: LeakingJobProgressListener2OOM.docx, Screen Shot 
> 2016-06-26 at 10.43.57 AM.png
>
>
> Spark master was OOMing with a lot of stack traces retained in 
> JobProgressListener. The object dependency goes like the following.
> JobProgressListener.stageIdToData => StageUIData.taskData => 
> TaskUIData.errorMessage
> Each error message is ~10kb since it has the entire stack trace. As we have a 
> lot of tasks, when all of the tasks across multiple stages go bad, these 
> error messages accounted for 0.5GB of heap at some point.
> Please correct me if I'm wrong, but it looks like all the task info for 
> running applications are kept in memory, which means it's almost always bound 
> to OOM for long-running applications. Would it make sense to fix this, for 
> example, by spilling some UI states to disk?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-17922) ClassCastException java.lang.ClassCastException: org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator cannot be cast to org.apache.spark.sql.cataly

2017-05-19 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-17922:
---
Affects Version/s: 2.0.1

> ClassCastException java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator 
> cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeProjection 
> -
>
> Key: SPARK-17922
> URL: https://issues.apache.org/jira/browse/SPARK-17922
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 2.0.1
>Reporter: kanika dhuria
> Attachments: spark_17922.tar.gz
>
>
> I am using spark 2.0
> Seeing class loading issue because the whole stage code gen is generating 
> multiple classes with same name as 
> "org.apache.spark.sql.catalyst.expressions.GeneratedClass"
> I am using dataframe transform. and within transform i use Osgi.
> Osgi replaces the thread context class loader to ContextFinder which looks at 
> all the class loaders in the stack to find out the new generated class and 
> finds the GeneratedClass with inner class GeneratedIterator byteclass 
> loader(instead of falling back to the byte class loader created by janino 
> compiler), since the class name is same that byte class loader loads the 
> class and returns GeneratedClass$GeneratedIterator instead of expected 
> GeneratedClass$UnsafeProjection.
> Can we generate different classes with different names or is it expected to 
> generate one class only? 
> This is the somewhat I am trying to do 
> {noformat} 
> import org.apache.spark.sql._
> import org.apache.spark.sql.types._
> import com.databricks.spark.avro._
>   def exePart(out:StructType): ((Iterator[Row]) => Iterator[Row]) = {
> //Initialize osgi
>  (rows:Iterator[Row]) => {
>  var outi = Iterator[Row]() 
>  while(rows.hasNext) {
>  val r = rows.next 
>  outi = outi.++(Iterator(Row(r.get(0  
>  } 
>  //val ors = Row("abc")   
>  //outi =outi.++( Iterator(ors))  
>  outi
>  }
>   }
> def transform1( outType:StructType) :((DataFrame) => DataFrame) = {
>  (d:DataFrame) => {
>   val inType = d.schema
>   val rdd = d.rdd.mapPartitions(exePart(outType))
>   d.sqlContext.createDataFrame(rdd, outType)
> }
>
>   }
> val df = spark.read.avro("file:///data/builds/a1.avro")
> val df1 = df.select($"id2").filter(false)
> val df2 = df1.transform(transform1(StructType(StructField("p1", IntegerType, 
> true)::Nil))).createOrReplaceTempView("tbl0")
> spark.sql("insert overwrite table testtable select p1 from tbl0")
> {noformat} 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19578) Poor pyspark performance

2017-05-03 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15994617#comment-15994617
 ] 

Artur Sukhenko commented on SPARK-19578:


[~nchammas] I've opened separate issue for Incorrect UI input-size - 
https://issues.apache.org/jira/browse/SPARK-20244

> Poor pyspark performance
> 
>
> Key: SPARK-19578
> URL: https://issues.apache.org/jira/browse/SPARK-19578
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Web UI
>Affects Versions: 1.6.1, 1.6.2, 2.0.1
> Environment: Spark 1.6.2 Hortonworks
> Spark 2.0.1 MapR
> Spark 1.6.1 MapR
>Reporter: Artur Sukhenko
> Attachments: reproduce_log
>
>
> Simple job in pyspark takes 14 minutes to complete.
> The text file used to reproduce contains multiple millions lines of one word 
> "yes"
>  (it might be the cause of poor performance)
> {code}
> var a = sc.textFile("/tmp/yes.txt")
> a.count()
> {code}
> Same code took 33 sec in spark-shell
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> [spark@c6401 ~]$ yes > /tmp/yes.txt
> [spark@c6401 ~]$ ll /tmp/
> -rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
> [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> [spark@c6401 ~]$ pyspark
> {code}
> Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
> {code}
> >>> a = sc.textFile("/tmp/yes.txt")
> {code}
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 341.1 KB, free 341.1 KB)
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 28.3 KB, free 369.4 KB)
> 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:43389 (size: 28.3 KB, free: 517.4 MB)
> 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
> NativeMethodAccessorImpl.java:-2
> {code}
> >>> a.count()
> {code}
> 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
> 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
> 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
> output partitions
> 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
> :1)
> 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] 
> at count at :1), which has no missing parents
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 5.7 KB, free 375.1 KB)
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 3.5 KB, free 378.6 KB)
> 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:43389 (size: 3.5 KB, free: 517.4 MB)
> 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:1008
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 0 (PythonRDD[2] at count at :1)
> 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
> 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, partition 0,ANY, 2149 bytes)
> 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 17/02/13 11:13:03 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
> 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
> mapreduce.task.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, 
> use mapreduce.task.attempt.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. 
> Instead, use mapreduce.task.ismap
> 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
> Instead, use mapreduce.task.partition
> 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
> mapreduce.job.id
> 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init 
> = 445, finish = 212573
> 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
> bytes result sent to driver
> 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, partition 1,ANY, 2149 bytes)
> 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 17/02/13 11:16:37 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
> 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 213605 ms on localhost (1/4)
> 17/02/13 11:20:05 INFO PythonRunner: 

[jira] [Updated] (SPARK-19578) Poor pyspark performance

2017-05-03 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Description: 
Simple job in pyspark takes 14 minutes to complete.
The text file used to reproduce contains multiple millions lines of one word 
"yes"
 (it might be the cause of poor performance)
{code}
var a = sc.textFile("/tmp/yes.txt")
a.count()
{code}
Same code took 33 sec in spark-shell

Reproduce steps:
Run this  to generate big file (press Ctrl+C after 5-6 seconds)
[spark@c6401 ~]$ yes > /tmp/yes.txt

[spark@c6401 ~]$ ll /tmp/
-rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
[spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
[spark@c6401 ~]$ pyspark
{code}
Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
{code}
>>> a = sc.textFile("/tmp/yes.txt")
{code}
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 341.1 KB, free 341.1 KB)
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 28.3 KB, free 369.4 KB)
17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:43389 (size: 28.3 KB, free: 517.4 MB)
17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2
{code}
>>> a.count()
{code}
17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
output partitions
17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
:1)
17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at 
count at :1), which has no missing parents
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 5.7 KB, free 375.1 KB)
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 3.5 KB, free 378.6 KB)
17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 
localhost:43389 (size: 3.5 KB, free: 517.4 MB)
17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:1008
17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
ResultStage 0 (PythonRDD[2] at count at :1)
17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, partition 0,ANY, 2149 bytes)
17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/02/13 11:13:03 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id

17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 
445, finish = 212573
17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
bytes result sent to driver
17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
localhost, partition 1,ANY, 2149 bytes)
17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/02/13 11:16:37 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
in 213605 ms on localhost (1/4)

17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 
122, finish = 208186
17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 
bytes result sent to driver
17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 
localhost, partition 2,ANY, 2149 bytes)
17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17/02/13 11:20:05 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728
17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) 
in 208302 ms on localhost (2/4)
17/02/13 11:23:37 INFO PythonRunner: Times: total = 212021, boot = -27, init = 
45, finish = 212003
17/02/13 11:23:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2182 
bytes 

[jira] [Updated] (SPARK-19578) Poor pyspark performance

2017-05-03 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Attachment: (was: spark_shell_correct_inputsize.png)

> Poor pyspark performance
> 
>
> Key: SPARK-19578
> URL: https://issues.apache.org/jira/browse/SPARK-19578
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Web UI
>Affects Versions: 1.6.1, 1.6.2, 2.0.1
> Environment: Spark 1.6.2 Hortonworks
> Spark 2.0.1 MapR
> Spark 1.6.1 MapR
>Reporter: Artur Sukhenko
> Attachments: reproduce_log
>
>
> Simple job in pyspark takes 14 minutes to complete.
> The text file used to reproduce contains multiple millions lines of one word 
> "yes"
>  (it might be the cause of poor performance)
> {code}
> var a = sc.textFile("/tmp/yes.txt")
> a.count()
> {code}
> Same code took 33 sec in spark-shell
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> [spark@c6401 ~]$ yes > /tmp/yes.txt
> [spark@c6401 ~]$ ll /tmp/
> -rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
> [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> [spark@c6401 ~]$ pyspark
> {code}
> Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
> {code}
> >>> a = sc.textFile("/tmp/yes.txt")
> {code}
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 341.1 KB, free 341.1 KB)
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 28.3 KB, free 369.4 KB)
> 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:43389 (size: 28.3 KB, free: 517.4 MB)
> 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
> NativeMethodAccessorImpl.java:-2
> {code}
> >>> a.count()
> {code}
> 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
> 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
> 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
> output partitions
> 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
> :1)
> 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] 
> at count at :1), which has no missing parents
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 5.7 KB, free 375.1 KB)
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 3.5 KB, free 378.6 KB)
> 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:43389 (size: 3.5 KB, free: 517.4 MB)
> 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:1008
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 0 (PythonRDD[2] at count at :1)
> 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
> 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, partition 0,ANY, 2149 bytes)
> 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 17/02/13 11:13:03 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
> 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
> mapreduce.task.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, 
> use mapreduce.task.attempt.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. 
> Instead, use mapreduce.task.ismap
> 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
> Instead, use mapreduce.task.partition
> 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
> mapreduce.job.id
> 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init 
> = 445, finish = 212573
> 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
> bytes result sent to driver
> 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, partition 1,ANY, 2149 bytes)
> 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 17/02/13 11:16:37 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
> 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 213605 ms on localhost (1/4)
> 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init 
> = 122, finish = 208186
> 17/02/13 11:20:05 INFO Executor: 

[jira] [Updated] (SPARK-19578) Poor pyspark performance

2017-05-03 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Attachment: (was: pyspark_incorrect_inputsize.png)

> Poor pyspark performance
> 
>
> Key: SPARK-19578
> URL: https://issues.apache.org/jira/browse/SPARK-19578
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Web UI
>Affects Versions: 1.6.1, 1.6.2, 2.0.1
> Environment: Spark 1.6.2 Hortonworks
> Spark 2.0.1 MapR
> Spark 1.6.1 MapR
>Reporter: Artur Sukhenko
> Attachments: reproduce_log, spark_shell_correct_inputsize.png
>
>
> Simple job in pyspark takes 14 minutes to complete.
> The text file used to reproduce contains multiple millions lines of one word 
> "yes"
>  (it might be the cause of poor performance)
> {code}
> var a = sc.textFile("/tmp/yes.txt")
> a.count()
> {code}
> Same code took 33 sec in spark-shell
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> [spark@c6401 ~]$ yes > /tmp/yes.txt
> [spark@c6401 ~]$ ll /tmp/
> -rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
> [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> [spark@c6401 ~]$ pyspark
> {code}
> Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
> {code}
> >>> a = sc.textFile("/tmp/yes.txt")
> {code}
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 341.1 KB, free 341.1 KB)
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 28.3 KB, free 369.4 KB)
> 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:43389 (size: 28.3 KB, free: 517.4 MB)
> 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
> NativeMethodAccessorImpl.java:-2
> {code}
> >>> a.count()
> {code}
> 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
> 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
> 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
> output partitions
> 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
> :1)
> 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] 
> at count at :1), which has no missing parents
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 5.7 KB, free 375.1 KB)
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 3.5 KB, free 378.6 KB)
> 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:43389 (size: 3.5 KB, free: 517.4 MB)
> 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:1008
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 0 (PythonRDD[2] at count at :1)
> 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
> 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, partition 0,ANY, 2149 bytes)
> 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 17/02/13 11:13:03 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
> 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
> mapreduce.task.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, 
> use mapreduce.task.attempt.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. 
> Instead, use mapreduce.task.ismap
> 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
> Instead, use mapreduce.task.partition
> 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
> mapreduce.job.id
> 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init 
> = 445, finish = 212573
> 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
> bytes result sent to driver
> 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, partition 1,ANY, 2149 bytes)
> 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 17/02/13 11:16:37 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
> 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 213605 ms on localhost (1/4)
> 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init 
> = 122, finish = 208186
> 

[jira] [Updated] (SPARK-19578) Poor pyspark performance

2017-05-03 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Summary: Poor pyspark performance  (was: Poor pyspark performance + 
incorrect UI input-size metrics)

> Poor pyspark performance
> 
>
> Key: SPARK-19578
> URL: https://issues.apache.org/jira/browse/SPARK-19578
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Web UI
>Affects Versions: 1.6.1, 1.6.2, 2.0.1
> Environment: Spark 1.6.2 Hortonworks
> Spark 2.0.1 MapR
> Spark 1.6.1 MapR
>Reporter: Artur Sukhenko
> Attachments: reproduce_log, spark_shell_correct_inputsize.png
>
>
> Simple job in pyspark takes 14 minutes to complete.
> The text file used to reproduce contains multiple millions lines of one word 
> "yes"
>  (it might be the cause of poor performance)
> {code}
> var a = sc.textFile("/tmp/yes.txt")
> a.count()
> {code}
> Same code took 33 sec in spark-shell
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> [spark@c6401 ~]$ yes > /tmp/yes.txt
> [spark@c6401 ~]$ ll /tmp/
> -rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
> [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> [spark@c6401 ~]$ pyspark
> {code}
> Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
> {code}
> >>> a = sc.textFile("/tmp/yes.txt")
> {code}
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 341.1 KB, free 341.1 KB)
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 28.3 KB, free 369.4 KB)
> 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:43389 (size: 28.3 KB, free: 517.4 MB)
> 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
> NativeMethodAccessorImpl.java:-2
> {code}
> >>> a.count()
> {code}
> 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
> 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
> 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
> output partitions
> 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
> :1)
> 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] 
> at count at :1), which has no missing parents
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 5.7 KB, free 375.1 KB)
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 3.5 KB, free 378.6 KB)
> 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:43389 (size: 3.5 KB, free: 517.4 MB)
> 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:1008
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 0 (PythonRDD[2] at count at :1)
> 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
> 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, partition 0,ANY, 2149 bytes)
> 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 17/02/13 11:13:03 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
> 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
> mapreduce.task.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, 
> use mapreduce.task.attempt.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. 
> Instead, use mapreduce.task.ismap
> 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
> Instead, use mapreduce.task.partition
> 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
> mapreduce.job.id
> 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init 
> = 445, finish = 212573
> 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
> bytes result sent to driver
> 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, partition 1,ANY, 2149 bytes)
> 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 17/02/13 11:16:37 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
> 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 213605 ms on localhost (1/4)
> 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, 

[jira] [Resolved] (SPARK-20308) org.apache.spark.shuffle.FetchFailedException: Too large frame

2017-04-12 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko resolved SPARK-20308.

Resolution: Duplicate

> org.apache.spark.shuffle.FetchFailedException: Too large frame
> --
>
> Key: SPARK-20308
> URL: https://issues.apache.org/jira/browse/SPARK-20308
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 2.1.0
>Reporter: Stanislav Chernichkin
>
> Spark uses custom frame decoder (TransportFrameDecoder) which does not 
> support frames larger than 2G. This lead to fails when shuffling using large 
> partitions.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-20244) Incorrect input size in UI with pyspark

2017-04-06 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-20244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-20244:
---
Attachment: pyspark_incorrect_inputsize.png
sparkshell_correct_inputsize.png

Spark-shell and pyspark UI screenshots.

> Incorrect input size in UI with pyspark
> ---
>
> Key: SPARK-20244
> URL: https://issues.apache.org/jira/browse/SPARK-20244
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 2.0.0, 2.1.0
>Reporter: Artur Sukhenko
>Priority: Minor
> Attachments: pyspark_incorrect_inputsize.png, 
> sparkshell_correct_inputsize.png
>
>
> In Spark UI (Details for Stage) Input Size is  64.0 KB when running in 
> PySparkShell. 
> Also it is incorrect in Tasks table:
> 64.0 KB / 132120575 in pyspark
> 252.0 MB / 132120575 in spark-shell
> I will attach screenshots.
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> $ yes > /tmp/yes.txt
> $ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> $ ./bin/pyspark
> {code}
> Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
> [GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
> Setting default log level to "WARN".
> To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
> setLogLevel(newLevel).
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/__ / .__/\_,_/_/ /_/\_\   version 2.1.0
>   /_/
> Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
> SparkSession available as 'spark'.{code}
> >>> a = sc.textFile("/tmp/yes.txt")
> >>> a.count()
> Open Spark UI and check Stage 0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-20244) Incorrect input size in UI with pyspark

2017-04-06 Thread Artur Sukhenko (JIRA)
Artur Sukhenko created SPARK-20244:
--

 Summary: Incorrect input size in UI with pyspark
 Key: SPARK-20244
 URL: https://issues.apache.org/jira/browse/SPARK-20244
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 2.1.0, 2.0.0
Reporter: Artur Sukhenko
Priority: Minor


In Spark UI (Details for Stage) Input Size is  64.0 KB when running in 
PySparkShell. 
Also it is incorrect in Tasks table:
64.0 KB / 132120575 in pyspark
252.0 MB / 132120575 in spark-shell

I will attach screenshots.

Reproduce steps:
Run this  to generate big file (press Ctrl+C after 5-6 seconds)
$ yes > /tmp/yes.txt
$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
$ ./bin/pyspark
{code}
Python 2.7.5 (default, Nov  6 2016, 00:28:07) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use 
setLogLevel(newLevel).
Welcome to
    __
 / __/__  ___ _/ /__
_\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.1.0
  /_/

Using Python version 2.7.5 (default, Nov  6 2016 00:28:07)
SparkSession available as 'spark'.{code}
>>> a = sc.textFile("/tmp/yes.txt")
>>> a.count()


Open Spark UI and check Stage 0.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,W

2017-04-06 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko resolved SPARK-19068.

   Resolution: Duplicate
Fix Version/s: 2.2.0

Closing as duplicate of SPARK-19146. 
Messages like :
{code}ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! 
Dropping event SparkListenerExecutorMetricsUpdate(136,WrappedArray()){code}
are the result of problem discussed in SPARK-19146.

> Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: 
> SparkListenerBus has already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(41,WrappedArray())
> --
>
> Key: SPARK-19068
> URL: https://issues.apache.org/jira/browse/SPARK-19068
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.1.0
> Environment: RHEL 7.2
>Reporter: JESSE CHEN
> Fix For: 2.2.0
>
> Attachments: sparklog.tar.gz
>
>
> On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in 
> order to use all RAM and cores for a 100TB Spark SQL workload. Long-running 
> queries tend to report the following ERRORs
> {noformat}
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(136,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(853,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(395,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(736,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(439,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(16,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(307,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(51,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(535,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(63,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(333,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(484,WrappedArray())
> (omitted) 
> {noformat}
> The message itself maybe a reasonable response to a already stopped 
> SparkListenerBus (so subsequent events are thrown away with that ERROR 
> message). The issue is that because SparkContext does NOT exit until all 
> these ERROR/events are reported, which is a huge number in our setup -- and 
> this can take, in some cases, hours!!!
> We tried increasing the 
> Adding default property: spark.scheduler.listenerbus.eventqueue.size=13
> from 10K, this still occurs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,Wr

2017-04-05 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19068:
---
Affects Version/s: 2.0.1

> Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: 
> SparkListenerBus has already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(41,WrappedArray())
> --
>
> Key: SPARK-19068
> URL: https://issues.apache.org/jira/browse/SPARK-19068
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 2.0.1, 2.1.0
> Environment: RHEL 7.2
>Reporter: JESSE CHEN
> Attachments: sparklog.tar.gz
>
>
> On a large cluster with 45TB RAM and 1,000 cores, we used 1008 executors in 
> order to use all RAM and cores for a 100TB Spark SQL workload. Long-running 
> queries tend to report the following ERRORs
> {noformat}
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(136,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(853,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(395,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(736,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(439,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(16,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(307,WrappedArray())
> 16/12/27 12:44:28 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(51,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(535,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(63,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(333,WrappedArray())
> 16/12/27 12:44:29 ERROR scheduler.LiveListenerBus: SparkListenerBus has 
> already stopped! Dropping event 
> SparkListenerExecutorMetricsUpdate(484,WrappedArray())
> (omitted) 
> {noformat}
> The message itself maybe a reasonable response to a already stopped 
> SparkListenerBus (so subsequent events are thrown away with that ERROR 
> message). The issue is that because SparkContext does NOT exit until all 
> these ERROR/events are reported, which is a huge number in our setup -- and 
> this can take, in some cases, hours!!!
> We tried increasing the 
> Adding default property: spark.scheduler.listenerbus.eventqueue.size=13
> from 10K, this still occurs. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,

2017-04-05 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957258#comment-15957258
 ] 

Artur Sukhenko commented on SPARK-19068:


Having similar problem.
Reproduce: 
{panel}[spark-2.1.0-bin-without-hadoop]$ ./bin/run-example --master yarn 
--deploy-mode client --num-executors 4 SparkPi 100{panel}
Made jstack of driver process and found this thread:
{code}"SparkListenerBus" #10 daemon prio=5 os_prio=0 tid=0x7fc8fdc8c800 
nid=0x37e7 runnable [0x7fc838764000]
   java.lang.Thread.State: RUNNABLE
at scala.collection.mutable.HashTable$class.resize(HashTable.scala:262)
at 
scala.collection.mutable.HashTable$class.scala$collection$mutable$HashTable$$addEntry0(HashTable.scala:154)
at 
scala.collection.mutable.HashTable$class.findOrAddEntry(HashTable.scala:166)
at 
scala.collection.mutable.LinkedHashMap.findOrAddEntry(LinkedHashMap.scala:49)
at scala.collection.mutable.LinkedHashMap.put(LinkedHashMap.scala:71)
at 
scala.collection.mutable.LinkedHashMap.$plus$eq(LinkedHashMap.scala:89)
at 
scala.collection.mutable.LinkedHashMap.$plus$eq(LinkedHashMap.scala:49)
at 
scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
at 
scala.collection.generic.Growable$$anonfun$$plus$plus$eq$1.apply(Growable.scala:59)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:59)
at scala.collection.mutable.AbstractMap.$plus$plus$eq(Map.scala:80)
at scala.collection.IterableLike$class.drop(IterableLike.scala:152)
at scala.collection.AbstractIterable.drop(Iterable.scala:54)
at 
org.apache.spark.ui.jobs.JobProgressListener.onTaskEnd(JobProgressListener.scala:412)
- locked <0x800b9db8> (a 
org.apache.spark.ui.jobs.JobProgressListener)
at 
org.apache.spark.scheduler.SparkListenerBus$class.doPostEvent(SparkListenerBus.scala:45)
at 
org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at 
org.apache.spark.scheduler.LiveListenerBus.doPostEvent(LiveListenerBus.scala:36)
at 
org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:63)
at 
org.apache.spark.scheduler.LiveListenerBus.postToAll(LiveListenerBus.scala:36)
at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(LiveListenerBus.scala:94)
at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1$$anonfun$apply$mcV$sp$1.apply(LiveListenerBus.scala:79)
at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(LiveListenerBus.scala:78)
at org.apache.spark.util.Utils$.tryOrStopSparkContext(Utils.scala:1245)
at 
org.apache.spark.scheduler.LiveListenerBus$$anon$1.run(LiveListenerBus.scala:77)
{code}

At 100k+ tasks job starts to be very slow and I am getting following errors:
{code}
[Stage 0:=>  (108000 + 4) / 
100]17/04/06 01:58:49 WARN LiveListenerBus: Dropped 182143 
SparkListenerEvents since Thu Apr 06 01:57:49 JST 2017
[Stage 0:=>  (109588 + 4) / 
100]17/04/06 01:59:49 WARN LiveListenerBus: Dropped 196647 
SparkListenerEvents since Thu Apr 06 01:58:49 JST 2017
[Stage 0:=>  (111241 + 5) / 100]
{code}
After some time we get this:
{code}
rBus: SparkListenerBus has already stopped! Dropping event 
SparkListenerExecutorMetricsUpdate(2,WrappedArray())
[Stage 0:==> (126782 + 4) / 
100]17/04/06 02:12:28 ERROR LiveListenerBus: SparkListenerBus has already 
stopped! Dropping event SparkListenerExecutorMetricsUpdate(3,WrappedArray())
[Stage 0:==> (126919 + 5) / 
100]17/04/06 02:12:31 ERROR LiveListenerBus: SparkListenerBus has already 
stopped! Dropping event SparkListenerExecutorMetricsUpdate(4,WrappedArray())
[Stage 0:==> (126982 + 4) / 
100]17/04/06 02:12:32 ERROR LiveListenerBus: SparkListenerBus has already 
stopped! Dropping event SparkListenerExecutorMetricsUpdate(1,WrappedArray())
[Stage 0:==> (127030 + 5) / 
100]17/04/06 02:12:33 ERROR LiveListenerBus: SparkListenerBus has already 
stopped! Dropping event SparkListenerExecutorMetricsUpdate(2,WrappedArray())
[Stage 0:==> 

[jira] [Updated] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-02-13 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Description: 
Simple job in pyspark takes 14 minutes to complete.
The text file used to reproduce contains multiple millions lines of one word 
"yes"
 (it might be the cause of poor performance)
{code}
var a = sc.textFile("/tmp/yes.txt")
a.count()
{code}
Same code took 33 sec in spark-shell

Reproduce steps:
Run this  to generate big file (press Ctrl+C after 5-6 seconds)
[spark@c6401 ~]$ yes > /tmp/yes.txt

[spark@c6401 ~]$ ll /tmp/
-rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
[spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
[spark@c6401 ~]$ pyspark
{code}
Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
{code}
>>> a = sc.textFile("/tmp/yes.txt")
{code}
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 341.1 KB, free 341.1 KB)
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 28.3 KB, free 369.4 KB)
17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:43389 (size: 28.3 KB, free: 517.4 MB)
17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2
{code}
>>> a.count()
{code}
17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
output partitions
17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
:1)
17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at 
count at :1), which has no missing parents
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 5.7 KB, free 375.1 KB)
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 3.5 KB, free 378.6 KB)
17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 
localhost:43389 (size: 3.5 KB, free: 517.4 MB)
17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:1008
17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
ResultStage 0 (PythonRDD[2] at count at :1)
17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, partition 0,ANY, 2149 bytes)
17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/02/13 11:13:03 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id

17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 
445, finish = 212573
17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
bytes result sent to driver
17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
localhost, partition 1,ANY, 2149 bytes)
17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/02/13 11:16:37 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
in 213605 ms on localhost (1/4)

17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 
122, finish = 208186
17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 
bytes result sent to driver
17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 
localhost, partition 2,ANY, 2149 bytes)
17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17/02/13 11:20:05 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728
17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) 
in 208302 ms on localhost (2/4)
17/02/13 11:23:37 INFO PythonRunner: Times: total = 212021, boot = -27, init = 
45, finish = 212003
17/02/13 11:23:37 INFO Executor: Finished task 2.0 in stage 0.0 (TID 2). 2182 
bytes 

[jira] [Updated] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-02-13 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-19578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-19578:
---
Attachment: reproduce_log
spark_shell_correct_inputsize.png
pyspark_incorrect_inputsize.png

> Poor pyspark performance + incorrect UI input-size metrics
> --
>
> Key: SPARK-19578
> URL: https://issues.apache.org/jira/browse/SPARK-19578
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Web UI
>Affects Versions: 1.6.1, 1.6.2, 2.0.1
> Environment: Spark 1.6.2 Hortonworks
> Spark 2.0.1 MapR
> Spark 1.6.1 MapR
>Reporter: Artur Sukhenko
> Attachments: pyspark_incorrect_inputsize.png, reproduce_log, 
> spark_shell_correct_inputsize.png
>
>
> Simple job in pyspark takes 14 minutes to complete
> {code}
> var a = sc.textFile("/tmp/yes.txt")
> a.count()
> {code}
> Same code took 33 sec in spark-shell
> Reproduce steps:
> Run this  to generate big file (press Ctrl+C after 5-6 seconds)
> [spark@c6401 ~]$ yes > /tmp/yes.txt
> [spark@c6401 ~]$ ll /tmp/
> -rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
> [spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
> [spark@c6401 ~]$ pyspark
> {code}
> Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
> [GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
> 17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
> {code}
> >>> a = sc.textFile("/tmp/yes.txt")
> {code}
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
> memory (estimated size 341.1 KB, free 341.1 KB)
> 17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes 
> in memory (estimated size 28.3 KB, free 369.4 KB)
> 17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory 
> on localhost:43389 (size: 28.3 KB, free: 517.4 MB)
> 17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
> NativeMethodAccessorImpl.java:-2
> {code}
> >>> a.count()
> {code}
> 17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
> 17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
> 17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
> output partitions
> 17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
> :1)
> 17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] 
> at count at :1), which has no missing parents
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
> memory (estimated size 5.7 KB, free 375.1 KB)
> 17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes 
> in memory (estimated size 3.5 KB, free 378.6 KB)
> 17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory 
> on localhost:43389 (size: 3.5 KB, free: 517.4 MB)
> 17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
> DAGScheduler.scala:1008
> 17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
> ResultStage 0 (PythonRDD[2] at count at :1)
> 17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
> 17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
> localhost, partition 0,ANY, 2149 bytes)
> 17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
> 17/02/13 11:13:03 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
> 17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
> mapreduce.task.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, 
> use mapreduce.task.attempt.id
> 17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. 
> Instead, use mapreduce.task.ismap
> 17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
> Instead, use mapreduce.task.partition
> 17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
> mapreduce.job.id
> 17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init 
> = 445, finish = 212573
> 17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
> bytes result sent to driver
> 17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
> localhost, partition 1,ANY, 2149 bytes)
> 17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
> 17/02/13 11:16:37 INFO HadoopRDD: Input split: 
> hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
> 17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
> in 213605 ms on localhost (1/4)
> 17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, 

[jira] [Created] (SPARK-19578) Poor pyspark performance + incorrect UI input-size metrics

2017-02-13 Thread Artur Sukhenko (JIRA)
Artur Sukhenko created SPARK-19578:
--

 Summary: Poor pyspark performance + incorrect UI input-size metrics
 Key: SPARK-19578
 URL: https://issues.apache.org/jira/browse/SPARK-19578
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Web UI
Affects Versions: 2.0.1, 1.6.2, 1.6.1
 Environment: Spark 1.6.2 Hortonworks
Spark 2.0.1 MapR
Spark 1.6.1 MapR
Reporter: Artur Sukhenko


Simple job in pyspark takes 14 minutes to complete
{code}
var a = sc.textFile("/tmp/yes.txt")
a.count()
{code}
Same code took 33 sec in spark-shell

Reproduce steps:
Run this  to generate big file (press Ctrl+C after 5-6 seconds)
[spark@c6401 ~]$ yes > /tmp/yes.txt

[spark@c6401 ~]$ ll /tmp/
-rw-r--r--  1 spark hadoop  516079616 Feb 13 11:10 yes.txt
[spark@c6401 ~]$ hadoop fs -copyFromLocal /tmp/yes.txt /tmp/
[spark@c6401 ~]$ pyspark
{code}
Python 2.6.6 (r266:84292, Feb 22 2013, 00:00:18) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-3)] on linux2
17/02/13 11:12:36 INFO SparkContext: Running Spark version 1.6.2
{code}
>>> a = sc.textFile("/tmp/yes.txt")
{code}
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0 stored as values in 
memory (estimated size 341.1 KB, free 341.1 KB)
17/02/13 11:12:58 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in 
memory (estimated size 28.3 KB, free 369.4 KB)
17/02/13 11:12:58 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 
localhost:43389 (size: 28.3 KB, free: 517.4 MB)
17/02/13 11:12:58 INFO SparkContext: Created broadcast 0 from textFile at 
NativeMethodAccessorImpl.java:-2
{code}
>>> a.count()
{code}
17/02/13 11:13:03 INFO FileInputFormat: Total input paths to process : 1
17/02/13 11:13:03 INFO SparkContext: Starting job: count at :1
17/02/13 11:13:03 INFO DAGScheduler: Got job 0 (count at :1) with 4 
output partitions
17/02/13 11:13:03 INFO DAGScheduler: Final stage: ResultStage 0 (count at 
:1)
17/02/13 11:13:03 INFO DAGScheduler: Parents of final stage: List()
17/02/13 11:13:03 INFO DAGScheduler: Missing parents: List()
17/02/13 11:13:03 INFO DAGScheduler: Submitting ResultStage 0 (PythonRDD[2] at 
count at :1), which has no missing parents
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1 stored as values in 
memory (estimated size 5.7 KB, free 375.1 KB)
17/02/13 11:13:03 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in 
memory (estimated size 3.5 KB, free 378.6 KB)
17/02/13 11:13:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 
localhost:43389 (size: 3.5 KB, free: 517.4 MB)
17/02/13 11:13:03 INFO SparkContext: Created broadcast 1 from broadcast at 
DAGScheduler.scala:1008
17/02/13 11:13:03 INFO DAGScheduler: Submitting 4 missing tasks from 
ResultStage 0 (PythonRDD[2] at count at :1)
17/02/13 11:13:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 4 tasks
17/02/13 11:13:03 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 
localhost, partition 0,ANY, 2149 bytes)
17/02/13 11:13:03 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
17/02/13 11:13:03 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:0+134217728
17/02/13 11:13:03 INFO deprecation: mapred.tip.id is deprecated. Instead, use 
mapreduce.task.id
17/02/13 11:13:03 INFO deprecation: mapred.task.id is deprecated. Instead, use 
mapreduce.task.attempt.id
17/02/13 11:13:03 INFO deprecation: mapred.task.is.map is deprecated. Instead, 
use mapreduce.task.ismap
17/02/13 11:13:03 INFO deprecation: mapred.task.partition is deprecated. 
Instead, use mapreduce.task.partition
17/02/13 11:13:03 INFO deprecation: mapred.job.id is deprecated. Instead, use 
mapreduce.job.id

17/02/13 11:16:37 INFO PythonRunner: Times: total = 213335, boot = 317, init = 
445, finish = 212573
17/02/13 11:16:37 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 2182 
bytes result sent to driver
17/02/13 11:16:37 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, 
localhost, partition 1,ANY, 2149 bytes)
17/02/13 11:16:37 INFO Executor: Running task 1.0 in stage 0.0 (TID 1)
17/02/13 11:16:37 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:134217728+134217728
17/02/13 11:16:37 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) 
in 213605 ms on localhost (1/4)

17/02/13 11:20:05 INFO PythonRunner: Times: total = 208227, boot = -81, init = 
122, finish = 208186
17/02/13 11:20:05 INFO Executor: Finished task 1.0 in stage 0.0 (TID 1). 2182 
bytes result sent to driver
17/02/13 11:20:05 INFO TaskSetManager: Starting task 2.0 in stage 0.0 (TID 2, 
localhost, partition 2,ANY, 2149 bytes)
17/02/13 11:20:05 INFO Executor: Running task 2.0 in stage 0.0 (TID 2)
17/02/13 11:20:05 INFO HadoopRDD: Input split: 
hdfs://c6401.ambari.apache.org:8020/tmp/yes.txt:268435456+134217728
17/02/13 11:20:05 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) 
in 208302 ms on localhost (2/4)
17/02/13 11:23:37 

[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-10-11 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-4105:
--
Affects Version/s: 2.0.0

> FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
> shuffle
> -
>
> Key: SPARK-4105
> URL: https://issues.apache.org/jira/browse/SPARK-4105
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.5.1, 1.6.1, 2.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Attachments: JavaObjectToSerialize.java, 
> SparkFailedToUncompressGenerator.scala
>
>
> We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
> shuffle read.  Here's a sample stacktrace from an executor:
> {code}
> 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
> 33053)
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here's another occurrence of a similar error:
> {code}
> 

[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-07 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470574#comment-15470574
 ] 

Artur Sukhenko commented on SPARK-17340:


Agree with [~jerryshao].

> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-07 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470514#comment-15470514
 ] 

Artur Sukhenko edited comment on SPARK-17340 at 9/7/16 12:57 PM:
-

I thought you can do 
{code}
$yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED
{code}
to check if it is used.

But there is spark.yarn.preserve.staging.files property which will make staging 
folder stay in hfds, if it is enabled - you probably don't want them to be 
cleaned.
 



was (Author: asukhenko):
I thought you can do 
{code}
$yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED
{code}
to check if it used.

But there is spark.yarn.preserve.staging.files property which will make staging 
folder stay in hfds, if it is enabled - you probably don't want them to be 
cleaned.
 


> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-07 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15470514#comment-15470514
 ] 

Artur Sukhenko commented on SPARK-17340:


I thought you can do 
{code}
$yarn application -list -appTypes SPARK -appStates FINISHED,FAILED,KILLED
{code}
to check if it used.

But there is spark.yarn.preserve.staging.files property which will make staging 
folder stay in hfds, if it is enabled - you probably don't want them to be 
cleaned.
 


> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 

[jira] [Commented] (SPARK-17379) Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)

2016-09-05 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15464604#comment-15464604
 ] 

Artur Sukhenko commented on SPARK-17379:


Tried to use spark with 4.1.5, after fixing deprecated and implementing new 
methods some tests failed. So agree with you to just upgrade to 4.0.41.

> Upgrade netty-all to 4.0.41.Final (4.1.5-Final not compatible)
> --
>
> Key: SPARK-17379
> URL: https://issues.apache.org/jira/browse/SPARK-17379
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 2.0.0
>Reporter: Adam Roberts
>Priority: Trivial
>
> We should use the newest version of netty based on info here: 
> http://netty.io/news/2016/08/29/4-0-41-Final-4-1-5-Final.html, especially 
> interested in the static initialiser deadlock fix: 
> https://github.com/netty/netty/pull/5730
> Lots more fixes mentioned so will create the pull request - again a case of 
> updating the pom and then the dependency files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-01 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455086#comment-15455086
 ] 

Artur Sukhenko commented on SPARK-17340:


Yes, I can. 
However some users will stop app like this and we cannot ask everyone not to 
kill local yarn#client process. Eventually .sparkStaging folder will be filled 
up and you'll have to manually clean it.

> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}

[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-01 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455077#comment-15455077
 ] 

Artur Sukhenko commented on SPARK-17340:


Oh, now I get this. 
However, if you stop it in yarn-client with Ctrl+C, it does cleanup. 
Shouldn't yarn cluster be tolerant to such an shutdown?

> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-09-01 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15455038#comment-15455038
 ] 

Artur Sukhenko commented on SPARK-17340:


With yarn-client it will be cleaned up.
I am talking about yarn-cluster. Where cleanup is done by 
ApplicationMaster.scala#cleanupStagingDir only when 
{code}
if (finalStatus == FinalApplicationStatus.SUCCEEDED || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
{code}

> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-08-31 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102
 ] 

Artur Sukhenko edited comment on SPARK-17340 at 8/31/16 7:14 PM:
-

Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS)
It did clean up .stagingDirectory when I Ctrl+C spark submit and yarn-killed 
app.
{code:java}
   if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}
And in Spark 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY):
{code:java}
  if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}



was (Author: asukhenko):
Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS)
{code:java}
   if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}
On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY):
{code:java}
  if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}


> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Comment Edited] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-08-31 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102
 ] 

Artur Sukhenko edited comment on SPARK-17340 at 8/31/16 7:12 PM:
-

Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS)
{code:java}
   if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}
On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY):
{code:java}
  if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}



was (Author: asukhenko):
Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS)
{code:java}
   if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code:java}
On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY):
{code}
  if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}


> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> 

[jira] [Commented] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-08-31 Thread Artur Sukhenko (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453102#comment-15453102
 ] 

Artur Sukhenko commented on SPARK-17340:


Tested on 1.6.1: exitCode 15 (EXIT_EXCEPTION_USER_CLASS)
{code:java}
   if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EXCEPTION_USER_CLASS ||
isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code:java}
On 2.0.0 yarn kill will produce exitCode 16 (EXIT_EARLY):
{code}
  if (!unregistered) {
  // we only want to unregister if we don't want the RM to retry
  if (finalStatus == FinalApplicationStatus.SUCCEEDED ||
exitCode == ApplicationMaster.EXIT_EARLY || isLastAttempt) {
unregister(finalStatus, finalMsg)
cleanupStagingDir(fs)
  }
}
{code}


> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 

[jira] [Updated] (SPARK-17340) .sparkStaging not cleaned if application exited incorrectly

2016-08-31 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-17340:
---
Summary: .sparkStaging not cleaned if application exited incorrectly  (was: 
.sparkStaging is not cleaned if killed )

> .sparkStaging not cleaned if application exited incorrectly
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/08/26 00:51:17 

[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed

2016-08-31 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-17340:
---
Component/s: YARN

> .sparkStaging is not cleaned if killed 
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and press Ctrl+C
> 3. kill app: $ yarn application -kill 
> 4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> All of these apps are already finished/killed, but 
> sparkStaging/application_ remains:
> {code}
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: 

[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed

2016-08-31 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-17340:
---
Description: 
When running Spark (yarn,cluster mode) and killing application
.sparkStaging is not cleaned.

Reproduce:
1. run SparkPi job in yarn cluster mode
2. Wait app to switch to RUNNING and press Ctrl+C
3. kill app: $ yarn application -kill 
4. Verify that it is not cleaned up: $ hadoop fs -ls .sparkStaging

Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.

All of these apps are already finished/killed, but 
sparkStaging/application_ remains:
{code}
$ hadoop fs -ls .sparkStaging
Found 6 items
drwx--   - user user  3 2016-08-26 00:57 
.sparkStaging/application_1472140614688_0001
drwx--   - user user  3 2016-08-26 01:09 
.sparkStaging/application_1472140614688_0002
drwx--   - user user  3 2016-08-26 19:51 
.sparkStaging/application_1472140614688_0005
drwx--   - user user  3 2016-08-26 19:53 
.sparkStaging/application_1472140614688_0007
drwx--   - user user  3 2016-08-31 22:43 
.sparkStaging/application_1472634296300_0011
drwx--   - user user  3 2016-08-31 23:30 
.sparkStaging/application_1472651370711_0006
{code}
{code}
ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
0.0 (TID 504) in 14 ms on node1 (505/1000)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
0.0 (TID 505) in 14 ms on node1 (506/1000)
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/metrics/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/api,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/static,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs,null}
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 508.0 in stage 
0.0 (TID 508, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 507.0 in stage 
0.0 (TID 507) in 20 ms on node1 (507/1000)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 509.0 in stage 
0.0 (TID 509, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO 

[jira] [Updated] (SPARK-17340) .sparkStaging is not cleaned if killed

2016-08-31 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-17340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-17340:
---
Summary: .sparkStaging is not cleaned if killed   (was: .sparkStaging not 
cleaned in yarn-cluster mode)

> .sparkStaging is not cleaned if killed 
> ---
>
> Key: SPARK-17340
> URL: https://issues.apache.org/jira/browse/SPARK-17340
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2, 1.6.1, 2.0.0
>Reporter: Artur Sukhenko
>Priority: Minor
>
> When running Spark (yarn,cluster mode) and killing application
> .sparkStaging is not cleaned.
> Reproduce:
> 1. run SparkPi job in yarn cluster mode
> 2. Wait app to switch to RUNNING and Press Ctrl+C
> 3. kill app: 
> #yarn application -kill 
> 4. Verify that it is not cleaned up:
> #hadoop fs -ls .sparkStaging
> Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.
> When app is killed like this 
> ==
> $ hadoop fs -ls .sparkStaging
> Found 6 items
> drwx--   - user user  3 2016-08-26 00:57 
> .sparkStaging/application_1472140614688_0001
> drwx--   - user user  3 2016-08-26 01:09 
> .sparkStaging/application_1472140614688_0002
> drwx--   - user user  3 2016-08-26 19:51 
> .sparkStaging/application_1472140614688_0005
> drwx--   - user user  3 2016-08-26 19:53 
> .sparkStaging/application_1472140614688_0007
> drwx--   - user user  3 2016-08-31 22:43 
> .sparkStaging/application_1472634296300_0011
> drwx--   - user user  3 2016-08-31 23:30 
> .sparkStaging/application_1472651370711_0006
> {code}
> ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
> 16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
> 0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
> 0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
> 0.0 (TID 504) in 14 ms on node1 (505/1000)
> 16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
> 0.0 (TID 505) in 14 ms on node1 (506/1000)
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/metrics/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/api,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/static,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/threadDump,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/executors,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/environment,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/rdd,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/storage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/pool,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/stage,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/stages,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job/json,null}
> 16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
> o.s.j.s.ServletContextHandler{/jobs/job,null}
> 16/08/26 

[jira] [Created] (SPARK-17340) .sparkStaging not cleaned in yarn-cluster mode

2016-08-31 Thread Artur Sukhenko (JIRA)
Artur Sukhenko created SPARK-17340:
--

 Summary: .sparkStaging not cleaned in yarn-cluster mode
 Key: SPARK-17340
 URL: https://issues.apache.org/jira/browse/SPARK-17340
 Project: Spark
  Issue Type: Bug
Affects Versions: 2.0.0, 1.6.1, 1.5.2
Reporter: Artur Sukhenko
Priority: Minor


When running Spark (yarn,cluster mode) and killing application
.sparkStaging is not cleaned.

Reproduce:
1. run SparkPi job in yarn cluster mode
2. Wait app to switch to RUNNING and Press Ctrl+C
3. kill app: 
#yarn application -kill 
4. Verify that it is not cleaned up:
#hadoop fs -ls .sparkStaging

Tested in Spark 1.5.2, 1.6.1, 2.0.0 - same error.

When app is killed like this 
==
$ hadoop fs -ls .sparkStaging
Found 6 items
drwx--   - user user  3 2016-08-26 00:57 
.sparkStaging/application_1472140614688_0001
drwx--   - user user  3 2016-08-26 01:09 
.sparkStaging/application_1472140614688_0002
drwx--   - user user  3 2016-08-26 19:51 
.sparkStaging/application_1472140614688_0005
drwx--   - user user  3 2016-08-26 19:53 
.sparkStaging/application_1472140614688_0007
drwx--   - user user  3 2016-08-31 22:43 
.sparkStaging/application_1472634296300_0011
drwx--   - user user  3 2016-08-31 23:30 
.sparkStaging/application_1472651370711_0006

{code}
ERROR yarn.ApplicationMaster: RECEIVED SIGNAL 15: SIGTERM
16/08/26 00:51:17 INFO spark.SparkContext: Invoking stop() from shutdown hook
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 506.0 in stage 
0.0 (TID 506, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 507.0 in stage 
0.0 (TID 507, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 504.0 in stage 
0.0 (TID 504) in 14 ms on node1 (505/1000)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 505.0 in stage 
0.0 (TID 505) in 14 ms on node1 (506/1000)
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/metrics/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/kill,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/api,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/static,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/threadDump,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/executors,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/environment,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/rdd,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/storage,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/pool,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/stage,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/stages,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/job,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs/json,null}
16/08/26 00:51:17 INFO handler.ContextHandler: stopped 
o.s.j.s.ServletContextHandler{/jobs,null}
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Starting task 508.0 in stage 
0.0 (TID 508, node1, PROCESS_LOCAL, 2085 bytes)
16/08/26 00:51:17 INFO scheduler.TaskSetManager: Finished task 507.0 in stage 
0.0 (TID 507) in 20 ms on 

[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-08-16 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-4105:
--
Affects Version/s: 1.5.1

> FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
> shuffle
> -
>
> Key: SPARK-4105
> URL: https://issues.apache.org/jira/browse/SPARK-4105
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.5.1, 1.6.1
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Attachments: JavaObjectToSerialize.java, 
> SparkFailedToUncompressGenerator.scala
>
>
> We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
> shuffle read.  Here's a sample stacktrace from an executor:
> {code}
> 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
> 33053)
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here's another occurrence of a similar error:
> {code}
> 

[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-08-16 Thread Artur Sukhenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artur Sukhenko updated SPARK-4105:
--
Affects Version/s: 1.6.1

> FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
> shuffle
> -
>
> Key: SPARK-4105
> URL: https://issues.apache.org/jira/browse/SPARK-4105
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0, 1.2.1, 1.3.0, 1.4.1, 1.6.1
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Attachments: JavaObjectToSerialize.java, 
> SparkFailedToUncompressGenerator.scala
>
>
> We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
> shuffle read.  Here's a sample stacktrace from an executor:
> {code}
> 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
> 33053)
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:181)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> Here's another occurrence of a similar error:
> {code}
> java.io.IOException: