[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-11-16 Thread ShuMing Li (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16256578#comment-16256578
 ] 

ShuMing Li commented on SPARK-20882:


Can I ask  what `Still have 1 requests outstanding when connection from 
10.0.139.110:7337 is closed` error means ? what's the real problem?

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_805, shuffle_5_1431_808, shuffle_5_1431_806, 
> shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_806, shuffle_5_1431_809, 
> shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: Set()
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 21
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 20
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 19
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 18
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 17
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 16
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 15
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 14
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 13
> 17/05/26 12:04:06 DEBUG 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-28 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16027743#comment-16027743
 ] 

cen yuhai commented on SPARK-20882:
---

[~zsxwing] It is my problem... It is none of spark's busniess.

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_805, shuffle_5_1431_808, shuffle_5_1431_806, 
> shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_806, shuffle_5_1431_809, 
> shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: Set()
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 21
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 20
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 19
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 18
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 17
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 16
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 15
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 14
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 13
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 12
> 17/05/26 12:04:06 DEBUG 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-26 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026595#comment-16026595
 ] 

Shixiong Zhu commented on SPARK-20882:
--

[~cenyuhai] do you mind to share what's the root cause?

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_805, shuffle_5_1431_808, shuffle_5_1431_806, 
> shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_806, shuffle_5_1431_809, 
> shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: Set()
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 21
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 20
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 19
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 18
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 17
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 16
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 15
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 14
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 13
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 12
> 17/05/26 12:04:06 DEBUG 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-26 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16026040#comment-16026040
 ] 

cen yuhai commented on SPARK-20882:
---

yes.
{code}
17/05/26 12:04:14 INFO TransportResponseHandler: Still have 1 requests 
outstanding when connection from 10.0.139.110:7337 is closed
17/05/26 12:04:14 INFO TransportResponseHandler: Still have 1 requests 
outstanding when connection from 10.0.139.110:7337 is closed
{code}
before the requests in flight 1. the remainingBlocks is empty.
{code}
17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
Set(shuffle_5_1431_809)
17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: Set()
{code}

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_805, shuffle_5_1431_808, shuffle_5_1431_806, 
> shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_806, shuffle_5_1431_809, 
> shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809, shuffle_5_1431_807)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_808, shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: 
> Set(shuffle_5_1431_809)
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: remainingBlocks: Set()
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 21
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 20
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 19
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 18
> 17/05/26 12:04:06 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 17
> 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025588#comment-16025588
 ] 

Shixiong Zhu commented on SPARK-20882:
--

[~cenyuhai] Did you see this log "logger.error("Still have {} requests 
outstanding when connection from {} is closed","?

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
>   at 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025581#comment-16025581
 ] 

cen yuhai commented on SPARK-20882:
---

I think the problem is when the connection to nodemanger:7337 is closed. It 
will hang forever?
{code}
17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
flight 1
17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
bigdata-apache-hdp-132.xg01/10.0.132.58:7337
java.io.IOException: Connection reset by peer
{code}

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025506#comment-16025506
 ] 

Shixiong Zhu commented on SPARK-20882:
--

[~cenyuhai] Is it possible to reproduce it and get logs of the shuffle service?

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025496#comment-16025496
 ] 

cen yuhai commented on SPARK-20882:
---

Yes, I am using shuffle service

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
>   at 
> 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025377#comment-16025377
 ] 

Shixiong Zhu commented on SPARK-20882:
--

NVM. I saw NettyBlockTransferService in the log.

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my client netty version to 4.0.43.Final.
> The shuffle service handler is still 4.0.42.Final
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
>   at 
> 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread Shixiong Zhu (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16025227#comment-16025227
 ] 

Shixiong Zhu commented on SPARK-20882:
--

[~cenyuhai] are you using shuffle service?

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my netty version to 4.0.43.Final.
> spark.sql.adaptive.enabled is true
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}
> {code}
> 7/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 3
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 2
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:01:55 DEBUG ShuffleBlockFetcherIterator: Number of requests in 
> flight 1
> 17/05/26 02:02:03 WARN TransportChannelHandler: Exception in connection from 
> bigdata-apache-hdp-132.xg01/10.0.132.58:7337
> java.io.IOException: Connection reset by peer
>   at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
>   at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
>   at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
>   at sun.nio.ch.IOUtil.read(IOUtil.java:192)
>   at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
>   at 
> io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:221)
>   at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:899)
>   at 
> io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:275)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:643)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:566)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:480)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:442)
>   at 
> 

[jira] [Commented] (SPARK-20882) Executor is waiting for ShuffleBlockFetcherIterator

2017-05-25 Thread cen yuhai (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-20882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16024503#comment-16024503
 ] 

cen yuhai commented on SPARK-20882:
---

[~zsxwing]

> Executor is waiting for ShuffleBlockFetcherIterator
> ---
>
> Key: SPARK-20882
> URL: https://issues.apache.org/jira/browse/SPARK-20882
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.1.0, 2.1.1
>Reporter: cen yuhai
> Attachments: executor_jstack, executor_log, screenshot-1.png, 
> screenshot-2.png, screenshot-3.png
>
>
> This bug is like https://issues.apache.org/jira/browse/SPARK-19300.
> but I have updated my netty version to 4.0.43.Final.
> {code}
> "Executor task launch worker for task 4808985" #5373 daemon prio=5 os_prio=0 
> tid=0x7f54ef437000 nid=0x1aed0 waiting on condition [0x7f53aebfe000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> parking to wait for <0x000498c249c0> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at java.util.concurrent.locks.LockSupport.park(LockSupport.java:189)
> at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:332)
> at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:58)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)
> at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
> at 
> org.apache.spark.util.collection.ExternalSorter.insertAll(ExternalSorter.scala:199)
> at 
> org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:63)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:97)
> at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
> at org.apache.spark.scheduler.Task.run(Task.scala:114)
> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:323)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1147)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
> at java.lang.Thread.run(Thread.java:834)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org