[ 
https://issues.apache.org/jira/browse/SPARK-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498991#comment-14498991
 ] 

Michael Allman commented on SPARK-6962:
---------------------------------------

[~adav]Which logs would be helpful?
[~pwend...@gmail.com]I've seen this problem occur where a stage is hung waiting 
for multiple tasks from more than one executor to complete. Also, the GC time 
as reported for the blocked tasks is insignificant, or at least nothing odd 
compared to the other tasks.

Additionally, I see no unusual CPU usage or load level. The tasks seem to be 
simply idle, waiting for some never-to-be-received input. Also, I see the same 
thread stack trace as the OP (the thread whose stack includes the line 
"org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:278)").
 I think that signal can be used to distinguish this hang from others.

I've also just confirmed with [~rxin] on the mailing list that I'm still seeing 
this problem on branch-1.3 as of 
https://github.com/apache/spark/commit/6d3c4d8b04b2738a821dfcc3df55a5635b89e506.

> Netty BlockTransferService hangs in the middle of SQL query
> -----------------------------------------------------------
>
>                 Key: SPARK-6962
>                 URL: https://issues.apache.org/jira/browse/SPARK-6962
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.2.1, 1.3.0
>            Reporter: Jon Chase
>         Attachments: jstacks.txt
>
>
> Spark SQL queries (though this seems to be a Spark Core issue - I'm just 
> using queries in the REPL to surface this, so I mention Spark SQL) hang 
> indefinitely under certain (not totally understood) circumstances.  
> This is resolved by setting spark.shuffle.blockTransferService=nio, which 
> seems to point to netty as the issue.  Netty was set as the default for the 
> block transport layer in 1.2.0, which is when this issue started.  Setting 
> the service to nio allows queries to complete normally.
> I do not see this problem when running queries over smaller (~20 5MB files) 
> datasets.  When I increase the scope to include more data (several hundred 
> ~5MB files), the queries will get through several steps but eventuall hang  
> indefinitely.
> Here's the email chain regarding this issue, including stack traces:
> http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/<cae61spfqt2y7d5vqzomzz2dmr-jx2c2zggcyky40npkjjx4...@mail.gmail.com>
> For context, here's the announcement regarding the block transfer service 
> change: 
> http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/<cabpqxssl04q+rbltp-d8w+z3atn+g-um6gmdgdnh-hzcvd-...@mail.gmail.com>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to