[jira] [Commented] (SPARK-6962) Netty BlockTransferService hangs in the middle of SQL query

Jon Chase (JIRA) Fri, 17 Apr 2015 11:42:41 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500406#comment-14500406
 ]


Jon Chase commented on SPARK-6962:
----------------------------------

I'm tailing the executor logs before/as this is happening and I don't see 
anything out of the ordinary (errors, etc.)  Here's what the logs look like 
when the lockup occurs (again, not seeing anything out of the ordinary).  I 
tailed all executor's, and all of the logs look similar to this.

==> 
/mnt/var/log/hadoop/yarn-hadoop-nodemanager-ip-XX-XX-XX-XXX.eu-west-1.compute.internal.log
 <==
2015-04-17 18:27:58,206 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 11216 for container-id 
container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 
11.3 GB of 50 GB virtual memory used
2015-04-17 18:28:01,214 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 11216 for container-id 
container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 
11.3 GB of 50 GB virtual memory used
2015-04-17 18:28:04,221 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 11216 for container-id 
container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 
11.3 GB of 50 GB virtual memory used
2015-04-17 18:28:07,229 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl
 (Container Monitor): Memory usage of ProcessTree 11216 for container-id 
container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 
11.3 GB of 50 GB virtual memory used

> Netty BlockTransferService hangs in the middle of SQL query
> -----------------------------------------------------------
>
>                 Key: SPARK-6962
>                 URL: https://issues.apache.org/jira/browse/SPARK-6962
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.2.0, 1.2.1, 1.3.0
>            Reporter: Jon Chase
>         Attachments: jstacks.txt
>
>
> Spark SQL queries (though this seems to be a Spark Core issue - I'm just 
> using queries in the REPL to surface this, so I mention Spark SQL) hang 
> indefinitely under certain (not totally understood) circumstances.  
> This is resolved by setting spark.shuffle.blockTransferService=nio, which 
> seems to point to netty as the issue.  Netty was set as the default for the 
> block transport layer in 1.2.0, which is when this issue started.  Setting 
> the service to nio allows queries to complete normally.
> I do not see this problem when running queries over smaller (~20 5MB files) 
> datasets.  When I increase the scope to include more data (several hundred 
> ~5MB files), the queries will get through several steps but eventuall hang  
> indefinitely.
> Here's the email chain regarding this issue, including stack traces:
> http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/<cae61spfqt2y7d5vqzomzz2dmr-jx2c2zggcyky40npkjjx4...@mail.gmail.com>
> For context, here's the announcement regarding the block transfer service 
> change: 
> http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/<cabpqxssl04q+rbltp-d8w+z3atn+g-um6gmdgdnh-hzcvd-...@mail.gmail.com>



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-6962) Netty BlockTransferService hangs in the middle of SQL query

Reply via email to