[ https://issues.apache.org/jira/browse/SPARK-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500406#comment-14500406 ]
Jon Chase commented on SPARK-6962: ---------------------------------- I'm tailing the executor logs before/as this is happening and I don't see anything out of the ordinary (errors, etc.) Here's what the logs look like when the lockup occurs (again, not seeing anything out of the ordinary). I tailed all executor's, and all of the logs look similar to this. ==> /mnt/var/log/hadoop/yarn-hadoop-nodemanager-ip-XX-XX-XX-XXX.eu-west-1.compute.internal.log <== 2015-04-17 18:27:58,206 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Memory usage of ProcessTree 11216 for container-id container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 11.3 GB of 50 GB virtual memory used 2015-04-17 18:28:01,214 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Memory usage of ProcessTree 11216 for container-id container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 11.3 GB of 50 GB virtual memory used 2015-04-17 18:28:04,221 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Memory usage of ProcessTree 11216 for container-id container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 11.3 GB of 50 GB virtual memory used 2015-04-17 18:28:07,229 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl (Container Monitor): Memory usage of ProcessTree 11216 for container-id container_1429189930421_0012_01_000002: 6.7 GB of 10 GB physical memory used; 11.3 GB of 50 GB virtual memory used > Netty BlockTransferService hangs in the middle of SQL query > ----------------------------------------------------------- > > Key: SPARK-6962 > URL: https://issues.apache.org/jira/browse/SPARK-6962 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 1.2.0, 1.2.1, 1.3.0 > Reporter: Jon Chase > Attachments: jstacks.txt > > > Spark SQL queries (though this seems to be a Spark Core issue - I'm just > using queries in the REPL to surface this, so I mention Spark SQL) hang > indefinitely under certain (not totally understood) circumstances. > This is resolved by setting spark.shuffle.blockTransferService=nio, which > seems to point to netty as the issue. Netty was set as the default for the > block transport layer in 1.2.0, which is when this issue started. Setting > the service to nio allows queries to complete normally. > I do not see this problem when running queries over smaller (~20 5MB files) > datasets. When I increase the scope to include more data (several hundred > ~5MB files), the queries will get through several steps but eventuall hang > indefinitely. > Here's the email chain regarding this issue, including stack traces: > http://mail-archives.apache.org/mod_mbox/spark-user/201503.mbox/<cae61spfqt2y7d5vqzomzz2dmr-jx2c2zggcyky40npkjjx4...@mail.gmail.com> > For context, here's the announcement regarding the block transfer service > change: > http://mail-archives.apache.org/mod_mbox/spark-dev/201411.mbox/<cabpqxssl04q+rbltp-d8w+z3atn+g-um6gmdgdnh-hzcvd-...@mail.gmail.com> -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org