[jira] [Commented] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510190#comment-14510190 ] Apache Spark commented on SPARK-6014: - User 'nishkamravi2' has created a pull request for this issue: https://github.com/apache/spark/pull/5672 java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN -- Key: SPARK-6014 URL: https://issues.apache.org/jira/browse/SPARK-6014 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.0 Environment: Hadoop 2.4, YARN Reporter: Cheolsoo Park Assignee: Marcelo Vanzin Priority: Minor Labels: yarn Fix For: 1.4.0 This is a regression of SPARK-2261. In branch-1.3 and master, {{EventLoggingListener}} throws {{java.io.IOException: Filesystem closed}} when ctrl+c or ctrl+d the spark-sql shell. The root cause is that DFSClient is already shut down before EventLoggingListener invokes the following HDFS methods, and thus, DFSClient.isClientRunning() check fails- {code} Line #135: hadoopDataStream.foreach(hadoopFlushMethod.invoke(_)) Line #187: if (fileSystem.exists(target)) { {code} The followings are full stack trace- {code} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:170) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:54) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1613) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1843) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1804) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:127) ... 19 more {code} {code} Exception in thread Thread-3 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1760) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:187) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.stop(SparkContext.scala:1379) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66) at
[jira] [Commented] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14500412#comment-14500412 ] Apache Spark commented on SPARK-6014: - User 'vanzin' has created a pull request for this issue: https://github.com/apache/spark/pull/5560 java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN -- Key: SPARK-6014 URL: https://issues.apache.org/jira/browse/SPARK-6014 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.0 Environment: Hadoop 2.4, YARN Reporter: Cheolsoo Park Priority: Minor Labels: yarn This is a regression of SPARK-2261. In branch-1.3 and master, {{EventLoggingListener}} throws {{java.io.IOException: Filesystem closed}} when ctrl+c or ctrl+d the spark-sql shell. The root cause is that DFSClient is already shut down before EventLoggingListener invokes the following HDFS methods, and thus, DFSClient.isClientRunning() check fails- {code} Line #135: hadoopDataStream.foreach(hadoopFlushMethod.invoke(_)) Line #187: if (fileSystem.exists(target)) { {code} The followings are full stack trace- {code} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:170) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:54) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1613) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1843) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1804) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:127) ... 19 more {code} {code} Exception in thread Thread-3 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1760) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:187) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.stop(SparkContext.scala:1379) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107) {code}
[jira] [Commented] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14342077#comment-14342077 ] Sean Owen commented on SPARK-6014: -- Although a fix is possible for Hadoop 2.2+, it is not clear there is any way to avoid a race with HDFS's shutdown hook before that. It would be moderately painful to solve this with reflection, and probably not worth it. This can be resolved with the approach in the PR above for 2.2+. java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN -- Key: SPARK-6014 URL: https://issues.apache.org/jira/browse/SPARK-6014 Project: Spark Issue Type: Bug Components: YARN Affects Versions: 1.3.0 Environment: Hadoop 2.4, YARN Reporter: Cheolsoo Park Priority: Minor Labels: yarn This is a regression of SPARK-2261. In branch-1.3 and master, {{EventLoggingListener}} throws {{java.io.IOException: Filesystem closed}} when ctrl+c or ctrl+d the spark-sql shell. The root cause is that DFSClient is already shut down before EventLoggingListener invokes the following HDFS methods, and thus, DFSClient.isClientRunning() check fails- {code} Line #135: hadoopDataStream.foreach(hadoopFlushMethod.invoke(_)) Line #187: if (fileSystem.exists(target)) { {code} The followings are full stack trace- {code} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:170) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:54) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1613) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1843) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1804) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:127) ... 19 more {code} {code} Exception in thread Thread-3 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1760) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:187) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.stop(SparkContext.scala:1379) at
[jira] [Commented] (SPARK-6014) java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN
[ https://issues.apache.org/jira/browse/SPARK-6014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337055#comment-14337055 ] Apache Spark commented on SPARK-6014: - User 'piaozhexiu' has created a pull request for this issue: https://github.com/apache/spark/pull/4771 java.io.IOException: Filesystem is thrown when ctrl+c or ctrl+d spark-sql on YARN -- Key: SPARK-6014 URL: https://issues.apache.org/jira/browse/SPARK-6014 Project: Spark Issue Type: Bug Affects Versions: 1.3.0 Environment: Hadoop 2.4, YARN Reporter: Cheolsoo Park Priority: Minor Labels: yarn This is a regression of SPARK-2261. In branch-1.3 and master, {{EventLoggingListener}} throws {{java.io.IOException: Filesystem closed}} when ctrl+c or ctrl+d the spark-sql shell. The root cause is that DFSClient is already shut down before EventLoggingListener invokes the following HDFS methods, and thus, DFSClient.isClientRunning() check fails- {code} Line #135: hadoopDataStream.foreach(hadoopFlushMethod.invoke(_)) Line #187: if (fileSystem.exists(target)) { {code} The followings are full stack trace- {code} java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener$$anonfun$logEvent$3.apply(EventLoggingListener.scala:135) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.EventLoggingListener.logEvent(EventLoggingListener.scala:135) at org.apache.spark.scheduler.EventLoggingListener.onApplicationEnd(EventLoggingListener.scala:170) at org.apache.spark.scheduler.SparkListenerBus$class.onPostEvent(SparkListenerBus.scala:54) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.scheduler.LiveListenerBus.onPostEvent(LiveListenerBus.scala:31) at org.apache.spark.util.ListenerBus$class.postToAll(ListenerBus.scala:53) at org.apache.spark.util.AsynchronousListenerBus.postToAll(AsynchronousListenerBus.scala:36) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply$mcV$sp(AsynchronousListenerBus.scala:76) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.AsynchronousListenerBus$$anon$1$$anonfun$run$1.apply(AsynchronousListenerBus.scala:61) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1613) at org.apache.spark.util.AsynchronousListenerBus$$anon$1.run(AsynchronousListenerBus.scala:60) Caused by: java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:1843) at org.apache.hadoop.hdfs.DFSOutputStream.hflush(DFSOutputStream.java:1804) at org.apache.hadoop.fs.FSDataOutputStream.hflush(FSDataOutputStream.java:127) ... 19 more {code} {code} Exception in thread Thread-3 java.io.IOException: Filesystem closed at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:707) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1760) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1124) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1120) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) at org.apache.spark.scheduler.EventLoggingListener.stop(EventLoggingListener.scala:187) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at org.apache.spark.SparkContext$$anonfun$stop$4.apply(SparkContext.scala:1379) at scala.Option.foreach(Option.scala:236) at org.apache.spark.SparkContext.stop(SparkContext.scala:1379) at org.apache.spark.sql.hive.thriftserver.SparkSQLEnv$.stop(SparkSQLEnv.scala:66) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$$anon$1.run(SparkSQLCLIDriver.scala:107) {code} -- This message was