[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337785#comment-14337785 ] Ziv Huang commented on SPARK-3687: -- I've already resigned my job (not due to this issue) and hence I don't have the environment to reproduce it or test it on Spark 1.3.0. I'm sorry for this and I do hope someone can take over and trace it for us. Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 120th-150th tasks. In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't kill the job from web UI. In 1.1.0, the job hangs for couple mins (3.x mins actually), and then web UI of spark master shows that the job is finished with state FAILED. In addition, the job stage web UI still hangs, and execution duration time is still accumulating. For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 100. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156236#comment-14156236 ] Ziv Huang commented on SPARK-3687: -- The following is the jstack dump of one CoarseGrainedExecutorBackend when the job hangs (the spark version is 1.1.0): Attach Listener daemon prio=10 tid=0x7fded0001000 nid=0x7836 waiting on condition [0x] java.lang.Thread.State: RUNNABLE Hashed wheel timer #1 daemon prio=10 tid=0x7fde9c001000 nid=0x7811 waiting on condition [0x7fdf26a84000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(Native Method) at org.jboss.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:503) at org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:401) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at java.lang.Thread.run(Thread.java:745) New I/O server boss #6 daemon prio=10 tid=0x7fdeb4084000 nid=0x7810 runnable [0x7fdf26b85000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x0007db53acc0 (a sun.nio.ch.Util$2) - locked 0x0007db53acb0 (a java.util.Collections$UnmodifiableSet) - locked 0x0007db53ab98 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102) at org.jboss.netty.channel.socket.nio.NioServerBoss.select(NioServerBoss.java:163) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206) at org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) New I/O worker #5 daemon prio=10 tid=0x7fdeb4037000 nid=0x780f runnable [0x7fdf26c86000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x0007db529f98 (a sun.nio.ch.Util$2) - locked 0x0007db529f88 (a java.util.Collections$UnmodifiableSet) - locked 0x0007db529e70 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108) at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) New I/O worker #4 daemon prio=10 tid=0x7fdeb4032800 nid=0x780e runnable [0x7fdf26d87000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) - locked 0x0007db528610 (a sun.nio.ch.Util$2) - locked 0x0007db528600 (a java.util.Collections$UnmodifiableSet) - locked 0x0007db5284e8 (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) at org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409) at
[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152985#comment-14152985 ] Ziv Huang commented on SPARK-3687: -- I run jps on worker node when it hangs. I see two processes: Worker and CoarseGrainedExecutorBackend. Do you mean I should print the stack of CoarseGrainedExecutorBackend? Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 120th-150th tasks. In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't kill the job from web UI. In 1.1.0, the job hangs for couple mins (3.x mins actually), and then web UI of spark master shows that the job is finished with state FAILED. In addition, the job stage web UI still hangs, and execution duration time is still accumulating. For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 100. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147504#comment-14147504 ] Ziv Huang commented on SPARK-3687: -- The following is the jstack dump of one executor when it hangs: File appending thread for /opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stderr daemon prio=10 tid=0x7ffe0c002800 nid=0x18a3 runnable [0x7ffebc402000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xfaeee1d0 (a java.lang.UNIXProcess$ProcessPipeInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) File appending thread for /opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stdout daemon prio=10 tid=0x7ffe0c004000 nid=0x18a2 runnable [0x7ffebc503000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xfaeec108 (a java.lang.UNIXProcess$ProcessPipeInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) process reaper daemon prio=10 tid=0x7ffe0c001000 nid=0x1868 runnable [0x7ffecc0c7000] java.lang.Thread.State: RUNNABLE at java.lang.UNIXProcess.waitForProcessExit(Native Method) at java.lang.UNIXProcess.access$500(UNIXProcess.java:54) at java.lang.UNIXProcess$4.run(UNIXProcess.java:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ExecutorRunner for app-20140925150845-0007/2 daemon prio=10 tid=0x7ffe7011b800 nid=0x1866 in Object.wait() [0x7ffebc705000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xfaee9df8 (a java.lang.UNIXProcess) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:263) - locked 0xfaee9df8 (a java.lang.UNIXProcess) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:164) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:63) Attach Listener daemon prio=10 tid=0x7ffe84001000 nid=0x170f waiting on condition [0x] java.lang.Thread.State: RUNNABLE sparkWorker-akka.actor.default-dispatcher-16 daemon prio=10 tid=0x7ffe68214800 nid=0x13a3 waiting on condition [0x7ffebc806000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xfd614a78 (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) sparkWorker-akka.actor.default-dispatcher-15 daemon prio=10 tid=0x7ffe7011e000 nid=0x13a2 waiting on condition [0x7ffebc604000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native
[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147513#comment-14147513 ] Ziv Huang commented on SPARK-3687: -- Just a few mins ago I ran a job twice, processing 203 sequence files. Both times I saw the job hanging with different behavior from before: 1. the web UI of spark master shows that the job is finished with state failed after 3.x mins 2. the job stage web UI still hangs, and execution duration time is still accumulating. Hope this information helps debugging :) Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147513#comment-14147513 ] Ziv Huang edited comment on SPARK-3687 at 9/25/14 8:36 AM: --- Just a few mins ago I ran a job twice, processing 203 sequence files. Both times I saw the job hanging with different behavior than before: 1. the web UI of spark master shows that the job is finished with state failed after 3.x mins 2. the job stage web UI still hangs, and execution duration time is still accumulating. Hope this information helps debugging :) was (Author: taqilabon): Just a few mins ago I ran a job twice, processing 203 sequence files. Both times I saw the job hanging with different behavior from before: 1. the web UI of spark master shows that the job is finished with state failed after 3.x mins 2. the job stage web UI still hangs, and execution duration time is still accumulating. Hope this information helps debugging :) Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Issue Comment Deleted] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Comment: was deleted (was: Just a few mins ago I ran a job twice, processing 203 sequence files. Both times I saw the job hanging with different behavior than before: 1. the web UI of spark master shows that the job is finished with state failed after 3.x mins 2. the job stage web UI still hangs, and execution duration time is still accumulating. Hope this information helps debugging :)) Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 120th-150th tasks. In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't kill the job from web UI. In 1.1.0, the job hangs for couple mins (3.x mins actually), and then web UI of spark master shows that the job is finished with state FAILED. In addition, the job stage web UI still hangs, and execution duration time is still accumulating. For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 100. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 120th-150th tasks. In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't kill the job from web UI. In 1.1.0, the job hangs for couple mins (3.x mins actually), and then web UI of spark master shows that the job is finished with state FAILED. In addition, the job stage web UI still hangs, and execution duration time is still accumulating. For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 100. was: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 120th-150th tasks. In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't kill the job from web UI. In 1.1.0, the job hangs for couple mins (3.x mins actually), and then web UI of spark master shows that the job is finished with state FAILED. In addition, the job stage web UI still hangs, and execution duration time is still accumulating. For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 100. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147504#comment-14147504 ] Ziv Huang edited comment on SPARK-3687 at 9/25/14 3:09 PM: --- The following is the jstack dump of one executor when it hangs (the spark version is 1.1.0): File appending thread for /opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stderr daemon prio=10 tid=0x7ffe0c002800 nid=0x18a3 runnable [0x7ffebc402000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xfaeee1d0 (a java.lang.UNIXProcess$ProcessPipeInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) File appending thread for /opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stdout daemon prio=10 tid=0x7ffe0c004000 nid=0x18a2 runnable [0x7ffebc503000] java.lang.Thread.State: RUNNABLE at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:272) at java.io.BufferedInputStream.read1(BufferedInputStream.java:273) at java.io.BufferedInputStream.read(BufferedInputStream.java:334) - locked 0xfaeec108 (a java.lang.UNIXProcess$ProcessPipeInputStream) at java.io.FilterInputStream.read(FilterInputStream.java:107) at org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311) at org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38) process reaper daemon prio=10 tid=0x7ffe0c001000 nid=0x1868 runnable [0x7ffecc0c7000] java.lang.Thread.State: RUNNABLE at java.lang.UNIXProcess.waitForProcessExit(Native Method) at java.lang.UNIXProcess.access$500(UNIXProcess.java:54) at java.lang.UNIXProcess$4.run(UNIXProcess.java:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ExecutorRunner for app-20140925150845-0007/2 daemon prio=10 tid=0x7ffe7011b800 nid=0x1866 in Object.wait() [0x7ffebc705000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0xfaee9df8 (a java.lang.UNIXProcess) at java.lang.Object.wait(Object.java:503) at java.lang.UNIXProcess.waitFor(UNIXProcess.java:263) - locked 0xfaee9df8 (a java.lang.UNIXProcess) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:164) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:63) Attach Listener daemon prio=10 tid=0x7ffe84001000 nid=0x170f waiting on condition [0x] java.lang.Thread.State: RUNNABLE sparkWorker-akka.actor.default-dispatcher-16 daemon prio=10 tid=0x7ffe68214800 nid=0x13a3 waiting on condition [0x7ffebc806000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0xfd614a78 (a akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool) at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) sparkWorker-akka.actor.default-dispatcher-15 daemon prio=10 tid=0x7ffe7011e000 nid=0x13a2 waiting on condition [0x7ffebc604000]
[jira] [Created] (SPARK-3687) Spark hang while
Ziv Huang created SPARK-3687: Summary: Spark hang while Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Reporter: Ziv Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Summary: Spark hang while processing more than 100 sequence files (was: Spark hang while ) Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Reporter: Ziv Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Affects Version/s: 1.0.2 1.1.0 Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Component/s: Spark Core Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: I use spark Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang I use spark -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered (was: In my application, I read more than 100 sequence files, ) Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get job hanged if the number of partitions to be processed is no greater than 80. was:In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get job hanged if the number of partitions to be processed is no greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files
[ https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ziv Huang updated SPARK-3687: - Description: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. was: In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get job hanged if the number of partitions to be processed is no greater than 80. Spark hang while processing more than 100 sequence files Key: SPARK-3687 URL: https://issues.apache.org/jira/browse/SPARK-3687 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.2, 1.1.0 Reporter: Ziv Huang In my application, I read more than 100 sequence files to a JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered to get the result. It is quite often (but not always) that the spark hangs while the executing some of 110th-130th tasks. The job can hang for several hours, maybe forever (I can't wait for its completion). When the spark job hangs, I can't find any error message in anywhere, and I can't kill the job from web UI. The current workaround is to use coalesce to reduce the number of partitions to be processed. I never get a job hanged if the number of partitions to be processed is no greater than 80. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org