[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2015-02-25 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337785#comment-14337785
 ] 

Ziv Huang commented on SPARK-3687:
--

I've already resigned my job (not due to this issue) and hence I don't have
the environment to reproduce it or test it on Spark 1.3.0.
I'm sorry for this and I do hope someone can take over and trace it for us.




 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 120th-150th tasks.
 In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
 its completion).
 When the spark job hangs,  I can't kill the job from web UI.
 In 1.1.0, the job hangs for couple mins (3.x mins actually),
 and then web UI of spark master shows that the job is finished with state 
 FAILED.
 In addition, the job stage web UI still hangs, and execution duration time is 
 still accumulating.
 For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-10-02 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14156236#comment-14156236
 ] 

Ziv Huang commented on SPARK-3687:
--

The following is the jstack dump of one CoarseGrainedExecutorBackend when the 
job hangs (the spark version is 1.1.0):

Attach Listener daemon prio=10 tid=0x7fded0001000 nid=0x7836 waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

Hashed wheel timer #1 daemon prio=10 tid=0x7fde9c001000 nid=0x7811 
waiting on condition [0x7fdf26a84000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.jboss.netty.util.HashedWheelTimer$Worker.waitForNextTick(HashedWheelTimer.java:503)
at 
org.jboss.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:401)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at java.lang.Thread.run(Thread.java:745)

New I/O server boss #6 daemon prio=10 tid=0x7fdeb4084000 nid=0x7810 
runnable [0x7fdf26b85000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x0007db53acc0 (a sun.nio.ch.Util$2)
- locked 0x0007db53acb0 (a java.util.Collections$UnmodifiableSet)
- locked 0x0007db53ab98 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:102)
at 
org.jboss.netty.channel.socket.nio.NioServerBoss.select(NioServerBoss.java:163)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at 
org.jboss.netty.channel.socket.nio.NioServerBoss.run(NioServerBoss.java:42)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

New I/O worker #5 daemon prio=10 tid=0x7fdeb4037000 nid=0x780f runnable 
[0x7fdf26c86000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x0007db529f98 (a sun.nio.ch.Util$2)
- locked 0x0007db529f88 (a java.util.Collections$UnmodifiableSet)
- locked 0x0007db529e70 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:206)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at 
org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

New I/O worker #4 daemon prio=10 tid=0x7fdeb4032800 nid=0x780e runnable 
[0x7fdf26d87000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
- locked 0x0007db528610 (a sun.nio.ch.Util$2)
- locked 0x0007db528600 (a java.util.Collections$UnmodifiableSet)
- locked 0x0007db5284e8 (a sun.nio.ch.EPollSelectorImpl)
at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98)
at 
org.jboss.netty.channel.socket.nio.SelectorUtil.select(SelectorUtil.java:64)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.select(AbstractNioSelector.java:409)
at 

[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-30 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14152985#comment-14152985
 ] 

Ziv Huang commented on SPARK-3687:
--

I run jps on worker node when it hangs.
I see two processes: Worker and CoarseGrainedExecutorBackend.
Do you mean I should print the stack of CoarseGrainedExecutorBackend?

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 120th-150th tasks.
 In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
 its completion).
 When the spark job hangs,  I can't kill the job from web UI.
 In 1.1.0, the job hangs for couple mins (3.x mins actually),
 and then web UI of spark master shows that the job is finished with state 
 FAILED.
 In addition, the job stage web UI still hangs, and execution duration time is 
 still accumulating.
 For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147504#comment-14147504
 ] 

Ziv Huang commented on SPARK-3687:
--

The following is the jstack dump of one executor when it hangs:

File appending thread for 
/opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stderr daemon 
prio=10 tid=0x7ffe0c002800 nid=0x18a3 runnable [0x7ffebc402000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0xfaeee1d0 (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

File appending thread for 
/opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stdout daemon 
prio=10 tid=0x7ffe0c004000 nid=0x18a2 runnable [0x7ffebc503000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0xfaeec108 (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

process reaper daemon prio=10 tid=0x7ffe0c001000 nid=0x1868 runnable 
[0x7ffecc0c7000]
   java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.waitForProcessExit(Native Method)
at java.lang.UNIXProcess.access$500(UNIXProcess.java:54)
at java.lang.UNIXProcess$4.run(UNIXProcess.java:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

ExecutorRunner for app-20140925150845-0007/2 daemon prio=10 
tid=0x7ffe7011b800 nid=0x1866 in Object.wait() [0x7ffebc705000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xfaee9df8 (a java.lang.UNIXProcess)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:263)
- locked 0xfaee9df8 (a java.lang.UNIXProcess)
at 
org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:164)
at 
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:63)

Attach Listener daemon prio=10 tid=0x7ffe84001000 nid=0x170f waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

sparkWorker-akka.actor.default-dispatcher-16 daemon prio=10 
tid=0x7ffe68214800 nid=0x13a3 waiting on condition [0x7ffebc806000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xfd614a78 (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

sparkWorker-akka.actor.default-dispatcher-15 daemon prio=10 
tid=0x7ffe7011e000 nid=0x13a2 waiting on condition [0x7ffebc604000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native 

[jira] [Commented] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147513#comment-14147513
 ] 

Ziv Huang commented on SPARK-3687:
--

Just a few mins ago I ran a job twice, processing 203 sequence files.
Both times I saw the job hanging with different behavior from before: 
1. the web UI of spark master shows that the job is finished with state 
failed after 3.x mins
2. the job stage web UI still hangs, and execution duration time is still 
accumulating.
Hope this information helps debugging :)

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 110th-130th tasks.
 The job can hang for several hours, maybe forever (I can't wait for its 
 completion).
 When the spark job hangs, I can't find any error message in anywhere, and I 
 can't kill the job from web UI.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 80.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147513#comment-14147513
 ] 

Ziv Huang edited comment on SPARK-3687 at 9/25/14 8:36 AM:
---

Just a few mins ago I ran a job twice, processing 203 sequence files.
Both times I saw the job hanging with different behavior than before: 
1. the web UI of spark master shows that the job is finished with state 
failed after 3.x mins
2. the job stage web UI still hangs, and execution duration time is still 
accumulating.
Hope this information helps debugging :)


was (Author: taqilabon):
Just a few mins ago I ran a job twice, processing 203 sequence files.
Both times I saw the job hanging with different behavior from before: 
1. the web UI of spark master shows that the job is finished with state 
failed after 3.x mins
2. the job stage web UI still hangs, and execution duration time is still 
accumulating.
Hope this information helps debugging :)

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 110th-130th tasks.
 The job can hang for several hours, maybe forever (I can't wait for its 
 completion).
 When the spark job hangs, I can't find any error message in anywhere, and I 
 can't kill the job from web UI.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 80.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Comment: was deleted

(was: Just a few mins ago I ran a job twice, processing 203 sequence files.
Both times I saw the job hanging with different behavior than before: 
1. the web UI of spark master shows that the job is finished with state 
failed after 3.x mins
2. the job stage web UI still hangs, and execution duration time is still 
accumulating.
Hope this information helps debugging :))

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 120th-150th tasks.
 In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
 its completion).
 When the spark job hangs,  I can't kill the job from web UI.
 In 1.1.0, the job hangs for couple mins (3.x mins actually),
 and then web UI of spark master shows that the job is finished with state 
 FAILED.
 In addition, the job stage web UI still hangs, and execution duration time is 
 still accumulating.
 For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Description: 
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 120th-150th tasks.

In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
its completion).
When the spark job hangs,  I can't kill the job from web UI.

In 1.1.0, the job hangs for couple mins (3.x mins actually),
and then web UI of spark master shows that the job is finished with state 
FAILED.
In addition, the job stage web UI still hangs, and execution duration time is 
still accumulating.

For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get a job hanged if the number of partitions to be processed is no 
greater than 100.

  was:
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 110th-130th tasks.
The job can hang for several hours, maybe forever (I can't wait for its 
completion).
When the spark job hangs, I can't find any error message in anywhere, and I 
can't kill the job from web UI.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get a job hanged if the number of partitions to be processed is no 
greater than 80.


 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 120th-150th tasks.
 In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
 its completion).
 When the spark job hangs,  I can't kill the job from web UI.
 In 1.1.0, the job hangs for couple mins (3.x mins actually),
 and then web UI of spark master shows that the job is finished with state 
 FAILED.
 In addition, the job stage web UI still hangs, and execution duration time is 
 still accumulating.
 For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-25 Thread Ziv Huang (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14147504#comment-14147504
 ] 

Ziv Huang edited comment on SPARK-3687 at 9/25/14 3:09 PM:
---

The following is the jstack dump of one executor when it hangs (the spark 
version is 1.1.0):

File appending thread for 
/opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stderr daemon 
prio=10 tid=0x7ffe0c002800 nid=0x18a3 runnable [0x7ffebc402000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0xfaeee1d0 (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

File appending thread for 
/opt/spark-1.1.0-bin-hadoop2.4/work/app-20140925150845-0007/2/stdout daemon 
prio=10 tid=0x7ffe0c004000 nid=0x18a2 runnable [0x7ffebc503000]
   java.lang.Thread.State: RUNNABLE
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:272)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
- locked 0xfaeec108 (a 
java.lang.UNIXProcess$ProcessPipeInputStream)
at java.io.FilterInputStream.read(FilterInputStream.java:107)
at 
org.apache.spark.util.logging.FileAppender.appendStreamToFile(FileAppender.scala:70)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply$mcV$sp(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at 
org.apache.spark.util.logging.FileAppender$$anon$1$$anonfun$run$1.apply(FileAppender.scala:39)
at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1311)
at 
org.apache.spark.util.logging.FileAppender$$anon$1.run(FileAppender.scala:38)

process reaper daemon prio=10 tid=0x7ffe0c001000 nid=0x1868 runnable 
[0x7ffecc0c7000]
   java.lang.Thread.State: RUNNABLE
at java.lang.UNIXProcess.waitForProcessExit(Native Method)
at java.lang.UNIXProcess.access$500(UNIXProcess.java:54)
at java.lang.UNIXProcess$4.run(UNIXProcess.java:227)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

ExecutorRunner for app-20140925150845-0007/2 daemon prio=10 
tid=0x7ffe7011b800 nid=0x1866 in Object.wait() [0x7ffebc705000]
   java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on 0xfaee9df8 (a java.lang.UNIXProcess)
at java.lang.Object.wait(Object.java:503)
at java.lang.UNIXProcess.waitFor(UNIXProcess.java:263)
- locked 0xfaee9df8 (a java.lang.UNIXProcess)
at 
org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:164)
at 
org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:63)

Attach Listener daemon prio=10 tid=0x7ffe84001000 nid=0x170f waiting on 
condition [0x]
   java.lang.Thread.State: RUNNABLE

sparkWorker-akka.actor.default-dispatcher-16 daemon prio=10 
tid=0x7ffe68214800 nid=0x13a3 waiting on condition [0x7ffebc806000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  0xfd614a78 (a 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinPool)
at scala.concurrent.forkjoin.ForkJoinPool.scan(ForkJoinPool.java:2075)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

sparkWorker-akka.actor.default-dispatcher-15 daemon prio=10 
tid=0x7ffe7011e000 nid=0x13a2 waiting on condition [0x7ffebc604000]
   

[jira] [Created] (SPARK-3687) Spark hang while

2014-09-24 Thread Ziv Huang (JIRA)
Ziv Huang created SPARK-3687:


 Summary: Spark hang while 
 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
Reporter: Ziv Huang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Summary: Spark hang while processing more than 100 sequence files  (was: 
Spark hang while )

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
Reporter: Ziv Huang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Affects Version/s: 1.0.2
   1.1.0

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Component/s: Spark Core

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Description: I use spark 

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 I use spark 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Description: In my application, I read more than 100 sequence files to a 
JavaPairRDD, perform flatmap to get another JavaRDD, and then use takeOrdered  
(was: In my application, I read more than 100 sequence files, )

 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Description: 
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 110th-130th tasks.
The job can hang for several hours, maybe forever (I can't wait for its 
completion).
When the spark job hangs, I can't find any error message in anywhere, and I 
can't kill the job from web UI.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get job hanged if the number of partitions to be processed is no 
greater than 80.

  was:In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered


 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 110th-130th tasks.
 The job can hang for several hours, maybe forever (I can't wait for its 
 completion).
 When the spark job hangs, I can't find any error message in anywhere, and I 
 can't kill the job from web UI.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get job hanged if the number of partitions to be processed is no 
 greater than 80.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3687) Spark hang while processing more than 100 sequence files

2014-09-24 Thread Ziv Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-
Description: 
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 110th-130th tasks.
The job can hang for several hours, maybe forever (I can't wait for its 
completion).
When the spark job hangs, I can't find any error message in anywhere, and I 
can't kill the job from web UI.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get a job hanged if the number of partitions to be processed is no 
greater than 80.

  was:
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 110th-130th tasks.
The job can hang for several hours, maybe forever (I can't wait for its 
completion).
When the spark job hangs, I can't find any error message in anywhere, and I 
can't kill the job from web UI.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get job hanged if the number of partitions to be processed is no 
greater than 80.


 Spark hang while processing more than 100 sequence files
 

 Key: SPARK-3687
 URL: https://issues.apache.org/jira/browse/SPARK-3687
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.2, 1.1.0
Reporter: Ziv Huang

 In my application, I read more than 100 sequence files to a JavaPairRDD, 
 perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
 result.
 It is quite often (but not always) that the spark hangs while the executing 
 some of 110th-130th tasks.
 The job can hang for several hours, maybe forever (I can't wait for its 
 completion).
 When the spark job hangs, I can't find any error message in anywhere, and I 
 can't kill the job from web UI.
 The current workaround is to use coalesce to reduce the number of partitions 
 to be processed.
 I never get a job hanged if the number of partitions to be processed is no 
 greater than 80.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org