[ 
https://issues.apache.org/jira/browse/SPARK-3687?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ziv Huang updated SPARK-3687:
-----------------------------
    Description: 
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 120th-150th tasks.

In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
its completion).
When the spark job hangs,  I can't kill the job from web UI.

In 1.1.0, the job hangs for couple mins (3.x mins actually),
and then web UI of spark master shows that the job is finished with state 
"FAILED".
In addition, the job stage web UI still hangs, and execution duration time is 
still accumulating.

For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get a job hanged if the number of partitions to be processed is no 
greater than 100.

  was:
In my application, I read more than 100 sequence files to a JavaPairRDD, 
perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
result.
It is quite often (but not always) that the spark hangs while the executing 
some of 110th-130th tasks.
The job can hang for several hours, maybe forever (I can't wait for its 
completion).
When the spark job hangs, I can't find any error message in anywhere, and I 
can't kill the job from web UI.

The current workaround is to use coalesce to reduce the number of partitions to 
be processed.
I never get a job hanged if the number of partitions to be processed is no 
greater than 80.


> Spark hang while processing more than 100 sequence files
> --------------------------------------------------------
>
>                 Key: SPARK-3687
>                 URL: https://issues.apache.org/jira/browse/SPARK-3687
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.2, 1.1.0
>            Reporter: Ziv Huang
>
> In my application, I read more than 100 sequence files to a JavaPairRDD, 
> perform flatmap to get another JavaRDD, and then use takeOrdered to get the 
> result.
> It is quite often (but not always) that the spark hangs while the executing 
> some of 120th-150th tasks.
> In 1.0.2, the job can hang for several hours, maybe forever (I can't wait for 
> its completion).
> When the spark job hangs,  I can't kill the job from web UI.
> In 1.1.0, the job hangs for couple mins (3.x mins actually),
> and then web UI of spark master shows that the job is finished with state 
> "FAILED".
> In addition, the job stage web UI still hangs, and execution duration time is 
> still accumulating.
> For both 1.0.2 and 1.1.0, the job hangs with no error messages in anywhere.
> The current workaround is to use coalesce to reduce the number of partitions 
> to be processed.
> I never get a job hanged if the number of partitions to be processed is no 
> greater than 100.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to