StanZhai created SPARK-19532:
--------------------------------

             Summary: [Core]`DataStreamer for file` threads of DFSOutputStream 
leak if set `spark.speculation` to true
                 Key: SPARK-19532
                 URL: https://issues.apache.org/jira/browse/SPARK-19532
             Project: Spark
          Issue Type: Bug
          Components: Spark Core, SQL
    Affects Versions: 2.1.0
            Reporter: StanZhai
            Priority: Blocker


When set `spark.speculation` to true, from thread dump page of Executor of 
WebUI, I found that there are about 1300 threads named  "DataStreamer for file 
/test/data/test_temp/_temporary/0/_temporary/attempt_20170207172435_80750_m_000069_1/part-00069-690407af-0900-46b1-9590-a6d6c696fe68.snappy.parquet"
 in TIMED_WAITING state.

{code}
java.lang.Object.wait(Native Method)
org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:564)
{code}

The off-heap memory exceeds a lot until Executor exited with OOM exception. 

This problem occurs only when writing data to the Hadoop(tasks may be killed by 
Executor during writing).




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to