[jira] [Commented] (SPARK-16643) When doing Shuffle, report "java.io.FileNotFoundException"
[ https://issues.apache.org/jira/browse/SPARK-16643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16147139#comment-16147139 ] Sajith Dimal commented on SPARK-16643: -- We observed this in spark version 1.6.2 as well, please find the bellow error log: TID: [-1] [] [2017-08-01 22:05:16,768] ERROR {org.apache.spark.executor.Executor} - Exception in task 0.0 in stage 6.0 (TID 6) {org.apache.spark.executor.Executor} java.io.FileNotFoundException: /tmp/blockmgr-d44d050f-8727-4f96-83f5-69e3281d7aa5/39/temp_shuffle_3145a66b-1823-4c82-a7ac-2ac55fd5726e (Stale file handle) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.(FileOutputStream.java:206) at org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:140) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) > When doing Shuffle, report "java.io.FileNotFoundException" > -- > > Key: SPARK-16643 > URL: https://issues.apache.org/jira/browse/SPARK-16643 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: LSB Version: > :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch > Distributor ID: CentOS > Description: CentOS release 6.6 (Final) > Release: 6.6 > Codename: Final > java version "1.7.0_10" > Java(TM) SE Runtime Environment (build 1.7.0_10-b18) > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) >Reporter: Deng Changchun > > In our spark cluster of standalone mode, we execute some SQLs on SparkSQL, > such some aggregate sqls as "select count(rowKey) from HVRC_B_LOG where 1=1 > and RESULTTIME >= 146332800 and RESULTTIME <= 1463414399000" > at the begining all is good, however after about 15 days, when execute the > aggreate sqls, it will report error, the log looks like: > 【Notice: > it is very strange that it won't report error every time when executing > aggreate sql, let's say random, after executing some aggregate sqls, it will > log error by chance.】 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Managed memory leak detected; size = 8388608 bytes, TID = > 624 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Exception in task 0.3 in stage 580.0 (TID 624) > java.io.FileNotFoundException: > /tmp/spark-cb199fce-bb80-4e6f-853f-4d7984bf5f34/executor-fb7c2149-c6c4-4697-ba2f-3b53dcd7f34a/blockmgr-0a9003ad-23b3-4ff5-b76f-6fbc5d71e730/3e/temp_shuffle_ef68b340-85e4-483c-90e8-5e8c8d8ee4ee > (没有那个文件或目录) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:212) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16643) When doing Shuffle, report "java.io.FileNotFoundException"
[ https://issues.apache.org/jira/browse/SPARK-16643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387272#comment-15387272 ] Sean Owen commented on SPARK-16643: --- Could be. There are others related to FileNotFoundException, just had trouble finding them. Yes try a different version. I don't think anyone would look at an issue that's only known to affect 1.5 at this point, as there are no more releases in that line. > When doing Shuffle, report "java.io.FileNotFoundException" > -- > > Key: SPARK-16643 > URL: https://issues.apache.org/jira/browse/SPARK-16643 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: LSB Version: > :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch > Distributor ID: CentOS > Description: CentOS release 6.6 (Final) > Release: 6.6 > Codename: Final > java version "1.7.0_10" > Java(TM) SE Runtime Environment (build 1.7.0_10-b18) > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) >Reporter: Deng Changchun > > In our spark cluster of standalone mode, we execute some SQLs on SparkSQL, > such some aggregate sqls as "select count(rowKey) from HVRC_B_LOG where 1=1 > and RESULTTIME >= 146332800 and RESULTTIME <= 1463414399000" > at the begining all is good, however after about 15 days, when execute the > aggreate sqls, it will report error, the log looks like: > 【Notice: > it is very strange that it won't report error every time when executing > aggreate sql, let's say random, after executing some aggregate sqls, it will > log error by chance.】 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Managed memory leak detected; size = 8388608 bytes, TID = > 624 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Exception in task 0.3 in stage 580.0 (TID 624) > java.io.FileNotFoundException: > /tmp/spark-cb199fce-bb80-4e6f-853f-4d7984bf5f34/executor-fb7c2149-c6c4-4697-ba2f-3b53dcd7f34a/blockmgr-0a9003ad-23b3-4ff5-b76f-6fbc5d71e730/3e/temp_shuffle_ef68b340-85e4-483c-90e8-5e8c8d8ee4ee > (没有那个文件或目录) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:212) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16643) When doing Shuffle, report "java.io.FileNotFoundException"
[ https://issues.apache.org/jira/browse/SPARK-16643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15387012#comment-15387012 ] Deng Changchun commented on SPARK-16643: Thank you for response. I think this problem is totally different from https://issues.apache.org/jira/browse/SPARK-12240, even through they both reported FileNotFoundException. For SPARK-12240, I can solve it through setting ulimit. Come back to this problem, I have setted ulimit unlimited, the error info is not "too many open files", just "that file or directory doesn't exist". So I don't think they are the similar problem. By the way, I will try with a more recent version. > When doing Shuffle, report "java.io.FileNotFoundException" > -- > > Key: SPARK-16643 > URL: https://issues.apache.org/jira/browse/SPARK-16643 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: LSB Version: > :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch > Distributor ID: CentOS > Description: CentOS release 6.6 (Final) > Release: 6.6 > Codename: Final > java version "1.7.0_10" > Java(TM) SE Runtime Environment (build 1.7.0_10-b18) > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) >Reporter: Deng Changchun > > In our spark cluster of standalone mode, we execute some SQLs on SparkSQL, > such some aggregate sqls as "select count(rowKey) from HVRC_B_LOG where 1=1 > and RESULTTIME >= 146332800 and RESULTTIME <= 1463414399000" > at the begining all is good, however after about 15 days, when execute the > aggreate sqls, it will report error, the log looks like: > 【Notice: > it is very strange that it won't report error every time when executing > aggreate sql, let's say random, after executing some aggregate sqls, it will > log error by chance.】 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Managed memory leak detected; size = 8388608 bytes, TID = > 624 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Exception in task 0.3 in stage 580.0 (TID 624) > java.io.FileNotFoundException: > /tmp/spark-cb199fce-bb80-4e6f-853f-4d7984bf5f34/executor-fb7c2149-c6c4-4697-ba2f-3b53dcd7f34a/blockmgr-0a9003ad-23b3-4ff5-b76f-6fbc5d71e730/3e/temp_shuffle_ef68b340-85e4-483c-90e8-5e8c8d8ee4ee > (没有那个文件或目录) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:212) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-16643) When doing Shuffle, report "java.io.FileNotFoundException"
[ https://issues.apache.org/jira/browse/SPARK-16643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15385548#comment-15385548 ] Sean Owen commented on SPARK-16643: --- This might have been resolved since 1.5.0; can you try with a more recent version? There are similar old issues like https://issues.apache.org/jira/browse/SPARK-12240 > When doing Shuffle, report "java.io.FileNotFoundException" > -- > > Key: SPARK-16643 > URL: https://issues.apache.org/jira/browse/SPARK-16643 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.5.2 > Environment: LSB Version: > :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch > Distributor ID: CentOS > Description: CentOS release 6.6 (Final) > Release: 6.6 > Codename: Final > java version "1.7.0_10" > Java(TM) SE Runtime Environment (build 1.7.0_10-b18) > Java HotSpot(TM) 64-Bit Server VM (build 23.6-b04, mixed mode) >Reporter: Deng Changchun > > In our spark cluster of standalone mode, we execute some SQLs on SparkSQL, > such some aggregate sqls as "select count(rowKey) from HVRC_B_LOG where 1=1 > and RESULTTIME >= 146332800 and RESULTTIME <= 1463414399000" > at the begining all is good, however after about 15 days, when execute the > aggreate sqls, it will report error, the log looks like: > 【Notice: > it is very strange that it won't report error every time when executing > aggreate sql, let's say random, after executing some aggregate sqls, it will > log error by chance.】 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Managed memory leak detected; size = 8388608 bytes, TID = > 624 > 2016-07-20 13:48:50,250 ERROR [Executor task launch worker-75] > executor.Executor: Exception in task 0.3 in stage 580.0 (TID 624) > java.io.FileNotFoundException: > /tmp/spark-cb199fce-bb80-4e6f-853f-4d7984bf5f34/executor-fb7c2149-c6c4-4697-ba2f-3b53dcd7f34a/blockmgr-0a9003ad-23b3-4ff5-b76f-6fbc5d71e730/3e/temp_shuffle_ef68b340-85e4-483c-90e8-5e8c8d8ee4ee > (没有那个文件或目录) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.(FileOutputStream.java:212) > at > org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:88) > at > org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.insertAll(BypassMergeSortShuffleWriter.java:110) > at > org.apache.spark.shuffle.sort.SortShuffleWriter.write(SortShuffleWriter.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:73) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41) > at org.apache.spark.scheduler.Task.run(Task.scala:88) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) > at java.lang.Thread.run(Thread.java:722) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org