[ 
https://issues.apache.org/jira/browse/HUDI-3724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

sivabalan narayanan closed HUDI-3724.
-------------------------------------
    Resolution: Fixed

not reproducible anymore. will revisit if we run into issues.

> Too many open files w/ COW spark long running tests
> ---------------------------------------------------
>
>                 Key: HUDI-3724
>                 URL: https://issues.apache.org/jira/browse/HUDI-3724
>             Project: Apache Hudi
>          Issue Type: Bug
>            Reporter: sivabalan narayanan
>            Assignee: sivabalan narayanan
>            Priority: Blocker
>              Labels: pull-request-available
>             Fix For: 0.11.0
>
>
> We run integ tests against hudi and recently our spark long running tests are 
> failing for COW table with "too many open files". May be we have some leaks 
> and need to chase them and close it out. 
> {code:java}
>       ... 6 more
> Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
> Task 0 in stage 6808.0 failed 1 times, most recent failure: Lost task 0.0 in 
> stage 6808.0 (TID 109960) (ip-10-0-40-161.us-west-1.compute.internal executor 
> driver): java.io.FileNotFoundException: 
> /tmp/blockmgr-96dd9c25-86c7-4d00-a20a-d6515eef9a37/39/temp_shuffle_9149fce7-e9b0-4fee-bb21-1eba16dd89a3
>  (Too many open files)
>       at java.io.FileOutputStream.open0(Native Method)
>       at java.io.FileOutputStream.open(FileOutputStream.java:270)
>       at java.io.FileOutputStream.<init>(FileOutputStream.java:213)
>       at 
> org.apache.spark.storage.DiskBlockObjectWriter.initialize(DiskBlockObjectWriter.scala:133)
>       at 
> org.apache.spark.storage.DiskBlockObjectWriter.open(DiskBlockObjectWriter.scala:152)
>       at 
> org.apache.spark.storage.DiskBlockObjectWriter.write(DiskBlockObjectWriter.scala:279)
>       at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:171)
>       at 
> org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
>       at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
>       at org.apache.spark.scheduler.Task.run(Task.scala:131)
>       at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
>       at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1462)
>       at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to