Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
codope closed issue #9826: [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running URL: https://github.com/apache/hudi/issues/9826 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
ad1happy2go commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1894915016 Closing this issue as 0.14.1 is realeased. Please reopen in case you see this issue again @zyclove -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
zyclove commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1851189708 This problem was recently verified based on version 0.14 with the #10224 . It has been running for a week and the problem has not been reproduced for the time being. The release-0.14.1 need to be merged. @nsivabalan -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
zyclove commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1835889750 > We already fixed this in 0.12.3 #7799 I will look into the daemon thread pool w/ marker handler. But upgrading to 0.12.3 should help you fix the issue you are facing. > > Let us know if the fix may not solve your case. The 0.12.3 works well in my jobs for long time . -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
zyclove commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1835885551 @nsivabalan ok , thanks. #10224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
nsivabalan commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834853977 @zyclove : for now you can probably patch internally. But we plan to have 0.14.1 in 2 weeks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
nsivabalan commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834851062 While investigating this, found a related gap. So, have put out a fix https://github.com/apache/hudi/pull/10224 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
nsivabalan commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834708610 We already fixed this in 0.12.3 https://github.com/apache/hudi/pull/7799 I will look into the daemon thread pool w/ marker handler. But upgrading to 0.12.3 should help you fix the issue you are facing. Let us know if the fix may not solve your case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
zyclove commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1823781426 Hi, this issue occurs frequently, has it been resolved? As https://issues.apache.org/jira/browse/HUDI-6980 is not closed. When will version 0.14.1 be released? There is an urgent need to upgrade, including other issues. @ad1happy2go @pravin1406 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
ad1happy2go commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1778651895 @pravin1406 Great. Thanks. I encourage you to contribute and raise a PR. Let me know in case you need any help. Created a JIRA - https://issues.apache.org/jira/browse/HUDI-6980 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
pravin1406 commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1770433093 Hi @ad1happy2go Sure, would be happy to help out. These are the thread pool i suspected were running, so made sure they spawn daemon threads only. https://github.com/apache/hudi/assets/25177655/676b8d2c-ac78-4a22-aacf-efcf4fd5475d";> Updated BaseHoodieWriteClient to check if all steps are completed before exiting. If not, then call close if timeline service was created in this client. https://github.com/apache/hudi/assets/25177655/c5422c0d-7aa3-4db4-a59c-66ef091fb1d7";> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
ad1happy2go commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1768272437 Great! Thanks @pravin1406 . Do you mind sharing the changes with community. Also do you have interest to contribute to hudi. I can help you in case you face any issues. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
pravin1406 commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1767881958 Anyways, was able to reproduce using a local setup and minio for s3 support. Also was able to catch the relevant error in SparkRDDWriteClient to explicitly call timeline server close. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
danny0405 commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1756650395 cc @yihua for taking care of this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
pravin1406 commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1752626251 @yihua anything on this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
danny0405 commented on issue #9826: URL: https://github.com/apache/hudi/issues/9826#issuecomment-1751565733 @yihua Any good ideas? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]
pravin1406 opened a new issue, #9826: URL: https://github.com/apache/hudi/issues/9826 Hi, In one of the cases when we writing incremental data (upsert) in a hudi table, we failed to write deltacommit file on storage, which caused the HoodieIOException. After which spark context exited. But the JVM did not die. When we took a thread dump we non daemon many thread pools still running, which would have caused the JVM to not shutdown. Further upon debugging, we came to conclusion that this was due to marker handler creating 2 set of threadpools in non daemon way. Also when the exception was thrown, the timeline service thread was stopped as it was daemon (we have made changes to open source code). But marker handler threads were left non daemon. Had the exception was caught properly and .close of timeline service was called in finally block, this would have not happened. I've tried to make this these thread pool deamon. Also wanted to handle gracefull shutdown of the timeline service. but couldn't figure it out. Need you help. Attaching exception stacktrace and threadump. **Environment Description** * Hudi version : 0.12.2 * Spark version : 3.2.0 * Hive version : 3.1.1 * Hadoop version : 3.1.1 * Storage (HDFS/S3/GCS..) : S3 * Running on Docker? (yes/no) : no **Additional context** Add any other context about the problem here. **Stacktrace** ```Add the stacktrace of the error.``` `Exception in thread "main" org.apache.hudi.exception.HoodieIOException: Failed to create file s3a://intsys-erp/datalake_staging/p_fa_books/.hoodie/metadata/.hoodie/20231002172521154.deltacommit.inflight at org.apache.hudi.common.util.FileIOUtils.createFileInPath(FileIOUtils.java:183) at org.apache.hudi.common.util.FileIOUtils.createFileInPath(FileIOUtils.java:189) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:580) at org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:642) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:148) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:163) at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:45) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:107) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:76) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:166) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:819) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:886) at org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$52(BaseHoodieWriteClient.java:350) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:350) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:282) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:235) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:126) at org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:701) at org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:345) at org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145) at org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75) at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73) at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExec