Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2024-01-16 Thread via GitHub


codope closed issue #9826: [SUPPORT] Spark job stuck after completion, due to 
some non daemon threads still running
URL: https://github.com/apache/hudi/issues/9826


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2024-01-16 Thread via GitHub


ad1happy2go commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1894915016

   Closing this issue as 0.14.1 is realeased. Please reopen in case you see 
this issue again @zyclove 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-12-11 Thread via GitHub


zyclove commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1851189708

   This problem was recently verified based on version 0.14 with the #10224 . 
It has been running for a week and the problem has not been reproduced for the 
time being. The release-0.14.1 need to be merged. 
   @nsivabalan 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-12-01 Thread via GitHub


zyclove commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1835889750

   > We already fixed this in 0.12.3 #7799 I will look into the daemon thread 
pool w/ marker handler. But upgrading to 0.12.3 should help you fix the issue 
you are facing.
   > 
   > Let us know if the fix may not solve your case.
   
   The 0.12.3 works well in my jobs for long time .


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-12-01 Thread via GitHub


zyclove commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1835885551

   @nsivabalan  ok , thanks. 
   
   #10224 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-11-30 Thread via GitHub


nsivabalan commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834853977

   @zyclove : for now you can probably patch internally. But we plan to have 
0.14.1 in 2 weeks. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-11-30 Thread via GitHub


nsivabalan commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834851062

   While investigating this, found a related gap. So, have put out a fix 
https://github.com/apache/hudi/pull/10224 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-11-30 Thread via GitHub


nsivabalan commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1834708610

   We already fixed this in 0.12.3 https://github.com/apache/hudi/pull/7799 
   I will look into the daemon thread pool w/ marker handler. But upgrading to 
0.12.3 should help you fix the issue you are facing. 
   
   Let us know if the fix may not solve your case. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-11-22 Thread via GitHub


zyclove commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1823781426

   Hi, this issue occurs frequently, has it been resolved? As 
https://issues.apache.org/jira/browse/HUDI-6980 is not closed.
   When will version 0.14.1 be released? There is an urgent need to upgrade, 
including other issues.
   
   
   @ad1happy2go @pravin1406 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-25 Thread via GitHub


ad1happy2go commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1778651895

   @pravin1406 Great. Thanks. I encourage you to contribute and raise a PR. Let 
me know in case you need any help.
   
   Created a JIRA - https://issues.apache.org/jira/browse/HUDI-6980


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-19 Thread via GitHub


pravin1406 commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1770433093

   Hi @ad1happy2go Sure, would be happy to help out.
   
   These are the thread pool i suspected were running, so made sure they spawn 
daemon threads only.
   
   https://github.com/apache/hudi/assets/25177655/676b8d2c-ac78-4a22-aacf-efcf4fd5475d";>
   
   
   Updated BaseHoodieWriteClient to check if all steps are completed before 
exiting. If not, then call close if timeline service was created in this client.
   
   https://github.com/apache/hudi/assets/25177655/c5422c0d-7aa3-4db4-a59c-66ef091fb1d7";>
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-18 Thread via GitHub


ad1happy2go commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1768272437

   Great! Thanks @pravin1406 . Do you mind sharing the changes with community.
   
   Also do you have interest to contribute to hudi. I can help you in case you 
face any issues. Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-18 Thread via GitHub


pravin1406 commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1767881958

   Anyways, was able to reproduce using a local setup and minio for s3 support. 
Also was able to catch the relevant error in SparkRDDWriteClient to explicitly 
call timeline server close.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-10 Thread via GitHub


danny0405 commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1756650395

   cc @yihua for taking care of this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-09 Thread via GitHub


pravin1406 commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1752626251

   @yihua anything on this ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-06 Thread via GitHub


danny0405 commented on issue #9826:
URL: https://github.com/apache/hudi/issues/9826#issuecomment-1751565733

   @yihua Any good ideas?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[I] [SUPPORT] Spark job stuck after completion, due to some non daemon threads still running [hudi]

2023-10-06 Thread via GitHub


pravin1406 opened a new issue, #9826:
URL: https://github.com/apache/hudi/issues/9826

   Hi,
   In one of the cases when we writing incremental data  (upsert) in a hudi 
table, we failed to write deltacommit file on storage, which caused the 
HoodieIOException. After which spark context exited. But the JVM did not die.  
When we took a thread dump we non daemon many thread pools still running, which 
would have caused the JVM to not shutdown.
   
   Further upon debugging, we came to conclusion that this was due to marker 
handler creating 2 set of threadpools in non daemon way. Also when the 
exception was thrown, the timeline service thread was stopped as it was daemon 
(we have made changes to open source code). But  marker handler threads were 
left non daemon. Had the exception was caught properly and .close of timeline 
service was called in finally block, this would have not happened. I've tried 
to make this these thread pool deamon. Also wanted to handle gracefull shutdown 
of the timeline service. but couldn't figure it out. Need you help.
   
   Attaching exception stacktrace and threadump.
   
   
   **Environment Description**
   
   * Hudi version : 0.12.2
   
   * Spark version : 3.2.0
   
   * Hive version : 3.1.1
   * Hadoop version : 3.1.1
   
   * Storage (HDFS/S3/GCS..) : S3
   
   * Running on Docker? (yes/no) : no
   
   
   **Additional context**
   
   Add any other context about the problem here.
   
   **Stacktrace**
   
   ```Add the stacktrace of the error.```
   
   
   `Exception in thread "main" org.apache.hudi.exception.HoodieIOException: 
Failed to create file 
s3a://intsys-erp/datalake_staging/p_fa_books/.hoodie/metadata/.hoodie/20231002172521154.deltacommit.inflight
   at 
org.apache.hudi.common.util.FileIOUtils.createFileInPath(FileIOUtils.java:183)
   at 
org.apache.hudi.common.util.FileIOUtils.createFileInPath(FileIOUtils.java:189)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionState(HoodieActiveTimeline.java:580)
   at 
org.apache.hudi.common.table.timeline.HoodieActiveTimeline.transitionRequestedToInflight(HoodieActiveTimeline.java:642)
   at 
org.apache.hudi.table.action.commit.BaseCommitActionExecutor.saveWorkloadProfileMetadataToInflight(BaseCommitActionExecutor.java:148)
   at 
org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:163)
   at 
org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:45)
   at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:107)
   at 
org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:76)
   at 
org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:173)
   at 
org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:166)
   at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:819)
   at 
org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:886)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.lambda$writeTableMetadata$52(BaseHoodieWriteClient.java:350)
   at 
org.apache.hudi.common.util.Option.ifPresent(Option.java:97)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:350)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:282)
   at 
org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:235)
   at 
org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:126)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.commitAndPerformPostOperations(HoodieSparkSqlWriter.scala:701)
   at 
org.apache.hudi.HoodieSparkSqlWriter$.write(HoodieSparkSqlWriter.scala:345)
   at 
org.apache.hudi.DefaultSource.createRelation(DefaultSource.scala:145)
   at 
org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
   at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
   at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExec