cxzl25 commented on code in PR #3488:
URL: https://github.com/apache/celeborn/pull/3488#discussion_r2374960578


##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -3915,6 +3917,17 @@ object CelebornConf extends Logging {
       .booleanConf
       .createWithDefault(false)
 
+  val WORKER_GRACEFUL_SHUTDOWN_DB_DELETE_FAILURE_POLICY: ConfigEntry[String] =
+    buildConf("celeborn.worker.graceful.shutdown.dbDeleteFailurePolicy")
+      .categories("worker")
+      .doc("Policy for handling DB delete failures during graceful shutdown. " 
+
+        "THROW: throw exception, EXIT: trigger graceful shutdown, IGNORE: log 
error and continue (default).")
+      .version("0.7.0")
+      .stringConf
+      .transform(_.toUpperCase(Locale.ROOT))
+      .checkValues(Set("THROW", "EXIT", "IGNORE"))

Review Comment:
   Currently THROW only outputs exceptions in thread pools and skips the 
remaining cleanup.
   
   ```
   25/09/06 22:14:37,927 ERROR [worker-expired-shuffle-cleaner-35042] 
ThreadExceptionHandler: Uncaught exception in executor service 
worker-expired-shuffle-cleaner, thread 
Thread[worker-expired-shuffle-cleaner-35042,5,main]
   java.lang.RuntimeException: org.rocksdb.RocksDBException: While open a file 
for appending: Xrecovery.rdb/002960.log: No space left on device
           at com.google.common.base.Throwables.propagate(Throwables.java:234)
           at 
org.apache.celeborn.service.deploy.worker.shuffledb.RocksDB.delete(RocksDB.java:75)
           at 
org.apache.celeborn.service.deploy.worker.storage.StorageManager.$anonfun$cleanupExpiredShuffleKey$1(StorageManager.scala:600)
   ```



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -3915,6 +3917,17 @@ object CelebornConf extends Logging {
       .booleanConf
       .createWithDefault(false)
 
+  val WORKER_GRACEFUL_SHUTDOWN_DB_DELETE_FAILURE_POLICY: ConfigEntry[String] =
+    buildConf("celeborn.worker.graceful.shutdown.dbDeleteFailurePolicy")
+      .categories("worker")
+      .doc("Policy for handling DB delete failures during graceful shutdown. " 
+
+        "THROW: throw exception, EXIT: trigger graceful shutdown, IGNORE: log 
error and continue (default).")
+      .version("0.7.0")
+      .stringConf
+      .transform(_.toUpperCase(Locale.ROOT))
+      .checkValues(Set("THROW", "EXIT", "IGNORE"))

Review Comment:
   Currently `THROW` only outputs exceptions in thread pools and skips the 
remaining cleanup.
   
   ```
   25/09/06 22:14:37,927 ERROR [worker-expired-shuffle-cleaner-35042] 
ThreadExceptionHandler: Uncaught exception in executor service 
worker-expired-shuffle-cleaner, thread 
Thread[worker-expired-shuffle-cleaner-35042,5,main]
   java.lang.RuntimeException: org.rocksdb.RocksDBException: While open a file 
for appending: Xrecovery.rdb/002960.log: No space left on device
           at com.google.common.base.Throwables.propagate(Throwables.java:234)
           at 
org.apache.celeborn.service.deploy.worker.shuffledb.RocksDB.delete(RocksDB.java:75)
           at 
org.apache.celeborn.service.deploy.worker.storage.StorageManager.$anonfun$cleanupExpiredShuffleKey$1(StorageManager.scala:600)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to