Copilot commented on code in PR #2151:
URL: https://github.com/apache/auron/pull/2151#discussion_r3026121706


##########
auron-core/src/main/java/org/apache/auron/jni/AuronCallNativeWrapper.java:
##########
@@ -90,13 +70,45 @@ public AuronCallNativeWrapper(
         this.stageId = stageId;
         this.taskId = taskId;
 
+        init();
         LOG.warn("Start executing native plan");
         this.nativeRuntimePtr = JniBridge.callNative(
                 nativeMemory,
                 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.NATIVE_LOG_LEVEL),
                 this);
     }
 
+    private static void init() {
+        if (!initialized) {
+            synchronized (AuronCallNativeWrapper.class) {
+                if (!initialized) {
+                    // initialize native environment
+                    LOG.info("Initializing native environment (batchSize="
+                            + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
+                            + ", "
+                            + "memoryFraction="
+                            + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)
+                            + ")");
+
+                    // arrow configuration
+                    System.setProperty("arrow.struct.conflict.policy", 
"CONFLICT_APPEND");
+
+                    // preload JNI bridge classes
+                    try {
+                        Class.forName("org.apache.auron.jni.JniBridge");
+                    } catch (ClassNotFoundException e) {
+                        throw new RuntimeException("Cannot load JniBridge 
class", e);
+                    }
+
+                    AuronAdaptor.getInstance().loadAuronLib();
+                    Runtime.getRuntime().addShutdownHook(new 
Thread(JniBridge::onExit));
+
+                    initialized = true;

Review Comment:
   Because initialization is now retried after failures, 
`AuronAdaptor.getInstance().loadAuronLib()` may be invoked multiple times in 
the same JVM. In the Spark adaptor, each call extracts to a fresh 
`createTempFile(...).deleteOnExit()`, so repeated failures can accumulate temp 
files until executor exit. Consider making `loadAuronLib()` idempotent / cached 
(or cleaning up temp files on failure) to avoid disk bloat on 
flaky/interrupt-driven init attempts.
   ```suggestion
                       try {
                           // initialize native environment
                           LOG.info("Initializing native environment 
(batchSize="
                                   + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
                                   + ", "
                                   + "memoryFraction="
                                   + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)
                                   + ")");
   
                           // arrow configuration
                           System.setProperty("arrow.struct.conflict.policy", 
"CONFLICT_APPEND");
   
                           // preload JNI bridge classes
                           try {
                               Class.forName("org.apache.auron.jni.JniBridge");
                           } catch (ClassNotFoundException e) {
                               throw new RuntimeException("Cannot load 
JniBridge class", e);
                           }
   
                           AuronAdaptor.getInstance().loadAuronLib();
                           Runtime.getRuntime().addShutdownHook(new 
Thread(JniBridge::onExit));
                       } finally {
                           // Mark initialization as attempted to avoid 
repeated native library loading
                           initialized = true;
                       }
   ```



##########
auron-core/src/main/java/org/apache/auron/jni/AuronCallNativeWrapper.java:
##########
@@ -90,13 +70,45 @@ public AuronCallNativeWrapper(
         this.stageId = stageId;
         this.taskId = taskId;
 
+        init();
         LOG.warn("Start executing native plan");
         this.nativeRuntimePtr = JniBridge.callNative(
                 nativeMemory,
                 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.NATIVE_LOG_LEVEL),
                 this);
     }
 
+    private static void init() {
+        if (!initialized) {
+            synchronized (AuronCallNativeWrapper.class) {
+                if (!initialized) {
+                    // initialize native environment
+                    LOG.info("Initializing native environment (batchSize="
+                            + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
+                            + ", "
+                            + "memoryFraction="
+                            + 
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)

Review Comment:
   `init()` can run while a Spark task thread is already interrupted/cancelled 
(the production issue shows `ClosedByInterruptException`). Even though 
initialization is now retriable, attempting `Files.copy/System.load` from an 
interrupted task is still likely to fail and generate noisy errors. Consider 
short-circuiting initialization when the current task is not running (e.g., 
`!AuronAdaptor.getInstance().isTaskRunning()` or 
`Thread.currentThread().isInterrupted()`), and fail fast so a later 
non-cancelled task can perform the one-time init successfully.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to