Copilot commented on code in PR #2151:
URL: https://github.com/apache/auron/pull/2151#discussion_r3026121706
##########
auron-core/src/main/java/org/apache/auron/jni/AuronCallNativeWrapper.java:
##########
@@ -90,13 +70,45 @@ public AuronCallNativeWrapper(
this.stageId = stageId;
this.taskId = taskId;
+ init();
LOG.warn("Start executing native plan");
this.nativeRuntimePtr = JniBridge.callNative(
nativeMemory,
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.NATIVE_LOG_LEVEL),
this);
}
+ private static void init() {
+ if (!initialized) {
+ synchronized (AuronCallNativeWrapper.class) {
+ if (!initialized) {
+ // initialize native environment
+ LOG.info("Initializing native environment (batchSize="
+ +
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
+ + ", "
+ + "memoryFraction="
+ +
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)
+ + ")");
+
+ // arrow configuration
+ System.setProperty("arrow.struct.conflict.policy",
"CONFLICT_APPEND");
+
+ // preload JNI bridge classes
+ try {
+ Class.forName("org.apache.auron.jni.JniBridge");
+ } catch (ClassNotFoundException e) {
+ throw new RuntimeException("Cannot load JniBridge
class", e);
+ }
+
+ AuronAdaptor.getInstance().loadAuronLib();
+ Runtime.getRuntime().addShutdownHook(new
Thread(JniBridge::onExit));
+
+ initialized = true;
Review Comment:
Because initialization is now retried after failures,
`AuronAdaptor.getInstance().loadAuronLib()` may be invoked multiple times in
the same JVM. In the Spark adaptor, each call extracts to a fresh
`createTempFile(...).deleteOnExit()`, so repeated failures can accumulate temp
files until executor exit. Consider making `loadAuronLib()` idempotent / cached
(or cleaning up temp files on failure) to avoid disk bloat on
flaky/interrupt-driven init attempts.
```suggestion
try {
// initialize native environment
LOG.info("Initializing native environment
(batchSize="
+
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
+ ", "
+ "memoryFraction="
+
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)
+ ")");
// arrow configuration
System.setProperty("arrow.struct.conflict.policy",
"CONFLICT_APPEND");
// preload JNI bridge classes
try {
Class.forName("org.apache.auron.jni.JniBridge");
} catch (ClassNotFoundException e) {
throw new RuntimeException("Cannot load
JniBridge class", e);
}
AuronAdaptor.getInstance().loadAuronLib();
Runtime.getRuntime().addShutdownHook(new
Thread(JniBridge::onExit));
} finally {
// Mark initialization as attempted to avoid
repeated native library loading
initialized = true;
}
```
##########
auron-core/src/main/java/org/apache/auron/jni/AuronCallNativeWrapper.java:
##########
@@ -90,13 +70,45 @@ public AuronCallNativeWrapper(
this.stageId = stageId;
this.taskId = taskId;
+ init();
LOG.warn("Start executing native plan");
this.nativeRuntimePtr = JniBridge.callNative(
nativeMemory,
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.NATIVE_LOG_LEVEL),
this);
}
+ private static void init() {
+ if (!initialized) {
+ synchronized (AuronCallNativeWrapper.class) {
+ if (!initialized) {
+ // initialize native environment
+ LOG.info("Initializing native environment (batchSize="
+ +
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.BATCH_SIZE)
+ + ", "
+ + "memoryFraction="
+ +
AuronAdaptor.getInstance().getAuronConfiguration().get(AuronConfiguration.MEMORY_FRACTION)
Review Comment:
`init()` can run while a Spark task thread is already interrupted/cancelled
(the production issue shows `ClosedByInterruptException`). Even though
initialization is now retriable, attempting `Files.copy/System.load` from an
interrupted task is still likely to fail and generate noisy errors. Consider
short-circuiting initialization when the current task is not running (e.g.,
`!AuronAdaptor.getInstance().isTaskRunning()` or
`Thread.currentThread().isInterrupted()`), and fail fast so a later
non-cancelled task can perform the one-time init successfully.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]