renechoi opened a new pull request, #4998:
URL: https://github.com/apache/zeppelin/pull/4998

   ### What is this PR for?
   This PR improves process resource management in the 
`detectSparkScalaVersion` method of `SparkInterpreterLauncher.java` to prevent 
process hangs and resource leaks.
   
   ### Current Issues Fixed:
   1. **Buffer Overflow Risk**: Process stdout was not consumed, potentially 
causing hangs if spark-submit produces output
   2. **Indefinite Blocking**: No timeout on `process.waitFor()` could cause 
indefinite blocking
   3. **Missing Process Cleanup**: Process wasn't explicitly destroyed after use
   4. **No Exit Value Validation**: Process exit status wasn't checked
   
   ### What type of PR is it?
   Improvement
   
   ### Todos
   * [ ] - Code review
   * [ ] - CI build verification
   
   ### What is the Jira issue?
   * https://issues.apache.org/jira/browse/ZEPPELIN-6258
   
   ### How should this be tested?
   * **Unit test added**: `testDetectSparkScalaVersionProcessManagement` - 
Verifies the method works correctly with new process management
   * **Manual testing**:
       - Start Zeppelin with Spark interpreter
       - Verify spark-submit processes are properly terminated
       - No zombie processes should remain after interpreter initialization
   
   * **CI**: All existing tests pass
   
   ### Screenshots (if appropriate)
   N/A
   
   ### Questions:
   * Does the license files need to update? **No**
   * Is there breaking changes for older versions? **No**
   * Does this needs documentation? **No**
   
   ### Implementation Details
   - Added stdout consumption using `IOUtils.copy()` to `NullOutputStream` to 
prevent buffer overflow
   - Implemented 30-second timeout with `process.waitFor(30, TimeUnit.SECONDS)`
   - Added forceful process termination on timeout
   - Added exit value logging for better diagnostics
   - Ensured process cleanup in finally block with `destroyForcibly()`
   - Moved file reading into try-finally block for better resource management
   
   ### Benefits
   1. **Prevents Hangs**: Stdout consumption prevents buffer-related process 
hangs
   2. **Guaranteed Termination**: 30-second timeout ensures process won't block 
indefinitely
   3. **Better Diagnostics**: Exit value logging helps identify spark-submit 
failures
   4. **Resource Safety**: Explicit process cleanup prevents zombie processes


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to