renechoi opened a new pull request, #5001:
URL: https://github.com/apache/zeppelin/pull/5001

   ###  What is this PR for?
   
     This PR fixes a memory leak issue where WebSocket connections that are 
switched to watcher mode are never removed from the watcherSockets queue when 
the connection is closed. This causes the queue to grow indefinitely, leading 
to increased memory
     usage over time and potential OutOfMemoryError in long-running Zeppelin 
servers.
   
   ###  What type of PR is it?
   
     Bug Fix
   
   ###  What is the Jira issue?
   
     https://issues.apache.org/jira/browse/ZEPPELIN-6260
   
   ###  How should this be tested?
   
     Steps to reproduce the issue:
     1. Open a WebSocket connection to Zeppelin
     2. Send a WATCHER message to switch the connection to watcher mode
     3. Close the connection
     4. The connection remains in watcherSockets queue indefinitely
   
     Verification after fix:
     - Unit tests have been added to verify the fix:
       - removeWatcherConnectionCleansQueue: Basic functionality test
       - removeWatcherConnectionWithMultipleWatchers: Tests selective removal
       - removeWatcherConnectionConcurrentTest: Tests thread safety
       - switchConnectionToWatcherAndRemove: Tests complete lifecycle
   
   ###  Screenshots (if appropriate)
   
     N/A
   
   ###  Questions:
   
     - Does the license files need update? No
     - Is there breaking changes for older versions? No
     - Does this needs documentation? No
   
   ###  Description of changes:
   
     1. Added removeWatcherConnection() method to ConnectionManager to safely 
remove connections from the watcherSockets queue
     2. Modified NotebookServer.removeConnection() to call 
removeWatcherConnection() when any connection is closed
     3. Added debug logging to track watcher connection removal
     4. Added comprehensive unit tests including concurrent access scenarios
   
     The fix is minimal and safe:
     - The ConcurrentLinkedQueue.remove() operation is safe even if the element 
doesn't exist
     - The synchronization block is very short, minimizing performance impact
     - The approach ensures no watcher connections are leaked, regardless of 
how they were closed
   
   ###  Related observations:
   
     During the investigation, I noticed that broadcastToWatchers() doesn't 
remove watchers when IOException occurs. This could cause performance 
degradation as closed connections would repeatedly fail. However, this is a 
separate issue and should be
     addressed in a different PR to keep this fix focused.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to