renechoi opened a new pull request, #5001:
URL: https://github.com/apache/zeppelin/pull/5001
### What is this PR for?
This PR fixes a memory leak issue where WebSocket connections that are
switched to watcher mode are never removed from the watcherSockets queue when
the connection is closed. This causes the queue to grow indefinitely, leading
to increased memory
usage over time and potential OutOfMemoryError in long-running Zeppelin
servers.
### What type of PR is it?
Bug Fix
### What is the Jira issue?
https://issues.apache.org/jira/browse/ZEPPELIN-6260
### How should this be tested?
Steps to reproduce the issue:
1. Open a WebSocket connection to Zeppelin
2. Send a WATCHER message to switch the connection to watcher mode
3. Close the connection
4. The connection remains in watcherSockets queue indefinitely
Verification after fix:
- Unit tests have been added to verify the fix:
- removeWatcherConnectionCleansQueue: Basic functionality test
- removeWatcherConnectionWithMultipleWatchers: Tests selective removal
- removeWatcherConnectionConcurrentTest: Tests thread safety
- switchConnectionToWatcherAndRemove: Tests complete lifecycle
### Screenshots (if appropriate)
N/A
### Questions:
- Does the license files need update? No
- Is there breaking changes for older versions? No
- Does this needs documentation? No
### Description of changes:
1. Added removeWatcherConnection() method to ConnectionManager to safely
remove connections from the watcherSockets queue
2. Modified NotebookServer.removeConnection() to call
removeWatcherConnection() when any connection is closed
3. Added debug logging to track watcher connection removal
4. Added comprehensive unit tests including concurrent access scenarios
The fix is minimal and safe:
- The ConcurrentLinkedQueue.remove() operation is safe even if the element
doesn't exist
- The synchronization block is very short, minimizing performance impact
- The approach ensures no watcher connections are leaked, regardless of
how they were closed
### Related observations:
During the investigation, I noticed that broadcastToWatchers() doesn't
remove watchers when IOException occurs. This could cause performance
degradation as closed connections would repeatedly fail. However, this is a
separate issue and should be
addressed in a different PR to keep this fix focused.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]