nathant27 opened a new pull request, #5027:
URL: https://github.com/apache/texera/pull/5027
<!--
Thanks for sending a pull request (PR)! Here are some tips for you:
1. If this is your first time, please read our contributor guidelines:
[Contributing to
Texera](https://github.com/apache/texera/blob/main/CONTRIBUTING.md)
2. Ensure you have added or run the appropriate tests for your PR
3. If the PR is work in progress, mark it a draft on GitHub.
4. Please write your PR title to summarize what this PR proposes, we
are following Conventional Commits style for PR titles as well.
5. Be sure to keep the PR description updated to reflect all changes.
-->
### What changes were proposed in this PR?
<!--
Please clarify what changes you are proposing. The purpose of this section
is to outline the changes. Here are some tips for you:
1. If you propose a new API, clarify the use case for a new API.
2. If you fix a bug, you can clarify why it is a bug.
3. If it is a refactoring, clarify what has been changed.
3. It would be helpful to include a before-and-after comparison using
screenshots or GIFs.
4. Please consider writing useful notes for better and faster reviews.
-->
Old version of Heartbeat._check_heartbeat(self) wraps the socket connection
and socket close in a single try except, and handles with the same try except.
The issue is that if close raises after a successful connection, there is a
false negative since it will handle it as if socket connection had failed. In
Heartbeat.run(self), this could cause the server to shutdown on close raising
exception since _check_heartbeat will return False in this case.
New version separates the connection handling and the close in two adjacent
try except blocks, and returns true as long as connection is successfully
established as well, getting rid of the false negative.
Old Version:
```
def _check_heartbeat(self) -> bool:
"""
Attempt to connect to JVM on the specific port. If succeeds, it
means the
socket is still available and the JVM is still alive. Otherwise, the
JVM
might have been gone.
:return: bool, indicating if the socket is available.
"""
try:
temp_socket = socket.create_connection(
(self._parsed_server_host, self._parsed_server_port),
timeout=1
)
temp_socket.close()
return True
except Exception as e:
logger.warning(f"Server is down with exception: {e}")
return False
```
New Version:
```
def _check_heartbeat(self) -> bool:
"""
Attempt to connect to the JVM port. If connection failure, then JVM
is dead, or if connection success
then JVM is alive even if close() raises. Logs on connection failure
and on close error.
:return: bool, True if connect succeeded, False if connect failed.
"""
try:
temp_socket = socket.create_connection(
(self._parsed_server_host, self._parsed_server_port),
timeout=1
)
except Exception as e:
logger.warning(f"Server is down with exception: {e}")
return False
try:
temp_socket.close()
except Exception as e:
logger.warning(f"Failed to close socket: {e}")
return True
```
### Any related issues, documentation, discussions?
Closes #4726
### How was this PR tested?
Changed old test to reflect new behavior(True on connection succeeds but
socket close rasises)
OLD TEST(REMOVED):
```
def test_returns_false_when_socket_close_raises(self):
# Pins the false-negative path: connect succeeds but the subsequent
# close() throws (e.g. broken pipe on a half-open socket). The bare
# `except Exception` in _check_heartbeat() catches it and reports
# "server down", and a regression that narrows that handler would be
# caught here.
hb = make_heartbeat()
fake_sock = MagicMock()
fake_sock.close.side_effect = OSError("close failed")
with patch(
"core.runnables.heartbeat.socket.create_connection",
return_value=fake_sock,
):
assert hb._check_heartbeat() is False
```
NEW TEST:
```
def
test_returns_true_when_connection_succeeds_but_socket_close_raises(self):
hb = make_heartbeat()
fake_sock = MagicMock()
fake_sock.close.side_effect = OSError("close failed")
with patch(
"core.runnables.heartbeat.socket.create_connection",
return_value=fake_sock,
):
assert hb._check_heartbeat() is True
```
Passes on existing old tests
### Was this PR authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]