thisisnic commented on issue #50009:
URL: https://github.com/apache/arrow/issues/50009#issuecomment-4556643806
I had Claude take a look at this, and here's its summary/suggestions. This
is a bit beyond my C++ knowledge; @jonkeane - does this look about right to you?
---
## Analysis
The R package registers a finalizer via `reg.finalizer(s3_finalizer,
finalize_s3, onexit = TRUE)` that calls `FinalizeS3()` when R exits. This
ultimately calls `Aws::ShutdownAPI()` in the AWS C++ SDK.
The likely sequence of events:
1. `read_parquet("s3://...")` at the start of the process opens HTTP
connections to S3, which get cached in the AWS SDK's connection pool
2. ~42 minutes of processing with no S3 activity — during this time,
something in the network path (S3 endpoint, load balancer, NAT gateway) closes
the idle TCP connections
3. `write_parquet("s3://...")` at the end succeeds (the SDK opens fresh
connections), but the old dead connections from step 1 may still be in the pool
4. `quit()` triggers the R finalizer → `FinalizeS3()` → `Aws::ShutdownAPI()`
5. The SDK walks through all connections in the pool during teardown and
hits the dead ones, triggering a SIGPIPE (writing to a closed socket)
6. The C++ `fs::FinalizeS3()` returns a non-OK status
7. **The R binding wraps this in `StopIfNotOk()`, which calls
`cpp11::stop()` — this does a `longjmp` out of the middle of SDK teardown**,
leaving it half-destroyed
8. R continues its exit process, and subsequent cleanup (GC, static
destructors) hits the half-torn-down SDK state → segfault at address 0x30
This is consistent with the reporter's finding that replacing `read_parquet`
from S3 with a local file read eliminated the segfault — no stale connections
in the pool means clean shutdown.
## The `StopIfNotOk` problem
The core issue is in `r/src/filesystem.cpp:355`:
```cpp
void FinalizeS3() {
#if defined(ARROW_R_WITH_S3)
StopIfNotOk(fs::FinalizeS3());
#endif
}
```
`StopIfNotOk` converts any non-OK C++ status into an R error via
`cpp11::stop()`, which does a `longjmp`. This is fine for normal operations,
but during SDK shutdown it's dangerous — you can't safely jump out of the
middle of `Aws::ShutdownAPI()`. The SDK ends up in a partially-finalized state,
which causes the actual segfault.
## Suggested fix
Replace `StopIfNotOk` with a warning so the shutdown completes even if there
are errors:
```cpp
void FinalizeS3() {
#if defined(ARROW_R_WITH_S3)
auto status = fs::FinalizeS3();
if (!status.ok()) {
cpp11::warning("FinalizeS3 failed: %s", status.ToString().c_str());
}
#endif
}
```
This way:
- The SDK shutdown runs to completion rather than being interrupted
mid-teardown
- Users still get a visible message if something went wrong
- Mid-session callers (e.g. via `arrow:::FinalizeS3()`) see the warning too
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]