jwijffels commented on issue #50009:
URL: https://github.com/apache/arrow/issues/50009#issuecomment-4555936150
Hello @thisisnic
Many thanks for checking. Indeed this looks like a credential expiry issue
which triggers a segfault.
1. How are you authenticating to S3? e.g. instance role, env vars etc. This
might be a credential expiry issue (still would need fixing in Arrow but helps
us narrow it down)
Authenticating is done using AWS roles.
2. Does the write_parquet() actually succeed despite the segfault?
Yes, it does succeed. It's only at quit(save = "no") that the segfault
occurs.
3. Is the bulk of the time before or after the write_parquet() in the long
jobs that drop out?
The bulk of the time of the process (e.g. the 42 minutes I mentioned) is in
between the read_parquet and write_parquet.
4. Is 42 mins a hard cutoff or exact?
It's a rough estimation based on the logs where I saw there was no crash and
the logs when there was a crash.
FWIW. Because of the segfault, I replaced the read_parquet to first
downloading the parquet to the harddrive of the server running the job and next
doing read_parquet on the local file. As the write_parquet is only at the end
of the process and nothing happens after that, that does not segfault any more
(maybe arrow does not segfault as the connection to s3 was still recent then
and not > 40 minutes ago)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]