jwijffels commented on issue #50009:
URL: https://github.com/apache/arrow/issues/50009#issuecomment-4555936150

   Hello @thisisnic 
   Many thanks for checking. Indeed this looks like a credential expiry issue 
which triggers a segfault.
   
   1. How are you authenticating to S3? e.g. instance role, env vars etc. This 
might be a credential expiry issue (still would need fixing in Arrow but helps 
us narrow it down)
   
   Authenticating is done using AWS roles. 
   
   2. Does the write_parquet() actually succeed despite the segfault?
   
   Yes, it does succeed.  It's only at quit(save = "no") that the segfault 
occurs.
   
   3. Is the bulk of the time before or after the write_parquet() in the long 
jobs that drop out?
   
   The bulk of the time of the process (e.g. the 42 minutes I mentioned) is in 
between the read_parquet and write_parquet.
    
   4. Is 42 mins a hard cutoff or exact?
   
   It's a rough estimation based on the logs where I saw there was no crash and 
the logs when there was a crash.
   FWIW. Because of the segfault, I replaced the read_parquet to first 
downloading the parquet to the harddrive of the server running the job and next 
doing read_parquet on the local file. As the write_parquet is only at the end 
of the process and nothing happens after that, that does not segfault any more 
(maybe arrow does not segfault as the connection to s3 was still recent then 
and not > 40 minutes ago)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to