0x26res opened a new issue, #39503:
URL: https://github.com/apache/arrow/issues/39503

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Since updating from pyarrow==12.0.0 to pyarrow=14.0.1, I have transient 
issues with some of my python jobs.
   
   I have simple ETL jobs, they load a few parquet files from s3, manipulate 
the data with a mix of pyarrow and pandas, and save one parquet file to s3. 
   
   Since updating to pyarrow==14.0.1, some of my jobs (something like 1 out of 
a 1000) are failing. Here are the symptoms:
   - I see this message in stderr:
   ```
   Fatal Python error: PyGILState_Release: auto-releasing thread-state, but no 
thread-state for this thread
   Python runtime state: finalizing (tstate=0x000056046142faa0)
   ```
   - My job orchestrator (aws batch) report a return code of 139 which stands 
for :
   > 139 – Occurs when a segmentation fault is experienced. Likely, the 
application tried to access a memory region which it not available, or there is 
an unset or invalid environment variable.
   - From looking at the full log and the expected side effect of my job (aka 
the s3 file that is saved), it looks like the job is sucesfull
   
   So it pretty much looks like the job fails while releasing resources after 
it's done running. 
   
   I can't create a reproducible example, simply because the error is 
transient. Also I can't really isolate which update is causing the issue. I was 
updating fsspec and s3fs at the same time as I was updating pyarrow. But I'm 
not sure they use `PyGILState_Release`.
   
   I looked at relevant issues and code in the repo, but couldn't find any 
relevant change. It could be related to this 
https://github.com/apache/arrow/issues/38626 but I doubt it (I don't use 
`atexit.register`).
   
   
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to