[GitHub] [arrow] adrienchaton commented on issue #14229: OSError: List index overflow.

GitBox Wed, 28 Sep 2022 03:06:20 -0700


adrienchaton commented on issue #14229:
URL: https://github.com/apache/arrow/issues/14229#issuecomment-1260679569


   Thanks for looking this up.
   Unfortunately its not possible to put together the generating steps of this 
dataframe in a self-contained script.
   But I could share the resulting dataframe that triggers the error when 
trying to read, although its 14GB large ...
   
   Some more observations which maybe help.
   _ if I run the same codes but instead of saving the dataframe into a single 
parquet, do a numpy array_split into e.g. 20 chunks saved separately (about 
700MB each), I can load these smaller chunks and possibly concatenate them back 
(I was looking at a workaround).
   _ the index datatype is int64.
   _ if I only load one column then I do not get the error.
   
   I suspect that the "flattened" size (rows*columns) of the dataframe is 
probably too big then ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] adrienchaton commented on issue #14229: OSError: List index overflow.

Reply via email to