adrienchaton commented on issue #14229: URL: https://github.com/apache/arrow/issues/14229#issuecomment-1260679569
Thanks for looking this up. Unfortunately its not possible to put together the generating steps of this dataframe in a self-contained script. But I could share the resulting dataframe that triggers the error when trying to read, although its 14GB large ... Some more observations which maybe help. _ if I run the same codes but instead of saving the dataframe into a single parquet, do a numpy array_split into e.g. 20 chunks saved separately (about 700MB each), I can load these smaller chunks and possibly concatenate them back (I was looking at a workaround). _ the index datatype is int64. _ if I only load one column then I do not get the error. I suspect that the "flattened" size (rows*columns) of the dataframe is probably too big then ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
