westonpace commented on pull request #10019: URL: https://github.com/apache/arrow/pull/10019#issuecomment-819197009
Ok, after getting lost in the weeds for a while I was able to confirm that this is very much related to ARROW-12220. Some fun facts... * Since threading has been removed the teardown is a lot more deterministic so we get the error 100% of the time * It only happens on the Windows build that has mimalloc turned on * The reason there are no logs is because the failure is not a segmentation fault but a mimalloc assertion * The assertion happens as the background generator is being destroyed (I verified this with the debugger and this is pretty concrete evidence) Unfortunately, merging in ARROW-12220 did not fix the issue. So...more debugging and I was able to discover... https://github.com/microsoft/mimalloc/issues/363 It is triggered by a thread exit. The serial thread readers use a dedicated thread that is destroyed when the reader finishes. That thread must have done a huge allocation. Huge is defined as "larger than `1<<21`". The test in question that triggers this uses a block size of `1<<22` So, I can workaround it by changing the `block_size` in the test. We could conceivably limit the block size for users. I'm not fully aware of all the places we destroy threads (there aren't many so we can maybe get away with it). We may want to reconsider mimalloc 2.0 until this is fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
