westonpace commented on pull request #10019:
URL: https://github.com/apache/arrow/pull/10019#issuecomment-819197009


   Ok, after getting lost in the weeds for a while I was able to confirm that 
this is very much related to ARROW-12220.  Some fun facts...
   
   * Since threading has been removed the teardown is a lot more deterministic 
so we get the error 100% of the time
   * It only happens on the Windows build that has mimalloc turned on
   * The reason there are no logs is because the failure is not a segmentation 
fault but a mimalloc assertion
   * The assertion happens as the background generator is being destroyed (I 
verified this with the debugger and this is pretty concrete evidence)
   
   Unfortunately, merging in ARROW-12220 did not fix the issue.  So...more 
debugging and I was able to discover...
   
   https://github.com/microsoft/mimalloc/issues/363
   
   It is triggered by a thread exit.  The serial thread readers use a dedicated 
thread that is destroyed when the reader finishes.  That thread must have done 
a huge allocation.  Huge is defined as "larger than `1<<21`".  The test in 
question that triggers this uses a block size of `1<<22`
   
   So, I can workaround it by changing the `block_size` in the test.  We could 
conceivably limit the block size for users.  I'm not fully aware of all the 
places we destroy threads (there aren't many so we can maybe get away with it). 
 We may want to reconsider mimalloc 2.0 until this is fixed.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to