JanKaul commented on issue #22090:
URL: https://github.com/apache/datafusion/issues/22090#issuecomment-4425320338

   > This is not just a theory -- we have seen such deadlocks in production at 
InfluxData -- typically on very skewed datasets
   
   All right, thanks for the input. I thought this would generally be a good 
idea, but I don't have as much experience with deadlocks. I'll focus on solving 
the problem downstream such that the consumers work more uniformly.
   
   > Related: if the partition that is spilling is because it's upstream 
operator is slow because the upstream operator spilled, I wonder if it would be 
beneficial to have a enum InFlightData { Memory(RecordBatch), Disk { file, 
start, end } or something like that. The point is: if we are going from one 
spilling operator to another maybe pushing the data around on disk instead of 
loading only to spill it again would make sense. But that'd be a big change.
   
   Passing the spilled files from one operator to another would only work if 
the operator doesn't change the data, which is probably not that often.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to