JanKaul commented on issue #22090:
URL: https://github.com/apache/datafusion/issues/22090#issuecomment-4425320338
> This is not just a theory -- we have seen such deadlocks in production at
InfluxData -- typically on very skewed datasets
All right, thanks for the input. I thought this would generally be a good
idea, but I don't have as much experience with deadlocks. I'll focus on solving
the problem downstream such that the consumers work more uniformly.
> Related: if the partition that is spilling is because it's upstream
operator is slow because the upstream operator spilled, I wonder if it would be
beneficial to have a enum InFlightData { Memory(RecordBatch), Disk { file,
start, end } or something like that. The point is: if we are going from one
spilling operator to another maybe pushing the data around on disk instead of
loading only to spill it again would make sense. But that'd be a big change.
Passing the spilled files from one operator to another would only work if
the operator doesn't change the data, which is probably not that often.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]