Re: [I] Sort ClickBench data using 4GB on standard laptop (spilling) [datafusion]

via GitHub Wed, 18 Feb 2026 04:04:23 -0800


cetra3 commented on issue #19216:
URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3920442449

I've played around with some prototype changes and managed to get this
working.

I've saved the investigation in this branch:

https://github.com/apache/datafusion/compare/main...pydantic:datafusion:memory_observations

There are a few changes that I needed to make to get this to run to
completion:

## Memory Coordinator

The main change is designing a **Memory Coordinator**, which is like a
memory pool, but adds a few ways to `.await` waiting for memory to be
available. I.e, if there are consumers waiting for memory, then other
spillable consumers should spill.

I.e, in one execution we wait for memory:

```rust
self.memory_consumer.allocate(size, AllocationType::Required).await
```

This registers a list of wakers waiting for memory in a fifo style
environment

Then in another execution path, we periodically check if we should spill:

```rust
if self.memory_consumer.should_spill() {
self.sort_and_spill_in_mem_batches().await?;
}
```

This *does* have the downside of potentially deadlocking on memory, and so
some protections should be in place, maybe a timeout waiting or something.
Regardless, it's better than the current strategy of completely bailing out if
there isn't memory available at the request time.

## MultiLevelMergeBuilder

This is the main reason why I think things are failing. Essentially when
you have more than 1 partition, one will greedily take all the available
memory, starving out other consumers/partitions.

And so there is no headroom to do other allocations as well, like in memory
sorts etc.. I've changed this to be a bit nicer. First it will `.try_grow`
and await waiting for minimum required memory, then it will try and grow the
allocation if there is headroom, up to a limit. The limit is the number of
registered consumers, which normally equals the number of partitions, but if
you have multiple queries running this can scale accordingly.

## Allocation Sizes

There is a *lot* of double accounting and other weirdness around how we know
how much memory is used. I turned off a few areas of double accounting that I
saw, and I have ran this with `dhat` to actually check it, and seems like it
falls under the 4gb limit. But we need a better solution here holistically.

So yeah a lot of changes needed for this one, I will work on breaking out
the changes into separate PRs if people are amenable. Starting with
`MemoryCoordinator`

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Sort ClickBench data using 4GB on standard laptop (spilling) [datafusion]

Reply via email to