I missed this as a discussion since it had the title of a GitHub discussion.
Comments below.
On Friday, April 27, 2018, 5:42:37 PM PDT, salim achouche
<[email protected]> wrote:
> Another point, I don't see a functional benefit from avoiding a change of
ownership for pass-through operators.
Please read my responses to Vlad. Change of ownership is critical to how
Drill's memory allocators work today. Of course, you are right that, if we
could do a new design (perhaps based on the budget-based approach), we would
not need the ownership stuff. But, without ownership changes now, the existing
allocators will simply cause us all manner of problems. In particular, none of
the spill logic added to Sort or HashAgg would work as they rely on a
properly-functioning allocator.
> Consider the following use-cases:
Example I -
- Single batch of size 8MB is received at time t0 and then is passed
through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (0, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (0, 0, 8MB)
You are right. Each minor fragment is single-threaded: only one operator is
"active" at a time as control passes from downstream to upstream operators.
(Yes, this is the unfortunate Drill terminology: downstream calls upstream,
data flows in the direction opposite to calls.)
This single-threaded model is the insight behind the budget-based memory model.
But, to get there, we must consider the whole system, we can't just make
localized changes, unfortunately.
> Example II -
- Multiple batches of size 8MB are received at time t0 - t2 and then is
passed through a set of pass-through operators
- At time t1 owned by operator Opr1, time t2 owned by operator t2, and so
forth
- Assume we report memory usage at time t0 - t2; this is what will be seen
- t0: (fragment, opr-1, opr-2) = (8Mb, 0, 0)
- t1: (fragment, opr-1, opr-2) = (8Mb, 8MB, 0)
- t2: (fragment, opr-1, opr-2) = (8Mb, 8Mb, 8MB)
The above can, AFAIK, never happen. A batch is owned by an operator, not a
fragment. A batch passes up the operator tree until it reaches the top or until
it reaches a "buffering" operator such as Sort.
> The key thing is that we clarify our reporting metrics so that users do not
make the wrong conclusions.
This is a good thing. But, we need to understand how the batches flow and
report that accurately. Further, we must deeply understand this flow if we want
to move to budget-based allocation without per-operator allocators.
Let's separate various concepts. First is the instantaneous "stats" maintained
by each operator allocator to enforce memory limits. Second is the total data
that has passed through an operator. Third is the maximum memory used at any
one time over the life of the operator.
These are all very useful, but they measure different things.
Thanks,
- Paul