Hi Igor,
You asked about the fixed-size block idea. This is the classic DB memory
management mechanism: a "buffer pool" consisting of some number of fixed-size
blocks. Memory allocation is simply a matter of grabbing a buffer from the
pool. Freeing memory returns the buffer to the pool. Since all blocks are of
the same size (and basically live forever), there is no external fragmentation.
In RDBMS systems, the buffer block size matches the disk page size. Tables and
indexes are read directly from disk into a buffer block. Spilling simply takes
the least recently used blocks and pages them out to disk, rather like a
virtual memory system.
Drill is not an RDBMS and does not have disk blocks, of course. So, in our
case, a block would represent a vector, or a "slice" of a vector. Let's say we
have a VARCHAR column that, for whatever reason, needs to be 10 MB in size. It
would be formed by chaining 10 blocks of, say 1 MB each. Netty already does
this with its composite ByteBuf, so it is not a new idea.
Drill vectors are power-of-two sized: 1 MB, 2 MB, ... 16 MB. If you have 8MB +
1 byte of data, you need a 16 MB vector. and you have an internal fragmentation
(wasted space) of about 8 MB. I believe Arrow works the same way. With blocks,
you need 9 blocks of 1 MB each. In this case, we reduce internal fragmentation
as well as external fragmentation. (On average, Drill (and Arrow) vectors will
be 3/4 full: the first half is always full, the second half is randomly full,
with an average of half full.)
Drill's current memory allocator goes to great lengths to avoid locks during
allocation. In the fixed-size block scheme, each thread might get a mini-pool
of blocks. The most-upstream operator pulls blocks form the mini-pool. The most
downstream operator puts blocks back in the pool after writing the data
somewhere. So, this scheme can be "lock free" in the steady-state case.
Reading and writing to blocks requires an indirection to handle multiple
blocks. But, even with existing vectors, we need to do a check on every access
(to ensure the request is in bounds and to grow the vector on write if it is
full.) Our experiments showed we could get basically the same performance with
fixed-size blocks as with variable-size vectors because both need to do that
per-access check.
Are fixed-size blocks the solution? Maybe, maybe not. It is an old technology;
there are probably newer alternatives. (And, again, if we use heap memory, the
issue is moot.) The key point is, let's think outside the ValueVector/Arrow
Vector box.
So, again, I'd suggest we do two things:
1. Identify our project goals.
2. Do some prototyping to see which choice best achieves these goals at what
cost.
Thanks,
- Paul
On Friday, January 10, 2020, 01:38:43 PM PST, Igor Guzenko
<[email protected]> wrote:
Hello Paul,
Thank you very much for your active participation in this discussion. I
agree that we shouldn't blindly accept Arrow as the only option in the
world.
Also, I would like to learn more about the fixed-size blocks. So I'll read
the paper and hope I'll have some related ideas to discuss later.
Thanks,
Igor