Github user paul-rogers commented on the issue:
https://github.com/apache/drill/pull/958
@Ben-Zvi, you are right that, in the worst case, this change will allow
operators to exceed the memory allotment. But, that is actually the purpose.
As we know, it is *very* difficult to get memory management just right at
present due to the wildly varying memory layouts for vectors, power-of-two
rounding of buffer sizes, unexpected doubling of vectors, and lack of control
over the size of incoming batches. We'd love to fix these, but doing so will
take time.
In the meanwhile, we have the choice of failing queries because the calcs
are off by a bit, or being more flexible and letting queries succeed at the
risk of running out of memory. The change here does log each "excess"
allocation so we can find them and fix any remaining issues. Also, in a test
environment, strict limits are enforced to find bugs.
All of this is set against the backdrop of the exchange operators, hash
join, and other operators that have an unlimited appetite for memory. Until we
reign in those operators, seems silly to kill user queries because those
operators that *do* manage memory make a small mistake here or there.
Once all operators are under control, and Drill's internal memory
allocation is under better control, we can back out this change and be much
more strict about enforcing memory limits.
Bottom line: should we fail user queries because of remaining rough spots
in the "managed" operators? Or, should we allow user queries to succeed at a
very small additional risk of running out of memory?
---