Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/958
  
    @Ben-Zvi, you are right that, in the worst case, this change will allow 
operators to exceed the memory allotment. But, that is actually the purpose.
    
    As we know, it is *very* difficult to get memory management just right at 
present due to the wildly varying memory layouts for vectors, power-of-two 
rounding of buffer sizes, unexpected doubling of vectors, and lack of control 
over the size of incoming batches. We'd love to fix these, but doing so will 
take time.
    
    In the meanwhile, we have the choice of failing queries because the calcs 
are off by a bit, or being more flexible and letting queries succeed at the 
risk of running out of memory. The change here does log each "excess" 
allocation so we can find them and fix any remaining issues. Also, in a test 
environment, strict limits are enforced to find bugs.
    
    All of this is set against the backdrop of the exchange operators, hash 
join, and other operators that have an unlimited appetite for memory. Until we 
reign in those operators, seems silly to kill user queries because those 
operators that *do* manage memory make a small mistake here or there.
    
    Once all operators are under control, and Drill's internal memory 
allocation is under better control, we can back out this change and be much 
more strict about enforcing memory limits.
    
    Bottom line: should we fail user queries because of remaining rough spots 
in the "managed" operators? Or, should we allow user queries to succeed at a 
very small additional risk of running out of memory?


---

Reply via email to