Also, the old hash join and aggregations will be removed and --enable_partitioned_aggregation=false and --enable_partitioned_hash_join=false flags will become no-ops.
Another change that may be relevant for development is that the max_block_mgr_memory query option is replaced with buffer_pool_limit. It has essentially the same effect for now. On Fri, Jul 14, 2017 at 12:16 PM, Tim Armstrong <tarmstr...@cloudera.com> wrote: > Hi All, > I have +2s on the main patches to switch query execution over from the > old BufferedBlockMgr to the new BufferPool. I wanted to let everyone know > what to expect. I'll do it in a question and answer format. > > ** When is the merge happening?* > Once I've done enough testing (stress test, perf, etc) to be confident > that merging the changes won't disrupt other people's workflow. Probably > mid-to-late next week. > > ** Is the new code production ready?* > No. Although it is already more robust in many ways than the old code. We > need time to test it and have people play around with it to find any bugs > or usability problems before we release it. > > I also need to finish executing the full test plan > <https://docs.google.com/document/d/10glhb7KKc_2JeSMQTxb0Zc_l7A-w1IqsmyqcgcJj_Ao/edit?usp=sharing>. > I'm using IMPALA-3200 <https://issues.apache.org/jira/browse/IMPALA-3200> > to track completion of the test plan and remaining issues. > > > ** Is this the last piece of the memory management/spill-to-disk work?* > No, this is a big milestone, but there's lot of other improvements that > will be unblocked by this. E.g. HDFS scan memory improvements, admission > control improvements, spill-to-disk performance improvements, > simplification of memory transfer. > > I've tried to link all the JIRAs to IMPALA-3200. > <https://issues.apache.org/jira/browse/IMPALA-3200> > > ** How will the change affect production users of Impala?* > In most circumstances the switch should be non-disruptive. We've put a lot > of effort into this part of the design and testing. > > Query memory requirements will generally decrease, although may increase > somewhat in some circumstances because more memory is reserved upfront, > instead of allocated best-effort. We now have more safety-valve tuning > parameters that give some degree of control over this. > > "Memory limit exceeded" will become a lot less frequent - instead the > query will fail at startup if it cannot get its initial memory reservation. > IMPALA-4834 <https://issues.apache.org/jira/browse/IMPALA-4834> tracks > fixing the remaining common cases of "memory limit exceeded" > > Spill-to-disk performance may change somewhat because so much code has > changed, and the default spill buffer size is smaller (2MB vs. 8MB), but in > my experiments the performance change has been within variance. > > In-memory performance of large aggs and joins will improve significantly > if transparent huge pages are available due to reduced TLB misses. > > ** How does this affect my development workflow?* > Hopefully not much at all, unless you are working directly on > spill-to-disk or memory management code. In the medium to long term this > switch will unblock a lot of other work. It will also become easier to > understand and test the spill-to-disk code - we'll catch more bugs in > functional testing instead of stress testing. > > If you are working on adding new operators you should be thinking about > how they can be made to operate in a memory constraint: IMPALA-4834 > <https://issues.apache.org/jira/browse/IMPALA-4834> > > If you see anything weird check to see if there's a JIRA linked to > IMPALA-3200, <https://issues.apache.org/jira/browse/IMPALA-3200>or just > file a bug and assign it to me. > > ** Can I switch back to the old code with a flag?* > No, it would have been very difficult to keep both versions of the code > enabled side-by-side. It would have been very difficult to test the old > code as well. One of the main goals of the work is to reduce the volume of > legacy code we have to maintain. > > ** How can I learn about the new code?* > I'm working on putting together slides with a high-level summary. The new > APIs are documented in detail in the headers in be/src/runtime/bufferpool/. > I'd also recommend looking at explain plans with explain_level=2, which has > information about memory reservations. > > Thanks everyone, please reach out if you have any questions or concerns. > > - Tim > <https://issues.apache.org/jira/browse/IMPALA-3200> >