Also, the old hash join and aggregations will be removed and
--enable_partitioned_aggregation=false and
--enable_partitioned_hash_join=false flags will become no-ops.

Another change that may be relevant for development is that the
max_block_mgr_memory query option is replaced with buffer_pool_limit. It
has essentially the same effect for now.

On Fri, Jul 14, 2017 at 12:16 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Hi All,
>   I have +2s on the main patches to switch query execution over from the
> old BufferedBlockMgr to the new BufferPool. I wanted to let everyone know
> what to expect. I'll do it in a question and answer format.
>
> ** When is the merge happening?*
> Once I've done enough testing (stress test, perf, etc) to be confident
> that merging the changes won't disrupt other people's workflow. Probably
> mid-to-late next week.
>
> ** Is the new code production ready?*
> No. Although it is already more robust in many ways than the old code. We
> need time to test it and have people play around with it to find any bugs
> or usability problems before we release it.
>
> I also need to finish executing the full test plan
> <https://docs.google.com/document/d/10glhb7KKc_2JeSMQTxb0Zc_l7A-w1IqsmyqcgcJj_Ao/edit?usp=sharing>.
> I'm using IMPALA-3200 <https://issues.apache.org/jira/browse/IMPALA-3200>
> to track completion of the test plan and remaining issues.
>
>
> ** Is this the last piece of the memory management/spill-to-disk work?*
> No, this is a big milestone, but there's lot of other improvements that
> will be unblocked by this. E.g. HDFS scan memory improvements, admission
> control improvements, spill-to-disk performance improvements,
> simplification of memory transfer.
>
> I've tried to link all the JIRAs to IMPALA-3200.
> <https://issues.apache.org/jira/browse/IMPALA-3200>
>
> ** How will the change affect production users of Impala?*
> In most circumstances the switch should be non-disruptive. We've put a lot
> of effort into this part of the design and testing.
>
> Query memory requirements will generally decrease, although may increase
> somewhat in some circumstances because more memory is reserved upfront,
> instead of allocated best-effort. We now have more safety-valve tuning
> parameters that give some degree of control over this.
>
> "Memory limit exceeded" will become a lot less frequent - instead the
> query will fail at startup if it cannot get its initial memory reservation.
> IMPALA-4834 <https://issues.apache.org/jira/browse/IMPALA-4834> tracks
> fixing the remaining common cases of "memory limit exceeded"
>
> Spill-to-disk performance may change somewhat because so much code has
> changed, and the default spill buffer size is smaller (2MB vs. 8MB), but in
> my experiments the performance change has been within variance.
>
> In-memory performance of large aggs and joins will improve significantly
> if transparent huge pages are available due to reduced TLB misses.
>
> ** How does this affect my development workflow?*
> Hopefully not much at all, unless you are working directly on
> spill-to-disk or memory management code. In the medium to long term this
> switch will unblock a lot of other work. It will also become easier to
> understand and test the spill-to-disk code - we'll catch more bugs in
> functional testing instead of stress testing.
>
> If you are working on adding new operators you should be thinking about
> how they can be made to operate in a memory constraint: IMPALA-4834
> <https://issues.apache.org/jira/browse/IMPALA-4834>
>
> If you see anything weird check to see if there's a JIRA linked to
> IMPALA-3200, <https://issues.apache.org/jira/browse/IMPALA-3200>or just
> file a bug and assign it to me.
>
> ** Can I switch back to the old code with a flag?*
> No, it would have been very difficult to keep both versions of the code
> enabled side-by-side. It would have been very difficult to test the old
> code as well. One of the main goals of the work is to reduce the volume of
> legacy code we have to maintain.
>
> ** How can I learn about the new code?*
> I'm working on putting together slides with a high-level summary. The new
> APIs are documented in detail in the headers in be/src/runtime/bufferpool/.
> I'd also recommend looking at explain plans with explain_level=2, which has
> information about memory reservations.
>
> Thanks everyone, please reach out if you have any questions or concerns.
>
> - Tim
> <https://issues.apache.org/jira/browse/IMPALA-3200>
>

Reply via email to