[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8123 ) Change subject: IMPALA-5870: Improve explain/profile output for partial sort .. Patch Set 1: (3 comments) The profile changes look great to me and make it way less confusing - just had one comment about variable names then I'm happy with that part of the patch. I'm unsure about the explain plan changes - I think that needs more thought and discussion. We could potentially decouple the two parts and get the profile changes in soon. http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@14 PS1, Line 14: For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for I'm skeptical about this part of the change. It definitely a wart but there are downsides to changing it. I think reporting the buffer size is still useful. It seems like this is mainly an issue with the name diplayed so if we want to change it I'd prefer directly changing the output bypass in a parameter to the resource profile to controls the display name and otherwise leaving the logic unchanged. Preaggregations also have the exact same naming inconsistency so I think whatever is done here should also be applied there. Another problem is that changing the name makes inconsistent with the query options that control the buffer sizes - min_spillable_buffer_bytes and default_spillable_buffer_bytes. I think the naming of the query options is imperfect but a reasonable compromise - it disambiguates it from the scanner's I/O buffers and most of time it's accurate. In the cases where it's slightly inaccurate the non-spilling operators are at least variants of spilling operators. We could maybe make an argument for decoupling the non-spillable and spillable buffer size query options since the non-spilling sizes don't affect I/O performance. Unclear if it's worth adding more query options though. I guess my preference is probably to leave the explain output unchanged and accept that there's some potential for confusion. I think a lot of the explain_level=2 output is low-level enough that you need to understand the mechanisms in order to interpret it. http://gerrit.cloudera.org:8080/#/c/8123/1//COMMIT_MSG@40 PS1, Line 40: - RunsCreated: 1 (1) This is much better! http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc File be/src/runtime/sorter.cc: http://gerrit.cloudera.org:8080/#/c/8123/1/be/src/runtime/sorter.cc@1520 PS1, Line 1520: runs_counter_ "runs_counter_" is a bit inaccurate for spilling sorts since it doesn't include merged runs. Maybe we should keep the old variable name and just change the string? "initial_runs_counter_" seems to make sense for both cases, since they are still initial runs for non-spilling sorts. -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 22 Sep 2017 18:08:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5870: Improve explain/profile output for partial sort
Thomas Tauber-Marshall has uploaded this change for review. ( http://gerrit.cloudera.org:8080/8123 Change subject: IMPALA-5870: Improve explain/profile output for partial sort .. IMPALA-5870: Improve explain/profile output for partial sort A recent change (IMPALA-5498) added the ability to do partial sorts, which divide their input up into runs each of which is sorted individually, avoiding the need to spill. Some of the debug output wasn't updated vs. regular sorts, leading to confusion. For EXPLAIN, this patch removes the 'spill-buffer' mem-estimate for partial sorts, since they can't spill. It does this by setting the spillable buffer size in the resource profile to -1. Since the BE relied on that number to determine the page size for sorts, it now calculates the page size from the min reservation, which gives an equivalent value. For the runtime profile, it removes the counters 'SpilledRuns' and 'MergesPerformed' since they will always be 0, and it renames the 'IntialRunsCreated' counter to 'RunsCreated' since the 'Initial' refers to the fact that in a regular sort those runs may be spilled or merged. It also adds a profile info string 'SortType' that can take the values 'Total', 'TopN', or 'Partial' to reflect the type of exec node being used. Example profile snippet for a partial sort: SORT_NODE (id=2):(Total: 403.261us, non-child: 382.029us, % non-child: 94.73%) SortType: Partial ExecOption: Codegen Enabled - NumRowsPerRun: (Avg: 44 (44) ; Min: 44 (44) ; Max: 44 (44) ; Number of samples: 1) - InMemorySortTime: 34.201us - PeakMemoryUsage: 2.02 MB (2117632) - RowsReturned: 44 (44) - RowsReturnedRate: 109.11 K/sec - RunsCreated: 1 (1) - SortDataSize: 572.00 B (572) Testing: - Manually ran several sorting queries and inspected their explain plans and profile Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b --- M be/src/exec/partial-sort-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/sorter.cc M be/src/runtime/sorter.h M fe/src/main/java/org/apache/impala/planner/SortNode.java 6 files changed, 22 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/23/8123/1 -- To view, visit http://gerrit.cloudera.org:8080/8123 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I2b15af78d8299db8edc44ff820c85db1cbe0be1b Gerrit-Change-Number: 8123 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Tauber-Marshall