[Impala-ASF-CR] IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2)
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21683 ) Change subject: IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2) .. Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc File be/src/util/runtime-profile.cc: http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2952 PS4, Line 2952: lock_guard l(event_sequence_lock_); Given the number of potential memory allocations below, it is going to be better to pre-allocate a vector and copy the events into it for processing so that the spinlock is not held very long. http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2953 PS4, Line 2953: if (!event_sequence_map_.empty()) { Is the empty case common enough to warrant a branch? http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2960 PS4, Line 2960: vector labels(event_sequence_contents.labels.size()); Can these structures get reused? http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2982 PS4, Line 2982: vector min_ts_list(BUCKET_SIZE, max); Also try to re-use these vectors. -- To view, visit http://gerrit.cloudera.org:8080/21683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214 Gerrit-Change-Number: 21683 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Comment-Date: Mon, 23 Sep 2024 12:43:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13233: Improve display of instance-level skew in query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21593 ) Change subject: IMPALA-13233: Improve display of instance-level skew in query timeline .. Patch Set 8: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2 Gerrit-Change-Number: 21593 Gerrit-PatchSet: 8 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Comment-Date: Thu, 12 Sep 2024 15:24:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13233: Improve display of instance-level skew in query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21593 ) Change subject: IMPALA-13233: Improve display of instance-level skew in query timeline .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js File www/scripts/query_timeline/fragment_diagram.js: http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@524 PS3, Line 524: var name_flds = pp.profile_name.split(/[()]/); : var node_type = name_flds[0].trim(); : var node_id = name_flds.length > 1 ? name_flds[1].split(/[=]/)[1] : 0; > Same here, I am not sure, why it was kept that way, need to confirm. Are their any implications of using string vs integer in this code? http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@548 PS3, Line 548: node_metadata.join_build_id > I will confirm with @kdeschle, I also could not find it. join_build_id should be an mt_dop case that uses join build but I don't remember the details of the json content. -- To view, visit http://gerrit.cloudera.org:8080/21593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2 Gerrit-Change-Number: 21593 Gerrit-PatchSet: 3 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Comment-Date: Tue, 27 Aug 2024 12:07:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10995: Allow remote submit time to be in the past
Kurt Deschler has abandoned this change. ( http://gerrit.cloudera.org:8080/17985 ) Change subject: IMPALA-10995: Allow remote submit time to be in the past .. Abandoned -- To view, visit http://gerrit.cloudera.org:8080/17985 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: If452d5ea48030c134ae4654e58b157f4fd24df04 Gerrit-Change-Number: 17985 Gerrit-PatchSet: 2 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Wenzhe Zhou
[Impala-ASF-CR] IMPALA-12576: Display warning on the query timeline page for the experimental profile
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21596 ) Change subject: IMPALA-12576: Display warning on the query timeline page for the experimental profile .. Patch Set 2: Is it possible to add an automated test? Not having automated tests for WebUI features can cause these tickets to get flagged as missing test coverage. -- To view, visit http://gerrit.cloudera.org:8080/21596 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6ad62be3572b7a126ce10dc4a11fcc9b8a58013d Gerrit-Change-Number: 21596 Gerrit-PatchSet: 2 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Thu, 15 Aug 2024 15:25:47 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13233: Represent phases of each instance separately on the query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21593 ) Change subject: IMPALA-13233: Represent phases of each instance separately on the query timeline .. Patch Set 3: (4 comments) http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@7 PS3, Line 7: IMPALA-13233: Represent phases of each instance separately on the query timeline Improve display of instance-level skew in query timeline http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@16 PS3, Line 16: being shown parallelly aggregated and displayed as a histogram http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@21 PS3, Line 21: are bucketed into 4 groups. Each group's timestamps are averaged and The histogram should display the min/max values somehow sicne averages can be very misleading with outliers. http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js File www/scripts/query_timeline/fragment_diagram.js: http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@263 PS3, Line 263: if (dx > ignore_px) { With the bucketed display, there is no good reason to skip displaying small skews. That was done previously because they could not be rendered. -- To view, visit http://gerrit.cloudera.org:8080/21593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2 Gerrit-Change-Number: 21593 Gerrit-PatchSet: 3 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Comment-Date: Thu, 15 Aug 2024 15:21:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12906: Incorporate scan range information into the tuple cache key
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21541 ) Change subject: IMPALA-12906: Incorporate scan range information into the tuple cache key .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/exec/tuple-cache-node.h@21 PS4, Line 21: #include Why include vector in this header? http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@600 PS4, Line 600: bool operator<(InstanceAssignment& other) const { return weight > other.weight; } Make this deterministic for equal weight case by comparing instance_idx. http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@816 PS4, Line 816: return ScanRangeWeight(a) > ScanRangeWeight(b); Make this deterministic for equal weight http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@821 PS4, Line 821: pop_heap(instance_heap.begin(), instance_heap.end()); Kind of hard to follow why pop_heap/push_heap is being used here instead of just updating and sorting the vector. Also why isn't make_heap needed? http://gerrit.cloudera.org:8080/#/c/21541/4/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java File fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java: http://gerrit.cloudera.org:8080/#/c/21541/4/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java@128 PS4, Line 128: output.append(detailPrefix + "input scan node ids: " + If this list will typically be long, it should be split at keyFormatWidth -- To view, visit http://gerrit.cloudera.org:8080/21541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34 Gerrit-Change-Number: 21541 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 15 Aug 2024 15:09:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits
Kurt Deschler has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/21598 ) Change subject: IMPALA-13214: Skip wait_until_connected when shell exits .. IMPALA-13214: Skip wait_until_connected when shell exits The ImpalaShell class expects to start impala-shell and interact with it by sending instructions over stdin and reading the results. This assumption was incorrect when used for impala-shell batch sessions, where the process exits on its own. If there's a delay in ImpalaShell.__init__ - between starting the process and polling to see that it's running - for a batch process, ImpalaShell will fail the assertion that process_status is None. This can be easily reproduced by adding a small (0.1s) sleep after starting the new process. Most batch runs of impala-shell happen through `run_impala_shell_cmd`. Updated that function to only wait for a successful connection when stdin input is supplied. Otherwise the command is assumed to be a batch function and any failures will be detected during `get_result`. Removed explicit use of `wait_until_connected` as redundant. Fixed cases in test_config_file that previously ignored WARNING before the connection string because they did not specify `wait_until_connected`. Tested by running shell/test_shell_commandline.py with a 0.1s delay before ImpalaShell polls. Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Reviewed-on: http://gerrit.cloudera.org:8080/21598 Tested-by: Impala Public Jenkins Reviewed-by: Jason Fehr Reviewed-by: Kurt Deschler --- M tests/custom_cluster/test_client_ssl.py M tests/shell/test_shell_commandline.py M tests/shell/util.py 3 files changed, 30 insertions(+), 25 deletions(-) Approvals: Impala Public Jenkins: Verified Jason Fehr: Looks good to me, but someone else must approve Kurt Deschler: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/21598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Gerrit-Change-Number: 21598 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21598 ) Change subject: IMPALA-13214: Skip wait_until_connected when shell exits .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Gerrit-Change-Number: 21598 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Wed, 24 Jul 2024 15:38:30 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21598 ) Change subject: IMPALA-13214: Skip wait_until_connected when shell exits .. Patch Set 4: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Gerrit-Change-Number: 21598 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Wed, 24 Jul 2024 00:56:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21598 ) Change subject: IMPALA-13214: Skip wait_until_connected when shell exits .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@10 PS2, Line 10: It This assumption was incorrect when.. http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@19 PS2, Line 19: Updates that function to only wait for a successful connection when Updates->Updated http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@21 PS2, Line 21: , extra comma http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@21 PS2, Line 21: Removes Removes->Removed http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@24 PS2, Line 24: Fixes Fixes->Fixed http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@28 PS2, Line 28: the the->a -- To view, visit http://gerrit.cloudera.org:8080/21598 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae Gerrit-Change-Number: 21598 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 23 Jul 2024 20:45:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12906: Incorporate scan range information into the tuple cache key
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21541 ) Change subject: IMPALA-12906: Incorporate scan range information into the tuple cache key .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java File fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java: http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java@125 PS1, Line 125: return output.toString(); Should we print input scan nodes here? -- To view, visit http://gerrit.cloudera.org:8080/21541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34 Gerrit-Change-Number: 21541 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 27 Jun 2024 18:48:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21557 ) Change subject: IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc File be/src/exec/iceberg-delete-builder.cc: http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc@260 PS1, Line 260: deletes.AddElements(positions); CRoaring does not handle out-of-memory. We should maybe add some reserve check here before calling it's functions. http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc@303 PS1, Line 303: RETURN_IF_ERROR(AddToDeletedRows(prev_file_path, pos_buffer)); Should sort pos_buffer here to guarantee performance of bitmap insert. -- To view, visit http://gerrit.cloudera.org:8080/21557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib769965d094149e99c43e0044914d9e76107 Gerrit-Change-Number: 21557 Gerrit-PatchSet: 1 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Wed, 26 Jun 2024 13:58:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21550 ) Change subject: IMPALA-12800: Implement hashCode everywhere .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca Gerrit-Change-Number: 21550 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 25 Jun 2024 20:18:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21550 ) Change subject: IMPALA-12800: Implement hashCode everywhere .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca Gerrit-Change-Number: 21550 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jason Fehr Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 25 Jun 2024 18:44:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21550 ) Change subject: IMPALA-12800: Implement hashCode everywhere .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10 PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That caused > I was looking to understand why hashCode is different for equal objects. Re Stated another way, any class that overrides equals() must guarantee that a.equals(b) => (a.hasbCode() == b.hashCode()). It is desirable but not necessary to override hashCode() if a suitable hashCode() implementation has been inherited. -- To view, visit http://gerrit.cloudera.org:8080/21550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca Gerrit-Change-Number: 21550 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 25 Jun 2024 17:36:35 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21550 ) Change subject: IMPALA-12800: Implement hashCode everywhere .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10 PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That caused > Done I was looking to understand why hashCode is different for equal objects. Reading java docs, the reason is that the default implementation hashes instance memory addresses. Since ColumnDef does not inherit from anything, these is no suitable default implementation to fall back to. We should add comments in Expr classes or wherever is calling outside of Expr to ensure that non-derived classes have suitable hashCode implementations. -- To view, visit http://gerrit.cloudera.org:8080/21550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca Gerrit-Change-Number: 21550 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 25 Jun 2024 17:28:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13106: Support larger imported query profile sizes through compression
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21463 ) Change subject: IMPALA-13106: Support larger imported query profile sizes through compression .. Patch Set 6: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG@37 PS4, Line 37: Added tests for the compression library methods utilized by > Automated tests have been written for compression. Done http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl@341 PS4, Line 341: uploadProfile(); > Web workers operate in a single thread, the processing is sequential. If we Ack -- To view, visit http://gerrit.cloudera.org:8080/21463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c4f31beb9cac89051460bf764b6d50c3933bd03 Gerrit-Change-Number: 21463 Gerrit-PatchSet: 6 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 25 Jun 2024 16:13:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21550 ) Change subject: IMPALA-12800: Implement hashCode everywhere .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10 PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That caused Please explain more how this manifested so there is a better record for future maintenance. It's not obvious why the base hashCode would yield different values. -- To view, visit http://gerrit.cloudera.org:8080/21550 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca Gerrit-Change-Number: 21550 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Smith Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Mon, 24 Jun 2024 21:29:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13106: Support larger imported query profile sizes through compression
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21463 ) Change subject: IMPALA-13106: Support larger imported query profile sizes through compression .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG@37 PS4, Line 37: Added tests for the compression library methods utilized by Please mention manual testing. What browsers is this tested on? http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl@341 PS4, Line 341: uploadProfile(); Why is being done recursively vs in a loop? What is the behavior if there are multiple errors? -- To view, visit http://gerrit.cloudera.org:8080/21463 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c4f31beb9cac89051460bf764b6d50c3933bd03 Gerrit-Change-Number: 21463 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 21 Jun 2024 12:59:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21482 ) Change subject: IMPALA-13137: Add additional client fetch metrics columns to the queries page .. Patch Set 8: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21482/8/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21482/8/www/queries.tmpl@158 PS8, Line 158: {{first_fetch}}{{client_fetch_duration}} We should consolidate these formatting style tags in a subsequent patch. -- To view, visit http://gerrit.cloudera.org:8080/21482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5 Gerrit-Change-Number: 21482 Gerrit-PatchSet: 8 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 18 Jun 2024 20:03:27 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21482 ) Change subject: IMPALA-13137: Add additional client fetch metrics columns to the queries page .. Patch Set 6: (3 comments) http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/impala-http-handler.cc@463 PS6, Line 463: document->AddMember("tips_fetch_duration", "The time taken for the client to fetch all" Total time spent returning rows to the client and other client-side processing. http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/query-state-record.h File be/src/service/query-state-record.h: http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/query-state-record.h@83 PS6, Line 83: /// The time taken for the client to fetch all rows. Total time spent returning rows to the client and other client-side processing. http://gerrit.cloudera.org:8080/#/c/21482/6/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21482/6/www/queries.tmpl@119 PS6, Line 119: Fetch Duration Client Fetch Duration -- To view, visit http://gerrit.cloudera.org:8080/21482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5 Gerrit-Change-Number: 21482 Gerrit-PatchSet: 6 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 17 Jun 2024 16:13:18 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21482 ) Change subject: IMPALA-13137: Add additional client fetch metrics columns to the queries page .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl File www/queries.tmpl: http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl@117 PS4, Line 117: First row fetched : : Client fetch wait time > Different name between this table header and query profile is OK. I'd say g "Fetch Time" would be misleading. This actually the time after the rows have been fetched from the server and can include transport and client processing or even the client doing nothing. -- To view, visit http://gerrit.cloudera.org:8080/21482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5 Gerrit-Change-Number: 21482 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 14 Jun 2024 17:14:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21482 ) Change subject: IMPALA-13137: Add additional client fetch metrics columns to the queries page .. Patch Set 2: Please add tests to tests/query_test/test_observability.py and/or tests/webserver/test_web_pages.py. Also incldue example output in the commit message as it's not clear what human readable means. -- To view, visit http://gerrit.cloudera.org:8080/21482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5 Gerrit-Change-Number: 21482 Gerrit-PatchSet: 2 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 11 Jun 2024 19:41:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21398 ) Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple cache key .. Patch Set 6: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c Gerrit-Change-Number: 21398 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 30 May 2024 18:48:50 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21398 ) Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple cache key .. Patch Set 6: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c Gerrit-Change-Number: 21398 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 30 May 2024 18:48:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21398 ) Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple cache key .. Patch Set 4: (5 comments) http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java File fe/src/main/java/org/apache/impala/analysis/Expr.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@868 PS4, Line 868: return treeToThrift(new ThriftSerializationCtx()); Why pass context vs null? Does this have any implications WRT being able to disable as much code as possible when caching is turned off or not applicable? http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java File fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java@72 PS4, Line 72: public void registerTuple(TupleId id) { Would it be cleaner to have this logic in Analyzer instead? http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1945 PS4, Line 1945: serialCtx.translateTupleId(entry.getKey().getId()).asInt(), Consider moving this inside Expr.treesToThrift() and passing in entry. http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@510 PS4, Line 510: msg.node_id = serialCtx.isTupleCache()? 0 : id_.asInt(); Move these assignments to the conditional branch below http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@528 PS4, Line 528: msg.addToRow_tuples(serialCtx.translateTupleId(tid).asInt()); Could local tids be employed early so that translation isn't needed here? -- To view, visit http://gerrit.cloudera.org:8080/21398 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c Gerrit-Change-Number: 21398 Gerrit-PatchSet: 4 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 09 May 2024 15:12:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21400 ) Change subject: IMPALA-13038: Support profile tab for imported query profiles .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc Gerrit-Change-Number: 21400 Gerrit-PatchSet: 5 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 09 May 2024 12:31:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21400 ) Change subject: IMPALA-13038: Support profile tab for imported query profiles .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl File www/query_profile.tmpl: http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@62 PS3, Line 62: result = `${result}${indent}Offset: ${eventSeq.offset}\n`; > Check performance of these repeated appends. It may be better to accumulate Is this approach actually faster? Concatenation is only efficient if the string doesn't reserve space. Please check javascript docs and measure performance. http://gerrit.cloudera.org:8080/#/c/21400/4/www/scripts/util.js File www/scripts/util.js: http://gerrit.cloudera.org:8080/#/c/21400/4/www/scripts/util.js@105 PS4, Line 105: return window.location.search.substring(10,window.location.search.indexOf("&")); Should parse out using the query_id tag. These attributes can appear in any order in URLs. -- To view, visit http://gerrit.cloudera.org:8080/21400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc Gerrit-Change-Number: 21400 Gerrit-PatchSet: 4 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 08 May 2024 20:02:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21383 ) Change subject: IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/scheduling/scheduler.cc@59 PS3, Line 59: DEFINE_bool_hidden(sort_runtime_filter_aggregator_cadidates, false, candidates http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/service/data-stream-service.cc File be/src/service/data-stream-service.cc: http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/service/data-stream-service.cc@143 PS3, Line 143: sleep_duration_ms > Isn't this too much, especially at the beginning? I would vote a short slee Maybe even less than 20ms. Especially if these sleeps can stack with complex plan. http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/util/network-util.h File be/src/util/network-util.h: http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/util/network-util.h@123 PS3, Line 123: if (comp == 0) comp = lhs.uds_address().compare(rhs.uds_address()); Check has_uds_address since it is optional field. -- To view, visit http://gerrit.cloudera.org:8080/21383 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I156d1f0c694b91ba34be70bc53ae9bacf924b3b9 Gerrit-Change-Number: 21383 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 08 May 2024 14:45:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21400 ) Change subject: IMPALA-13038: Support profile tab for imported query profiles .. Patch Set 3: (3 comments) http://gerrit.cloudera.org:8080/#/c/21400/3/be/src/service/impala-http-handler.cc File be/src/service/impala-http-handler.cc: http://gerrit.cloudera.org:8080/#/c/21400/3/be/src/service/impala-http-handler.cc@307 PS3, Line 307: Value query_id_val(PrintId(unique_id).c_str(), document->GetAllocator()); Move these after parse_status error check. http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl File www/query_profile.tmpl: http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@62 PS3, Line 62: result += `${indent}Offset: ${eventSeq.offset}\n`; Check performance of these repeated appends. It may be better to accumulate the sections into separate strings then append at the return. http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@113 PS3, Line 113: query.id = query.id.substring(10, query.id.indexOf("&")); Is there a tag that can be checked here? Please add a comment of where this URL/string comes from and what the format is. -- To view, visit http://gerrit.cloudera.org:8080/21400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc Gerrit-Change-Number: 21400 Gerrit-PatchSet: 3 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 08 May 2024 14:09:26 + Gerrit-HasComments: Yes
[native-toolchain-CR] IMPALA-13020: Use 64-bit integer for Thrift max message size on C++
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21343 ) Change subject: IMPALA-13020: Use 64-bit integer for Thrift max message size on C++ .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/21343/3/source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch File source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch: http://gerrit.cloudera.org:8080/#/c/21343/3/source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch@24 PS3, Line 24: Are these trailing spaces normal in patch files? -- To view, visit http://gerrit.cloudera.org:8080/21343 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I94fdd7b07fcc8dca0639839b40a9eff815090835 Gerrit-Change-Number: 21343 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Tue, 30 Apr 2024 11:24:55 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 19 Apr 2024 19:15:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 13: (3 comments) http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@102 PS13, Line 102: if (isOutputSegment()) { Change to if (topnode == null) http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1206 PS13, Line 1206: int maxScannerThreads = Integer.MAX_VALUE; maxScannerThreads is set to int max then never updated. Probably can be removed. http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/service/Frontend.java@2315 PS13, Line 2315: cpuAskUnbounded = scaleDownCpuSublinear(cpuCountRootFactor, We should report the original values somewhere before scaling. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 13 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Fri, 19 Apr 2024 14:34:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 15: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 15 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 17 Apr 2024 22:18:28 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 16: Patch set 16 accidentally uploaded. Please do not review. -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 16 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 16 Apr 2024 23:55:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21310 ) Change subject: IMPALA-12679: Improve test_rows_sent_counters assert .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py File tests/query_test/test_fetch.py: http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py@27 PS1, Line 27: class TestFetch(ImpalaTestSuite): > I'm surprised gerrit-code-review-checks didn't complain about spacing, I th Done http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py@68 PS1, Line 68: rpc_count = int(re.search("RPCCount: ([5-9])", runtime_profile).group(1)) > If rpc_count is not in the range 5-9, re.search will return None and this w Fixed RE -- To view, visit http://gerrit.cloudera.org:8080/21310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 Gerrit-Change-Number: 21310 Gerrit-PatchSet: 1 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 16 Apr 2024 22:29:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#16). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M be/src/util/runtime-profile-counters.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 28 files changed, 450 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/16 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 16 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/21310 to look at the new patch set (#2). Change subject: IMPALA-12679: Improve test_rows_sent_counters assert .. IMPALA-12679: Improve test_rows_sent_counters assert This patch changes the assert for failed test test_rows_sent_counters so that the actual RPC count is displayed in the assert output. The root cause of the failure will be addressed once sufficient data is collected with the new output. Testing: Ran test_rows_sent_counters with modified expected RPC count range to simulate failure. Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 --- M tests/query_test/test_fetch.py 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21310/2 -- To view, visit http://gerrit.cloudera.org:8080/21310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 Gerrit-Change-Number: 21310 Gerrit-PatchSet: 2 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert
Kurt Deschler has uploaded this change for review. ( http://gerrit.cloudera.org:8080/21310 Change subject: IMPALA-12679: Improve test_rows_sent_counters assert .. IMPALA-12679: Improve test_rows_sent_counters assert This patch changes the assert for failed test test_rows_sent_counters so that the actual RPC count is displayed in the assert output. The root cause of the failure will be addressed once sufficient data is collected with the new output. Testing: Ran test_rows_sent_counters with modified expected RPC count range to simulate failure. Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 --- M tests/query_test/test_fetch.py 1 file changed, 2 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21310/1 -- To view, visit http://gerrit.cloudera.org:8080/21310 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069 Gerrit-Change-Number: 21310 Gerrit-PatchSet: 1 Gerrit-Owner: Kurt Deschler
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 7: (5 comments) http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1210 PS7, Line 1210: in at nit: change to "is with" http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1210 PS7, Line 1210: is nit: remove "is" or change to "that is" http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1211 PS7, Line 1211: scheduler only assign nit: "the schedule would only assign" http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1213 PS7, Line 1213: // dominant plan subtree ('dominant_instance_count' is 0). Comment/terminology is overall hard to follow here. Is this referring to any fragment that would include the coordinator? Unclear why rounding up is the neccesary action. http://gerrit.cloudera.org:8080/#/c/21277/7/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/21277/7/common/thrift/Query.thrift@133 PS7, Line 133: Scheduler nit: The scheduler -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 16 Apr 2024 12:14:48 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG@16 PS1, Line 16: MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. > The current bounding still make sense. It still enforced by to determine pa The EG selection should only need the unbounded number for all EG. The last EG should simply ignore it. We should leave the plan sizing bounding in the parallel planner. I'm ok propagating the bounded number if it helps observability. -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 10 Apr 2024 19:00:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java File fe/src/main/java/org/apache/impala/planner/CoreCount.java: http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java@148 PS9, Line 148: // If core1 and core2 has equal total(), return one that does not have coordinator Should this be more consistent in removing coordinator core count? Seems it is counted above for equal case then not here for core 1 case. -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 9 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Apr 2024 18:40:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 10: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27 PS6, Line 27: class TupleFileReader; > If I remove this declaration, compilation of tuple-cache-node.cc fails. Thanks. I understand why this works now after reading the doc mentioned above -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 10 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Wed, 10 Apr 2024 02:07:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21277 ) Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG@16 PS1, Line 16: MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly. If these bounds don't make sense to apply with multiple executor groups, then we should reconsider their naming or whether these bounds should be applied at all by the planner or instead moved to the admission control. http://gerrit.cloudera.org:8080/#/c/21277/1/common/thrift/Query.thrift File common/thrift/Query.thrift: http://gerrit.cloudera.org:8080/#/c/21277/1/common/thrift/Query.thrift@1035 PS1, Line 1035: no specifi need to clarify what this means and why we have this concept. Is one core sufficient? http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@80 PS1, Line 80: private PlanFragment getFragment() { Can you make this getTopNode() then call this by both createCoreCount() and create UnboundedCoreCount()? http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/Planner.java@590 PS1, Line 590: int coresRequiredUnbounded = boundedCores.total(); Move this assignment to the else branch for clarity. I'm not really clear why this is using boundedCores and not an unbounded Number.. http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/service/Frontend.java@313 PS1, Line 313: protected int cores_required_unbounded_ = -1; Can this default to 1 instead? -- To view, visit http://gerrit.cloudera.org:8080/21277 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923 Gerrit-Change-Number: 21277 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Tue, 09 Apr 2024 22:20:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27 PS6, Line 27: class TupleFileReader; > Ack Did you try removing this declaration? I meant to imply that the definition already had to be included to compile. -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Alexey Serbin Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 09 Apr 2024 21:02:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 7: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 7 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 09 Apr 2024 19:51:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21257 ) Change subject: IMPALA-12980: Translate CpuAsk into admission control slots .. Patch Set 5: (10 comments) http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@11 PS5, Line 11: the number of processors and can be overridden with processors per executor? http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@16 PS5, Line 16: instances of any fragment on that backend. This is simplistic, because change "is simplistic" to "can lead to underestimation" http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@44 PS5, Line 44: AvgAdmissionSlotsPerExecutor It's not clear how this is calculated. Is it the maximum expected parallelism from above? http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@45 PS5, Line 45: think thinks http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@47 PS5, Line 47: AvgAdmissionSlotsPerExecutor, depending on what scheduled on that is scheduled http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc@1184 PS5, Line 1184: for (auto& finstance : backend.second.exec_params->instance_params()) { This is kind of hard to follow. There seems be assumptions that fragment indexes are ordered and not unique. http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc@1193 PS5, Line 1193: be_max_instances = max(be_max_instances, curr_instance_count); Can this be moved out of the loop? http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/planner/CostingSegment.java File fe/src/main/java/org/apache/impala/planner/CostingSegment.java: http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@84 PS5, Line 84: Preconditions.checkState(!nodes_.isEmpty()); Can this just get computed once at the top and passed down as a recursion arg? http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2338 PS5, Line 2338: derived is derived http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2399 PS5, Line 2399: cores_requirement may want to assert cores_requirement > 0 and numExecutors > 0 here to detect problems -- To view, visit http://gerrit.cloudera.org:8080/21257 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2 Gerrit-Change-Number: 21257 Gerrit-PatchSet: 5 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Tue, 09 Apr 2024 14:26:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21090 ) Change subject: IMPALA-12852: Make Kudu service start and stop independent .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/21090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Gerrit-Change-Number: 21090 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Wenzhe Zhou Gerrit-Reviewer: Yifan Zhang Gerrit-Comment-Date: Tue, 02 Apr 2024 01:05:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21171 ) Change subject: IMPALA-12905: Disk-based tuple caching .. Patch Set 6: (7 comments) http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h File be/src/exec/tuple-cache-node.h: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27 PS6, Line 27: class TupleFileReader; Are these forward declarations needed? I though the unique_ptr members below need the classes declared? http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@61 PS6, Line 61: SCOPED_TIMER(runtime_profile_->total_time_counter()); runtime_profile() used above. http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@142 PS6, Line 142: } else if (writer_->BytesWritten() > tuple_cache_mgr->MaxSize()) { Should check the size and not write if it is over. http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h File be/src/exec/tuple-file-reader.h: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h@30 PS6, Line 30: class MemTracker; Is this forward declaration needed? http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc File be/src/exec/tuple-file-reader.cc: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@77 PS6, Line 77: reader_.read(reinterpret_cast(&tuple_data_len), sizeof(tuple_data_len)); Check status after reads using if(!reader_) like below. http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@79 PS6, Line 79: DCHECK_GE(tuple_data_len, 0); These may not be initialized if read fails. http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc File be/src/exec/tuple-file-writer.cc: http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc@89 PS6, Line 89: if (!writer_) { Is it sufficient to just check the stream status here or can a transient error leave the write file invalid? -- To view, visit http://gerrit.cloudera.org:8080/21171 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf Gerrit-Change-Number: 21171 Gerrit-PatchSet: 6 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Tue, 02 Apr 2024 01:00:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12856: Event processor should ignore processing partition with empty partition values
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21143 ) Change subject: IMPALA-12856: Event processor should ignore processing partition with empty partition values .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21143/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21143/5//COMMIT_MSG@13 PS5, Line 13: partitions with empty values, EP should ignore such partitions instead Can the HMS bug also return non-empty partitions that are incomplete (missing files)? -- To view, visit http://gerrit.cloudera.org:8080/21143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2469930ccd74948325f1723bd8b2bd6aad02d09 Gerrit-Change-Number: 21143 Gerrit-PatchSet: 5 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 26 Mar 2024 12:50:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21090 ) Change subject: IMPALA-12852: Make Kudu service start and stop independent .. Patch Set 2: Please rename the new scripts to start-kudu and stop-kudu -- To view, visit http://gerrit.cloudera.org:8080/21090 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201 Gerrit-Change-Number: 21090 Gerrit-PatchSet: 2 Gerrit-Owner: Yifan Zhang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Comment-Date: Tue, 26 Mar 2024 12:13:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12856: Event processor should ignore processing partition with empty partition values
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21143 ) Change subject: IMPALA-12856: Event processor should ignore processing partition with empty partition values .. Patch Set 4: (1 comment) http://gerrit.cloudera.org:8080/#/c/21143/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/21143/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2869 PS4, Line 2869: continue; > The possibility of hitting this issue twice continuely should be really sma I agree with Quanlong here but the check should be in a separate loop above so reloads are not wasted if the empty partition is late in the list. -- To view, visit http://gerrit.cloudera.org:8080/21143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id2469930ccd74948325f1723bd8b2bd6aad02d09 Gerrit-Change-Number: 21143 Gerrit-PatchSet: 4 Gerrit-Owner: Sai Hemanth Gantasala Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Mon, 25 Mar 2024 11:59:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#15). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M be/src/util/runtime-profile-counters.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 28 files changed, 450 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/15 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 15 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 14: (1 comment) http://gerrit.cloudera.org:8080/#/c/20850/14/be/src/util/runtime-profile-counters.h File be/src/util/runtime-profile-counters.h: http://gerrit.cloudera.org:8080/#/c/20850/14/be/src/util/runtime-profile-counters.h@378 PS14, Line 378: /// Class for maintaining scoped reference counts with HighWaterMarkCounter > line has trailing whitespace Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 14 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 15 Mar 2024 22:46:17 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 14: (9 comments) http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query > Just brain dump, I don't think that the patch needs to be changed: The hive JDBC driver ramps up threads slowly after multiple batches are fetch which avoids this case. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query > Thanks for the info, I agree that 2 is solved by gradual ramping on client The maximum number of streams is enforced by the driver. The highest useful stream count that we have observed so far is around 5. Being CPU is a compressible resource, there does not seem to be much of an issue absorbing this additional parallelism. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24 PS11, Line 24: It is expected that the default : threshold will never be reached with a single fetch stream. > >There are no parallel fetches as they are serialized by the fetch lock Added counter for number of streams as suggested. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@186 PS13, Line 186: Initialize > This is only done in BufferedPlanRootSink, but not in class BlockingPlanRoo Added support to BlockingPlanRootSync. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@187 PS13, Line 187: . > nit: . Done http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@188 PS13, Line 188: RETURN_IF_ERROR(results->InitDelayedMaterialization(*row_desc_, > If I understand correctly this will create a new RowBatchContext on each fe It might be possible to attach a cache to the QueryDriver. I agree let's defer this. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h@295 PS13, Line 295: /// Move 'num_rows' rows from 'src' to 'dest' within the batch. Useful for exec > Should the description be updated too? Done http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc@251 PS13, Line 251: fetch_results->__set_hasMoreRows(!query_handle->eos()); > If I understand correctly at this point the coordinator may have already hi The query_handle reference keeps the QueryDriver/RunTimeState/MemPool around that the RowBachContext is using. http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc@477 PS13, Line 477: c > nit: extra space Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 14 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 15 Mar 2024 21:12:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#14). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/blocking-plan-root-sink.cc M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M be/src/util/runtime-profile-counters.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 28 files changed, 450 insertions(+), 123 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/14 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 14 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 5: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69 PS3, Line 69: // Save the number of rows in case GetNext() is called with a non-empty batch, > Changed this comment to make it clearer. I was trying to say that TupleCach Done http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499 PS3, Line 499:*/ > There will be multiple patches that revisit this function as we do more mas Done http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150 PS3, Line 150: StringBuilder errorLog = new StringBuilder(); > Changed this to verify the hash trace in the same way. I also changed the c Done http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test: http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63 PS3, Line 63: row_regex:.*\[TPlanNode\(.*\] > This particular test file is checking what the explain output contains at d Done -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Thu, 14 Mar 2024 18:58:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21035 ) Change subject: IMPALA-12818: Intermediate Result Caching plan node framework .. Patch Set 3: (5 comments) http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG@24 PS3, Line 24: eligible location. Eligibility is currently restricted to immediately It's a little awkward that placement is restricted in this commit since that likely makes untested codepaths. http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc File be/src/exec/tuple-cache-node.cc: http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69 PS3, Line 69: // Note: TupleCacheNode does not support limits, so this does not need logic to Limits can be supported for deterministic results. Maybe change to a TODO. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java File fe/src/main/java/org/apache/impala/planner/PlanNode.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499 PS3, Line 499: private void initThrift(TPlanNode msg, TupleCacheInfo tupleCacheInfo) { Maybe cleaner to split this function so the subset can be called for IRC. http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java: http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150 PS3, Line 150: assertEquals(intersection.size(), 0); Consider verifying the hash trace here also. http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test: http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63 PS3, Line 63: row_regex:.*\[TPlanNode\(.*\] Maybe better to verify the actual cache key and at least the first and last trace lines so we can easily now if something fundamental changes. -- To view, visit http://gerrit.cloudera.org:8080/21035 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961 Gerrit-Change-Number: 21035 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Yida Wu Gerrit-Comment-Date: Fri, 01 Mar 2024 16:55:45 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21071 ) Change subject: IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up .. Patch Set 5: (2 comments) http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842 PS5, Line 842: LOG.info("EventId: {} EventType: COMMIT_TXN transaction id: {}", getEventId(), > These are very helpful in connecting the COMMIT/ABORT_TXN events with the t Thanks for the clarification. http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842 PS5, Line 842: LOG.info("EventId: {} EventType: COMMIT_TXN transaction id: {}", getEventId(), > Why are these (and other) messages INFO level now? This may create a lot of Done -- To view, visit http://gerrit.cloudera.org:8080/21071 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93b6f684d6e4b94961d804a0c022029249873681 Gerrit-Change-Number: 21071 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Thu, 29 Feb 2024 00:12:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/21071 ) Change subject: IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842 PS5, Line 842: LOG.info("EventId: {} EventType: COMMIT_TXN transaction id: {}", getEventId(), Why are these (and other) messages INFO level now? This may create a lot of chatter in the logs. -- To view, visit http://gerrit.cloudera.org:8080/21071 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I93b6f684d6e4b94961d804a0c022029249873681 Gerrit-Change-Number: 21071 Gerrit-PatchSet: 5 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Sai Hemanth Gantasala Gerrit-Comment-Date: Tue, 27 Feb 2024 15:35:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20719 ) Change subject: IMPALA-12433: Share buffers among channels in KrpcDataStreamSender .. Patch Set 15: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20719 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3 Gerrit-Change-Number: 20719 Gerrit-PatchSet: 15 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zihao Ye Gerrit-Comment-Date: Wed, 14 Feb 2024 16:12:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#13). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 26 files changed, 412 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/13 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 13 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21 PS11, Line 21: (default 0.2). > As there is only minimal testing in the patch, it may better to disable thi This is effectively inactive without setting options in the Hive JDBC driver or setting the threshold to 0, and that potential performance regressions are minimal (see comments below). I would rather leave this with the threshold set as-is so that it is not required to set additional options from JDBC to benefit from this enhancement. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23 PS11, Line 23: Once enabled, delayed materialization will be used for the remainder of : the rows fetched for the current query > Just brain dump, I don't think that the patch needs to be changed: The Hive JDBC driver ramps up threads slowly as row batches are fetched. That should avoid most of these cases. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24 PS11, Line 24: It is expected that the default : threshold will never be reached with a single fetch stream. > It seems safer to guarantee this by detecting parallel fetches and only ena There are no parallel fetches as they are serialized by the fetch lock. Using the fetch lock acquisition latency is superior to counting how many streams are referencing the statement since we can avoid delayed materialization when the benefits would be negligible. The current threshold (20%) will typically only delay when there are more than 2 streams. I also measured performance with narrow and wide table and the performance difference was very negligible, <5% for degenerate wide table cases and <1% for narrow tables. http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@31 PS11, Line 31: -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% > A follow up Jira would be nice about creating some simple multi thread clie Possibly once the Hive driver changes are available in a release. I did test manually with asan and tsan and the multi-stream driver. http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h@329 PS11, Line 329: ror > typo Done http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254 PS11, Line 254: RETURN_IF_ERROR(result_set->Flush()); : result_flush_timer.Stop(); : query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime()); > the timer is not stopped in case line 254 returns an error This is not necessary since we would not want to report flush time in the case of an error. http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc@56 PS11, Line 56: class RowBatchContext { > Can you add some comment about the role of this class? Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Mon, 05 Feb 2024 19:23:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#12). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 26 files changed, 413 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/12 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 12 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 11: (7 comments) http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc File be/src/codegen/llvm-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@459 PS10, Line 459: llvm::Value* llvm_len1 = codegen->GetI32Constant(strlen(data1)); : llvm::Value* llvm_len2 = codegen->GetI32Constant(strlen(data > Nit: The shared_ptr should be freed automatically at the end of this functi Done http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@525 PS10, Line 525: // state. > Nit: The shared_ptr will get reset at the end of the function, so the "code Done http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@553 PS10, Line 553: llvm::Function* complete_fn = complete_prototype.GeneratePrototype(&builder, nullpt > Same thing as above. Done http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen.h File be/src/codegen/llvm-codegen.h: http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen.h@666 PS10, Line 666: MemTracker* parent_mem_tracker, const std::string& file, : const std::string& id, std::unique_ptr* codegen); : > Nit: I think this might be the last use of boost::scoped_ptr in this file. Changed to unique_ptr. shared_ptr uses atomic primitives and is more expensive at runtime. http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/exprs/expr-codegen-test.cc File be/src/exprs/expr-codegen-test.cc: http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/exprs/expr-codegen-test.cc@360 PS10, Line 360: } > Nit: Same comment as elsewhere, this reset() call is not strictly necessary Done http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.h File be/src/runtime/row-batch.h: http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.h@327 PS10, Line 327: /// Deep copy num_rows from this row batch into dst stating at start_idx, using memory : /// allocated from dst's tuple_data_pool_. : /// This variant supports copy of a subset of the rows and non-empty dst ror batch : /// TODO: the current implementation of DeepCopyRows can produce an oversized : /// row batch if there are duplicate tuples in this row ba > Two thoughts: Done http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.cc@484 PS10, Line 484: void RowBatch::DeepCopyRows(RowBatch* dst, int start_idx, int num_row > One other DCHECK we can do is to assert that the source and destination are Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Tue, 30 Jan 2024 19:56:05 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#11). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exec/exchange-node.cc M be/src/exec/sort-node.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 26 files changed, 411 insertions(+), 120 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/11 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 11 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#10). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/benchmarks/hash-benchmark.cc M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 24 files changed, 399 insertions(+), 98 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/10 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 7: (9 comments) http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc File be/src/codegen/llvm-codegen.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc@263 PS7, Line 263: return status; > Should this also call codegen->reset()? Done http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc@1050 PS7, Line 1050: coord_instance_->GetCodegenRef(results->GetCodegenPtr()); > Oh, I think I understand this now. It's assigning the codegen object held b Changed as suggested. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h File be/src/runtime/fragment-state.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h@58 PS7, Line 58: void GetCodegenRef(std::shared_ptr& codegen); > nit: this makes more sense to me as SetCodegenRef. And most idiomatic versi Changed to SetCodegenRef. I want to preserve the existing references. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h@53 PS1, Line 53: TImpalaQueryOptions::DELAY_MATERIALIZE_RESULTS_THRESHOLD + 1); \ > line too long (108 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h File be/src/service/query-result-set.h: http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@98 PS7, Line 98: std::shared_ptr& GetCodegenPtr() { return codegen; } > Safer to return a const&. Don't want a caller to be able to reset it to del This is set externally and needs to be non-const. http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@109 PS7, Line 109: std::shared_ptr codegen; > I don't see this set anywhere. It is set externally as a reference. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@190 PS1, Line 190: /// Add a row from a TResultRow > Functions and variables have blank lines around them, and comments describi Done http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc@168 PS3, Line 168: > This should be moved to the 'private' block below. Done http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc@107 PS5, Line 107: /// Handle to shared runtime state. This is reentrant > nit: Trailing slash. Done -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Sat, 27 Jan 2024 01:29:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#9). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 23 files changed, 397 insertions(+), 96 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/9 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 9 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#8). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen-cache-test.cc M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/exprs/expr-codegen-test.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 22 files changed, 380 insertions(+), 79 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/8 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 8 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#7). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 20 files changed, 359 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/7 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 7 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#6). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/codegen/llvm-codegen.cc M be/src/codegen/llvm-codegen.h M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/coordinator.cc M be/src/runtime/fragment-instance-state.cc M be/src/runtime/fragment-instance-state.h M be/src/runtime/fragment-state.cc M be/src/runtime/fragment-state.h M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 20 files changed, 358 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/6 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 6 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#5). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 13 files changed, 324 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/5 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 5 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#4). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 13 files changed, 324 insertions(+), 40 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/4 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 4 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12688: Support JSON profile imports for visualizing query timeline in webUI
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20867 ) Change subject: IMPALA-12688: Support JSON profile imports for visualizing query timeline in webUI .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20867/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20867/1//COMMIT_MSG@9 PS1, Line 9: A button has been added to import JSON query profiles for visualizing : them on the query timeline page. > It's weird to add import JSON button in query timeline page since the page Let's see if we can add a new /import link at the top that open a screen for import and only offers supported tabs. -- To view, visit http://gerrit.cloudera.org:8080/20867 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife6eb59bf2030fd19fc92aaf134eb51c609e04d0 Gerrit-Change-Number: 20867 Gerrit-PatchSet: 1 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 10 Jan 2024 14:59:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#3). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 13 files changed, 298 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/3 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 3 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20850 ) Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. Patch Set 1: (23 comments) http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG@13 PS1, Line 13: FetchLockWaitTimer and ResultFlushTimer and have been added to account > nit: extra "and" after ResultFlushTimer Done http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG@15 PS1, Line 15: materializing results outside of the fetch lock. New query option > nit: A line break would be nice somewhere around here. Big wall of text. Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/exec/buffered-plan-root-sink.cc File be/src/exec/buffered-plan-root-sink.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/exec/buffered-plan-root-sink.cc@240 PS1, Line 240: num_rows_to_read, output_expr_evals_, state)); > I think this needs a comment. It's not obvious why this uses num_rows_to_re Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@469 PS1, Line 469: void RowBatch::CopyRows(RowBatch* dst, int start_idx, int num_rows) { > nit: can we put this after DeepCopyTo so the ordering is consistent with th Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@471 PS1, Line 471: DCHECK_LE(num_rows, num_rows_); > This feels a little redundant with the next line. What we actually care abo removed. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@472 PS1, Line 472: DCHECK_LE(start_idx + num_rows, num_rows_); > Maybe also DCHECK start_idx >= 0, and num_rows > 0? Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc@208 PS1, Line 208: MonotonicStopWatch fetch_lock_timer; > nit: No reason this needs to be outside the block where it's used. Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc@248 PS1, Line 248: fetch_lock_timer.Start(); > When does this get stopped again? Or used? removed. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h File be/src/service/query-options.h: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h@53 PS1, Line 53: TImpalaQueryOptions::DELAY_MATERIALIZE_RESULTS_THRESHOLD + 1); \ > line too long (108 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc File be/src/service/query-result-set.cc: http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@61 PS1, Line 61: pendingRowBatch_(), > nit: you don't really need to default initialize all of these. It happens a Now the ctors are non-default. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@82 PS1, Line 82: DCHECK(!pendingRowBatch_); > nit: could some of this happen in the constructor? You're asserting things Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@90 PS1, Line 90: mem_tracker_.reset(new MemTracker((impala::IntGauge*)nullptr, -1, "Result row batch", state_->query_mem_tracker())); > line too long (120 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@91 PS1, Line 91: expr_mem_tracker_.reset(new MemTracker((impala::IntGauge*)nullptr, -1, "Result expr", mem_tracker_.get())); > line too long (111 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@189 PS1, Line 189: virtual size_t size() override { return row_batch_context_ ? row_batch_context_->size() : num_rows_; } > line too long (104 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@190 PS1, Line 190: virtual Status InitDelayedMaterialization(const RowDescriptor& row_desc, > Please try to match style (spacing, functions descriptions). Please be more specific what you want changed. http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@193 PS1, Line 193: virtual bool DelayedMaterializationEnabled() const override { return delay_materialization_; } > line too long (96 > 90) Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@198 PS1, Line 198: RETURN_IF_ERROR(AddRows(row_batch_context_->GetExprEvals(), > Re-using AddRows here makes the AddRows conditional feel awkward. How about Done http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@280 PS1, Line 280: metadata, rowset, stringify_map_keys, expected_resul
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Hello Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20850 to look at the new patch set (#2). Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 13 files changed, 297 insertions(+), 38 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/2 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 2 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Smith
[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20719 ) Change subject: IMPALA-12433: Share buffers among channels in KrpcDataStreamSender .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc@271 PS11, Line 271: bool* is_compressed, int64_t size, TrackedString* compression_scratch) { > Change to TrackedString& or TrackedString*const here and above NM. I missed that this can be null. -- To view, visit http://gerrit.cloudera.org:8080/20719 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3 Gerrit-Change-Number: 20719 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Mon, 08 Jan 2024 20:07:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock
Kurt Deschler has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20850 Change subject: IMPALA-12533: Support row materialization outside of fetch lock .. IMPALA-12533: Support row materialization outside of fetch lock This patch supports an alternative result materialization path where results are copied to a temporary row batch then materialized into thrift outside of the fetch lock. This can improve fetch throughput when the client is fetching concurrently from multiple result streams. New metrics FetchLockWaitTimer and ResultFlushTimer and have been added to account for synchrounous time waiting for the fetch lock and cumulative time materializing results outside of the fetch lock. New query option delay_materialize_results_threshold has been added to automatically enable delayed result materialization when the fetch lock wait time exceeds the specified fraction of the materialization time (default 0.2). Once enabled, delayed materialization will be used for the remainder of the rows fetched for the current query. It is expected that the default threshold will never be reached with a single fetch stream. This optimization is disabled when query result caching is active since the cache stores materialized rows and is populated inside of the fetch lock. Testing: -Tests added to tests/query_test/test_observability.py -Manual testing with Multi-stream Hive JDBC driver. This showed a 20% improvement in fetch time with a wide table and 5 streams. On the mini-cluster, performance is now limited by root sink exchange throughput. Production clusters should be able to achieve higher gains. Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 --- M be/src/exec/buffered-plan-root-sink.cc M be/src/runtime/row-batch.cc M be/src/runtime/row-batch.h M be/src/service/client-request-state.cc M be/src/service/client-request-state.h M be/src/service/impala-hs2-server.cc M be/src/service/query-options.cc M be/src/service/query-options.h M be/src/service/query-result-set.cc M be/src/service/query-result-set.h M common/thrift/ImpalaService.thrift M common/thrift/Query.thrift M tests/query_test/test_observability.py 13 files changed, 294 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/1 -- To view, visit http://gerrit.cloudera.org:8080/20850 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23 Gerrit-Change-Number: 20850 Gerrit-PatchSet: 1 Gerrit-Owner: Kurt Deschler
[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20719 ) Change subject: IMPALA-12433: Share buffers among channels in KrpcDataStreamSender .. Patch Set 11: (1 comment) http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc File be/src/runtime/row-batch.cc: http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc@271 PS11, Line 271: bool* is_compressed, int64_t size, TrackedString* compression_scratch) { Change to TrackedString& or TrackedString*const here and above -- To view, visit http://gerrit.cloudera.org:8080/20719 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3 Gerrit-Change-Number: 20719 Gerrit-PatchSet: 11 Gerrit-Owner: Csaba Ringhofer Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Fri, 15 Dec 2023 18:36:47 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics
Hello Andrew Sherman, Riza Suminto, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20776 to look at the new patch set (#3). Change subject: IMPALA-12614: Use atomics for 64-bit host metrics .. IMPALA-12614: Use atomics for 64-bit host metrics This patch changes 64-bit host metrics to use atomics to avoid potential partial load/store data races. There are no functional changes. This issue was exposed by TSAN tests when IMPALA-12385 enabled periodic metrics by default. Testing: TSAN tests cover this case. Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176 --- M be/src/runtime/query-state.cc M be/src/util/system-state-info-test.cc M be/src/util/system-state-info.cc M be/src/util/system-state-info.h 4 files changed, 18 insertions(+), 17 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/3 -- To view, visit http://gerrit.cloudera.org:8080/20776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176 Gerrit-Change-Number: 20776 Gerrit-PatchSet: 3 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Riza Suminto
[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics
Kurt Deschler has uploaded this change for review. ( http://gerrit.cloudera.org:8080/20776 Change subject: IMPALA-12614: Use atomics for 64-bit host metrics .. IMPALA-12614: Use atomics for 64-bit host metrics This patch changes 64-bit host metrics to use atomics to avoid potential partial load/store data races. There are no functional changes. This issue was exposed by TSAN tests when IMPALA-12385 enabled periodic metrics by default. Testing: TSAN tests cover this case. Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176 --- M be/src/runtime/query-state.cc M be/src/util/system-state-info.cc M be/src/util/system-state-info.h 3 files changed, 13 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/1 -- To view, visit http://gerrit.cloudera.org:8080/20776 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176 Gerrit-Change-Number: 20776 Gerrit-PatchSet: 1 Gerrit-Owner: Kurt Deschler
[Impala-ASF-CR] IMPALA-12385: Enable Periodic metrics by default
Hello Riza Suminto, David Rorke, Surya Hebbar, Csaba Ringhofer, Michael Smith, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/20377 to look at the new patch set (#10). Change subject: IMPALA-12385: Enable Periodic metrics by default .. IMPALA-12385: Enable Periodic metrics by default This patch enables periodic metrics in query profiles by default and changes the metric collectors to be more suitable for mixed workloads. -Changed default of resource_trace_ratio to 1. -Changed profile metrics to use sampling counters which can automatically resize for long queries. -Reduced profile metric samping interval to 50ms to support short-running queries. -Changed fragment metrics to use the same sampling interval as periodic metrics. -Added singleton PeriodicCounterUpdater and thread for updating system (KRPC) metrics at a different period than fragment metrics. -Added new flag periodic_system_counter_update_period_ms for configuring system metric update period. Default is 500ms. Example profile output: - HostCpuIoWaitPercentage (400.000ms): 1, 0, 2, 3, 4, 6, 5, 2, 1, ... - HostCpuSysPercentage (400.000ms): 1, 12, 10, 11, 11, 5, 3, 3, 3, ... - HostCpuUserPercentage (400.000ms): 8, 46, 39, 39, 39, 29, 22, 23, ... Testing: -Updated runtime-profile-test and test_observability.py for new defaults -Manual inspection of query profile metrics for long-running queries -Tested for performance regression using 50 concurrent TPCH Q1 and with periodic_counter_update_period_ms=1 to verify that the periodic update loop does not affect peformance or become a bottleneck Change-Id: Ic8e5cbfd4b324081158574ceb8f4b3a062a69fd1 --- M be/src/runtime/exec-env.cc M be/src/runtime/krpc-data-stream-recvr.cc M be/src/runtime/query-state.cc M be/src/util/periodic-counter-updater.cc M be/src/util/periodic-counter-updater.h M be/src/util/runtime-profile-counters.h M be/src/util/runtime-profile-test.cc M be/src/util/runtime-profile.cc M be/src/util/runtime-profile.h M be/src/util/streaming-sampler.h M common/thrift/Query.thrift M tests/query_test/test_observability.py 12 files changed, 147 insertions(+), 81 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/20377/10 -- To view, visit http://gerrit.cloudera.org:8080/20377 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic8e5cbfd4b324081158574ceb8f4b3a062a69fd1 Gerrit-Change-Number: 20377 Gerrit-PatchSet: 10 Gerrit-Owner: Kurt Deschler Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar
[Impala-ASF-CR] IMPALA-12548: Fix behavior of AGG MEM CORRELATION FACTOR
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20684 ) Change subject: IMPALA-12548: Fix behavior of AGG_MEM_CORRELATION_FACTOR .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/20684 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6f81db32a1818abc257957f6de942b5c9f36211a Gerrit-Change-Number: 20684 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Aman Sinha Gerrit-Reviewer: David Rorke Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Riza Suminto Gerrit-Comment-Date: Wed, 08 Nov 2023 23:58:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12373: Small String Optimization for StringValue
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20496 ) Change subject: IMPALA-12373: Small String Optimization for StringValue .. Patch Set 11: Please consider an extension to SSO which would be to templatize the small string size so that wider fixed values like timestamp can also benefit. This should probably be filed separately unless you would do something fundamentally different here. -- To view, visit http://gerrit.cloudera.org:8080/20496 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3 Gerrit-Change-Number: 20496 Gerrit-PatchSet: 11 Gerrit-Owner: Zoltan Borok-Nagy Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Daniel Becker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Wed, 08 Nov 2023 22:08:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. Patch Set 31: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 31 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 09 Oct 2023 21:14:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. Patch Set 30: Code-Review+1 (2 comments) http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@13 PS30, Line 13: - It is not distributed, e.g, fregment is unpartitioned. The queries nit: fragment http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@15 PS30, Line 15: - Queries which read following data types from external jdbc tables JDBC -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 30 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 09 Oct 2023 20:52:13 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. Patch Set 27: (2 comments) http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@20 PS29, Line 20: - Don't support external tables with complex types of columns. Please reword all of these the begin with Don't. http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@23 PS29, Line 23: - Don't support Catalog V2 (IMPALA-7131) Support is limited to the following databases: -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 27 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 09 Oct 2023 20:09:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12473: Fix profile's missing event timestamp exception in query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20522 ) Change subject: IMPALA-12473: Fix profile's missing event timestamp exception in query timeline .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl File www/query_timeline.tmpl: http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@281 PS5, Line 281: for (var instance = 1; instance < fp.child_profiles.length; ++instance) { > In one of the instances mentioned in IMPALA-12473, intermediate event times It might better to consolidate the 2 cases so that the labels are only checked while missing labels are not accounted for. Could even go last to first to optimize for the case where the last events are missing. http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@300 PS5, Line 300: continue; remove the continue and change the next branch to else if. http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@308 PS5, Line 308: ++i; factor ++i out of branches -- To view, visit http://gerrit.cloudera.org:8080/20522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I025b00d8632a5a1953ecdaaa7d8a4ae224dd2610 Gerrit-Change-Number: 20522 Gerrit-PatchSet: 5 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 09 Oct 2023 16:42:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. Patch Set 27: (4 comments) http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@13 PS27, Line 13: - It is not distributed. Unclear what this actually means. Queries executed on the coordinator? http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@14 PS27, Line 14: - Don't support following data types for column data from external Change wording: From: Don't support queries... Don't support datatypes XYZ. To: Queries... are not supported. ... XYZ datatypes are not supported. http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@16 PS27, Line 16: - Only support binary predicates with operators =, !=, <=, >=, Will between/in/like planned be supported in the future? http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@43 PS27, Line 43: Does this require mincluster restart? -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 27 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 09 Oct 2023 15:51:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 18: We need to do something different for the zooming and scrolling as it's hard to figure out how to control it, especially with a track pad. The grey bar seems out of place and is confusing. I think it would be better to have buttons to control the zoom. Panning should work via click and drag or buttons. -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 18 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Fri, 06 Oct 2023 16:17:49 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12482: Make VLOG level of RpcEventHandler adjustable
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20537 ) Change subject: IMPALA-12482: Make VLOG level of RpcEventHandler adjustable .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20537/1/be/src/statestore/statestore-subscriber.cc File be/src/statestore/statestore-subscriber.cc: http://gerrit.cloudera.org:8080/#/c/20537/1/be/src/statestore/statestore-subscriber.cc@221 PS1, Line 221: new RpcEventHandler("statestore-subscriber", metrics_, 3)); > nit: add a comment about what the 3 here means, like "Logging statestore su maybe add a new #define for this class of logging? -- To view, visit http://gerrit.cloudera.org:8080/20537 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic7658ee0016411a9ace0ca3f2eb535b03d2a7add Gerrit-Change-Number: 20537 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Michael Smith Gerrit-Comment-Date: Thu, 05 Oct 2023 16:47:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20355 ) Change subject: IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline .. Patch Set 17: (13 comments) http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@8 PS17, Line 8: Add: This patch adds fragment-level metrics to the WebUI query timeline display.. Also mention splitting up the timeline logic into multiple files. http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@9 PS17, Line 9: The fragment's plan nodes are enlarged with an animated transition on hovering Limit line length to 73 characters in commit message. In vim: :set textwidth=73 gggqG http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@10 PS17, Line 10: query timeline's fragment fragment's row in the phase http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@25 PS17, Line 25: Once a fragment's metrics are displayed, they are updated as they Do you mean they will be updated for a running query? http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@35 PS17, Line 35: RESOURCE_TRACE_RATIO query option provides the utilization values to be Say that RESOURCE_TRACE_RATIO must be set to enable periodic metrics and remove the rest of the description. What matters for this change is what metrics are displayed. http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@48 PS17, Line 48: are being : displayed are displayed http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@59 PS17, Line 59: To help with manually testing the query timeline web page, We may want to split the test changes into a separate patch. http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@59 PS17, Line 59: manually What is preventing automated testing? http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@76 PS17, Line 76: As the order and name of counters may change within the profile, Unclear what this is relevant to. Is this part of the testing infrastructure? http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl File www/query_timeline.tmpl: http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@65 PS2, Line 65: > I can assign a trigger such as clicking on the timeticks footer or any othe I think it's ok the way it is. http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@471 PS2, Line 471: > Yes, it would be very helpful to compare multiple fragment metrics together First comment was relating to the position of the charts meaning make the fragment metrics appear below the periodic metrics. http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@778 PS2, Line 778: > The tics/scaling for the fragment metrics are not aligned properly now. The x-axis for the periodic metrics and the phase diagram have exactly the same max value and align perfectly. However, the fragment metrics has a different max value and does not align. http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@1018 PS2, Line 1018: > As the unit used to measure the fragment metrics is different compared to t Makes sense. So we can only show 2 scales or can there be other sets of labels? -- To view, visit http://gerrit.cloudera.org:8080/20355 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f Gerrit-Change-Number: 20355 Gerrit-PatchSet: 17 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Thu, 05 Oct 2023 14:07:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12473: Fix profile's missing event timestamp exception in query timeline
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20522 ) Change subject: IMPALA-12473: Fix profile's missing event timestamp exception in query timeline .. Patch Set 5: (1 comment) http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl File www/query_timeline.tmpl: http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@281 PS5, Line 281: for (var instance = 1; instance < fp.child_profiles.length; ++instance) { Is is always the last event(s) that are missing? If so, we may not need to check labels. -- To view, visit http://gerrit.cloudera.org:8080/20522 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I025b00d8632a5a1953ecdaaa7d8a4ae224dd2610 Gerrit-Change-Number: 20522 Gerrit-PatchSet: 5 Gerrit-Owner: Surya Hebbar Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Surya Hebbar Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Wed, 04 Oct 2023 14:28:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/17842 ) Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables .. Patch Set 22: (8 comments) http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@273 PS22, Line 273: .collect(Collectors.joining(", ")); Is quoting added somewhere? Postgres will convert unquoted capitals to lowercase. http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@96 PS22, Line 96: String countQuery = "SELECT COUNT(*) FROM (" + sql + ") tmptable"; Are we sure all of the target databases will flatten this view query? Otherwise could text replace the generated "select *" http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285 PS22, Line 285: props.put("maxActive", "3"); Are these database-specific? http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@86 PS22, Line 86: LOGGER.warn("hasNext() threw exception", se); This should re-throw http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@175 PS22, Line 175: LOGGER.warn("Caught exception while trying to close database objects", e); probably should re-throw http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@39 PS22, Line 39: return "Select * from (" + sql + ") as \"tmp\" limit " + limit; This may not always flatten well. consider text replacement instead. http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java File java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java: http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@54 PS22, Line 54: joiner.add(String.format("%s %s %s", name, op, value)); May need quoting here. http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql File testdata/bin/create-data-source-table.sql: http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql@56 PS22, Line 56: CREATE TABLE alltypes_jdbc_datasource ( Add tests that require quoting -- To view, visit http://gerrit.cloudera.org:8080/17842 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2 Gerrit-Change-Number: 17842 Gerrit-PatchSet: 22 Gerrit-Owner: Fucun Chu Gerrit-Reviewer: Abhishek Rawat Gerrit-Reviewer: Anonymous Coward Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Quanlong Huang Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 02 Oct 2023 21:17:50 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20475 ) Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/20475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394 Gerrit-Change-Number: 20475 Gerrit-PatchSet: 3 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 02 Oct 2023 16:13:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20475 ) Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment .. Patch Set 1: (3 comments) http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1065 PS1, Line 1065: long maxScannerThreads = Long.MAX_VALUE; > Done. Changed from long to int. Thanks for catching this! Please follow up below. http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1108 PS1, Line 1108: (int) Math.min(costBasedMaxParallelism, maxScannerThreads)); Is the cast necessary here? http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1139 PS1, Line 1139: maxParallelism_ = (int) Math.min(maxParallelism_, maxScannerThreads); Is the cast necessary here? -- To view, visit http://gerrit.cloudera.org:8080/20475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394 Gerrit-Change-Number: 20475 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Riza Suminto Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 02 Oct 2023 16:04:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment
Kurt Deschler has posted comments on this change. ( http://gerrit.cloudera.org:8080/20475 ) Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1065 PS1, Line 1065: long maxScannerThreads = Long.MAX_VALUE; What long? Should make thread counts in everywhere and check/cast computed values as early as possible. -- To view, visit http://gerrit.cloudera.org:8080/20475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394 Gerrit-Change-Number: 20475 Gerrit-PatchSet: 1 Gerrit-Owner: Riza Suminto Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Kurt Deschler Gerrit-Reviewer: Wenzhe Zhou Gerrit-Comment-Date: Mon, 02 Oct 2023 15:43:31 + Gerrit-HasComments: Yes