[Impala-ASF-CR] IMPALA-13304: Include aggregate instance-level metrics within experimental profile(V2)

2024-09-23 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21683 )

Change subject: IMPALA-13304: Include aggregate instance-level metrics within 
experimental profile(V2)
..


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc
File be/src/util/runtime-profile.cc:

http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2952
PS4, Line 2952: lock_guard l(event_sequence_lock_);
Given the number of potential memory allocations below, it is going to be 
better to pre-allocate a vector and copy the events into it for processing so 
that the spinlock is not held very long.


http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2953
PS4, Line 2953: if (!event_sequence_map_.empty()) {
Is the empty case common enough to warrant a branch?


http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2960
PS4, Line 2960: vector 
labels(event_sequence_contents.labels.size());
Can these structures get reused?


http://gerrit.cloudera.org:8080/#/c/21683/4/be/src/util/runtime-profile.cc@2982
PS4, Line 2982: vector min_ts_list(BUCKET_SIZE, max);
Also try to re-use these vectors.



--
To view, visit http://gerrit.cloudera.org:8080/21683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I49e18a7a7e1288e3e674e15b6fc86aad60a08214
Gerrit-Change-Number: 21683
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Comment-Date: Mon, 23 Sep 2024 12:43:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13233: Improve display of instance-level skew in query timeline

2024-09-12 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21593 )

Change subject: IMPALA-13233: Improve display of instance-level skew in query 
timeline
..


Patch Set 8: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2
Gerrit-Change-Number: 21593
Gerrit-PatchSet: 8
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Comment-Date: Thu, 12 Sep 2024 15:24:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13233: Improve display of instance-level skew in query timeline

2024-08-27 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21593 )

Change subject: IMPALA-13233: Improve display of instance-level skew in query 
timeline
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js
File www/scripts/query_timeline/fragment_diagram.js:

http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@524
PS3, Line 524: var name_flds = pp.profile_name.split(/[()]/);
 : var node_type = name_flds[0].trim();
 : var node_id = name_flds.length > 1 ? 
name_flds[1].split(/[=]/)[1] : 0;
> Same here, I am not sure, why it was kept that way, need to confirm.
Are their any implications of using string vs integer in this code?


http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@548
PS3, Line 548: node_metadata.join_build_id
> I will confirm with @kdeschle, I also could not find it.
join_build_id should be an mt_dop case that uses join build but I don't 
remember the details of the json content.



--
To view, visit http://gerrit.cloudera.org:8080/21593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2
Gerrit-Change-Number: 21593
Gerrit-PatchSet: 3
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Comment-Date: Tue, 27 Aug 2024 12:07:30 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10995: Allow remote submit time to be in the past

2024-08-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has abandoned this change. ( 
http://gerrit.cloudera.org:8080/17985 )

Change subject: IMPALA-10995: Allow remote submit time to be in the past
..


Abandoned
--
To view, visit http://gerrit.cloudera.org:8080/17985
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: If452d5ea48030c134ae4654e58b157f4fd24df04
Gerrit-Change-Number: 17985
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Wenzhe Zhou 


[Impala-ASF-CR] IMPALA-12576: Display warning on the query timeline page for the experimental profile

2024-08-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21596 )

Change subject: IMPALA-12576: Display warning on the query timeline page for 
the experimental profile
..


Patch Set 2:

Is it possible to add an automated test? Not having automated tests for WebUI 
features can cause these tickets to get flagged as missing test coverage.


--
To view, visit http://gerrit.cloudera.org:8080/21596
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6ad62be3572b7a126ce10dc4a11fcc9b8a58013d
Gerrit-Change-Number: 21596
Gerrit-PatchSet: 2
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Thu, 15 Aug 2024 15:25:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13233: Represent phases of each instance separately on the query timeline

2024-08-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21593 )

Change subject: IMPALA-13233: Represent phases of each instance separately on 
the query timeline
..


Patch Set 3:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@7
PS3, Line 7: IMPALA-13233: Represent phases of each instance separately on the 
query timeline
Improve display of instance-level skew in query timeline


http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@16
PS3, Line 16: being shown parallelly
aggregated and displayed as a histogram


http://gerrit.cloudera.org:8080/#/c/21593/3//COMMIT_MSG@21
PS3, Line 21: are bucketed into 4 groups. Each group's timestamps are averaged 
and
The histogram should display the min/max values somehow sicne averages can be 
very misleading with outliers.


http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js
File www/scripts/query_timeline/fragment_diagram.js:

http://gerrit.cloudera.org:8080/#/c/21593/3/www/scripts/query_timeline/fragment_diagram.js@263
PS3, Line 263:   if (dx > ignore_px) {
With the bucketed display, there is no good reason to skip displaying small 
skews. That was done previously because they could not be rendered.



--
To view, visit http://gerrit.cloudera.org:8080/21593
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ied8a5966e9e4111bf7aa25aee11d23881daad7d2
Gerrit-Change-Number: 21593
Gerrit-PatchSet: 3
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Comment-Date: Thu, 15 Aug 2024 15:21:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12906: Incorporate scan range information into the tuple cache key

2024-08-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21541 )

Change subject: IMPALA-12906: Incorporate scan range information into the tuple 
cache key
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/exec/tuple-cache-node.h@21
PS4, Line 21: #include 
Why include vector in this header?


http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@600
PS4, Line 600:   bool operator<(InstanceAssignment& other) const { return 
weight > other.weight; }
Make this deterministic for equal weight case by comparing instance_idx.


http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@816
PS4, Line 816: return ScanRangeWeight(a) > ScanRangeWeight(b);
Make this deterministic for equal weight


http://gerrit.cloudera.org:8080/#/c/21541/4/be/src/scheduling/scheduler.cc@821
PS4, Line 821: pop_heap(instance_heap.begin(), instance_heap.end());
Kind of hard to follow why pop_heap/push_heap is being used here instead of 
just updating and sorting the vector. Also why isn't make_heap needed?


http://gerrit.cloudera.org:8080/#/c/21541/4/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java:

http://gerrit.cloudera.org:8080/#/c/21541/4/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java@128
PS4, Line 128: output.append(detailPrefix + "input scan node ids: " +
If this list will typically be long, it should be split at keyFormatWidth



--
To view, visit http://gerrit.cloudera.org:8080/21541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34
Gerrit-Change-Number: 21541
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 15 Aug 2024 15:09:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits

2024-07-24 Thread Kurt Deschler (Code Review)
Kurt Deschler has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/21598 )

Change subject: IMPALA-13214: Skip wait_until_connected when shell exits
..

IMPALA-13214: Skip wait_until_connected when shell exits

The ImpalaShell class expects to start impala-shell and interact with it
by sending instructions over stdin and reading the results. This
assumption was incorrect when used for impala-shell batch sessions,
where the process exits on its own. If there's a delay in
ImpalaShell.__init__ - between starting the process and polling to see
that it's running - for a batch process, ImpalaShell will fail the
assertion that process_status is None. This can be easily reproduced by
adding a small (0.1s) sleep after starting the new process.

Most batch runs of impala-shell happen through `run_impala_shell_cmd`.
Updated that function to only wait for a successful connection when
stdin input is supplied. Otherwise the command is assumed to be a batch
function and any failures will be detected during `get_result`. Removed
explicit use of `wait_until_connected` as redundant.

Fixed cases in test_config_file that previously ignored WARNING before
the connection string because they did not specify
`wait_until_connected`.

Tested by running shell/test_shell_commandline.py with a 0.1s delay
before ImpalaShell polls.

Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Reviewed-on: http://gerrit.cloudera.org:8080/21598
Tested-by: Impala Public Jenkins 
Reviewed-by: Jason Fehr 
Reviewed-by: Kurt Deschler 
---
M tests/custom_cluster/test_client_ssl.py
M tests/shell/test_shell_commandline.py
M tests/shell/util.py
3 files changed, 30 insertions(+), 25 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Jason Fehr: Looks good to me, but someone else must approve
  Kurt Deschler: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/21598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Gerrit-Change-Number: 21598
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits

2024-07-24 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21598 )

Change subject: IMPALA-13214: Skip wait_until_connected when shell exits
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Gerrit-Change-Number: 21598
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 24 Jul 2024 15:38:30 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits

2024-07-23 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21598 )

Change subject: IMPALA-13214: Skip wait_until_connected when shell exits
..


Patch Set 4: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Gerrit-Change-Number: 21598
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Wed, 24 Jul 2024 00:56:08 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13214: Skip wait until connected when shell exits

2024-07-23 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21598 )

Change subject: IMPALA-13214: Skip wait_until_connected when shell exits
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@10
PS2, Line 10: It
This assumption was incorrect when..


http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@19
PS2, Line 19: Updates that function to only wait for a successful connection 
when
Updates->Updated


http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@21
PS2, Line 21: ,
extra comma


http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@21
PS2, Line 21: Removes
Removes->Removed


http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@24
PS2, Line 24: Fixes
Fixes->Fixed


http://gerrit.cloudera.org:8080/#/c/21598/2//COMMIT_MSG@28
PS2, Line 28: the
the->a



--
To view, visit http://gerrit.cloudera.org:8080/21598
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I24e029b6192a17773760cb44fd7a4f87b71c0aae
Gerrit-Change-Number: 21598
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 23 Jul 2024 20:45:41 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12906: Incorporate scan range information into the tuple cache key

2024-06-27 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21541 )

Change subject: IMPALA-12906: Incorporate scan range information into the tuple 
cache key
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java
File fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java:

http://gerrit.cloudera.org:8080/#/c/21541/1/fe/src/main/java/org/apache/impala/planner/TupleCacheNode.java@125
PS1, Line 125: return output.toString();
Should we print input scan nodes here?



--
To view, visit http://gerrit.cloudera.org:8080/21541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibe298fff0f644ce931a2aa934ebb98f69aab9d34
Gerrit-Change-Number: 21541
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 27 Jun 2024 18:48:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13088: Use RoaringBitmap instead of sorted vector of int64s

2024-06-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21557 )

Change subject: IMPALA-13088: Use RoaringBitmap instead of sorted vector of 
int64s
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc
File be/src/exec/iceberg-delete-builder.cc:

http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc@260
PS1, Line 260: deletes.AddElements(positions);
CRoaring does not handle out-of-memory. We should maybe add some reserve check 
here before calling it's functions.


http://gerrit.cloudera.org:8080/#/c/21557/1/be/src/exec/iceberg-delete-builder.cc@303
PS1, Line 303:   RETURN_IF_ERROR(AddToDeletedRows(prev_file_path, 
pos_buffer));
Should sort pos_buffer here to guarantee performance of bitmap insert.



--
To view, visit http://gerrit.cloudera.org:8080/21557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib769965d094149e99c43e0044914d9e76107
Gerrit-Change-Number: 21557
Gerrit-PatchSet: 1
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Wed, 26 Jun 2024 13:58:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere

2024-06-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21550 )

Change subject: IMPALA-12800: Implement hashCode everywhere
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca
Gerrit-Change-Number: 21550
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 25 Jun 2024 20:18:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere

2024-06-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21550 )

Change subject: IMPALA-12800: Implement hashCode everywhere
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca
Gerrit-Change-Number: 21550
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jason Fehr 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 25 Jun 2024 18:44:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere

2024-06-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21550 )

Change subject: IMPALA-12800: Implement hashCode everywhere
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10
PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That 
caused
> I was looking to understand why hashCode is different for equal objects. Re
Stated another way, any class that overrides equals() must guarantee that 
a.equals(b) => (a.hasbCode() == b.hashCode()). It is desirable but not 
necessary to override hashCode() if a suitable hashCode() implementation has 
been inherited.



--
To view, visit http://gerrit.cloudera.org:8080/21550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca
Gerrit-Change-Number: 21550
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 25 Jun 2024 17:36:35 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere

2024-06-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21550 )

Change subject: IMPALA-12800: Implement hashCode everywhere
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10
PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That 
caused
> Done
I was looking to understand why hashCode is different for equal objects. 
Reading java docs, the reason is that the default implementation hashes 
instance memory addresses. Since   ColumnDef does not inherit from anything, 
these is no suitable default implementation to fall back to. We should add 
comments in Expr classes or wherever is calling outside of Expr to ensure that 
non-derived classes have suitable hashCode implementations.



--
To view, visit http://gerrit.cloudera.org:8080/21550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca
Gerrit-Change-Number: 21550
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 25 Jun 2024 17:28:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13106: Support larger imported query profile sizes through compression

2024-06-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21463 )

Change subject: IMPALA-13106: Support larger imported query profile sizes 
through compression
..


Patch Set 6: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG@37
PS4, Line 37: Added tests for the compression library methods utilized by
> Automated tests have been written for compression.
Done


http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl@341
PS4, Line 341: uploadProfile();
> Web workers operate in a single thread, the processing is sequential. If we
Ack



--
To view, visit http://gerrit.cloudera.org:8080/21463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c4f31beb9cac89051460bf764b6d50c3933bd03
Gerrit-Change-Number: 21463
Gerrit-PatchSet: 6
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 25 Jun 2024 16:13:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12800: Implement hashCode everywhere

2024-06-24 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21550 )

Change subject: IMPALA-12800: Implement hashCode everywhere
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21550/1//COMMIT_MSG@10
PS1, Line 10: ScalarTypes could be equal, but have different hash codes. That 
caused
Please explain more how this manifested so there is a better record for future 
maintenance. It's not obvious why the base hashCode would yield different 
values.



--
To view, visit http://gerrit.cloudera.org:8080/21550
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I129bff6fd0968be135e23e0b24e273b2ea384eca
Gerrit-Change-Number: 21550
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Smith 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Mon, 24 Jun 2024 21:29:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13106: Support larger imported query profile sizes through compression

2024-06-21 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21463 )

Change subject: IMPALA-13106: Support larger imported query profile sizes 
through compression
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21463/4//COMMIT_MSG@37
PS4, Line 37: Added tests for the compression library methods utilized by
Please mention manual testing. What browsers is this tested on?


http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21463/4/www/queries.tmpl@341
PS4, Line 341: uploadProfile();
Why is being done recursively vs in a loop? What is the behavior if there are 
multiple errors?



--
To view, visit http://gerrit.cloudera.org:8080/21463
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c4f31beb9cac89051460bf764b6d50c3933bd03
Gerrit-Change-Number: 21463
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 21 Jun 2024 12:59:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page

2024-06-18 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21482 )

Change subject: IMPALA-13137: Add additional client fetch metrics columns to 
the queries page
..


Patch Set 8: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21482/8/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21482/8/www/queries.tmpl@158
PS8, Line 158: {{first_fetch}}{{client_fetch_duration}}
We should consolidate these formatting style tags in a subsequent patch.



--
To view, visit http://gerrit.cloudera.org:8080/21482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5
Gerrit-Change-Number: 21482
Gerrit-PatchSet: 8
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 18 Jun 2024 20:03:27 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page

2024-06-17 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21482 )

Change subject: IMPALA-13137: Add additional client fetch metrics columns to 
the queries page
..


Patch Set 6:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/impala-http-handler.cc@463
PS6, Line 463:   document->AddMember("tips_fetch_duration", "The time taken for 
the client to fetch all"
Total time spent returning rows to the client and other client-side processing.


http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/query-state-record.h
File be/src/service/query-state-record.h:

http://gerrit.cloudera.org:8080/#/c/21482/6/be/src/service/query-state-record.h@83
PS6, Line 83:   /// The time taken for the client to fetch all rows.
Total time spent returning rows to the client and other client-side processing.


http://gerrit.cloudera.org:8080/#/c/21482/6/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21482/6/www/queries.tmpl@119
PS6, Line 119:   Fetch Duration
Client Fetch Duration



--
To view, visit http://gerrit.cloudera.org:8080/21482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5
Gerrit-Change-Number: 21482
Gerrit-PatchSet: 6
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 17 Jun 2024 16:13:18 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page

2024-06-14 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21482 )

Change subject: IMPALA-13137: Add additional client fetch metrics columns to 
the queries page
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl
File www/queries.tmpl:

http://gerrit.cloudera.org:8080/#/c/21482/4/www/queries.tmpl@117
PS4, Line 117: First row fetched
 :   
 :   Client fetch 
wait time
> Different name between this table header and query profile is OK. I'd say g
"Fetch Time" would be misleading. This actually the time after the rows have 
been fetched from the server and can include transport and client processing or 
even the client doing nothing.



--
To view, visit http://gerrit.cloudera.org:8080/21482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5
Gerrit-Change-Number: 21482
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 14 Jun 2024 17:14:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13137: Add additional client fetch metrics columns to the queries page

2024-06-11 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21482 )

Change subject: IMPALA-13137: Add additional client fetch metrics columns to 
the queries page
..


Patch Set 2:

Please add tests to tests/query_test/test_observability.py and/or 
tests/webserver/test_web_pages.py. Also incldue example output in the commit 
message as it's not clear what human readable means.


--
To view, visit http://gerrit.cloudera.org:8080/21482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I74a9393a7b38750de0c3f6230b6e5e048048c4b5
Gerrit-Change-Number: 21482
Gerrit-PatchSet: 2
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 11 Jun 2024 19:41:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key

2024-05-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21398 )

Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple 
cache key
..


Patch Set 6: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Gerrit-Change-Number: 21398
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 30 May 2024 18:48:50 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key

2024-05-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21398 )

Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple 
cache key
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Gerrit-Change-Number: 21398
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 30 May 2024 18:48:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13057: Incorporate tuple/slot information into tuple cache key

2024-05-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21398 )

Change subject: IMPALA-13057: Incorporate tuple/slot information into tuple 
cache key
..


Patch Set 4:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java
File fe/src/main/java/org/apache/impala/analysis/Expr.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/analysis/Expr.java@868
PS4, Line 868: return treeToThrift(new ThriftSerializationCtx());
Why pass context vs null? Does this have any implications WRT being able to 
disable as much code as possible when caching is turned off or not applicable?


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java
File fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/common/ThriftSerializationCtx.java@72
PS4, Line 72:   public void registerTuple(TupleId id) {
Would it be cleaner to have this logic in Analyzer instead?


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
File fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java@1945
PS4, Line 1945: 
serialCtx.translateTupleId(entry.getKey().getId()).asInt(),
Consider moving this inside Expr.treesToThrift() and passing in entry.


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@510
PS4, Line 510: msg.node_id = serialCtx.isTupleCache()? 0 : id_.asInt();
Move these assignments to the conditional branch below


http://gerrit.cloudera.org:8080/#/c/21398/4/fe/src/main/java/org/apache/impala/planner/PlanNode.java@528
PS4, Line 528:   
msg.addToRow_tuples(serialCtx.translateTupleId(tid).asInt());
Could local tids be employed early so that translation isn't needed here?



--
To view, visit http://gerrit.cloudera.org:8080/21398
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7f5278e9dbb976cbebdc6a21a6e66bc90ce06c6c
Gerrit-Change-Number: 21398
Gerrit-PatchSet: 4
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 09 May 2024 15:12:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles

2024-05-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21400 )

Change subject: IMPALA-13038: Support profile tab for imported query profiles
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc
Gerrit-Change-Number: 21400
Gerrit-PatchSet: 5
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 09 May 2024 12:31:48 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles

2024-05-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21400 )

Change subject: IMPALA-13038: Support profile tab for imported query profiles
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl
File www/query_profile.tmpl:

http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@62
PS3, Line 62:   result = `${result}${indent}Offset: 
${eventSeq.offset}\n`;
> Check performance of these repeated appends. It may be better to accumulate
Is this approach actually faster? Concatenation is only efficient if the string 
doesn't reserve space. Please check javascript docs and measure performance.


http://gerrit.cloudera.org:8080/#/c/21400/4/www/scripts/util.js
File www/scripts/util.js:

http://gerrit.cloudera.org:8080/#/c/21400/4/www/scripts/util.js@105
PS4, Line 105:   return 
window.location.search.substring(10,window.location.search.indexOf("&"));
Should parse out using the query_id tag. These attributes can appear in any 
order in URLs.



--
To view, visit http://gerrit.cloudera.org:8080/21400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc
Gerrit-Change-Number: 21400
Gerrit-PatchSet: 4
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 08 May 2024 20:02:08 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote

2024-05-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21383 )

Change subject: IMPALA-13040: Add waiting mechanism in UpdateFilterFromRemote
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/scheduling/scheduler.cc@59
PS3, Line 59: DEFINE_bool_hidden(sort_runtime_filter_aggregator_cadidates, 
false,
candidates


http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/service/data-stream-service.cc
File be/src/service/data-stream-service.cc:

http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/service/data-stream-service.cc@143
PS3, Line 143: sleep_duration_ms
> Isn't this too much, especially at the beginning? I would vote a short slee
Maybe even less than 20ms. Especially if these sleeps can stack with complex 
plan.


http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/util/network-util.h
File be/src/util/network-util.h:

http://gerrit.cloudera.org:8080/#/c/21383/3/be/src/util/network-util.h@123
PS3, Line 123:   if (comp == 0) comp = 
lhs.uds_address().compare(rhs.uds_address());
Check has_uds_address since it is optional field.



--
To view, visit http://gerrit.cloudera.org:8080/21383
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I156d1f0c694b91ba34be70bc53ae9bacf924b3b9
Gerrit-Change-Number: 21383
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 08 May 2024 14:45:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-13038: Support profile tab for imported query profiles

2024-05-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21400 )

Change subject: IMPALA-13038: Support profile tab for imported query profiles
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21400/3/be/src/service/impala-http-handler.cc
File be/src/service/impala-http-handler.cc:

http://gerrit.cloudera.org:8080/#/c/21400/3/be/src/service/impala-http-handler.cc@307
PS3, Line 307:   Value query_id_val(PrintId(unique_id).c_str(), 
document->GetAllocator());
Move these after parse_status error check.


http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl
File www/query_profile.tmpl:

http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@62
PS3, Line 62:   result += `${indent}Offset: ${eventSeq.offset}\n`;
Check performance of these repeated appends. It may be better to accumulate the 
sections into separate strings then append at the return.


http://gerrit.cloudera.org:8080/#/c/21400/3/www/query_profile.tmpl@113
PS3, Line 113: query.id = query.id.substring(10, query.id.indexOf("&"));
Is there a tag that can be checked here? Please add a comment of where this 
URL/string comes from and what the format is.



--
To view, visit http://gerrit.cloudera.org:8080/21400
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iddcf2e285abbf42f97bde19014be076ccd6374bc
Gerrit-Change-Number: 21400
Gerrit-PatchSet: 3
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 08 May 2024 14:09:26 +
Gerrit-HasComments: Yes


[native-toolchain-CR] IMPALA-13020: Use 64-bit integer for Thrift max message size on C++

2024-04-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21343 )

Change subject: IMPALA-13020: Use 64-bit integer for Thrift max message size on 
C++
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21343/3/source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch
File 
source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch:

http://gerrit.cloudera.org:8080/#/c/21343/3/source/thrift/thrift-0.16.0-patches/0007-IMPALA-13020-Use-64-bit-integer-for-max-message-size.patch@24
PS3, Line 24:
Are these trailing spaces normal in patch files?



--
To view, visit http://gerrit.cloudera.org:8080/21343
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I94fdd7b07fcc8dca0639839b40a9eff815090835
Gerrit-Change-Number: 21343
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Tue, 30 Apr 2024 11:24:55 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-19 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 15: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 19 Apr 2024 19:15:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-19 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 13:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@102
PS13, Line 102: if (isOutputSegment()) {
Change to if (topnode == null)


http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1206
PS13, Line 1206: int maxScannerThreads = Integer.MAX_VALUE;
maxScannerThreads is set to int max then never updated. Probably can be removed.


http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21277/13/fe/src/main/java/org/apache/impala/service/Frontend.java@2315
PS13, Line 2315: cpuAskUnbounded = 
scaleDownCpuSublinear(cpuCountRootFactor,
We should report the original values somewhere before scaling.



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 13
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Fri, 19 Apr 2024 14:34:38 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-17 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 15: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 15
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 17 Apr 2024 22:18:28 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-04-16 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 16:

Patch set 16 accidentally uploaded. Please do not review.


--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 16
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 16 Apr 2024 23:55:20 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert

2024-04-16 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21310 )

Change subject: IMPALA-12679: Improve test_rows_sent_counters assert
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py
File tests/query_test/test_fetch.py:

http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py@27
PS1, Line 27: class TestFetch(ImpalaTestSuite):
> I'm surprised gerrit-code-review-checks didn't complain about spacing, I th
Done


http://gerrit.cloudera.org:8080/#/c/21310/1/tests/query_test/test_fetch.py@68
PS1, Line 68:   rpc_count = int(re.search("RPCCount: ([5-9])", 
runtime_profile).group(1))
> If rpc_count is not in the range 5-9, re.search will return None and this w
Fixed RE



--
To view, visit http://gerrit.cloudera.org:8080/21310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
Gerrit-Change-Number: 21310
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 16 Apr 2024 22:29:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-04-16 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#16).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M be/src/util/runtime-profile-counters.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
28 files changed, 450 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/16
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 16
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert

2024-04-16 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/21310

to look at the new patch set (#2).

Change subject: IMPALA-12679: Improve test_rows_sent_counters assert
..

IMPALA-12679: Improve test_rows_sent_counters assert

This patch changes the assert for failed test test_rows_sent_counters so
that the actual RPC count is displayed in the assert output. The root
cause of the failure will be addressed once sufficient data is collected
with the new output.

Testing:
  Ran test_rows_sent_counters with modified expected RPC count range to
simulate failure.

Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
---
M tests/query_test/test_fetch.py
1 file changed, 2 insertions(+), 1 deletion(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21310/2
--
To view, visit http://gerrit.cloudera.org:8080/21310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
Gerrit-Change-Number: 21310
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12679: Improve test rows sent counters assert

2024-04-16 Thread Kurt Deschler (Code Review)
Kurt Deschler has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/21310


Change subject: IMPALA-12679: Improve test_rows_sent_counters assert
..

IMPALA-12679: Improve test_rows_sent_counters assert

This patch changes the assert for failed test test_rows_sent_counters so
that the actual RPC count is displayed in the assert output. The root
cause of the failure will be addressed once sufficient data is collected
with the new output.

Testing:
  Ran test_rows_sent_counters with modified expected RPC count range to
simulate failure.

Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
---
M tests/query_test/test_fetch.py
1 file changed, 2 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/10/21310/1
--
To view, visit http://gerrit.cloudera.org:8080/21310
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic6b48cf4039028e749c914ee60b88f04833a0069
Gerrit-Change-Number: 21310
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-16 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 7:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1210
PS7, Line 1210: in at
nit: change to "is with"


http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1210
PS7, Line 1210: is
nit: remove "is" or change to "that is"


http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1211
PS7, Line 1211: scheduler only assign
nit: "the schedule would only assign"


http://gerrit.cloudera.org:8080/#/c/21277/7/be/src/scheduling/scheduler.cc@1213
PS7, Line 1213: // dominant plan subtree ('dominant_instance_count' is 
0).
Comment/terminology is overall hard to follow here. Is this referring to any 
fragment that would include the coordinator? Unclear why rounding up is the 
neccesary action.


http://gerrit.cloudera.org:8080/#/c/21277/7/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/21277/7/common/thrift/Query.thrift@133
PS7, Line 133: Scheduler
nit: The scheduler



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 16 Apr 2024 12:14:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-10 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG@16
PS1, Line 16: MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.
> The current bounding still make sense. It still enforced by to determine pa
The EG selection should only need the unbounded number for all EG. The last EG 
should simply ignore it. We should leave the plan sizing bounding in the 
parallel planner. I'm ok propagating the bounded number if it helps 
observability.



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 10 Apr 2024 19:00:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-10 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java
File fe/src/main/java/org/apache/impala/planner/CoreCount.java:

http://gerrit.cloudera.org:8080/#/c/21257/9/fe/src/main/java/org/apache/impala/planner/CoreCount.java@148
PS9, Line 148:   // If core1 and core2 has equal total(), return one that 
does not have coordinator
Should this be more consistent in removing coordinator core count? Seems it is 
counted above for equal case then not here for core 1 case.



--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 9
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Apr 2024 18:40:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 10: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27
PS6, Line 27: class TupleFileReader;
> If I remove this declaration, compilation of tuple-cache-node.cc fails.
Thanks. I understand why this works now after reading the doc mentioned above



--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 10
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Wed, 10 Apr 2024 02:07:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12988: Calculate an unbounded version of CpuAsk

2024-04-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21277 )

Change subject: IMPALA-12988: Calculate an unbounded version of CpuAsk
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21277/1//COMMIT_MSG@16
PS1, Line 16: MAX_FRAGMENT_INSTANCES_PER_NODE options accordingly.
If these bounds don't make sense to apply with multiple executor groups, then 
we should reconsider their naming or whether these bounds should be applied at 
all by the planner or instead moved to the admission control.


http://gerrit.cloudera.org:8080/#/c/21277/1/common/thrift/Query.thrift
File common/thrift/Query.thrift:

http://gerrit.cloudera.org:8080/#/c/21277/1/common/thrift/Query.thrift@1035
PS1, Line 1035:  no specifi
need to clarify what this means and why we have this concept. Is one core 
sufficient?


http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@80
PS1, Line 80:   private PlanFragment getFragment() {
Can you make this getTopNode() then call this by both createCoreCount() and 
create UnboundedCoreCount()?


http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/planner/Planner.java@590
PS1, Line 590: int coresRequiredUnbounded = boundedCores.total();
Move this assignment to the else branch for clarity. I'm not really clear why 
this is using boundedCores and not an unbounded Number..


http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21277/1/fe/src/main/java/org/apache/impala/service/Frontend.java@313
PS1, Line 313:   protected int cores_required_unbounded_ = -1;
Can this default to 1 instead?



--
To view, visit http://gerrit.cloudera.org:8080/21277
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5441e31088f90761062af35862be4ce09d116923
Gerrit-Change-Number: 21277
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Tue, 09 Apr 2024 22:20:00 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27
PS6, Line 27: class TupleFileReader;
> Ack
Did you try removing this declaration? I meant to imply that the definition 
already had to be included to compile.



--
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Alexey Serbin 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 09 Apr 2024 21:02:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 7: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 7
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 09 Apr 2024 19:51:35 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12980: Translate CpuAsk into admission control slots

2024-04-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21257 )

Change subject: IMPALA-12980: Translate CpuAsk into admission control slots
..


Patch Set 5:

(10 comments)

http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@11
PS5, Line 11: the number of processors and can be overridden with
processors per executor?


http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@16
PS5, Line 16: instances of any fragment on that backend. This is simplistic, 
because
change "is simplistic" to "can lead to underestimation"


http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@44
PS5, Line 44: AvgAdmissionSlotsPerExecutor
It's not clear how this is calculated. Is it the maximum expected parallelism 
from above?


http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@45
PS5, Line 45: think
thinks


http://gerrit.cloudera.org:8080/#/c/21257/5//COMMIT_MSG@47
PS5, Line 47: AvgAdmissionSlotsPerExecutor, depending on what scheduled on that
is scheduled


http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc
File be/src/scheduling/scheduler.cc:

http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc@1184
PS5, Line 1184: for (auto& finstance : 
backend.second.exec_params->instance_params()) {
This is kind of hard to follow. There seems be assumptions that fragment 
indexes are ordered and not unique.


http://gerrit.cloudera.org:8080/#/c/21257/5/be/src/scheduling/scheduler.cc@1193
PS5, Line 1193:   be_max_instances = max(be_max_instances, 
curr_instance_count);
Can this be moved out of the loop?


http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/planner/CostingSegment.java
File fe/src/main/java/org/apache/impala/planner/CostingSegment.java:

http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/planner/CostingSegment.java@84
PS5, Line 84:   Preconditions.checkState(!nodes_.isEmpty());
Can this just get computed once at the top and passed down as a recursion arg?


http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2338
PS5, Line 2338: derived
is derived


http://gerrit.cloudera.org:8080/#/c/21257/5/fe/src/main/java/org/apache/impala/service/Frontend.java@2399
PS5, Line 2399: cores_requirement
may want to assert cores_requirement > 0 and numExecutors > 0 here to detect 
problems



--
To view, visit http://gerrit.cloudera.org:8080/21257
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I338ca96555bfe8d07afce0320b3688a0861663f2
Gerrit-Change-Number: 21257
Gerrit-PatchSet: 5
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Tue, 09 Apr 2024 14:26:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent

2024-04-01 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21090 )

Change subject: IMPALA-12852: Make Kudu service start and stop independent
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/21090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
Gerrit-Change-Number: 21090
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Reviewer: Yifan Zhang 
Gerrit-Comment-Date: Tue, 02 Apr 2024 01:05:12 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12905: Disk-based tuple caching

2024-04-01 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21171 )

Change subject: IMPALA-12905: Disk-based tuple caching
..


Patch Set 6:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h
File be/src/exec/tuple-cache-node.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.h@27
PS6, Line 27: class TupleFileReader;
Are these forward declarations needed? I though the unique_ptr members below 
need the classes declared?


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@61
PS6, Line 61:   SCOPED_TIMER(runtime_profile_->total_time_counter());
runtime_profile() used above.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-cache-node.cc@142
PS6, Line 142:   } else if (writer_->BytesWritten() > 
tuple_cache_mgr->MaxSize()) {
Should check the size and not write if it is over.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h
File be/src/exec/tuple-file-reader.h:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.h@30
PS6, Line 30: class MemTracker;
Is this forward declaration needed?


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc
File be/src/exec/tuple-file-reader.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@77
PS6, Line 77:   reader_.read(reinterpret_cast(&tuple_data_len), 
sizeof(tuple_data_len));
Check status after reads using if(!reader_) like below.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-reader.cc@79
PS6, Line 79:   DCHECK_GE(tuple_data_len, 0);
These may not be initialized if read fails.


http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc
File be/src/exec/tuple-file-writer.cc:

http://gerrit.cloudera.org:8080/#/c/21171/6/be/src/exec/tuple-file-writer.cc@89
PS6, Line 89:   if (!writer_) {
Is it sufficient to just check the stream status here or can a transient error 
leave the write file invalid?



-- 
To view, visit http://gerrit.cloudera.org:8080/21171
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I13a65c4c0559cad3559d5f714a074dd06e9cc9bf
Gerrit-Change-Number: 21171
Gerrit-PatchSet: 6
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Tue, 02 Apr 2024 01:00:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12856: Event processor should ignore processing partition with empty partition values

2024-03-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21143 )

Change subject: IMPALA-12856: Event processor should ignore processing 
partition with empty partition values
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21143/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21143/5//COMMIT_MSG@13
PS5, Line 13: partitions with empty values, EP should ignore such partitions 
instead
Can the HMS bug also return non-empty partitions that are incomplete (missing 
files)?



--
To view, visit http://gerrit.cloudera.org:8080/21143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2469930ccd74948325f1723bd8b2bd6aad02d09
Gerrit-Change-Number: 21143
Gerrit-PatchSet: 5
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 26 Mar 2024 12:50:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12852: Make Kudu service start and stop independent

2024-03-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21090 )

Change subject: IMPALA-12852: Make Kudu service start and stop independent
..


Patch Set 2:

Please rename the new scripts to start-kudu and stop-kudu


--
To view, visit http://gerrit.cloudera.org:8080/21090
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9624aaa61353bb4520e879570e5688d5e3493201
Gerrit-Change-Number: 21090
Gerrit-PatchSet: 2
Gerrit-Owner: Yifan Zhang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Tue, 26 Mar 2024 12:13:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12856: Event processor should ignore processing partition with empty partition values

2024-03-25 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21143 )

Change subject: IMPALA-12856: Event processor should ignore processing 
partition with empty partition values
..


Patch Set 4:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21143/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/21143/4/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@2869
PS4, Line 2869:   continue;
> The possibility of hitting this issue twice continuely should be really sma
I agree with Quanlong here but the check should be in a separate loop above so 
reloads are not wasted if the empty partition is late in the list.



--
To view, visit http://gerrit.cloudera.org:8080/21143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id2469930ccd74948325f1723bd8b2bd6aad02d09
Gerrit-Change-Number: 21143
Gerrit-PatchSet: 4
Gerrit-Owner: Sai Hemanth Gantasala 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Mon, 25 Mar 2024 11:59:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-03-15 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#15).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M be/src/util/runtime-profile-counters.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
28 files changed, 450 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/15
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 15
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-03-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 14:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20850/14/be/src/util/runtime-profile-counters.h
File be/src/util/runtime-profile-counters.h:

http://gerrit.cloudera.org:8080/#/c/20850/14/be/src/util/runtime-profile-counters.h@378
PS14, Line 378: /// Class for maintaining scoped reference counts with 
HighWaterMarkCounter
> line has trailing whitespace
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 14
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Fri, 15 Mar 2024 22:46:17 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-03-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 14:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
 : the rows fetched for the current query
> Just brain dump, I don't think that the patch needs to be changed:
The hive JDBC driver ramps up threads slowly after multiple batches are fetch 
which avoids this case.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
 : the rows fetched for the current query
> Thanks for the info, I agree that 2 is solved by gradual ramping on client
The maximum number of streams is enforced by the driver. The highest useful 
stream count that we have observed so far is around 5. Being CPU is a 
compressible resource, there does not seem to be much of an issue absorbing 
this additional parallelism.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24
PS11, Line 24:  It is expected that the default
 : threshold will never be reached with a single fetch stream.
> >There are no parallel fetches as they are serialized by the fetch lock
Added counter for number of streams as suggested.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@186
PS13, Line 186: Initialize
> This is only done in BufferedPlanRootSink, but not in class BlockingPlanRoo
Added support to BlockingPlanRootSync.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@187
PS13, Line 187: .
> nit: .
Done


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/exec/buffered-plan-root-sink.cc@188
PS13, Line 188:   
RETURN_IF_ERROR(results->InitDelayedMaterialization(*row_desc_,
> If I understand correctly this will create a new RowBatchContext on each fe
It might be possible to attach a cache to the QueryDriver. I agree let's defer 
this.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/runtime/row-batch.h@295
PS13, Line 295:   /// Move 'num_rows' rows from 'src' to 'dest' within the 
batch. Useful for exec
> Should the description be updated too?
Done


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/impala-hs2-server.cc@251
PS13, Line 251: fetch_results->__set_hasMoreRows(!query_handle->eos());
> If I understand correctly at this point the coordinator may have already hi
The query_handle reference keeps the QueryDriver/RunTimeState/MemPool around 
that the RowBachContext is using.


http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/13/be/src/service/query-result-set.cc@477
PS13, Line 477: c
> nit: extra space
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 14
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Fri, 15 Mar 2024 21:12:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-03-15 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#14).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/blocking-plan-root-sink.cc
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M be/src/util/runtime-profile-counters.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
28 files changed, 450 insertions(+), 123 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/14
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 14
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework

2024-03-14 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21035 )

Change subject: IMPALA-12818: Intermediate Result Caching plan node framework
..


Patch Set 5: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69
PS3, Line 69:   // Save the number of rows in case GetNext() is called with a 
non-empty batch,
> Changed this comment to make it clearer. I was trying to say that TupleCach
Done


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499
PS3, Line 499:*/
> There will be multiple patches that revisit this function as we do more mas
Done


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java
File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150
PS3, Line 150:   StringBuilder errorLog = new StringBuilder();
> Changed this to verify the hash trace in the same way. I also changed the c
Done


http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test:

http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63
PS3, Line 63: row_regex:.*\[TPlanNode\(.*\]
> This particular test file is checking what the explain output contains at d
Done



--
To view, visit http://gerrit.cloudera.org:8080/21035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961
Gerrit-Change-Number: 21035
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Thu, 14 Mar 2024 18:58:58 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12818: Intermediate Result Caching plan node framework

2024-03-01 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21035 )

Change subject: IMPALA-12818: Intermediate Result Caching plan node framework
..


Patch Set 3:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/21035/3//COMMIT_MSG@24
PS3, Line 24: eligible location. Eligibility is currently restricted to 
immediately
It's a little awkward that placement is restricted in this commit since that 
likely makes untested codepaths.


http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc
File be/src/exec/tuple-cache-node.cc:

http://gerrit.cloudera.org:8080/#/c/21035/3/be/src/exec/tuple-cache-node.cc@69
PS3, Line 69:   // Note: TupleCacheNode does not support limits, so this does 
not need logic to
Limits can be supported for deterministic results. Maybe change to a TODO.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java
File fe/src/main/java/org/apache/impala/planner/PlanNode.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/main/java/org/apache/impala/planner/PlanNode.java@499
PS3, Line 499:   private void initThrift(TPlanNode msg, TupleCacheInfo 
tupleCacheInfo) {
Maybe cleaner to split this function so the subset can be called for IRC.


http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java
File fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java:

http://gerrit.cloudera.org:8080/#/c/21035/3/fe/src/test/java/org/apache/impala/planner/TupleCacheTest.java@150
PS3, Line 150: assertEquals(intersection.size(), 0);
Consider verifying the hash trace here also.


http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test
File testdata/workloads/functional-query/queries/QueryTest/explain-level1.test:

http://gerrit.cloudera.org:8080/#/c/21035/3/testdata/workloads/functional-query/queries/QueryTest/explain-level1.test@63
PS3, Line 63: row_regex:.*\[TPlanNode\(.*\]
Maybe better to verify the actual cache key and at least the first and last 
trace lines so we can easily now if something fundamental changes.



--
To view, visit http://gerrit.cloudera.org:8080/21035
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia1f36a87dcce6efd5d1e1f0bc04009bf009b1961
Gerrit-Change-Number: 21035
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Yida Wu 
Gerrit-Comment-Date: Fri, 01 Mar 2024 16:55:45 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up

2024-02-28 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21071 )

Change subject: IMPALA-12827: Fix failures in processing AbortTxnEvent due to 
aborted write id is cleaned up
..


Patch Set 5:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842
PS5, Line 842:   LOG.info("EventId: {} EventType: COMMIT_TXN transaction 
id: {}", getEventId(),
> These are very helpful in connecting the COMMIT/ABORT_TXN events with the t
Thanks for the clarification.


http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842
PS5, Line 842:   LOG.info("EventId: {} EventType: COMMIT_TXN transaction 
id: {}", getEventId(),
> Why are these (and other) messages INFO level now? This may create a lot of
Done



--
To view, visit http://gerrit.cloudera.org:8080/21071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93b6f684d6e4b94961d804a0c022029249873681
Gerrit-Change-Number: 21071
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Thu, 29 Feb 2024 00:12:26 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12827: Fix failures in processing AbortTxnEvent due to aborted write id is cleaned up

2024-02-27 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/21071 )

Change subject: IMPALA-12827: Fix failures in processing AbortTxnEvent due to 
aborted write id is cleaned up
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java
File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java:

http://gerrit.cloudera.org:8080/#/c/21071/5/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@842
PS5, Line 842:   LOG.info("EventId: {} EventType: COMMIT_TXN transaction 
id: {}", getEventId(),
Why are these (and other) messages INFO level now? This may create a lot of 
chatter in the logs.



--
To view, visit http://gerrit.cloudera.org:8080/21071
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I93b6f684d6e4b94961d804a0c022029249873681
Gerrit-Change-Number: 21071
Gerrit-PatchSet: 5
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Sai Hemanth Gantasala 
Gerrit-Comment-Date: Tue, 27 Feb 2024 15:35:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender

2024-02-14 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20719 )

Change subject: IMPALA-12433: Share buffers among channels in 
KrpcDataStreamSender
..


Patch Set 15: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3
Gerrit-Change-Number: 20719
Gerrit-PatchSet: 15
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zihao Ye 
Gerrit-Comment-Date: Wed, 14 Feb 2024 16:12:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-02-05 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#13).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
26 files changed, 412 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/13
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 13
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-02-05 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@21
PS11, Line 21: (default 0.2).
> As there is only minimal testing in the patch, it may better to disable thi
This is effectively inactive without setting options in the Hive JDBC driver or 
setting the threshold to 0, and that potential performance regressions are 
minimal (see comments below). I would rather leave this with the threshold set 
as-is so that it is not required to set additional options from JDBC to benefit 
from this enhancement.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@23
PS11, Line 23: Once enabled, delayed materialization will be used for the 
remainder of
 : the rows fetched for the current query
> Just brain dump, I don't think that the patch needs to be changed:
The Hive JDBC driver ramps up threads slowly as row batches are fetched. That 
should avoid most of these cases.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@24
PS11, Line 24:  It is expected that the default
 : threshold will never be reached with a single fetch stream.
> It seems safer to guarantee this by detecting parallel fetches and only ena
There are no parallel fetches as they are serialized by the fetch lock. Using 
the fetch lock acquisition latency is superior to counting how many streams are 
referencing the statement since we can avoid delayed materialization when the 
benefits would be negligible. The current threshold (20%) will typically only 
delay when there are more than 2 streams.
I also measured performance with narrow and wide table and the performance 
difference was very negligible, <5% for degenerate wide table cases and <1% for 
narrow tables.


http://gerrit.cloudera.org:8080/#/c/20850/11//COMMIT_MSG@31
PS11, Line 31: -Manual testing with Multi-stream Hive JDBC driver. This showed 
a 20%
> A follow up Jira would be nice about creating some simple multi thread clie
Possibly once the Hive driver changes are available in a release. I did test 
manually with asan and tsan and the multi-stream driver.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/runtime/row-batch.h@329
PS11, Line 329: ror
> typo
Done


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/impala-hs2-server.cc@254
PS11, Line 254: RETURN_IF_ERROR(result_set->Flush());
  : result_flush_timer.Stop();
  : 
query_handle->AddResultFlushTime(result_flush_timer.ElapsedTime());
> the timer is not stopped in case line 254 returns an error
This is not necessary since we would not want to report flush time in the case 
of an error.


http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/11/be/src/service/query-result-set.cc@56
PS11, Line 56: class RowBatchContext {
> Can you add some comment about the role of this class?
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 05 Feb 2024 19:23:25 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-02-05 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#12).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
26 files changed, 413 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/12
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 12
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-30 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 11:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc
File be/src/codegen/llvm-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@459
PS10, Line 459:   llvm::Value* llvm_len1 = 
codegen->GetI32Constant(strlen(data1));
  :   llvm::Value* llvm_len2 = codegen->GetI32Constant(strlen(data
> Nit: The shared_ptr should be freed automatically at the end of this functi
Done


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@525
PS10, Line 525:   // state.
> Nit: The shared_ptr will get reset at the end of the function, so the "code
Done


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen-test.cc@553
PS10, Line 553:   llvm::Function* complete_fn = 
complete_prototype.GeneratePrototype(&builder, nullpt
> Same thing as above.
Done


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen.h
File be/src/codegen/llvm-codegen.h:

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/codegen/llvm-codegen.h@666
PS10, Line 666:   MemTracker* parent_mem_tracker, const std::string& file,
  :   const std::string& id, std::unique_ptr* 
codegen);
  :
> Nit: I think this might be the last use of boost::scoped_ptr in this file.
Changed to unique_ptr. shared_ptr uses atomic primitives and is more expensive 
at runtime.


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/exprs/expr-codegen-test.cc
File be/src/exprs/expr-codegen-test.cc:

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/exprs/expr-codegen-test.cc@360
PS10, Line 360: }
> Nit: Same comment as elsewhere, this reset() call is not strictly necessary
Done


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.h
File be/src/runtime/row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.h@327
PS10, Line 327:   /// Deep copy num_rows from this row batch into dst stating 
at start_idx, using memory
  :   /// allocated from dst's tuple_data_pool_.
  :   /// This variant supports copy of a subset of the rows and 
non-empty dst ror batch
  :   /// TODO: the current implementation of DeepCopyRows can 
produce an oversized
  :   /// row batch if there are duplicate tuples in this row ba
> Two thoughts:
Done


http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20850/10/be/src/runtime/row-batch.cc@484
PS10, Line 484: void RowBatch::DeepCopyRows(RowBatch* dst, int start_idx, int 
num_row
> One other DCHECK we can do is to assert that the source and destination are
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Tue, 30 Jan 2024 19:56:05 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-30 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Csaba Ringhofer, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#11).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exec/exchange-node.cc
M be/src/exec/sort-node.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
26 files changed, 411 insertions(+), 120 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/11
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 11
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#10).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/benchmarks/hash-benchmark.cc
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
24 files changed, 399 insertions(+), 98 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/10
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 10
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 7:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc
File be/src/codegen/llvm-codegen.cc:

http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/codegen/llvm-codegen.cc@263
PS7, Line 263:   return status;
> Should this also call codegen->reset()?
Done


http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/coordinator.cc@1050
PS7, Line 1050: coord_instance_->GetCodegenRef(results->GetCodegenPtr());
> Oh, I think I understand this now. It's assigning the codegen object held b
Changed as suggested.


http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h
File be/src/runtime/fragment-state.h:

http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/runtime/fragment-state.h@58
PS7, Line 58:   void GetCodegenRef(std::shared_ptr& codegen);
> nit: this makes more sense to me as SetCodegenRef. And most idiomatic versi
Changed to SetCodegenRef. I want to preserve the existing references.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h@53
PS1, Line 53:   TImpalaQueryOptions::DELAY_MATERIALIZE_RESULTS_THRESHOLD + 
1); \
> line too long (108 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h
File be/src/service/query-result-set.h:

http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@98
PS7, Line 98:   std::shared_ptr& GetCodegenPtr() { return codegen; 
}
> Safer to return a const&. Don't want a caller to be able to reset it to del
This is set externally and needs to be non-const.


http://gerrit.cloudera.org:8080/#/c/20850/7/be/src/service/query-result-set.h@109
PS7, Line 109:   std::shared_ptr codegen;
> I don't see this set anywhere.
It is set externally as a reference.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@190
PS1, Line 190:   /// Add a row from a TResultRow
> Functions and variables have blank lines around them, and comments describi
Done


http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/3/be/src/service/query-result-set.cc@168
PS3, Line 168:
> This should be moved to the 'private' block below.
Done


http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/5/be/src/service/query-result-set.cc@107
PS5, Line 107:   /// Handle to shared runtime state. This is reentrant
> nit: Trailing slash.
Done



--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 7
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Sat, 27 Jan 2024 01:29:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#9).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
23 files changed, 397 insertions(+), 96 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/9
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 9
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#8).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/codegen/llvm-codegen-cache-test.cc
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/exprs/expr-codegen-test.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
22 files changed, 380 insertions(+), 79 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/8
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 8
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#7).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
20 files changed, 359 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/7
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 7
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-26 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#6).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/codegen/llvm-codegen.cc
M be/src/codegen/llvm-codegen.h
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/coordinator.cc
M be/src/runtime/fragment-instance-state.cc
M be/src/runtime/fragment-instance-state.h
M be/src/runtime/fragment-state.cc
M be/src/runtime/fragment-state.h
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
20 files changed, 358 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/6
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 6
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-10 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#5).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
13 files changed, 324 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/5
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 5
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-10 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#4).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
13 files changed, 324 insertions(+), 40 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/4
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 4
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12688: Support JSON profile imports for visualizing query timeline in webUI

2024-01-10 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20867 )

Change subject: IMPALA-12688: Support JSON profile imports for visualizing 
query timeline in webUI
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20867/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20867/1//COMMIT_MSG@9
PS1, Line 9: A button has been added to import JSON query profiles for 
visualizing
   : them on the query timeline page.
> It's weird to add import JSON button in query timeline page since the page
Let's see if we can add a new /import link at the top that open a screen for 
import and only offers supported tabs.



--
To view, visit http://gerrit.cloudera.org:8080/20867
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife6eb59bf2030fd19fc92aaf134eb51c609e04d0
Gerrit-Change-Number: 20867
Gerrit-PatchSet: 1
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 10 Jan 2024 14:59:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-10 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#3).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
13 files changed, 298 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/3
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 3
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20850 )

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..


Patch Set 1:

(23 comments)

http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG@13
PS1, Line 13: FetchLockWaitTimer and ResultFlushTimer and have been added to 
account
> nit: extra "and" after ResultFlushTimer
Done


http://gerrit.cloudera.org:8080/#/c/20850/1//COMMIT_MSG@15
PS1, Line 15: materializing results outside of the fetch lock. New query option
> nit: A line break would be nice somewhere around here. Big wall of text.
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/exec/buffered-plan-root-sink.cc
File be/src/exec/buffered-plan-root-sink.cc:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/exec/buffered-plan-root-sink.cc@240
PS1, Line 240: num_rows_to_read, output_expr_evals_, state));
> I think this needs a comment. It's not obvious why this uses num_rows_to_re
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@469
PS1, Line 469: void RowBatch::CopyRows(RowBatch* dst, int start_idx, int 
num_rows) {
> nit: can we put this after DeepCopyTo so the ordering is consistent with th
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@471
PS1, Line 471:   DCHECK_LE(num_rows, num_rows_);
> This feels a little redundant with the next line. What we actually care abo
removed.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/runtime/row-batch.cc@472
PS1, Line 472:   DCHECK_LE(start_idx + num_rows, num_rows_);
> Maybe also DCHECK start_idx >= 0, and num_rows > 0?
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc@208
PS1, Line 208:   MonotonicStopWatch fetch_lock_timer;
> nit: No reason this needs to be outside the block where it's used.
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/impala-hs2-server.cc@248
PS1, Line 248: fetch_lock_timer.Start();
> When does this get stopped again? Or used?
removed.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h
File be/src/service/query-options.h:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-options.h@53
PS1, Line 53:   TImpalaQueryOptions::DELAY_MATERIALIZE_RESULTS_THRESHOLD + 
1);   \
> line too long (108 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc
File be/src/service/query-result-set.cc:

http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@61
PS1, Line 61: pendingRowBatch_(),
> nit: you don't really need to default initialize all of these. It happens a
Now the ctors are non-default.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@82
PS1, Line 82: DCHECK(!pendingRowBatch_);
> nit: could some of this happen in the constructor? You're asserting things
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@90
PS1, Line 90: mem_tracker_.reset(new MemTracker((impala::IntGauge*)nullptr, 
-1, "Result row batch", state_->query_mem_tracker()));
> line too long (120 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@91
PS1, Line 91: expr_mem_tracker_.reset(new 
MemTracker((impala::IntGauge*)nullptr, -1, "Result expr", mem_tracker_.get()));
> line too long (111 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@189
PS1, Line 189:   virtual size_t size() override { return row_batch_context_ ? 
row_batch_context_->size() : num_rows_; }
> line too long (104 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@190
PS1, Line 190:   virtual Status InitDelayedMaterialization(const RowDescriptor& 
row_desc,
> Please try to match style (spacing, functions descriptions).
Please be more specific what you want changed.


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@193
PS1, Line 193:   virtual bool DelayedMaterializationEnabled() const override { 
return delay_materialization_; }
> line too long (96 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@198
PS1, Line 198:   RETURN_IF_ERROR(AddRows(row_batch_context_->GetExprEvals(),
> Re-using AddRows here makes the AddRows conditional feel awkward. How about
Done


http://gerrit.cloudera.org:8080/#/c/20850/1/be/src/service/query-result-set.cc@280
PS1, Line 280: metadata, rowset, stringify_map_keys, 
expected_resul

[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-09 Thread Kurt Deschler (Code Review)
Hello Michael Smith, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20850

to look at the new patch set (#2).

Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams.

New metrics FetchLockWaitTimer and ResultFlushTimer have been added to
account for synchrounous time waiting for the fetch lock and cumulative
time materializing results outside of the fetch lock.

New query option delay_materialize_results_threshold has been added to
automatically enable delayed result materialization when the fetch lock
wait time exceeds the specified fraction of the materialization time
(default 0.2).

Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
13 files changed, 297 insertions(+), 38 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/2
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 2
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Smith 


[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender

2024-01-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20719 )

Change subject: IMPALA-12433: Share buffers among channels in 
KrpcDataStreamSender
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc@271
PS11, Line 271: bool* is_compressed, int64_t size, TrackedString* 
compression_scratch) {
> Change to TrackedString& or TrackedString*const here and above
NM. I missed that this can be null.



--
To view, visit http://gerrit.cloudera.org:8080/20719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3
Gerrit-Change-Number: 20719
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Mon, 08 Jan 2024 20:07:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12533: Support row materialization outside of fetch lock

2024-01-03 Thread Kurt Deschler (Code Review)
Kurt Deschler has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20850


Change subject: IMPALA-12533: Support row materialization outside of fetch lock
..

IMPALA-12533: Support row materialization outside of fetch lock

This patch supports an alternative result materialization path where
results are copied to a temporary row batch then materialized into thrift
outside of the fetch lock. This can improve fetch throughput when the
client is fetching concurrently from multiple result streams. New metrics
FetchLockWaitTimer and ResultFlushTimer and have been added to account
for synchrounous time waiting for the fetch lock and cumulative time
materializing results outside of the fetch lock. New query option
delay_materialize_results_threshold has been added to automatically
enable delayed result materialization when the fetch lock wait time
exceeds the specified fraction of the materialization time (default 0.2).
Once enabled, delayed materialization will be used for the remainder of
the rows fetched for the current query. It is expected that the default
threshold will never be reached with a single fetch stream. This
optimization is disabled when query result caching is active since the
cache stores materialized rows and is populated inside of the fetch lock.

Testing:
-Tests added to tests/query_test/test_observability.py
-Manual testing with Multi-stream Hive JDBC driver. This showed a 20%
 improvement in fetch time with a wide table and 5 streams. On the
 mini-cluster, performance is now limited by root sink exchange
 throughput. Production clusters should be able to achieve higher gains.

Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
---
M be/src/exec/buffered-plan-root-sink.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M be/src/service/client-request-state.cc
M be/src/service/client-request-state.h
M be/src/service/impala-hs2-server.cc
M be/src/service/query-options.cc
M be/src/service/query-options.h
M be/src/service/query-result-set.cc
M be/src/service/query-result-set.h
M common/thrift/ImpalaService.thrift
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
13 files changed, 294 insertions(+), 37 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/50/20850/1
--
To view, visit http://gerrit.cloudera.org:8080/20850
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: If9512aa6022dbcf81e7eb5f559853b89fe80bd23
Gerrit-Change-Number: 20850
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12433: Share buffers among channels in KrpcDataStreamSender

2023-12-15 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20719 )

Change subject: IMPALA-12433: Share buffers among channels in 
KrpcDataStreamSender
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20719/11/be/src/runtime/row-batch.cc@271
PS11, Line 271: bool* is_compressed, int64_t size, TrackedString* 
compression_scratch) {
Change to TrackedString& or TrackedString*const here and above



--
To view, visit http://gerrit.cloudera.org:8080/20719
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I64854a350a9dae8bf3af11c871882ea4750e60b3
Gerrit-Change-Number: 20719
Gerrit-PatchSet: 11
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Fri, 15 Dec 2023 18:36:47 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Kurt Deschler (Code Review)
Hello Andrew Sherman, Riza Suminto, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20776

to look at the new patch set (#3).

Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..

IMPALA-12614: Use atomics for 64-bit host metrics

This patch changes 64-bit host metrics to use atomics to avoid potential
partial load/store data races. There are no functional changes. This
issue was exposed by TSAN tests when IMPALA-12385 enabled periodic
metrics by default.

Testing: TSAN tests cover this case.
Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
---
M be/src/runtime/query-state.cc
M be/src/util/system-state-info-test.cc
M be/src/util/system-state-info.cc
M be/src/util/system-state-info.h
4 files changed, 18 insertions(+), 17 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/3
--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 3
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Riza Suminto 


[Impala-ASF-CR] IMPALA-12614: Use atomics for 64-bit host metrics

2023-12-12 Thread Kurt Deschler (Code Review)
Kurt Deschler has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/20776


Change subject: IMPALA-12614: Use atomics for 64-bit host metrics
..

IMPALA-12614: Use atomics for 64-bit host metrics

This patch changes 64-bit host metrics to use atomics to avoid potential
partial load/store data races. There are no functional changes. This
issue was exposed by TSAN tests when IMPALA-12385 enabled periodic
metrics by default.

Testing: TSAN tests cover this case.
Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
---
M be/src/runtime/query-state.cc
M be/src/util/system-state-info.cc
M be/src/util/system-state-info.h
3 files changed, 13 insertions(+), 12 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/76/20776/1
--
To view, visit http://gerrit.cloudera.org:8080/20776
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I88a15d3ccfeab29506427b3bfcaeebf3a2831176
Gerrit-Change-Number: 20776
Gerrit-PatchSet: 1
Gerrit-Owner: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12385: Enable Periodic metrics by default

2023-12-05 Thread Kurt Deschler (Code Review)
Hello Riza Suminto, David Rorke, Surya Hebbar, Csaba Ringhofer, Michael Smith, 
Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20377

to look at the new patch set (#10).

Change subject: IMPALA-12385: Enable Periodic metrics by default
..

IMPALA-12385: Enable Periodic metrics by default

This patch enables periodic metrics in query profiles by default and
changes the metric collectors to be more suitable for mixed workloads.

-Changed default of resource_trace_ratio to 1.
-Changed profile metrics to use sampling counters which can automatically
 resize for long queries.
-Reduced profile metric samping interval to 50ms to support short-running
 queries.
-Changed fragment metrics to use the same sampling interval as periodic
 metrics.
-Added singleton PeriodicCounterUpdater and thread for updating system
 (KRPC) metrics at a different period than fragment metrics.
-Added new flag periodic_system_counter_update_period_ms for configuring
 system metric update period. Default is 500ms.

Example profile output:
- HostCpuIoWaitPercentage (400.000ms): 1, 0, 2, 3, 4, 6, 5, 2, 1, ...
- HostCpuSysPercentage (400.000ms): 1, 12, 10, 11, 11, 5, 3, 3, 3, ...
- HostCpuUserPercentage (400.000ms): 8, 46, 39, 39, 39, 29, 22, 23, ...

Testing:
-Updated runtime-profile-test and test_observability.py for new defaults
-Manual inspection of query profile metrics for long-running queries
-Tested for performance regression using 50 concurrent TPCH Q1 and with
 periodic_counter_update_period_ms=1 to verify that the periodic update
 loop does not affect peformance or become a bottleneck

Change-Id: Ic8e5cbfd4b324081158574ceb8f4b3a062a69fd1
---
M be/src/runtime/exec-env.cc
M be/src/runtime/krpc-data-stream-recvr.cc
M be/src/runtime/query-state.cc
M be/src/util/periodic-counter-updater.cc
M be/src/util/periodic-counter-updater.h
M be/src/util/runtime-profile-counters.h
M be/src/util/runtime-profile-test.cc
M be/src/util/runtime-profile.cc
M be/src/util/runtime-profile.h
M be/src/util/streaming-sampler.h
M common/thrift/Query.thrift
M tests/query_test/test_observability.py
12 files changed, 147 insertions(+), 81 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/77/20377/10
--
To view, visit http://gerrit.cloudera.org:8080/20377
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic8e5cbfd4b324081158574ceb8f4b3a062a69fd1
Gerrit-Change-Number: 20377
Gerrit-PatchSet: 10
Gerrit-Owner: Kurt Deschler 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 


[Impala-ASF-CR] IMPALA-12548: Fix behavior of AGG MEM CORRELATION FACTOR

2023-11-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20684 )

Change subject: IMPALA-12548: Fix behavior of AGG_MEM_CORRELATION_FACTOR
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20684
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6f81db32a1818abc257957f6de942b5c9f36211a
Gerrit-Change-Number: 20684
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Aman Sinha 
Gerrit-Reviewer: David Rorke 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Comment-Date: Wed, 08 Nov 2023 23:58:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12373: Small String Optimization for StringValue

2023-11-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20496 )

Change subject: IMPALA-12373: Small String Optimization for StringValue
..


Patch Set 11:

Please consider an extension to SSO which would be to templatize the small 
string size so that wider fixed values like timestamp can also benefit. This 
should probably be filed separately unless you would do something fundamentally 
different here.


--
To view, visit http://gerrit.cloudera.org:8080/20496
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741c3a5f12ab620b6b64b57d4c89b5f8e056efd3
Gerrit-Change-Number: 20496
Gerrit-PatchSet: 11
Gerrit-Owner: Zoltan Borok-Nagy 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Wed, 08 Nov 2023 22:08:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-10-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..


Patch Set 31: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 31
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 09 Oct 2023 21:14:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-10-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..


Patch Set 30: Code-Review+1

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@13
PS30, Line 13:   - It is not distributed, e.g, fregment is unpartitioned. The 
queries
nit: fragment


http://gerrit.cloudera.org:8080/#/c/17842/30//COMMIT_MSG@15
PS30, Line 15:   - Queries which read following data types from external jdbc 
tables
JDBC



--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 30
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:52:13 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-10-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..


Patch Set 27:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@20
PS29, Line 20:   - Don't support external tables with complex types of columns.
Please reword all of these the begin with Don't.


http://gerrit.cloudera.org:8080/#/c/17842/29//COMMIT_MSG@23
PS29, Line 23:   - Don't support Catalog V2 (IMPALA-7131)
Support is limited to the following databases:



--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 09 Oct 2023 20:09:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12473: Fix profile's missing event timestamp exception in query timeline

2023-10-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20522 )

Change subject: IMPALA-12473: Fix profile's missing event timestamp exception 
in query timeline
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl
File www/query_timeline.tmpl:

http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@281
PS5, Line 281: for (var instance = 1; instance < 
fp.child_profiles.length; ++instance) {
> In one of the instances mentioned in IMPALA-12473, intermediate event times
It might better to consolidate the 2 cases so that the labels are only checked 
while missing labels are not accounted for. Could even go last to first to 
optimize for the case where the last events are missing.


http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@300
PS5, Line 300:   continue;
remove the continue and change the next branch to else if.


http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@308
PS5, Line 308:   ++i;
factor ++i out of branches



--
To view, visit http://gerrit.cloudera.org:8080/20522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I025b00d8632a5a1953ecdaaa7d8a4ae224dd2610
Gerrit-Change-Number: 20522
Gerrit-PatchSet: 5
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 09 Oct 2023 16:42:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-10-09 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..


Patch Set 27:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@13
PS27, Line 13:   - It is not distributed.
Unclear what this actually means. Queries executed on the coordinator?


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@14
PS27, Line 14:   - Don't support following data types for column data from 
external
Change wording:
From: Don't support queries... Don't support datatypes XYZ.
To: Queries... are not supported. ... XYZ datatypes are not supported.


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@16
PS27, Line 16:   - Only support binary predicates with operators =, !=, <=, >=,
Will between/in/like planned be supported in the future?


http://gerrit.cloudera.org:8080/#/c/17842/27//COMMIT_MSG@43
PS27, Line 43:
Does this require mincluster restart?



--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 27
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 09 Oct 2023 15:51:09 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-10-06 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 18:

We need to do something different for the zooming and scrolling as it's hard to 
figure out how to control it, especially with a track pad. The grey bar seems 
out of place and is confusing. I think it would be better to have buttons to 
control the  zoom. Panning should work via click and drag or buttons.


--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 18
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Fri, 06 Oct 2023 16:17:49 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12482: Make VLOG level of RpcEventHandler adjustable

2023-10-05 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20537 )

Change subject: IMPALA-12482: Make VLOG level of RpcEventHandler adjustable
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20537/1/be/src/statestore/statestore-subscriber.cc
File be/src/statestore/statestore-subscriber.cc:

http://gerrit.cloudera.org:8080/#/c/20537/1/be/src/statestore/statestore-subscriber.cc@221
PS1, Line 221:   new RpcEventHandler("statestore-subscriber", metrics_, 3));
> nit: add a comment about what the 3 here means, like "Logging statestore su
maybe add a new #define for this class of logging?



--
To view, visit http://gerrit.cloudera.org:8080/20537
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic7658ee0016411a9ace0ca3f2eb535b03d2a7add
Gerrit-Change-Number: 20537
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Michael Smith 
Gerrit-Comment-Date: Thu, 05 Oct 2023 16:47:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12364: Display memory, disk and network metrics in webUI's query timeline

2023-10-05 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20355 )

Change subject: IMPALA-12364: Display memory, disk and network metrics in 
webUI's query timeline
..


Patch Set 17:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@8
PS17, Line 8:
Add:
This patch adds fragment-level metrics to the WebUI query timeline display..

Also mention splitting up the timeline logic into multiple files.


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@9
PS17, Line 9: The fragment's plan nodes are enlarged with an animated 
transition on hovering
Limit line length to 73 characters in commit message.
In vim:
:set textwidth=73
gggqG


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@10
PS17, Line 10: query timeline's fragment
fragment's row in the phase


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@25
PS17, Line 25: Once a fragment's metrics are displayed, they are updated as they
Do you mean they will be updated for a running query?


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@35
PS17, Line 35: RESOURCE_TRACE_RATIO query option provides the utilization 
values to be
Say that RESOURCE_TRACE_RATIO must be set to enable periodic metrics and remove 
the rest of the description. What matters for this change is what metrics are 
displayed.


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@48
PS17, Line 48: are being
 : displayed
are displayed


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@59
PS17, Line 59: To help with manually testing the query timeline web page,
We may want to split the test changes into a separate patch.


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@59
PS17, Line 59: manually
What is preventing automated testing?


http://gerrit.cloudera.org:8080/#/c/20355/17//COMMIT_MSG@76
PS17, Line 76: As the order and name of counters may change within the profile,
Unclear what this is relevant to. Is this part of the testing infrastructure?


http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl
File www/query_timeline.tmpl:

http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@65
PS2, Line 65:
> I can assign a trigger such as clicking on the timeticks footer or any othe
I think it's ok the way it is.


http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@471
PS2, Line 471:
> Yes, it would be very helpful to compare multiple fragment metrics together
First comment was relating to the position of the charts meaning make the 
fragment metrics appear below the periodic metrics.


http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@778
PS2, Line 778:
> The tics/scaling for the fragment metrics are not aligned properly now.
The x-axis for the periodic metrics and the phase diagram have exactly the same 
max value and align perfectly. However, the fragment metrics has a different 
max value and does not align.


http://gerrit.cloudera.org:8080/#/c/20355/2/www/query_timeline.tmpl@1018
PS2, Line 1018:
> As the unit used to measure the fragment metrics is different compared to t
Makes sense. So we can only show 2 scales or can there be other sets of labels?



--
To view, visit http://gerrit.cloudera.org:8080/20355
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifd25e6f0bc9fbd664ec98936daff3f27182dfc7f
Gerrit-Change-Number: 20355
Gerrit-PatchSet: 17
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Thu, 05 Oct 2023 14:07:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12473: Fix profile's missing event timestamp exception in query timeline

2023-10-04 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20522 )

Change subject: IMPALA-12473: Fix profile's missing event timestamp exception 
in query timeline
..


Patch Set 5:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl
File www/query_timeline.tmpl:

http://gerrit.cloudera.org:8080/#/c/20522/5/www/query_timeline.tmpl@281
PS5, Line 281: for (var instance = 1; instance < 
fp.child_profiles.length; ++instance) {
Is is always the last event(s) that are missing? If so, we may not need to 
check labels.



--
To view, visit http://gerrit.cloudera.org:8080/20522
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I025b00d8632a5a1953ecdaaa7d8a4ae224dd2610
Gerrit-Change-Number: 20522
Gerrit-PatchSet: 5
Gerrit-Owner: Surya Hebbar 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Surya Hebbar 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Wed, 04 Oct 2023 14:28:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5741: Initial support for reading tiny RDBMS tables

2023-10-02 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17842 )

Change subject: IMPALA-5741: Initial support for reading tiny RDBMS tables
..


Patch Set 22:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/JdbcDataSource.java@273
PS22, Line 273:   .collect(Collectors.joining(", "));
Is quoting added somewhere? Postgres will convert unquoted capitals to 
lowercase.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@96
PS22, Line 96:   String countQuery = "SELECT COUNT(*) FROM (" + sql + ") 
tmptable";
Are we sure all of the target databases will flatten this view query? Otherwise 
could text replace the generated "select *"


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/GenericJdbcDatabaseAccessor.java@285
PS22, Line 285: props.put("maxActive", "3");
Are these database-specific?


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@86
PS22, Line 86:   LOGGER.warn("hasNext() threw exception", se);
This should re-throw


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JdbcRecordIterator.java@175
PS22, Line 175:   LOGGER.warn("Caught exception while trying to close 
database objects", e);
probably should re-throw


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/dao/JethroDatabaseAccessor.java@39
PS22, Line 39: return "Select * from (" + sql + ") as \"tmp\" limit " + 
limit;
This may not always flatten well. consider text replacement instead.


http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java
File 
java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java:

http://gerrit.cloudera.org:8080/#/c/17842/22/java/ext-data-source/jdbc/src/main/java/org/apache/impala/extdatasource/jdbc/util/QueryConditionUtil.java@54
PS22, Line 54: joiner.add(String.format("%s %s %s", name, op, value));
May need quoting here.


http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql
File testdata/bin/create-data-source-table.sql:

http://gerrit.cloudera.org:8080/#/c/17842/22/testdata/bin/create-data-source-table.sql@56
PS22, Line 56: CREATE TABLE alltypes_jdbc_datasource (
Add tests that require quoting



--
To view, visit http://gerrit.cloudera.org:8080/17842
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8244e978c7717c6f1452f66f1630b6441392e7d2
Gerrit-Change-Number: 17842
Gerrit-PatchSet: 22
Gerrit-Owner: Fucun Chu 
Gerrit-Reviewer: Abhishek Rawat 
Gerrit-Reviewer: Anonymous Coward 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 02 Oct 2023 21:17:50 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment

2023-10-02 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20475 )

Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394
Gerrit-Change-Number: 20475
Gerrit-PatchSet: 3
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 02 Oct 2023 16:13:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment

2023-10-02 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20475 )

Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment
..


Patch Set 1:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1065
PS1, Line 1065: long maxScannerThreads = Long.MAX_VALUE;
> Done. Changed from long to int. Thanks for catching this!
Please follow up below.


http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1108
PS1, Line 1108:   (int) Math.min(costBasedMaxParallelism, 
maxScannerThreads));
Is the cast necessary here?


http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1139
PS1, Line 1139:   maxParallelism_ = (int) Math.min(maxParallelism_, 
maxScannerThreads);
Is the cast necessary here?



--
To view, visit http://gerrit.cloudera.org:8080/20475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394
Gerrit-Change-Number: 20475
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Riza Suminto 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 02 Oct 2023 16:04:10 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12444: Fix minimum parallelism bug in scan fragment

2023-10-02 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20475 )

Change subject: IMPALA-12444: Fix minimum parallelism bug in scan fragment
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java
File fe/src/main/java/org/apache/impala/planner/PlanFragment.java:

http://gerrit.cloudera.org:8080/#/c/20475/1/fe/src/main/java/org/apache/impala/planner/PlanFragment.java@1065
PS1, Line 1065: long maxScannerThreads = Long.MAX_VALUE;
What long? Should make thread counts in everywhere and check/cast computed 
values as early as possible.



--
To view, visit http://gerrit.cloudera.org:8080/20475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I69e5a80146d4ac41de5ef406fc2bdceffe3ec394
Gerrit-Change-Number: 20475
Gerrit-PatchSet: 1
Gerrit-Owner: Riza Suminto 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Reviewer: Wenzhe Zhou 
Gerrit-Comment-Date: Mon, 02 Oct 2023 15:43:31 +
Gerrit-HasComments: Yes


  1   2   3   4   5   >