[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-07 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch os moved to a separate file to make the commonly
  included row-batch.h lighter.

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing of compression as impalad processes of often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
7 files changed, 122 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/2
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-07 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 2:

Build Failed

https://jenkins.impala.io/job/gerrit-code-review-checks/13940/ : Initial code 
review checks failed. See linked job for details on the failure.


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 07 Sep 2023 17:52:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-07 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 2:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc@211
PS2, Line 211:   const NetworkAddressPB& GetAddress() { return address_; };
declare method const


http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc@806
PS2, Line 806: bool is_local = channels_.size() == 1
Can Channels::is_local flag be used here instead of re-computing?


http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc@277
PS2, Line 277:   if (size > 0 && !output_batch->skip_compression_) {
Is there any accounting for compression memory in the resource profile that 
should b changed to reflect this optimization?



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 2
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Thu, 07 Sep 2023 20:04:14 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Hello Kurt Deschler, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20462

to look at the new patch set (#3).

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch os moved to a separate file to make the commonly
  included row-batch.h lighter.

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing of compression as impalad processes of often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
7 files changed, 121 insertions(+), 62 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/3
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc@211
PS2, Line 211:   bool IsLocal() const { return is_local_; }
> declare method const
Property no longer needed after resolving other comment.


http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/krpc-data-stream-sender.cc@806
PS2, Line 806: bool is_local = channels_.size() == 1 && 
channels_[0]->IsLocal();
> Can Channels::is_local flag be used here instead of re-computing?
Done


http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc@277
PS2, Line 277:   if (size > 0 && !output_batch->skip_compression_) {
> Is there any accounting for compression memory in the resource profile that
I would prefer to decrease the reservation in a later patch

I think that it would be possible to work with much fewer buffers, created a 
ticket for this: IMPALA-12433
Doing that optimization would allow lowering mem estimates significantly.



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 11:42:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13953/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 11:50:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 3:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG@16
PS3, Line 16: os
Nit: is.


http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG@26
PS3, Line 26: of
Nit: 'of' seems unnecessary here.


http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h
File be/src/runtime/outbound-row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h@39
PS3, Line 39: =false
The default value is not needed, all call sites have been updated to pass this 
argument.



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 12:04:48 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h
File be/src/runtime/outbound-row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h@39
PS3, Line 39: =false
> The default value is not needed, all call sites have been updated to pass t
I missed the tests and benchmarks, they're not updated there. Still I'd 
consider updating those call sites so it is more explicit that there is a 
second parameter.



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 3
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 12:09:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20462

to look at the new patch set (#4).

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch was moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing of compression as impalad processes of often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_observability.py
8 files changed, 128 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/4
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20462

to look at the new patch set (#5).

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch is moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing compression as impalad processes of often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_observability.py
8 files changed, 128 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/5
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 5
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Hello Kurt Deschler, Daniel Becker, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/20462

to look at the new patch set (#6).

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch is moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing compression as impalad processes often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_observability.py
8 files changed, 128 insertions(+), 65 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/62/20462/6
--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 6:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG@16
PS3, Line 16: is
> Nit: is.
Done


http://gerrit.cloudera.org:8080/#/c/20462/3//COMMIT_MSG@26
PS3, Line 26:
> Nit: 'of' seems unnecessary here.
Done


http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h
File be/src/runtime/outbound-row-batch.h:

http://gerrit.cloudera.org:8080/#/c/20462/3/be/src/runtime/outbound-row-batch.h@39
PS3, Line 39: =false
> I missed the tests and benchmarks, they're not updated there. Still I'd con
I was considering updating the benchmarks as ideally compression could be added 
is a new dimension but wanted to keep this change minimal.


http://gerrit.cloudera.org:8080/#/c/20462/6/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/20462/6/tests/query_test/test_observability.py@504
PS6, Line 504: assert "ExchangeScanRatio: 4.63" in profile
Running all tests revealed that this was broken by the change.



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 12:39:57 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/13954/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 4
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 12:57:47 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 6: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 12:58:14 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-08 Thread Kurt Deschler (Code Review)
Kurt Deschler has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 6: Code-Review+1

(1 comment)

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc
File be/src/runtime/row-batch.cc:

http://gerrit.cloudera.org:8080/#/c/20462/2/be/src/runtime/row-batch.cc@277
PS2, Line 277:   if (size > 0 && !output_batch->skip_compression_) {
> I would prefer to decrease the reservation in a later patch
Ok. Please make to prioritize that. Reducing memory requirements is more 
important than improving performance in many cases.



--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 6
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Fri, 08 Sep 2023 13:32:59 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9695/ 
DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Sun, 10 Sep 2023 16:21:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Sun, 10 Sep 2023 20:40:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/9696/ 
DRY_RUN=false


-- 
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 11 Sep 2023 08:44:51 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-11 Thread Daniel Becker (Code Review)
Daniel Becker has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler 
Gerrit-Comment-Date: Mon, 11 Sep 2023 08:44:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-12430: Skip compression when sending row batches within same process

2023-09-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/20462 )

Change subject: IMPALA-12430: Skip compression when sending row batches within 
same process
..

IMPALA-12430: Skip compression when sending row batches within same process

LZ4 compression doesn't seem useful when the RowBatch is sent to a
fragment instance within the same process instead of a remote host.

After this change KrpcDataStreamSender skips compression for channels
where the destination is in the same process.

Other changes:
- OutboundRowBatch is moved to a separate file to make the commonly
  included row-batch.h lighter.
- TestObservability.test_global_exchange_counters had to be changed
  as skipping compression changed metric ExchangeScanRatio. Also added
  a sleep to the test query because it was flaky on my machine (it
  doesn't seem flaky in jenkins runs, probably my CPU is faster).

See the Jira for more details on tasks that could be skipped in
intra process RowBatch transfer. From these compression is both
the most expensive and easiest to avoid.

Note that it may also make sense to skip compression if the target
is not the in same process but resides on the same host. This setup is
not typical in production environment AFAIK and it would complicate
testing compression as impalad processes often run on the
same host during tests. For these reasons it seems better to only
implement this if both the host and port are the same.

TPCH benchmark shows significant improvement but it uses only 3
impalad processes so 1/3 of exchanges are affected - in bigger
clusters the change should be much smaller.
+--+---+-++++
| Workload | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--+---+-++++
| TPCH(42) | parquet / none / none | 3.59| -4.95% | 2.37   | -2.51% 
|
+--+---+-++++

Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Reviewed-on: http://gerrit.cloudera.org:8080/20462
Tested-by: Impala Public Jenkins 
Reviewed-by: Daniel Becker 
---
M be/src/benchmarks/row-batch-serialize-benchmark.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
A be/src/runtime/outbound-row-batch.h
M be/src/runtime/row-batch-serialize-test.cc
M be/src/runtime/row-batch.cc
M be/src/runtime/row-batch.h
M tests/query_test/test_observability.py
8 files changed, 128 insertions(+), 65 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Daniel Becker: Looks good to me, approved

--
To view, visit http://gerrit.cloudera.org:8080/20462
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I7ea23fd1f0f10f72f3dbd8594f3def3ee190230a
Gerrit-Change-Number: 20462
Gerrit-PatchSet: 8
Gerrit-Owner: Csaba Ringhofer 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Daniel Becker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Kurt Deschler