[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..

IMPALA-10956: datasketches UDFs: memory leak and merge overhead

- call destructors of sketch and union objects
- avoid overhead of constructing union and getting result from it every time

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Reviewed-on: http://gerrit.cloudera.org:8080/17869
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)

Approvals:
  Impala Public Jenkins: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 6
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 5: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 5
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 17:56:36 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-11 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4: Code-Review+2

You are right, the issue seems unrelated - I was worried because I didn't see 
these broken tests before, they are not know flaky tests, but now I see that 
there were very similar failures in another run:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5018/


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 11:34:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 5
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 11:35:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7701/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 5
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 11:35:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

Sorry about duplicate comments. I am not used to this Gerrit thing. The 
interface here is not very intuitive. And it seems there is no way to remove 
these comments once posted.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:16:07 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(1 comment)
 > Some of the failed tests are related to running out of memory in JVM, and 
 > these are not known flaky tests

As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:08:05 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

> (1 comment)
 >
 > > Some of the failed tests are related to running out of memory in
 > JVM, and these are not known flaky tests

As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(1 comment)

> Some of the failed tests are related to running out of memory in JVM, and 
> these are not known flaky tests
As far as I understand, the failed tests seem to have nothing to do with 
DataSketches UDFs.

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075
PS4, Line 2075:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::update_theta_sketch(
  :   datasketches::update_theta_sketch::builder().build())
> I looked through the patch again looking for potential leaks. bit I didn't
Both failures seem unrelated to this change to me. One is about iceberg, 
another is about some statement expression limit.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:03 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075
PS4, Line 2075:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::update_theta_sketch(
  :   datasketches::update_theta_sketch::builder().build())
> I looked through the patch again looking for potential leaks. bit I didn't
Space is allocated just above, and placement new is called to put an object 
there. uninitilaized_fill_n does the same. I just did not quite like using it 
for one object. I think it is clearer to call placement new directly.


http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2151
PS4, Line 2151: u.update(*dst_sketch_ptr);
> Couldn't we use std::move here? update seems to be optimized for the rvalue
Yes, we could since we discard this object later. I doubt it makes any 
difference in practice. Rvalue case is supported in the API, but I don't think 
it is handled differently in the Theta union. Anyway, this is not incorrect. In 
the worst case it is a missed slight optimization opportunity.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 10 Dec 2021 18:16:29 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075
PS4, Line 2075:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::update_theta_sketch(
  :   datasketches::update_theta_sketch::builder().build())
I looked through the patch again looking for potential leaks. bit I didn't find 
any. Theta seems the most suspicious to me because the   
std::uninitialized_fill_n(sketch_ptr, 1, sketch); used in the original version 
- I don't know what is the reason behind using it.


http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2151
PS4, Line 2151: u.update(*dst_sketch_ptr);
Couldn't we use std::move here? update seems to be optimized for the rvalue 
case.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 10 Dec 2021 11:35:07 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-10 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has removed a vote on this change.

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Removed Code-Review+2 by Impala Public Jenkins 

--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: deleteVote
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4: -Code-Review

Some of the failed tests are related to running out of memory in JVM, and these 
are not known flaky tests:
https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5031/testReport/junit/query_test.test_nested_types/TestNestedTypesNoMtDop/test_parquet_stats_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/

https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5034/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_writing_many_files_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/

Probably there is some leak in the code that failed to notice.


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 10 Dec 2021 07:46:27 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7698/


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Dec 2021 16:10:03 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-09 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4: Code-Review+2

I am unsure about the reason why the GVO failed, restarting it


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Dec 2021 09:55:53 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-09 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7698/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Dec 2021 09:55:31 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7697/


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Thu, 09 Dec 2021 03:42:13 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 08 Dec 2021 21:23:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 4:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7697/ 
DRY_RUN=false


--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 4
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 08 Dec 2021 21:23:00 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3:

(1 comment)

> please write once again if you want the code to be merged as it is or add 
> updates
I would suggest merging as is, and do another round of improvements separately 
once the dependency problem is resolved.

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
> I agree that it is a great improvement as it is.
If the processing can start from merge, then indeed we can do lazy 
initialization, but let's not worry about this right now.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 08 Dec 2021 20:43:19 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-08 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3: Code-Review+2

(2 comments)

Went through the code again, lgtm

gave +2 - please write once again if you want the code to be merged as it is or 
add updates, and then I will start the tests

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9
PS3, Line 9: - call destructors of sketch and union objects
>It did not seem to me that these comments invite a discussion or something 
>might be expected from me.

The only actionable suggestion was adding tests with group by to have much more 
sketches, but it is not critical to do it in this change.

> I believe there is a ticket open for the dependency issue.
The one I know about is IMPALA-10868


http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
> As I understand, there is always the update phase of the processing before
I agree that it is a great improvement as it is.

If there is no group by, we do aggregation in two phases, one using update, 
then we serialize it, and send it to a final node that will merge the 
aggregated results from different nodes.



-- 
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Wed, 08 Dec 2021 20:08:31 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-12-03 Thread Alexander Saydakov (Code Review)
Alexander Saydakov has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9
PS3, Line 9: - call destructors of sketch and union objects
> Thanks a lot for fixing this!
Sorry for the late reply. It did not seem to me that these comments invite a 
discussion or something might be expected from me. I am replying just in case I 
misunderstood, and I hope to move this forward.

Yes, ideally we should have an allocator, and I even started writing one, but 
it turned out that the version of Apache DataSketches currently used in Impala 
is not ready for this yet. There are some changes in the library to support 
this, but not released yet. It should happen soon. In the meantime this 
proposed change should make things better than before. I believe that the leaks 
should not happen in normal circumstances anymore (only in case of thrown 
exceptions).

The next step after adopting this change would be to fix the dependency on 
Apache Datasketches, so we could easily upgrade to the latest version, and then 
work on the allocator to take this to the next level. I believe there is a 
ticket open for the dependency issue.


http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
> DsHllInit always initializes a hll_sketch while we'll only modify it if DsH
As I understand, there is always the update phase of the processing before 
moving on to the merge phase. Even if I am wrong, and it is not so, the 
overhead of this unnecessary constructor is not that high, and it was there 
before, so no regression here. On the contrary, some unnecessary complexity was 
eliminated in this change. Even if not perfect, this should be an improvement.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Sat, 04 Dec 2021 05:28:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-10-29 Thread Csaba Ringhofer (Code Review)
Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..


Patch Set 3:

(2 comments)

lgtm in general, I will to go through the code again to check all possible leaks

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9
PS3, Line 9: - call destructors of sketch and union objects
Thanks a lot for fixing this!

A few notes here:

- We should have some utilities in Impala that help with memory management of 
complex types in aggregate states. I am thinking about creating an 
std::allocator like class that would use the same allocation as FunctionContext 
through some thread local state. This seems a valid thing to do as it is 
guaranteed that the aggregate state will be used only within a single thread. 
datasketch classes already can get an allocator template, so this would allow 
us to allocate all memory the "proper" way.

- We don't test this area too well - there are no GROUP BYs in 
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test

and without GROUP BY, we'll only create a single sketch per fragment instance 
during aggregation, so we won't leak too much memory. Adding a test with a 
GROUP BY with e.g. 1000 groups would increase the chance of catching such 
issues.


http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate())
  :   datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
DsHllInit always initializes a hll_sketch while we'll only modify it if 
DsHllUpdate is called later.

My idea would be to skip creating a hll_sketch in DsHllInit (e.g. by creating 
an uninitialized state) and then initialize a hll_sketch/hll_union during 
Update/Merge.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Fri, 29 Oct 2021 09:57:23 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead

2021-10-07 Thread Alexander Saydakov (Code Review)
Hello Fucun Chu, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17869

to look at the new patch set (#3).

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
..

IMPALA-10956: datasketches UDFs: memory leak and merge overhead

- call destructors of sketch and union objects
- avoid overhead of constructing union and getting result from it every time

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/3
--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-10956 datasketches UDFs: memory leak and merge overhead

2021-10-07 Thread Alexander Saydakov (Code Review)
Hello Fucun Chu, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/17869

to look at the new patch set (#2).

Change subject: IMPALA-10956 datasketches UDFs: memory leak and merge overhead
..

IMPALA-10956 datasketches UDFs: memory leak and merge overhead

- call destructors of sketch and union objects
- avoid overhead of constructing union and getting result from it every time

Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
---
M be/src/exprs/aggregate-functions-ir.cc
1 file changed, 273 insertions(+), 195 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/2
--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 2
Gerrit-Owner: Alexander Saydakov 
Gerrit-Reviewer: Fucun Chu 
Gerrit-Reviewer: Impala Public Jenkins