[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. IMPALA-10956: datasketches UDFs: memory leak and merge overhead - call destructors of sketch and union objects - avoid overhead of constructing union and getting result from it every time Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Reviewed-on: http://gerrit.cloudera.org:8080/17869 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 273 insertions(+), 195 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 6 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 5: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 5 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 17:56:36 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Code-Review+2 You are right, the issue seems unrelated - I was worried because I didn't see these broken tests before, they are not know flaky tests, but now I see that there were very similar failures in another run: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5018/ -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 11:34:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 5 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 11:35:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7701/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 5 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 11:35:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Sorry about duplicate comments. I am not used to this Gerrit thing. The interface here is not very intuitive. And it seems there is no way to remove these comments once posted. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 01:16:07 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: (1 comment) > Some of the failed tests are related to running out of memory in JVM, and > these are not known flaky tests As far as I understand, the failed tests seem to have nothing to do with DataSketches UDFs. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 01:08:05 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: > (1 comment) > > > Some of the failed tests are related to running out of memory in > JVM, and these are not known flaky tests As far as I understand, the failed tests seem to have nothing to do with DataSketches UDFs. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: (1 comment) > Some of the failed tests are related to running out of memory in JVM, and > these are not known flaky tests As far as I understand, the failed tests seem to have nothing to do with DataSketches UDFs. http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075 PS4, Line 2075: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::update_theta_sketch( : datasketches::update_theta_sketch::builder().build()) > I looked through the patch again looking for potential leaks. bit I didn't Both failures seem unrelated to this change to me. One is about iceberg, another is about some statement expression limit. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 11 Dec 2021 01:06:03 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075 PS4, Line 2075: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::update_theta_sketch( : datasketches::update_theta_sketch::builder().build()) > I looked through the patch again looking for potential leaks. bit I didn't Space is allocated just above, and placement new is called to put an object there. uninitilaized_fill_n does the same. I just did not quite like using it for one object. I think it is clearer to call placement new directly. http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2151 PS4, Line 2151: u.update(*dst_sketch_ptr); > Couldn't we use std::move here? update seems to be optimized for the rvalue Yes, we could since we discard this object later. I doubt it makes any difference in practice. Rvalue case is supported in the API, but I don't think it is handled differently in the Theta union. Anyway, this is not incorrect. In the worst case it is a missed slight optimization opportunity. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 10 Dec 2021 18:16:29 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: (2 comments) http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2075 PS4, Line 2075: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::update_theta_sketch( : datasketches::update_theta_sketch::builder().build()) I looked through the patch again looking for potential leaks. bit I didn't find any. Theta seems the most suspicious to me because the std::uninitialized_fill_n(sketch_ptr, 1, sketch); used in the original version - I don't know what is the reason behind using it. http://gerrit.cloudera.org:8080/#/c/17869/4/be/src/exprs/aggregate-functions-ir.cc@2151 PS4, Line 2151: u.update(*dst_sketch_ptr); Couldn't we use std::move here? update seems to be optimized for the rvalue case. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 10 Dec 2021 11:35:07 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has removed a vote on this change. Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Removed Code-Review+2 by Impala Public Jenkins -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: deleteVote Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: -Code-Review Some of the failed tests are related to running out of memory in JVM, and these are not known flaky tests: https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5031/testReport/junit/query_test.test_nested_types/TestNestedTypesNoMtDop/test_parquet_stats_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/ https://jenkins.impala.io/job/ubuntu-16.04-dockerised-tests/5034/testReport/junit/query_test.test_iceberg/TestIcebergTable/test_writing_many_files_protocol__beeswax___exec_optionbatch_size___0___num_nodes___0___disable_codegen_rows_threshold___0___disable_codegen___False___abort_on_error___1___exec_single_node_rows_threshold___0table_format__parquet_none_/ Probably there is some leak in the code that failed to notice. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 10 Dec 2021 07:46:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7698/ -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Dec 2021 16:10:03 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Code-Review+2 I am unsure about the reason why the GVO failed, restarting it -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Dec 2021 09:55:53 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7698/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Dec 2021 09:55:31 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/7697/ -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 09 Dec 2021 03:42:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 08 Dec 2021 21:23:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 4: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/7697/ DRY_RUN=false -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 4 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 08 Dec 2021 21:23:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 3: (1 comment) > please write once again if you want the code to be merged as it is or add > updates I would suggest merging as is, and do another round of improvements separately once the dependency problem is resolved. http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689 PS3, Line 1689: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); > I agree that it is a great improvement as it is. If the processing can start from merge, then indeed we can do lazy initialization, but let's not worry about this right now. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 08 Dec 2021 20:43:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 3: Code-Review+2 (2 comments) Went through the code again, lgtm gave +2 - please write once again if you want the code to be merged as it is or add updates, and then I will start the tests http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9 PS3, Line 9: - call destructors of sketch and union objects >It did not seem to me that these comments invite a discussion or something >might be expected from me. The only actionable suggestion was adding tests with group by to have much more sketches, but it is not critical to do it in this change. > I believe there is a ticket open for the dependency issue. The one I know about is IMPALA-10868 http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689 PS3, Line 1689: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); > As I understand, there is always the update phase of the processing before I agree that it is a great improvement as it is. If there is no group by, we do aggregation in two phases, one using update, then we serialize it, and send it to a final node that will merge the aggregated results from different nodes. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Wed, 08 Dec 2021 20:08:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9 PS3, Line 9: - call destructors of sketch and union objects > Thanks a lot for fixing this! Sorry for the late reply. It did not seem to me that these comments invite a discussion or something might be expected from me. I am replying just in case I misunderstood, and I hope to move this forward. Yes, ideally we should have an allocator, and I even started writing one, but it turned out that the version of Apache DataSketches currently used in Impala is not ready for this yet. There are some changes in the library to support this, but not released yet. It should happen soon. In the meantime this proposed change should make things better than before. I believe that the leaks should not happen in normal circumstances anymore (only in case of thrown exceptions). The next step after adopting this change would be to fix the dependency on Apache Datasketches, so we could easily upgrade to the latest version, and then work on the allocator to take this to the next level. I believe there is a ticket open for the dependency issue. http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689 PS3, Line 1689: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); > DsHllInit always initializes a hll_sketch while we'll only modify it if DsH As I understand, there is always the update phase of the processing before moving on to the merge phase. Even if I am wrong, and it is not so, the overhead of this unnecessary constructor is not that high, and it was there before, so no regression here. On the contrary, some unnecessary complexity was eliminated in this change. Even if not perfect, this should be an improvement. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Sat, 04 Dec 2021 05:28:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 ) Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. Patch Set 3: (2 comments) lgtm in general, I will to go through the code again to check all possible leaks http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9 PS3, Line 9: - call destructors of sketch and union objects Thanks a lot for fixing this! A few notes here: - We should have some utilities in Impala that help with memory management of complex types in aggregate states. I am thinking about creating an std::allocator like class that would use the same allocation as FunctionContext through some thread local state. This seems a valid thing to do as it is guaranteed that the aggregate state will be used only within a single thread. datasketch classes already can get an allocator template, so this would allow us to allocate all memory the "proper" way. - We don't test this area too well - there are no GROUP BYs in https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test and without GROUP BY, we'll only create a single sketch per fragment instance during aggregation, so we won't leak too much memory. Adding a test with a GROUP BY with e.g. 1000 groups would increase the chance of catching such issues. http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689 PS3, Line 1689: agg_state_ptr->second = new (ctx->Allocate()) : datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); DsHllInit always initializes a hll_sketch while we'll only modify it if DsHllUpdate is called later. My idea would be to skip creating a hll_sketch in DsHllInit (e.g. by creating an uninitialized state) and then initialize a hll_sketch/hll_union during Update/Merge. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 29 Oct 2021 09:57:23 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-10956: datasketches UDFs: memory leak and merge overhead
Hello Fucun Chu, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17869 to look at the new patch set (#3). Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead .. IMPALA-10956: datasketches UDFs: memory leak and merge overhead - call destructors of sketch and union objects - avoid overhead of constructing union and getting result from it every time Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 273 insertions(+), 195 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/3 -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-10956 datasketches UDFs: memory leak and merge overhead
Hello Fucun Chu, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/17869 to look at the new patch set (#2). Change subject: IMPALA-10956 datasketches UDFs: memory leak and merge overhead .. IMPALA-10956 datasketches UDFs: memory leak and merge overhead - call destructors of sketch and union objects - avoid overhead of constructing union and getting result from it every time Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f --- M be/src/exprs/aggregate-functions-ir.cc 1 file changed, 273 insertions(+), 195 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/69/17869/2 -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 2 Gerrit-Owner: Alexander Saydakov Gerrit-Reviewer: Fucun Chu Gerrit-Reviewer: Impala Public Jenkins