Alexander Saydakov has posted comments on this change. ( http://gerrit.cloudera.org:8080/17869 )
Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead ...................................................................... Patch Set 3: (2 comments) http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9 PS3, Line 9: - call destructors of sketch and union objects > Thanks a lot for fixing this! Sorry for the late reply. It did not seem to me that these comments invite a discussion or something might be expected from me. I am replying just in case I misunderstood, and I hope to move this forward. Yes, ideally we should have an allocator, and I even started writing one, but it turned out that the version of Apache DataSketches currently used in Impala is not ready for this yet. There are some changes in the library to support this, but not released yet. It should happen soon. In the meantime this proposed change should make things better than before. I believe that the leaks should not happen in normal circumstances anymore (only in case of thrown exceptions). The next step after adopting this change would be to fix the dependency on Apache Datasketches, so we could easily upgrade to the latest version, and then work on the allocator to take this to the next level. I believe there is a ticket open for the dependency issue. http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689 PS3, Line 1689: agg_state_ptr->second = new (ctx->Allocate<datasketches::hll_sketch>()) : datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE); > DsHllInit always initializes a hll_sketch while we'll only modify it if DsH As I understand, there is always the update phase of the processing before moving on to the merge phase. Even if I am wrong, and it is not so, the overhead of this unnecessary constructor is not that high, and it was there before, so no regression here. On the contrary, some unnecessary complexity was eliminated in this change. Even if not perfect, this should be an improvement. -- To view, visit http://gerrit.cloudera.org:8080/17869 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f Gerrit-Change-Number: 17869 Gerrit-PatchSet: 3 Gerrit-Owner: Alexander Saydakov <al...@apache.org> Gerrit-Reviewer: Alexander Saydakov <al...@apache.org> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Comment-Date: Sat, 04 Dec 2021 05:28:01 +0000 Gerrit-HasComments: Yes