Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17869 )

Change subject: IMPALA-10956: datasketches UDFs: memory leak and merge overhead
......................................................................


Patch Set 3:

(2 comments)

lgtm in general, I will to go through the code again to check all possible leaks

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17869/3//COMMIT_MSG@9
PS3, Line 9: - call destructors of sketch and union objects
Thanks a lot for fixing this!

A few notes here:

- We should have some utilities in Impala that help with memory management of 
complex types in aggregate states. I am thinking about creating an 
std::allocator like class that would use the same allocation as FunctionContext 
through some thread local state. This seems a valid thing to do as it is 
guaranteed that the aggregate state will be used only within a single thread. 
datasketch classes already can get an allocator template, so this would allow 
us to allocate all memory the "proper" way.

- We don't test this area too well - there are no GROUP BYs in 
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/datasketches-hll.test

and without GROUP BY, we'll only create a single sketch per fragment instance 
during aggregation, so we won't leak too much memory. Adding a test with a 
GROUP BY with e.g. 1000 groups would increase the chance of catching such 
issues.


http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

http://gerrit.cloudera.org:8080/#/c/17869/3/be/src/exprs/aggregate-functions-ir.cc@1689
PS3, Line 1689:   agg_state_ptr->second = new 
(ctx->Allocate<datasketches::hll_sketch>())
              :       datasketches::hll_sketch(DS_SKETCH_CONFIG, DS_HLL_TYPE);
DsHllInit always initializes a hll_sketch while we'll only modify it if 
DsHllUpdate is called later.

My idea would be to skip creating a hll_sketch in DsHllInit (e.g. by creating 
an uninitialized state) and then initialize a hll_sketch/hll_union during 
Update/Merge.



--
To view, visit http://gerrit.cloudera.org:8080/17869
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8dd0e6736f4266f74f5f265f58d40a4e4707287f
Gerrit-Change-Number: 17869
Gerrit-PatchSet: 3
Gerrit-Owner: Alexander Saydakov <al...@apache.org>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Fucun Chu <chufu...@hotmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Oct 2021 09:57:23 +0000
Gerrit-HasComments: Yes

Reply via email to