Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage
......................................................................


Patch Set 6:

(3 comments)

I'm still not convinced the new paths are necessarily, e.g. the case I 
mentioned previously where you need to merge and there are different sized 
inputs.

http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc
File be/src/exprs/aggregate-functions-ir.cc:

PS6, Line 918: const static int INIT_CAPACITY = 16;
             : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * 
NUM_SAMPLES_PER_BUCKET;
Please add a comment. It's not clear why all of these are grouped together 
anymore. The first two are only relevant to histograms. These two are about 
capacity for anything using ResSampling. MAX_NUM_SAMPLES should probably also 
be a capacity now, e.g. MAX_CAPACITY.


PS6, Line 958: ReservoirSampleState
after thinking more about the casing comments, I came to the conclusion that I 
don't think trying to make this look like a std::vector is even the best 
interface.

I'd prefer if the methods were named non-std-vector-like names, e.g. 

GetSample(idx)
AddSample(s)
GetStateSize()


PS6, Line 1285:   nth_element(src->begin(), mid_point, src->end(), 
SampleValLess<T>);
nice


-- 
To view, visit http://gerrit.cloudera.org:8080/6025
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to