Matthew Jacobs has posted comments on this change. Change subject: IMPALA-4787: Optimize APPX_MEDIAN() memory usage ......................................................................
Patch Set 6: (3 comments) I'm still not convinced the new paths are necessarily, e.g. the case I mentioned previously where you need to merge and there are different sized inputs. http://gerrit.cloudera.org:8080/#/c/6025/6/be/src/exprs/aggregate-functions-ir.cc File be/src/exprs/aggregate-functions-ir.cc: PS6, Line 918: const static int INIT_CAPACITY = 16; : const static int MAX_NUM_SAMPLES = NUM_BUCKETS * NUM_SAMPLES_PER_BUCKET; Please add a comment. It's not clear why all of these are grouped together anymore. The first two are only relevant to histograms. These two are about capacity for anything using ResSampling. MAX_NUM_SAMPLES should probably also be a capacity now, e.g. MAX_CAPACITY. PS6, Line 958: ReservoirSampleState after thinking more about the casing comments, I came to the conclusion that I don't think trying to make this look like a std::vector is even the best interface. I'd prefer if the methods were named non-std-vector-like names, e.g. GetSample(idx) AddSample(s) GetStateSize() PS6, Line 1285: nth_element(src->begin(), mid_point, src->end(), SampleValLess<T>); nice -- To view, visit http://gerrit.cloudera.org:8080/6025 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I99adaad574d4fb0a3cf38c6cbad8b2a23df12968 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbapple-imp...@apache.org> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Taras Bobrovytsky <tbobrovyt...@cloudera.com> Gerrit-HasComments: Yes