[GitHub] gianm commented on a change in pull request #6062: Fix a bug in GroupByQueryEngine

GitBox Mon, 30 Jul 2018 08:29:54 -0700

gianm commented on a change in pull request #6062: Fix a bug in 
GroupByQueryEngine
URL: https://github.com/apache/incubator-druid/pull/6062#discussion_r206202261


 ##########
 File path: 
processing/src/main/java/io/druid/query/groupby/GroupByQueryEngine.java
 ##########
 @@ -178,37 +178,33 @@ public int getNumRows()
       return positions;
     }
 
-    private List<ByteBuffer> updateValues(
-        ByteBuffer key,
-        List<DimensionSelector> dims
-    )
+    @Nullable
+    private List<ByteBuffer> updateValues(ByteBuffer key, 
List<DimensionSelector> dims)
     {
       if (dims.size() > 0) {
-        List<ByteBuffer> retVal = null;
-        List<ByteBuffer> unaggregatedBuffers = null;
-
         final DimensionSelector dimSelector = dims.get(0);
         final IndexedInts row = dimSelector.getRow();
         final int rowSize = row.size();
         if (rowSize == 0) {
           ByteBuffer newKey = key.duplicate();
           newKey.putInt(MISSING_VALUE);
-          unaggregatedBuffers = updateValues(newKey, dims.subList(1, 
dims.size()));
+          return updateValues(newKey, dims.subList(1, dims.size()));
         } else {
+          List<ByteBuffer> retVal = null;
           for (int i = 0; i < rowSize; i++) {
             ByteBuffer newKey = key.duplicate();
             int dimValue = row.get(i);
             newKey.putInt(dimValue);
-            unaggregatedBuffers = updateValues(newKey, dims.subList(1, 
dims.size()));
-          }
-        }
-        if (unaggregatedBuffers != null) {
-          if (retVal == null) {
-            retVal = Lists.newArrayList();
+            List<ByteBuffer> unaggregatedBuffers = updateValues(newKey, 
dims.subList(1, dims.size()));
 
 Review comment:
   > @jihoonson or @jon-wei is that correct? If yes, do you plan to remove 
groupBy v1?
   
   It is correct, groupBy v1 is legacy and should be removed in the future. 
There is one case I am aware of where it is 'better' than v2: if you have some 
growable aggregators (like some sketches), v1 will probably use less memory 
than v2. This is because v2 uses off-heap aggregators which allocate all the 
memory they need up-front, but v1 uses on-heap aggregators that can grow 
gradually. I don't have the issue number handy, but someone raised this in the 
past as a reason that they switched back from v2 to v1 (they were doing groupBy 
on a high-cardinality dimension with thetaSketch metrics).
   
   I think if off-heap aggregators could learn how to start small and then 
grow, then v1 will be really outpaced by v2 in all known cases and could be 
removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org

[GitHub] gianm commented on a change in pull request #6062: Fix a bug in GroupByQueryEngine

Reply via email to