[GitHub] [ignite-3] AMashenkov commented on a diff in pull request #2312: [IGNITE-19785] Sql. Rework execution of 2-phase aggregates

via GitHub Thu, 27 Jul 2023 03:08:57 -0700


AMashenkov commented on code in PR #2312:
URL: https://github.com/apache/ignite-3/pull/2312#discussion_r1276059987



##########
modules/sql-engine/src/main/java/org/apache/ignite/internal/sql/engine/util/PlanUtils.java:
##########
@@ -46,4 +62,124 @@ public static boolean 
complexDistinctAgg(List<AggregateCall> aggCalls) {
         }
         return false;
     }
+
+    /**
+     * Creates a row type for REDUCE phase of a two phase aggregate used by 
hash-based map/reduce implementation.
+     *
+     * <p>Given the grouping keys f0, f1, f2 and aggregates (agg1 and agg2) in 
produces the following type:
+     *
+     * <pre>
+     *     field_0 -> f0_type
+     *     field_1 -> f1_type
+     *     field_2 -> f2_type
+     *     field_3 -> agg1_result_type
+     *     field_4 -> agg2_result_type
+     * </pre>
+     */
+    public static RelDataType createSortAggRowType(ImmutableBitSet grpKeys,
+            IgniteTypeFactory typeFactory, RelDataType inputType, 
List<AggregateCall> aggregateCalls) {
+
+        RelDataTypeFactory.Builder builder = typeFactory.builder();
+
+        for (int fieldIdx : grpKeys) {
+            RelDataTypeField fld = inputType.getFieldList().get(fieldIdx);
+            builder.add(fld);
+        }
+
+        addAccumulatorFields(typeFactory, aggregateCalls, builder);
+
+        return builder.build();
+    }
+
+    /**
+     * Creates a row type returned from a MAP phase of a two phase aggregate 
used by hash-based map/reduce implementation.
+     *
+     * <p>Given the input row [f0, f1, f2, f3, f4], grouping sets [f1], [f2, 
f0], [f4], and 2 aggregates (agg1 and agg2)
+     * it produces the following type:
+     *
+     * <pre>
+     *     field_0 -> f0_type
+     *     field_1 -> f1_type
+     *     field_2 -> f2_type
+     *     field_3 -> f4_type
+     *     field_4 -> agg1_result_type
+     *     field_5 -> agg2_result_type
+     *     _group_id -> byte # each grouping set has id assigned to id and 
that id is equal to its position in GROUP BY GROUPING SET clause.
+     * </pre>
+     */
+    public static RelDataType createHashAggRowType(List<ImmutableBitSet> 
groupSets,
+            IgniteTypeFactory typeFactory, RelDataType inputType, 
List<AggregateCall> aggregateCalls) {
+
+        Mapping mapping = computeAggFieldMapping(groupSets, 
AggregateType.REDUCE);
+
+        RelDataTypeFactory.Builder builder = typeFactory.builder();
+
+        for (int i = 0; i < mapping.getTargetCount(); i++) {
+            int source = mapping.getSource(i);
+            RelDataTypeField fld = inputType.getFieldList().get(source);
+            builder.add(fld);
+        }
+
+        addAccumulatorFields(typeFactory, aggregateCalls, builder);
+
+        builder.add("_GROUP_ID", SqlTypeName.TINYINT);
+
+        return builder.build();
+    }
+
+    /**
+     * Field mapping for hash execution nodes. See {@link 
#createHashAggRowType(List, IgniteTypeFactory, RelDataType, List)}.
+     *
+     * <p>For REDUCE phase it produces the following mapping:
+     *
+     * <pre>
+     * group sets:
+     *   [0], [2, 0], [3]
+     * mapping:
+     *   0 -> 0
+     *   2 -> 1
+     *   3 -> 2
+     * </pre>
+     *
+     * <p>Generates identity mapping for other phases.
+     */
+    public static Mapping computeAggFieldMapping(List<ImmutableBitSet> 
groupingSets, AggregateType aggregateType) {
+        BitSet fieldIndices = new BitSet();
+
+        for (ImmutableBitSet groupingSet : groupingSets) {
+            for (int field : groupingSet) {
+                fieldIndices.set(field);
+            }
+        }
+
+        if (aggregateType == AggregateType.REDUCE) {
+            Mapping mapping = Mappings.create(MappingType.INVERSE_SURJECTION, 
fieldIndices.length(), fieldIndices.cardinality());
+            int[] position = new int[1];

Review Comment:
   Does it do the same as `Common.inverseTrimmingMapping`?
   Maybe we can reuse method from common. 
   
   Agree, you implementation is better as it avoids boxing. But it's better to 
use utility method here, and think about avoiding boxing/unboxing in Commons 
later.



##########
modules/sql-engine/src/main/java/org/apache/ignite/internal/sql/engine/util/PlanUtils.java:
##########
@@ -46,4 +62,124 @@ public static boolean 
complexDistinctAgg(List<AggregateCall> aggCalls) {
         }
         return false;
     }
+
+    /**
+     * Creates a row type for REDUCE phase of a two phase aggregate used by 
hash-based map/reduce implementation.
+     *
+     * <p>Given the grouping keys f0, f1, f2 and aggregates (agg1 and agg2) in 
produces the following type:
+     *
+     * <pre>
+     *     field_0 -> f0_type
+     *     field_1 -> f1_type
+     *     field_2 -> f2_type
+     *     field_3 -> agg1_result_type
+     *     field_4 -> agg2_result_type
+     * </pre>
+     */
+    public static RelDataType createSortAggRowType(ImmutableBitSet grpKeys,
+            IgniteTypeFactory typeFactory, RelDataType inputType, 
List<AggregateCall> aggregateCalls) {
+
+        RelDataTypeFactory.Builder builder = typeFactory.builder();
+
+        for (int fieldIdx : grpKeys) {
+            RelDataTypeField fld = inputType.getFieldList().get(fieldIdx);
+            builder.add(fld);
+        }
+
+        addAccumulatorFields(typeFactory, aggregateCalls, builder);
+
+        return builder.build();
+    }
+
+    /**
+     * Creates a row type returned from a MAP phase of a two phase aggregate 
used by hash-based map/reduce implementation.
+     *
+     * <p>Given the input row [f0, f1, f2, f3, f4], grouping sets [f1], [f2, 
f0], [f4], and 2 aggregates (agg1 and agg2)
+     * it produces the following type:
+     *
+     * <pre>
+     *     field_0 -> f0_type
+     *     field_1 -> f1_type
+     *     field_2 -> f2_type
+     *     field_3 -> f4_type
+     *     field_4 -> agg1_result_type
+     *     field_5 -> agg2_result_type
+     *     _group_id -> byte # each grouping set has id assigned to id and 
that id is equal to its position in GROUP BY GROUPING SET clause.
+     * </pre>
+     */
+    public static RelDataType createHashAggRowType(List<ImmutableBitSet> 
groupSets,
+            IgniteTypeFactory typeFactory, RelDataType inputType, 
List<AggregateCall> aggregateCalls) {
+
+        Mapping mapping = computeAggFieldMapping(groupSets, 
AggregateType.REDUCE);
+
+        RelDataTypeFactory.Builder builder = typeFactory.builder();
+
+        for (int i = 0; i < mapping.getTargetCount(); i++) {
+            int source = mapping.getSource(i);
+            RelDataTypeField fld = inputType.getFieldList().get(source);
+            builder.add(fld);
+        }
+
+        addAccumulatorFields(typeFactory, aggregateCalls, builder);
+
+        builder.add("_GROUP_ID", SqlTypeName.TINYINT);
+
+        return builder.build();
+    }
+
+    /**
+     * Field mapping for hash execution nodes. See {@link 
#createHashAggRowType(List, IgniteTypeFactory, RelDataType, List)}.
+     *
+     * <p>For REDUCE phase it produces the following mapping:
+     *
+     * <pre>
+     * group sets:
+     *   [0], [2, 0], [3]
+     * mapping:
+     *   0 -> 0
+     *   2 -> 1
+     *   3 -> 2
+     * </pre>
+     *
+     * <p>Generates identity mapping for other phases.
+     */
+    public static Mapping computeAggFieldMapping(List<ImmutableBitSet> 
groupingSets, AggregateType aggregateType) {
+        BitSet fieldIndices = new BitSet();
+
+        for (ImmutableBitSet groupingSet : groupingSets) {
+            for (int field : groupingSet) {
+                fieldIndices.set(field);
+            }
+        }
+
+        if (aggregateType == AggregateType.REDUCE) {
+            Mapping mapping = Mappings.create(MappingType.INVERSE_SURJECTION, 
fieldIndices.length(), fieldIndices.cardinality());
+            int[] position = new int[1];
+
+            fieldIndices.stream().forEach(b -> {
+                int i = position[0];
+                mapping.set(b, i);
+                position[0] = i + 1;
+            });

Review Comment:
   Does it do the same as `Common.inverseTrimmingMapping`?
   Maybe we can reuse method from common. 
   
   Agree, you implementation is better as it avoids boxing. But it's better to 
use utility method here, and think about avoiding boxing/unboxing in Commons 
later.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: notifications-unsubscr...@ignite.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [ignite-3] AMashenkov commented on a diff in pull request #2312: [IGNITE-19785] Sql. Rework execution of 2-phase aggregates

Reply via email to