Jackie-Jiang commented on code in PR #8927:
URL: https://github.com/apache/pinot/pull/8927#discussion_r922568228
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/docvalsets/ProjectionBlockValSet.java:
##########
@@ -52,6 +58,30 @@ public ProjectionBlockValSet(DataBlockCache dataBlockCache,
String column, DataS
_dataSource = dataSource;
}
+ @Nullable
+ @Override
+ public RoaringBitmap getNullBitmap() {
+ if (!_nullBitmapSet) {
+ NullValueVectorReader nullValueReader = _dataSource.getNullValueVector();
+ ImmutableRoaringBitmap nullBitmap = nullValueReader != null ?
nullValueReader.getNullBitmap() : null;
+ if (nullBitmap != null && nullBitmap.getCardinality() > 0) {
Review Comment:
For better performance, same for other places where cardinality is
calculated just to identify if the bitmap is empty
```suggestion
if (nullBitmap != null && !nullBitmap.isEmpty()) {
```
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -390,28 +443,55 @@ private DataTable getAggregationResultDataTable()
columnNames[i] = aggregationFunction.getColumnName();
columnDataTypes[i] =
aggregationFunction.getIntermediateResultColumnType();
}
+ RoaringBitmap[] nullBitmaps = null;
Review Comment:
Let's revert the changes in this method for now. We may re-introduce them in
the PR to support null aggregation result
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -378,22 +486,30 @@ private int getKeyForValue(ByteArray value) {
private static class IntGroupKeyIterator implements Iterator<GroupKey> {
final Iterator<Int2IntMap.Entry> _iterator;
final GroupKey _groupKey;
+ Integer _groupKeyForNullValue;
- IntGroupKeyIterator(Int2IntOpenHashMap intMap) {
+ IntGroupKeyIterator(Int2IntOpenHashMap intMap, Integer
groupKeyForNullValue) {
_iterator = intMap.int2IntEntrySet().fastIterator();
_groupKey = new GroupKey();
+ _groupKeyForNullValue = groupKeyForNullValue;
}
@Override
public boolean hasNext() {
- return _iterator.hasNext();
+ return _iterator.hasNext() || _groupKeyForNullValue != null;
}
@Override
public GroupKey next() {
- Int2IntMap.Entry entry = _iterator.next();
- _groupKey._groupId = entry.getIntValue();
- _groupKey._keys = new Object[]{entry.getIntKey()};
+ if (_iterator.hasNext()) {
+ Int2IntMap.Entry entry = _iterator.next();
+ _groupKey._groupId = entry.getIntValue();
+ _groupKey._keys = new Object[]{entry.getIntKey()};
+ } else if (_groupKeyForNullValue != null) {
+ _groupKey._groupId = _groupKeyForNullValue;
+ _groupKey._keys = new Object[]{null};
+ _groupKeyForNullValue = null;
+ }
Review Comment:
Consider returning `null` group first to reduce the overhead, also add a
TODO to revisit this because this one adds overhead to regular case without
null handling
```suggestion
if (_groupKeyForNullValue != null) {
_groupKey._groupId = _groupKeyForNullValue;
_groupKey._keys = new Object[]{null};
_groupKeyForNullValue = null;
return _groupKey;
}
Int2IntMap.Entry entry = _iterator.next();
_groupKey._groupId = entry.getIntValue();
_groupKey._keys = new Object[]{entry.getIntKey()};
```
##########
pinot-core/src/main/java/org/apache/pinot/core/util/QueryOptionsUtils.java:
##########
@@ -87,4 +88,12 @@ public static Integer getMinServerGroupTrimSize(Map<String,
String> queryOptions
String minServerGroupTrimSizeString =
queryOptions.get(Request.QueryOptionKey.MIN_SERVER_GROUP_TRIM_SIZE);
return minServerGroupTrimSizeString != null ?
Integer.parseInt(minServerGroupTrimSizeString) : null;
}
+
+ public static boolean isNullHandlingEnabled(Map<String, String>
queryOptions) {
+ boolean nullHandlingEnabled =
Boolean.parseBoolean(queryOptions.get(Request.QueryOptionKey.ENABLE_NULL_HANDLING));
+ if (nullHandlingEnabled) {
+ Preconditions.checkState(DataTableFactory.getDataTableVersion() >=
DataTableFactory.VERSION_4);
Review Comment:
Add some exception message such as: `Null handling cannot be enabled for
data table version smaller than 4`
##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpecUtils.java:
##########
@@ -0,0 +1,45 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.spi.data;
+
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.spi.data.FieldSpec.DataType;
+
+
+public class FieldSpecUtils {
Review Comment:
Let's call it `NullValueUtils` and move it under `org.apache.pinot.spi.utils`
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -72,6 +80,13 @@ public int getGlobalGroupKeyUpperBound() {
@Override
public void generateKeysForBlock(TransformBlock transformBlock, int[]
groupKeys) {
BlockValSet blockValSet =
transformBlock.getBlockValueSet(_groupByExpression);
+ if (_nullHandlingEnabled) {
+ RoaringBitmap nullBitmap = blockValSet.getNullBitmap();
+ if (nullBitmap != null) {
Review Comment:
```suggestion
if (nullBitmap != null && !nullBitmap.isEmpty()) {
```
##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java:
##########
@@ -147,11 +177,26 @@ public PriorityQueue<Object[]> getRows() {
* Reduces a collection of {@link DataTable}s to selection rows for
selection queries with <code>ORDER BY</code>.
* (Broker side)
*/
- public void reduceWithOrdering(Collection<DataTable> dataTables) {
+ public void reduceWithOrdering(Collection<DataTable> dataTables, boolean
nullHandlingEnabled) {
+ RoaringBitmap[] nullBitmaps = null;
for (DataTable dataTable : dataTables) {
+ if (nullHandlingEnabled) {
+ if (nullBitmaps == null) {
+ nullBitmaps = new RoaringBitmap[dataTable.getDataSchema().size()];
+ }
+ for (int colId = 0; colId < nullBitmaps.length; colId++) {
+ nullBitmaps[colId] = dataTable.getNullRowIds(colId);
+ }
+ }
int numRows = dataTable.getNumberOfRows();
for (int rowId = 0; rowId < numRows; rowId++) {
Object[] row =
SelectionOperatorUtils.extractRowFromDataTable(dataTable, rowId);
+ int len = nullBitmaps == null ? 0 : nullBitmaps.length;
+ for (int colId = 0; colId < len; colId++) {
+ if (nullBitmaps[colId] != null &&
nullBitmaps[colId].contains(rowId)) {
+ row[colId] = null;
+ }
+ }
Review Comment:
```suggestion
if (nullHandlingEnabled) {
for (int colId = 0; colId < nullBitmaps.length; colId++) {
if (nullBitmaps[colId] != null &&
nullBitmaps[colId].contains(rowId)) {
row[colId] = null;
}
}
}
```
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/NoDictionarySingleColumnGroupKeyGenerator.java:
##########
@@ -122,6 +137,88 @@ public void generateKeysForBlock(TransformBlock
transformBlock, int[] groupKeys)
}
}
+ public void generateKeysForBlockNullHandlingEnabled(TransformBlock
transformBlock, int[] groupKeys,
+ RoaringBitmap nullBitmap) {
+ assert nullBitmap != null;
+ BlockValSet blockValSet =
transformBlock.getBlockValueSet(_groupByExpression);
+ int numDocs = transformBlock.getNumDocs();
+
+ switch (_storedType) {
+ case INT:
+ int[] intValues = blockValSet.getIntValuesSV();
+ if (nullBitmap.getCardinality() < numDocs) {
+ for (int i = 0; i < numDocs; i++) {
+ groupKeys[i] = nullBitmap.contains(i) ? getKeyForNullValue() :
getKeyForValue(intValues[i]);
+ }
+ } else if (numDocs > 0) {
+ Arrays.fill(groupKeys, getKeyForNullValue());
Review Comment:
Same for other places
```suggestion
Arrays.fill(groupKeys, getKeyForNullValue(), 0, numDocs);
```
##########
pinot-core/src/main/java/org/apache/pinot/core/query/aggregation/groupby/DefaultGroupByExecutor.java:
##########
@@ -81,10 +83,13 @@ public DefaultGroupByExecutor(QueryContext queryContext,
ExpressionContext[] gro
// Initialize group key generator
int numGroupsLimit = queryContext.getNumGroupsLimit();
int maxInitialResultHolderCapacity =
queryContext.getMaxInitialResultHolderCapacity();
- if (hasNoDictionaryGroupByExpression) {
+ if (hasNoDictionaryGroupByExpression || _nullHandlingEnabled) {
if (groupByExpressions.length == 1) {
+ // TODO(nhejazi): support MV and dictionary based. No dictionary
performance is much worse than dictionary
Review Comment:
(minor) Update the TODO a little bit to be more clear: `support MV and
dictionary based when null handling is enabled.`
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/blocks/IntermediateResultsBlock.java:
##########
@@ -80,45 +83,64 @@ public IntermediateResultsBlock() {
/**
* Constructor for selection result.
*/
- public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]>
selectionResult) {
+ public IntermediateResultsBlock(DataSchema dataSchema, Collection<Object[]>
selectionResult,
+ boolean nullHandlingEnabled) {
_dataSchema = dataSchema;
_selectionResult = selectionResult;
+ _nullHandlingEnabled = nullHandlingEnabled;
}
/**
* Constructor for aggregation result.
* <p>For aggregation only, the result is a list of values.
* <p>For aggregation group-by, the result is a list of maps from group keys
to aggregation values.
*/
- public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
List<Object> aggregationResult) {
+ public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
List<Object> aggregationResult,
+ boolean nullHandlingEnabled) {
_aggregationFunctions = aggregationFunctions;
_aggregationResult = aggregationResult;
+ _nullHandlingEnabled = nullHandlingEnabled;
+ }
+
+ /**
+ * Constructor for aggregation result.
+ * <p>For aggregation only, the result is a list of values.
+ * <p>For aggregation group-by, the result is a list of maps from group keys
to aggregation values.
+ */
+ public IntermediateResultsBlock(AggregationFunction[] aggregationFunctions,
List<Object> aggregationResult,
Review Comment:
Do we still need this new constructor? Not sure if we want to set the
distinct table schema as the results block schema
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]