[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8503: Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.

GitBox Thu, 14 Apr 2022 12:11:34 -0700


Jackie-Jiang commented on code in PR #8503:
URL: https://github.com/apache/pinot/pull/8503#discussion_r850664743



##########
pinot-core/src/main/java/org/apache/pinot/core/common/evaluators/DefaultJsonPathEvaluator.java:
##########
@@ -38,8 +41,13 @@
 
 public final class DefaultJsonPathEvaluator implements JsonPathEvaluator {
 
-  private static final ParseContext JSON_PARSER_CONTEXT = JsonPath.using(
-      new Configuration.ConfigurationBuilder().jsonProvider(new 
JacksonJsonProvider())
+  private static final ObjectMapper OBJECT_MAPPER_WITH_EXACT_BIG_DECIMAL = 
(new ObjectMapper())

Review Comment:
   The change in this file will add overhead (caused of 
`USE_BIG_DECIMAL_FOR_FLOATS`) to the json parsing but not adding much values 
because we don't have a way to propagate the BigDecimal values to the executor 
via the bytes type currently. I'd suggest not introducing this change in this 
PR, but solve it in a separate thread along with the support of passing 
BigDecimal values to the executor.



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunction.java:
##########
@@ -278,6 +282,7 @@ public byte[][] transformToBytesValuesSV(ProjectionBlock 
projectionBlock) {
     Dictionary dictionary = getDictionary();
     if (dictionary != null) {
       int[] dictIds = transformToDictIdsSV(projectionBlock);
+      // todo: In the line below, _intValuesSV is populated but not returned! 
This is most likely a bug.

Review Comment:
   Good catch, this is a bug. Can you directly fix it by changing it to 
`dictionary.readBytesValues(dictIds, length, _byteValuesSV);`



##########
pinot-core/src/main/java/org/apache/pinot/core/data/table/TableResizer.java:
##########
@@ -78,6 +80,7 @@ public TableResizer(DataSchema dataSchema, QueryContext 
queryContext) {
     assert orderByExpressions != null;
     _numOrderByExpressions = orderByExpressions.size();
     _orderByValueExtractors = new 
OrderByValueExtractor[_numOrderByExpressions];
+    _orderByColumnDataTypes = new ColumnDataType[_numOrderByExpressions];

Review Comment:
   Seems not used?



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java:
##########
@@ -337,7 +338,7 @@ private IndexedTable getIndexedTable(DataSchema dataSchema, 
Collection<DataTable
 
     Future[] futures = new Future[numReduceThreadsToUse];
     CountDownLatch countDownLatch = new CountDownLatch(numDataTables);
-    ColumnDataType[] storedColumnDataTypes = 
dataSchema.getStoredColumnDataTypes();
+    ColumnDataType[] columnDataTypes = dataSchema.getStoredColumnDataTypes();

Review Comment:
   ```suggestion
       ColumnDataType[] columnDataTypes = dataSchema.getColumnDataTypes();
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPredIndexStatsCollector.java:
##########
@@ -0,0 +1,144 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.stats;
+
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.Set;
+import org.apache.pinot.segment.spi.creator.StatsCollectorConfig;
+import org.apache.pinot.spi.utils.BigDecimalUtils;
+
+
+/**
+ * Extension of {@link AbstractColumnStatisticsCollector} for BigDecimal 
column type.
+ */
+public class BigDecimalColumnPredIndexStatsCollector extends 
AbstractColumnStatisticsCollector {
+  private final Set<BigDecimal> _values = new 
ObjectOpenHashSet<>(INITIAL_HASH_SET_SIZE);
+
+  private int _minLength = Integer.MAX_VALUE;
+  private int _maxLength = 0;
+  private int _maxRowLength = 0;
+  private BigDecimal[] _sortedValues;
+  private boolean _sealed = false;
+
+  // todo: remove this class if not needed.
+  public BigDecimalColumnPredIndexStatsCollector(String column, 
StatsCollectorConfig statsCollectorConfig) {
+    super(column, statsCollectorConfig);
+  }
+
+  @Override
+  public void collect(Object entry) {
+    if (entry instanceof Object[]) {
+      Object[] values = (Object[]) entry;
+      int rowLength = 0;
+      for (Object obj : values) {
+        BigDecimal value = (BigDecimal) obj;
+        _values.add(value);
+        int length = BigDecimalUtils.byteSize(value);
+        _minLength = Math.min(_minLength, length);
+        _maxLength = Math.max(_maxLength, length);
+        rowLength += length;
+      }
+      _maxNumberOfMultiValues = Math.max(_maxNumberOfMultiValues, 
values.length);
+      _maxRowLength = Math.max(_maxRowLength, rowLength);
+      updateTotalNumberOfEntries(values);
+    } else {
+      BigDecimal value;
+      int length;
+      if (entry.getClass() == BigDecimal.class) {

Review Comment:
   We won't hit this branch



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/ScalarTransformFunctionWrapper.java:
##########
@@ -362,6 +364,12 @@ private void getNonLiteralValues(ProjectionBlock 
projectionBlock) {
         case DOUBLE:
           _nonLiteralValues[i] = 
ArrayUtils.toObject(transformFunction.transformToDoubleValuesSV(projectionBlock));
           break;
+        case BIG_DECIMAL:
+          byte[][] byteValues = 
transformFunction.transformToBytesValuesSV(projectionBlock);
+          BigDecimal[] bigDecimalValues = new BigDecimal[byteValues.length];
+          ArrayCopyUtils.copy(byteValues, bigDecimalValues, byteValues.length);

Review Comment:
   Let's not add this `copy` method because `BigDecimal` is not a storage type. 
We may handle it similar to how `BOOLEAN` and `TIMESTAMP` is handled



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java:
##########
@@ -45,30 +46,32 @@ public class LiteralTransformFunction implements 
TransformFunction {
   private final long _longLiteral;
   private final float _floatLiteral;
   private final double _doubleLiteral;
+  private final BigDecimal _bigDecimalLiteral;

Review Comment:
   Here we should introduce `byte[] _bytesLiteral`, and store the serialized 
big decimal



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java:
##########
@@ -84,6 +87,8 @@ static DataType inferLiteralDataType(String literal) {
         return DataType.FLOAT;
       } else if (number instanceof Double) {
         return DataType.DOUBLE;
+      } else if (number instanceof BigDecimal) {

Review Comment:
   `BigInteger` can also be represented as `BigDecimal`



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/BaseTransformFunction.java:
##########
@@ -55,6 +58,7 @@ public abstract class BaseTransformFunction implements 
TransformFunction {
   protected long[] _longValuesSV;
   protected float[] _floatValuesSV;
   protected double[] _doubleValuesSV;
+  protected BigDecimal[] _bigDecimalValuesSV;

Review Comment:
   This should be reverted



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/JsonExtractScalarTransformFunction.java:
##########
@@ -301,7 +313,7 @@ private String[] 
transformTransformedValuesToStringValuesSV(ProjectionBlock proj
     for (int i = 0; i < numDocs; i++) {
       Object result = null;
       try {
-        result = JSON_PARSER_CONTEXT.parse(jsonStrings[i]).read(_jsonPath);
+        result = 
JSON_PARSER_CONTEXT_WITH_EXACT_BIG_DECIMAL.parse(jsonStrings[i]).read(_jsonPath);

Review Comment:
   This should be used in the `transformToBytesValuesSV()` instead of 
`transformToStringValuesSV` because `BigDecimal` is stored as bytes. Again I'd 
suggest supporting it separately along with the additional support in the 
`JsonPathEvaluator`



##########
pinot-core/src/main/java/org/apache/pinot/core/query/reduce/GroupByDataTableReducer.java:
##########
@@ -370,7 +371,11 @@ public void runJob() {
                       values[colId] = dataTable.getString(rowId, colId);
                       break;
                     case BYTES:
-                      values[colId] = dataTable.getBytes(rowId, colId);
+                      if (columnDataTypes[colId] == 
ColumnDataType.BIG_DECIMAL) {

Review Comment:
   I don't think we should convert the values here. The underlying comparator 
will do the conversion



##########
pinot-core/src/main/java/org/apache/pinot/core/query/selection/SelectionOperatorService.java:
##########
@@ -121,7 +123,13 @@ private Comparator<Object[]> 
getTypeCompatibleComparator(List<OrderByExpressionC
         Object v2 = o2[index];
         int result;
         if (isNumber[i]) {
-          result = Double.compare(((Number) v1).doubleValue(), ((Number) 
v2).doubleValue());
+          // BigDecimal is a Number, but underlying stored type is ByteArray.
+          if (v1.getClass().equals(ByteArray.class)) {

Review Comment:
   Check `columnDataTypes[i]` instead of comparing class



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPredIndexStatsCollector.java:
##########
@@ -0,0 +1,144 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.stats;
+
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.Set;
+import org.apache.pinot.segment.spi.creator.StatsCollectorConfig;
+import org.apache.pinot.spi.utils.BigDecimalUtils;
+
+
+/**
+ * Extension of {@link AbstractColumnStatisticsCollector} for BigDecimal 
column type.
+ */
+public class BigDecimalColumnPredIndexStatsCollector extends 
AbstractColumnStatisticsCollector {

Review Comment:
   ```suggestion
   public class BigDecimalColumnPreIndexStatsCollector extends 
AbstractColumnStatisticsCollector {
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/query/distinct/DistinctTable.java:
##########
@@ -86,18 +87,28 @@ public DistinctTable(DataSchema dataSchema, @Nullable 
List<OrderByExpressionCont
       int numOrderByExpressions = orderByExpressions.size();
       int[] orderByExpressionIndices = new int[numOrderByExpressions];
       int[] comparisonFactors = new int[numOrderByExpressions];
+      ColumnDataType[] columnDataTypes = new 
ColumnDataType[columnNames.size()];

Review Comment:
   ```suggestion
         ColumnDataType[] columnDataTypes = new 
ColumnDataType[numOrderByExpressions];
   ```



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java:
##########
@@ -225,6 +230,11 @@ public byte[][] transformToBytesValuesSV(ProjectionBlock 
projectionBlock) {
     if (bytesResult == null || bytesResult.length < numDocs) {
       bytesResult = new byte[numDocs][];
       Arrays.fill(bytesResult, BytesUtils.toBytes(_literal));

Review Comment:
   Here we should fill the pre-computed bytes value



##########
pinot-spi/src/main/java/org/apache/pinot/spi/data/FieldSpec.java:
##########
@@ -49,6 +51,7 @@ public abstract class FieldSpec implements 
Comparable<FieldSpec>, Serializable {
   public static final Long DEFAULT_DIMENSION_NULL_VALUE_OF_LONG = 
Long.MIN_VALUE;
   public static final Float DEFAULT_DIMENSION_NULL_VALUE_OF_FLOAT = 
Float.NEGATIVE_INFINITY;
   public static final Double DEFAULT_DIMENSION_NULL_VALUE_OF_DOUBLE = 
Double.NEGATIVE_INFINITY;
+  public static final byte[] DEFAULT_DIMENSION_NULL_VALUE_OF_BIG_DECIMAL = 
BigDecimalUtils.serialize(BigDecimal.ZERO);

Review Comment:
   `0` is not a proper default for dimension. IMO it might not make sense to 
configure a dimension as big decimal, so probably just not allow it



##########
pinot-core/src/main/java/org/apache/pinot/core/operator/transform/function/LiteralTransformFunction.java:
##########
@@ -45,30 +46,32 @@ public class LiteralTransformFunction implements 
TransformFunction {
   private final long _longLiteral;
   private final float _floatLiteral;
   private final double _doubleLiteral;
+  private final BigDecimal _bigDecimalLiteral;
 
   // literals may be shared but values are intentionally not volatile as 
assignment races are benign
   private int[] _intResult;
   private long[] _longResult;
   private float[] _floatResult;
   private double[] _doubleResult;
+  private BigDecimal[] _bigDecimalResult;
   private String[] _stringResult;
   private byte[][] _bytesResult;
 
   public LiteralTransformFunction(String literal) {
     _literal = literal;
     _dataType = inferLiteralDataType(literal);
-    if (_dataType.isNumeric()) {
-      BigDecimal bigDecimal = new BigDecimal(_literal);
-      _intLiteral = bigDecimal.intValue();
-      _longLiteral = bigDecimal.longValue();
-      _floatLiteral = bigDecimal.floatValue();
-      _doubleLiteral = bigDecimal.doubleValue();
+    if (_dataType == DataType.TIMESTAMP) {
+      _bigDecimalLiteral = 
BigDecimal.valueOf(Timestamp.valueOf(literal).getTime());

Review Comment:
   Why do we set big decimal value for `TIMESTAMP` type?



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/creator/impl/stats/BigDecimalColumnPredIndexStatsCollector.java:
##########
@@ -0,0 +1,144 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.segment.local.segment.creator.impl.stats;
+
+import it.unimi.dsi.fastutil.objects.ObjectOpenHashSet;
+import java.math.BigDecimal;
+import java.util.Arrays;
+import java.util.Set;
+import org.apache.pinot.segment.spi.creator.StatsCollectorConfig;
+import org.apache.pinot.spi.utils.BigDecimalUtils;
+
+
+/**
+ * Extension of {@link AbstractColumnStatisticsCollector} for BigDecimal 
column type.
+ */
+public class BigDecimalColumnPredIndexStatsCollector extends 
AbstractColumnStatisticsCollector {
+  private final Set<BigDecimal> _values = new 
ObjectOpenHashSet<>(INITIAL_HASH_SET_SIZE);
+
+  private int _minLength = Integer.MAX_VALUE;
+  private int _maxLength = 0;
+  private int _maxRowLength = 0;
+  private BigDecimal[] _sortedValues;
+  private boolean _sealed = false;
+
+  // todo: remove this class if not needed.
+  public BigDecimalColumnPredIndexStatsCollector(String column, 
StatsCollectorConfig statsCollectorConfig) {
+    super(column, statsCollectorConfig);
+  }
+
+  @Override
+  public void collect(Object entry) {
+    if (entry instanceof Object[]) {

Review Comment:
   This is for MV, which is not supported for big decimal



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] Jackie-Jiang commented on a diff in pull request #8503: Support single-valued BigDecimal in schema, type conversion, SQL statements and minimum set of transforms.

Reply via email to