Jackie-Jiang commented on code in PR #10186:
URL: https://github.com/apache/pinot/pull/10186#discussion_r1089382456


##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseRealtimeClusterIntegrationTest.java:
##########
@@ -92,9 +92,11 @@ protected void overrideServerConf(PinotConfiguration 
configuration) {
   protected List<String> getNoDictionaryColumns() {
     // Randomly set time column as no dictionary column.
     if (new Random().nextInt(2) == 0) {

Review Comment:
   (nit, not introduced in this PR)
   ```suggestion
       if (new Random().nextBoolean()) {
   ```



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseRealtimeClusterIntegrationTest.java:
##########
@@ -92,9 +92,11 @@ protected void overrideServerConf(PinotConfiguration 
configuration) {
   protected List<String> getNoDictionaryColumns() {
     // Randomly set time column as no dictionary column.
     if (new Random().nextInt(2) == 0) {
-      return Arrays.asList("ActualElapsedTime", "ArrDelay", "DepDelay", 
"CRSDepTime", "DaysSinceEpoch");
+      return Arrays.asList("ActualElapsedTime", "ArrDelay", "DepDelay", 
"CRSDepTime", "RandomAirports",
+          "DivTotalGTimes", "DaysSinceEpoch");
     } else {
-      return super.getNoDictionaryColumns();
+      return Arrays.asList("ActualElapsedTime", "ArrDelay", "DepDelay", 
"CRSDepTime", "RandomAirports",

Review Comment:
   Suggest modifying the one in the `BaseClusterIntegrationTest` so that it is 
covered for offline tests as well



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/datasource/MutableDataSource.java:
##########
@@ -137,5 +137,9 @@ public Set<Integer> getPartitions() {
     public int getCardinality() {
       return _cardinality;
     }
+
+    public int getMaxRowLengthInBytes() {

Review Comment:
   Add`@Override`



##########
pinot-integration-tests/src/test/java/org/apache/pinot/integration/tests/BaseRealtimeClusterIntegrationTest.java:
##########
@@ -92,9 +92,11 @@ protected void overrideServerConf(PinotConfiguration 
configuration) {
   protected List<String> getNoDictionaryColumns() {
     // Randomly set time column as no dictionary column.
     if (new Random().nextInt(2) == 0) {
-      return Arrays.asList("ActualElapsedTime", "ArrDelay", "DepDelay", 
"CRSDepTime", "DaysSinceEpoch");
+      return Arrays.asList("ActualElapsedTime", "ArrDelay", "DepDelay", 
"CRSDepTime", "RandomAirports",

Review Comment:
   We can make it more readable by
   ```suggestion
         List<String> noDictionaryColumns = new 
ArrayList<>(super.getNoDictionaryColumns());
         noDictionaryColumns.add("DaysSinceEpoch");
         return noDictionaryColumns;
   ```



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/datasource/MutableDataSource.java:
##########
@@ -63,10 +62,11 @@ private static class MutableDataSourceMetadata implements 
DataSourceMetadata {
     final Set<Integer> _partitions;
     final Comparable _minValue;
     final Comparable _maxValue;
+    int _maxRowLengthInBytes = -1;

Review Comment:
   Make it `final`



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -1243,18 +1246,60 @@ private boolean isAggregateMetricsEnabled() {
 
   // NOTE: Okay for single-writer
   @SuppressWarnings("NonAtomicOperationOnVolatileField")
-  private static class NumValuesInfo {
+  private static class ValuesInfo {
     volatile int _numValues = 0;
     volatile int _maxNumValuesPerMVEntry = -1;
+    volatile int _varByteMVMaxRowLengthInBytes = -1;
 
-    void updateSVEntry() {
+    void updateSVNumValues() {
       _numValues++;
     }
 
-    void updateMVEntry(int numValuesInMVEntry) {
+    void updateMVNumValues(int numValuesInMVEntry) {
       _numValues += numValuesInMVEntry;
       _maxNumValuesPerMVEntry = Math.max(_maxNumValuesPerMVEntry, 
numValuesInMVEntry);
     }
+
+    /**
+     * When an MV VarByte column is created with noDict, the realtime segment 
is still created with a dictionary.
+     * When the realtime segment is converted to offline segment, the offline 
segment creates a noDict column.
+     * MultiValueVarByteRawIndexCreator requires the maxRowLengthInBytes. 
Refer to OSS issue
+     * https://github.com/apache/pinot/issues/10127 for more details.
+     */
+    void updateVarByteMVMaxRowLengthInBytes(Object entry, DataType dataType) {
+      // MV support for BigDecimal is not available.
+      if (dataType != STRING && dataType != BYTES) {
+        return;
+      }
+
+      Object[] values = (Object[]) entry;
+      int rowLength = 0;
+
+      switch (dataType) {
+        case STRING: {
+          for (Object obj : values) {
+            String value = (String) obj;
+            int length = value.getBytes(UTF_8).length;
+            rowLength += length;
+          }
+
+          _varByteMVMaxRowLengthInBytes = 
Math.max(_varByteMVMaxRowLengthInBytes, rowLength);
+          break;
+        }
+        case BYTES: {
+          for (Object obj : values) {
+            ByteArray value = new ByteArray((byte[]) obj);
+            int length = value.length();
+            rowLength += length;
+          }
+
+          _varByteMVMaxRowLengthInBytes = 
Math.max(_varByteMVMaxRowLengthInBytes, rowLength);
+          break;
+        }
+        default:
+          throw new IllegalStateException("Invalid type=" + dataType);

Review Comment:
   That will probably introduce unnecessary branching which impact the 
performance



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/column/IntermediateIndexContainer.java:
##########
@@ -61,9 +61,10 @@ public IntermediateIndexContainer(FieldSpec fieldSpec, 
@Nullable PartitionFuncti
   }
 
   public DataSource toDataSource(int numDocsIndexed) {
+    // TODO(Vivek): Change to correct value of maxRowLengthInBytes
     return new MutableDataSource(_fieldSpec, numDocsIndexed, 
_numValuesInfo._numValues,
         _numValuesInfo._maxNumValuesPerMVEntry, _dictionary.length(), 
_partitionFunction, _partitions, _minValue,
-        _maxValue, _forwardIndex, _dictionary, null, null, null, null, null, 
null, null, null);
+        _maxValue, _forwardIndex, _dictionary, null, null, null, null, null, 
null, null, null, -1);

Review Comment:
   This class can actually be removed. It was introduced as an effort to unify 
the realtime and offline segment creation, but seems like the project is 
aborted.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to