sajjad-moradi commented on code in PR #10186:
URL: https://github.com/apache/pinot/pull/10186#discussion_r1088501494


##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/segment/index/column/IntermediateIndexContainer.java:
##########
@@ -61,9 +61,10 @@ public IntermediateIndexContainer(FieldSpec fieldSpec, 
@Nullable PartitionFuncti
   }
 
   public DataSource toDataSource(int numDocsIndexed) {
+    // TODO(Vivek): Change to correct value of maxRowLengthInBytes
     return new MutableDataSource(_fieldSpec, numDocsIndexed, 
_numValuesInfo._numValues,
         _numValuesInfo._maxNumValuesPerMVEntry, _dictionary.length(), 
_partitionFunction, _partitions, _minValue,
-        _maxValue, _forwardIndex, _dictionary, null, null, null, null, null, 
null, null, null);
+        _maxValue, _forwardIndex, _dictionary, null, null, null, null, null, 
null, null, null, -1);

Review Comment:
   I'm not sure where IntermediateSegment is being used. Based on the other 
parameter, I assume using -1 should be fine! In any case, please remove the 
TODO.



##########
pinot-segment-local/src/main/java/org/apache/pinot/segment/local/indexsegment/mutable/MutableSegmentImpl.java:
##########
@@ -1243,18 +1246,60 @@ private boolean isAggregateMetricsEnabled() {
 
   // NOTE: Okay for single-writer
   @SuppressWarnings("NonAtomicOperationOnVolatileField")
-  private static class NumValuesInfo {
+  private static class ValuesInfo {
     volatile int _numValues = 0;
     volatile int _maxNumValuesPerMVEntry = -1;
+    volatile int _varByteMVMaxRowLengthInBytes = -1;
 
-    void updateSVEntry() {
+    void updateSVNumValues() {
       _numValues++;
     }
 
-    void updateMVEntry(int numValuesInMVEntry) {
+    void updateMVNumValues(int numValuesInMVEntry) {
       _numValues += numValuesInMVEntry;
       _maxNumValuesPerMVEntry = Math.max(_maxNumValuesPerMVEntry, 
numValuesInMVEntry);
     }
+
+    /**
+     * When an MV VarByte column is created with noDict, the realtime segment 
is still created with a dictionary.
+     * When the realtime segment is converted to offline segment, the offline 
segment creates a noDict column.
+     * MultiValueVarByteRawIndexCreator requires the maxRowLengthInBytes. 
Refer to OSS issue
+     * https://github.com/apache/pinot/issues/10127 for more details.
+     */
+    void updateVarByteMVMaxRowLengthInBytes(Object entry, DataType dataType) {
+      // MV support for BigDecimal is not available.
+      if (dataType != STRING && dataType != BYTES) {
+        return;
+      }
+
+      Object[] values = (Object[]) entry;
+      int rowLength = 0;
+
+      switch (dataType) {
+        case STRING: {
+          for (Object obj : values) {
+            String value = (String) obj;
+            int length = value.getBytes(UTF_8).length;
+            rowLength += length;
+          }
+
+          _varByteMVMaxRowLengthInBytes = 
Math.max(_varByteMVMaxRowLengthInBytes, rowLength);
+          break;
+        }
+        case BYTES: {
+          for (Object obj : values) {
+            ByteArray value = new ByteArray((byte[]) obj);
+            int length = value.length();
+            rowLength += length;
+          }
+
+          _varByteMVMaxRowLengthInBytes = 
Math.max(_varByteMVMaxRowLengthInBytes, rowLength);
+          break;
+        }
+        default:
+          throw new IllegalStateException("Invalid type=" + dataType);

Review Comment:
   I suggest use the switch-case statement inside the for-loop, so the rest of 
the code gets reused.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to