Jackie-Jiang commented on a change in pull request #6710:
URL: https://github.com/apache/incubator-pinot/pull/6710#discussion_r604464226



##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;

Review comment:
       (nit)
   ```suggestion
       THREAD_CPU_TIME_NS("threadCpuTimeNs");
   ```

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       I still suggest associating an id with each key instead of using ordinal 
of the enum. The convention here should be always increasing the id when adding 
new keys.
   Id is more flexible than ordinal for the following reasons:
   - Ordinal works as always putting the index key as the id. If by any chance 
people accidentally change the order of the keys, it will break
   - With id, we can remove keys in a backward-compatible way in two releases 
if necessary. With ordinal, we have to keep a place holder so that the ordinal 
for other keys don't change
   
   @mqliang @siddharthteotia @mcvsubbu Thoughts?

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new 
HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int 
type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, 
MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, 
MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of 
long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = 
ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, 
MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, 
MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, 
MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {

Review comment:
       You don't need `Optional` here, but either:
   - Throw exception for invalid id (suggest this way)
   - Return `null` for invalid id

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {

Review comment:
       ```suggestion
     enum MetadataKey {
   ```

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new 
HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int 
type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, 
MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, 
MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of 
long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = 
ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, 
MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, 
MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, 
MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));

Review comment:
       `getOrDefault(name, null)` is the same as `get(name)`

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of 
using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {
+  public static final int VERSION_2 = 2;
+  public static final int VERSION_3 = 3;
+  private static int _version = VERSION_3;

Review comment:
       This should not be hardcoded but from the config

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new 
HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int 
type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, 
MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, 
MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of 
long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = 
ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, 
MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, 
MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, 
MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {

Review comment:
       Same here

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableImplBase.java
##########
@@ -0,0 +1,284 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.pinot.core.common.datatable;
+
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.DataInputStream;
+import java.io.DataOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.Map;
+import org.apache.pinot.common.utils.DataSchema;
+import org.apache.pinot.common.utils.DataTable;
+import org.apache.pinot.common.utils.StringUtil;
+import org.apache.pinot.core.common.ObjectSerDeUtils;
+import org.apache.pinot.spi.utils.ByteArray;
+import org.apache.pinot.spi.utils.BytesUtils;
+
+import static 
org.apache.pinot.core.common.datatable.DataTableUtils.decodeString;
+
+
+/**
+ * Base implementation of the DataTable interface.
+ */
+public abstract class DataTableImplBase implements DataTable {

Review comment:
       Rename to `BaseDataTable`

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -77,6 +77,9 @@
 // TODO:   3. Given a data schema, write all values one by one instead of 
using rowId and colId to position (save time).
 // TODO:   4. Store bytes as variable size data instead of String
 public class DataTableBuilder {

Review comment:
       Suggest making 2 builders, one for v2 and one for v3. You can extract 
the common logic into a base class, or just duplicate code because we will 
deprecate v2 in the next release once v3 is well tested

##########
File path: 
pinot-core/src/main/java/org/apache/pinot/core/common/datatable/DataTableBuilder.java
##########
@@ -96,6 +99,17 @@ public DataTableBuilder(DataSchema dataSchema) {
     _rowSizeInBytes = DataTableUtils.computeColumnOffsets(dataSchema, 
_columnOffsets);

Review comment:
       This won't be correct because we want to fix the float value size 
(should be 4 but use 8 bytes in v2)

##########
File path: 
pinot-common/src/main/java/org/apache/pinot/common/utils/DataTable.java
##########
@@ -80,4 +85,87 @@
   double[] getDoubleArray(int rowId, int colId);
 
   String[] getStringArray(int rowId, int colId);
+
+  /* The MetadataKeys is used in V3, where we present metadata as 
Map<MetadataKeys, String>
+   * ATTENTION:
+   *  - Don't change existing keys.
+   *  - Don't remove existing keys.
+   *  - Always add new keys to the end.
+   *  Otherwise, backward compatibility will be broken.
+   */
+  enum MetadataKeys {
+    UNKNOWN("unknown"),
+    TABLE("table"), // NOTE: this key is only used in PrioritySchedulerTest
+    EXCEPTION("Exception"),
+    NUM_DOCS_SCANNED("numDocsScanned"),
+    NUM_ENTRIES_SCANNED_IN_FILTER("numEntriesScannedInFilter"),
+    NUM_ENTRIES_SCANNED_POST_FILTER("numEntriesScannedPostFilter"),
+    NUM_SEGMENTS_QUERIED("numSegmentsQueried"),
+    NUM_SEGMENTS_PROCESSED("numSegmentsProcessed"),
+    NUM_SEGMENTS_MATCHED("numSegmentsMatched"),
+    NUM_CONSUMING_SEGMENTS_PROCESSED("numConsumingSegmentsProcessed"),
+    MIN_CONSUMING_FRESHNESS_TIME_MS("minConsumingFreshnessTimeMs"),
+    TOTAL_DOCS("totalDocs"),
+    NUM_GROUPS_LIMIT_REACHED("numGroupsLimitReached"),
+    TIME_USED_MS("timeUsedMs"),
+    TRACE_INFO("traceInfo"),
+    REQUEST_ID("requestId"),
+    NUM_RESIZES("numResizes"),
+    RESIZE_TIME_MS("resizeTimeMs"),
+    THREAD_CPU_TIME_NS("threadCpuTimeNs"),
+    ;
+
+    private static final Map<String, MetadataKeys> _nameToEnumKeyMap = new 
HashMap<>();
+    // _intValueMetadataKeys contains all metadata keys which has value of int 
type.
+    private static final Set<MetadataKeys> _intValueMetadataKeys = ImmutableSet
+        .of(MetadataKeys.NUM_SEGMENTS_QUERIED, 
MetadataKeys.NUM_SEGMENTS_PROCESSED, MetadataKeys.NUM_SEGMENTS_MATCHED,
+            MetadataKeys.NUM_RESIZES, 
MetadataKeys.NUM_CONSUMING_SEGMENTS_PROCESSED, MetadataKeys.NUM_RESIZES);
+    // _longValueMetadataKeys contains all metadata keys which has value of 
long type.
+    private static final Set<MetadataKeys> _longValueMetadataKeys = 
ImmutableSet
+        .of(MetadataKeys.NUM_DOCS_SCANNED, 
MetadataKeys.NUM_ENTRIES_SCANNED_IN_FILTER,
+            MetadataKeys.NUM_ENTRIES_SCANNED_POST_FILTER, 
MetadataKeys.MIN_CONSUMING_FRESHNESS_TIME_MS,
+            MetadataKeys.TOTAL_DOCS, MetadataKeys.TIME_USED_MS, 
MetadataKeys.REQUEST_ID, MetadataKeys.RESIZE_TIME_MS,
+            MetadataKeys.THREAD_CPU_TIME_NS);
+    private final String _name;
+
+    MetadataKeys(String name) {
+      this._name = name;
+    }
+
+    // getByOrdinal returns an optional enum key for a given ordinal
+    public static Optional<MetadataKeys> getByOrdinal(int ordinal) {
+      if (ordinal >= MetadataKeys.values().length) {
+        return Optional.empty();
+      }
+      return Optional.ofNullable(MetadataKeys.values()[ordinal]);
+    }
+
+    // getByName returns an optional enum key for a given name.
+    public static Optional<MetadataKeys> getByName(String name) {
+      return Optional.ofNullable(_nameToEnumKeyMap.getOrDefault(name, null));
+    }
+
+    // isIntValueMetadataKey returns true if the given key has value of int 
type.
+    public static boolean isIntValueMetadataKey(MetadataKeys key) {
+      return _intValueMetadataKeys.contains(key);
+    }
+
+    // isLongValueMetadataKey returns true if the given key has value of long 
type.
+    public static boolean isLongValueMetadataKey(MetadataKeys key) {
+      return _longValueMetadataKeys.contains(key);
+    }
+
+    // getName returns the associated name(string) of the enum key.
+    public String getName() {
+      return _name;
+    }
+
+    static {

Review comment:
       Put this block following the map definition for better readability




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to