Copilot commented on code in PR #18428:
URL: https://github.com/apache/pinot/pull/18428#discussion_r3191233919


##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
     }
   }
 
-  public static PinotDataType getSingleValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the [PinotDataType] for the given single value, dispatched on 
the runtime class via
+  /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of 
non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  public static PinotDataType getSingleValueType(Object value) {

Review Comment:
   Replacing the public `Class<?>` entry points with value-only signatures is a 
source/binary incompatible API change. Downstream code compiled against 
`getSingleValueType(Class<?>)` / `getMultiValueType(Class<?>)` will no longer 
link, and callers that only have a declared type lose the only supported path 
for resolving types without materializing a sample instance. Keeping the old 
overloads as deprecated shims would preserve compatibility.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
     }
   }
 
-  public static PinotDataType getSingleValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the [PinotDataType] for the given single value, dispatched on 
the runtime class via
+  /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of 
non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  public static PinotDataType getSingleValueType(Object value) {
+    if (value instanceof Integer) {
       return INTEGER;
     }
-    if (cls == Long.class) {
+    if (value instanceof Long) {
       return LONG;
     }
-    if (cls == Float.class) {
+    if (value instanceof Float) {
       return FLOAT;
     }
-    if (cls == Double.class) {
+    if (value instanceof Double) {
       return DOUBLE;
     }
-    if (cls == BigDecimal.class) {
+    if (value instanceof BigDecimal) {
       return BIG_DECIMAL;
     }
-    if (cls == String.class) {
-      return STRING;
-    }
-    if (cls == byte[].class) {
-      return BYTES;
-    }
-    if (cls == UUID.class) {
-      return UUID;
-    }
-    if (cls == Boolean.class) {
+    if (value instanceof Boolean) {
       return BOOLEAN;
     }
-    if (cls == Timestamp.class) {
+    if (value instanceof Timestamp) {
       return TIMESTAMP;
     }
-    if (cls != null && Map.class.isAssignableFrom(cls)) {
+    if (value instanceof String) {
+      return STRING;
+    }
+    if (value instanceof byte[]) {
+      return BYTES;
+    }
+    if (value instanceof Map) {
       return MAP;
     }
-    if (cls == LocalDate.class) {
+    if (value instanceof LocalDate) {
       return DATE;
     }
-    if (cls == LocalTime.class) {
+    if (value instanceof LocalTime) {
       return TIME;
     }
-    if (cls == Byte.class) {
+    if (value instanceof UUID) {
+      return UUID;
+    }
+    if (value instanceof Byte) {
       return BYTE;
     }
-    if (cls == Character.class) {
+    if (value instanceof Character) {
       return CHARACTER;
     }
-    if (cls == Short.class) {
+    if (value instanceof Short) {
       return SHORT;
     }
     return OBJECT;
   }
 
-  public static PinotDataType getMultiValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the multi-value [PinotDataType] for the given sample element, 
dispatched on the runtime class
+  /// via `instanceof`. Returns [#OBJECT_ARRAY] for any unrecognized type.
+  public static PinotDataType getMultiValueType(Object element) {

Review Comment:
   Replacing the public `Class<?>` entry points with value-only signatures is a 
source/binary incompatible API change. Downstream code compiled against 
`getSingleValueType(Class<?>)` / `getMultiValueType(Class<?>)` will no longer 
link, and callers that only have a declared type lose the only supported path 
for resolving types without materializing a sample instance. Keeping the old 
overloads as deprecated shims would preserve compatibility.



##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?> 
clazz) {
     return PARAMETER_TYPE_MAP.get(clazz);
   }
 
-  /**
-   * Returns the corresponding PinotDataType for the given argument class, or 
{@code null} if there is no one matching.
-   */
-  @Nullable
-  public static PinotDataType getArgumentType(Class<?> clazz) {
-    if (Collection.class.isAssignableFrom(clazz)) {
-      return PinotDataType.COLLECTION;
+  /// Returns the corresponding [PinotDataType] for the given argument value 
(the actual value passed into
+  /// the function). Returns [PinotDataType#OBJECT] / 
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+  /// matching [PinotDataType#getSingleValueType]'s best-effort fallback. 
Subclasses of non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  ///
+  /// Dispatch (single-value first since it's the dominant case for function 
arguments):
+  /// - Single values → delegated to [PinotDataType#getSingleValueType] 
(covers all scalar types
+  ///   including `byte[]` → [PinotDataType#BYTES]).
+  /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) → 
first non-null element is
+  ///   sampled and [PinotDataType#getMultiValueType] is consulted. Empty / 
all-null reference arrays
+  ///   fall back to [PinotDataType#OBJECT_ARRAY] since the element type is 
undeterminable.
+  /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` / 
`boolean[]`) → handled here, since
+  ///   they can't be element-sampled into a boxed type.
+  /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back 
to [PinotDataType#OBJECT].
+  public static PinotDataType getArgumentType(Object value) {
+    PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+    if (singleValueType != PinotDataType.OBJECT) {
+      return singleValueType;
     }

Review Comment:
   This replaces the public `getArgumentType(Class<?>)` API with a value-based 
variant, and the same PR also removes `getDataType(Class<?>)`. That is a 
breaking change for downstream code that performs type resolution from declared 
classes or method signatures rather than live values. Please keep the 
`Class<?>` APIs as compatibility shims and deprecate them before removal.



##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?> 
clazz) {
     return PARAMETER_TYPE_MAP.get(clazz);
   }
 
-  /**
-   * Returns the corresponding PinotDataType for the given argument class, or 
{@code null} if there is no one matching.
-   */
-  @Nullable
-  public static PinotDataType getArgumentType(Class<?> clazz) {
-    if (Collection.class.isAssignableFrom(clazz)) {
-      return PinotDataType.COLLECTION;
+  /// Returns the corresponding [PinotDataType] for the given argument value 
(the actual value passed into
+  /// the function). Returns [PinotDataType#OBJECT] / 
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+  /// matching [PinotDataType#getSingleValueType]'s best-effort fallback. 
Subclasses of non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  ///
+  /// Dispatch (single-value first since it's the dominant case for function 
arguments):
+  /// - Single values → delegated to [PinotDataType#getSingleValueType] 
(covers all scalar types
+  ///   including `byte[]` → [PinotDataType#BYTES]).
+  /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) → 
first non-null element is
+  ///   sampled and [PinotDataType#getMultiValueType] is consulted. Empty / 
all-null reference arrays
+  ///   fall back to [PinotDataType#OBJECT_ARRAY] since the element type is 
undeterminable.
+  /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` / 
`boolean[]`) → handled here, since
+  ///   they can't be element-sampled into a boxed type.
+  /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back 
to [PinotDataType#OBJECT].
+  public static PinotDataType getArgumentType(Object value) {
+    PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+    if (singleValueType != PinotDataType.OBJECT) {
+      return singleValueType;
     }
-    if (Map.class.isAssignableFrom(clazz)) {
-      return PinotDataType.MAP;
+    if (value instanceof Object[]) {
+      Object[] array = (Object[]) value;
+      for (Object element : array) {
+        if (element == null) {
+          continue;
+        }
+        return PinotDataType.getMultiValueType(element);
+      }
+      // Empty or all-null reference array — element type undeterminable.
+      return PinotDataType.OBJECT_ARRAY;
     }

Review Comment:
   This regresses typed empty/all-null reference arrays. `new Integer[0]`, `new 
Timestamp[0]`, `new String[0]`, etc. now resolve to `OBJECT_ARRAY` even though 
the runtime array class still exposes the component type. Callers such as 
`BaseDefaultColumnHandler.createDerivedColumnV1Indices` infer column type from 
the first produced value, so an early empty array can permanently lock the 
derived column to `OBJECT_ARRAY` and break later writes/conversions. Fall back 
to the array component type before returning `OBJECT_ARRAY`.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
     }
   }
 
-  public static PinotDataType getSingleValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the [PinotDataType] for the given single value, dispatched on 
the runtime class via
+  /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of 
non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  public static PinotDataType getSingleValueType(Object value) {
+    if (value instanceof Integer) {
       return INTEGER;
     }
-    if (cls == Long.class) {
+    if (value instanceof Long) {
       return LONG;
     }
-    if (cls == Float.class) {
+    if (value instanceof Float) {
       return FLOAT;
     }
-    if (cls == Double.class) {
+    if (value instanceof Double) {
       return DOUBLE;
     }
-    if (cls == BigDecimal.class) {
+    if (value instanceof BigDecimal) {
       return BIG_DECIMAL;
     }
-    if (cls == String.class) {
-      return STRING;
-    }
-    if (cls == byte[].class) {
-      return BYTES;
-    }
-    if (cls == UUID.class) {
-      return UUID;
-    }
-    if (cls == Boolean.class) {
+    if (value instanceof Boolean) {
       return BOOLEAN;
     }
-    if (cls == Timestamp.class) {
+    if (value instanceof Timestamp) {
       return TIMESTAMP;
     }
-    if (cls != null && Map.class.isAssignableFrom(cls)) {
+    if (value instanceof String) {
+      return STRING;
+    }
+    if (value instanceof byte[]) {
+      return BYTES;
+    }
+    if (value instanceof Map) {
       return MAP;
     }
-    if (cls == LocalDate.class) {
+    if (value instanceof LocalDate) {
       return DATE;
     }
-    if (cls == LocalTime.class) {
+    if (value instanceof LocalTime) {
       return TIME;
     }
-    if (cls == Byte.class) {
+    if (value instanceof UUID) {
+      return UUID;
+    }
+    if (value instanceof Byte) {
       return BYTE;
     }
-    if (cls == Character.class) {
+    if (value instanceof Character) {
       return CHARACTER;
     }
-    if (cls == Short.class) {
+    if (value instanceof Short) {
       return SHORT;
     }
     return OBJECT;
   }
 
-  public static PinotDataType getMultiValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the multi-value [PinotDataType] for the given sample element, 
dispatched on the runtime class
+  /// via `instanceof`. Returns [#OBJECT_ARRAY] for any unrecognized type.
+  public static PinotDataType getMultiValueType(Object element) {

Review Comment:
   The previous tests covered `null`, but the updated suite no longer asserts 
the `null` fallbacks for these new value-based overloads. Since the new 
implementation depends on `instanceof` semantics to return `OBJECT` / 
`OBJECT_ARRAY` for `null`, please add explicit `getSingleValueType(null)` and 
`getMultiValueType(null)` assertions so this contract is locked down.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1230,6 +1230,24 @@ public Integer[] toInternal(Object value) {
     }
   },
 
+  BOOLEAN_ARRAY {
+    @Override
+    public Boolean[] convert(Object value, PinotDataType sourceType) {
+      return sourceType.toBooleanArray(value);
+    }
+
+    @Override
+    public Integer[] toInternal(Object value) {
+      Boolean[] booleanArray = (Boolean[]) value;
+      int length = booleanArray.length;
+      Integer[] intArray = new Integer[length];
+      for (int i = 0; i < length; i++) {
+        intArray[i] = booleanArray[i] != null ? (booleanArray[i] ? 1 : 0) : 
null;
+      }

Review Comment:
   This repurposes the existing public `BOOLEAN_ARRAY` enum constant from 
carrying `boolean[]` to carrying `Boolean[]`. Any existing caller that still 
passes primitive `boolean[]` values under `BOOLEAN_ARRAY` will now fail at the 
cast on line 1241, so this is a runtime-breaking contract change rather than a 
purely internal refactor. The old `BOOLEAN_ARRAY` behavior should remain 
backward-compatible if you need to add a boxed variant.



##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?> 
clazz) {
     return PARAMETER_TYPE_MAP.get(clazz);
   }
 
-  /**
-   * Returns the corresponding PinotDataType for the given argument class, or 
{@code null} if there is no one matching.
-   */
-  @Nullable
-  public static PinotDataType getArgumentType(Class<?> clazz) {
-    if (Collection.class.isAssignableFrom(clazz)) {
-      return PinotDataType.COLLECTION;
+  /// Returns the corresponding [PinotDataType] for the given argument value 
(the actual value passed into
+  /// the function). Returns [PinotDataType#OBJECT] / 
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+  /// matching [PinotDataType#getSingleValueType]'s best-effort fallback. 
Subclasses of non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  ///
+  /// Dispatch (single-value first since it's the dominant case for function 
arguments):
+  /// - Single values → delegated to [PinotDataType#getSingleValueType] 
(covers all scalar types
+  ///   including `byte[]` → [PinotDataType#BYTES]).
+  /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) → 
first non-null element is
+  ///   sampled and [PinotDataType#getMultiValueType] is consulted. Empty / 
all-null reference arrays
+  ///   fall back to [PinotDataType#OBJECT_ARRAY] since the element type is 
undeterminable.
+  /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` / 
`boolean[]`) → handled here, since
+  ///   they can't be element-sampled into a boxed type.
+  /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back 
to [PinotDataType#OBJECT].
+  public static PinotDataType getArgumentType(Object value) {
+    PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+    if (singleValueType != PinotDataType.OBJECT) {
+      return singleValueType;
     }

Review Comment:
   The old `Class<?>` helper accepted `null`, and this replacement still has a 
`null` path through `getSingleValueType(value)`. `FunctionUtilsTest` doesn't 
cover `getArgumentType(null)`, so a future refactor could accidentally change 
the fallback from `OBJECT` without tripping the new suite.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1796,121 +1832,126 @@ public PinotDataType getSingleValueType() {
     }
   }
 
-  public static PinotDataType getSingleValueType(Class<?> cls) {
-    if (cls == Integer.class) {
+  /// Returns the [PinotDataType] for the given single value, dispatched on 
the runtime class via
+  /// `instanceof`. Returns [#OBJECT] for any unrecognized type. Subclasses of 
non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  public static PinotDataType getSingleValueType(Object value) {

Review Comment:
   The previous tests covered `null`, but the updated suite no longer asserts 
the `null` fallbacks for these new value-based overloads. Since the new 
implementation depends on `instanceof` semantics to return `OBJECT` / 
`OBJECT_ARRAY` for `null`, please add explicit `getSingleValueType(null)` and 
`getMultiValueType(null)` assertions so this contract is locked down.
   



##########
pinot-common/src/main/java/org/apache/pinot/common/function/FunctionUtils.java:
##########
@@ -149,26 +99,55 @@ public static PinotDataType getParameterType(Class<?> 
clazz) {
     return PARAMETER_TYPE_MAP.get(clazz);
   }
 
-  /**
-   * Returns the corresponding PinotDataType for the given argument class, or 
{@code null} if there is no one matching.
-   */
-  @Nullable
-  public static PinotDataType getArgumentType(Class<?> clazz) {
-    if (Collection.class.isAssignableFrom(clazz)) {
-      return PinotDataType.COLLECTION;
+  /// Returns the corresponding [PinotDataType] for the given argument value 
(the actual value passed into
+  /// the function). Returns [PinotDataType#OBJECT] / 
[PinotDataType#OBJECT_ARRAY] for unrecognized types,
+  /// matching [PinotDataType#getSingleValueType]'s best-effort fallback. 
Subclasses of non-final types
+  /// (e.g. vendor `Timestamp` subclasses returned by JDBC drivers) are 
matched by their parent type.
+  ///
+  /// Dispatch (single-value first since it's the dominant case for function 
arguments):
+  /// - Single values → delegated to [PinotDataType#getSingleValueType] 
(covers all scalar types
+  ///   including `byte[]` → [PinotDataType#BYTES]).
+  /// - Reference arrays (`Object[]` and subtypes including `byte[][]`) → 
first non-null element is
+  ///   sampled and [PinotDataType#getMultiValueType] is consulted. Empty / 
all-null reference arrays
+  ///   fall back to [PinotDataType#OBJECT_ARRAY] since the element type is 
undeterminable.
+  /// - Primitive arrays (`int[]` / `long[]` / `float[]` / `double[]` / 
`boolean[]`) → handled here, since
+  ///   they can't be element-sampled into a boxed type.
+  /// - [PinotDataType#COLLECTION] for any [Collection]; otherwise falls back 
to [PinotDataType#OBJECT].
+  public static PinotDataType getArgumentType(Object value) {
+    PinotDataType singleValueType = PinotDataType.getSingleValueType(value);
+    if (singleValueType != PinotDataType.OBJECT) {
+      return singleValueType;
     }
-    if (Map.class.isAssignableFrom(clazz)) {
-      return PinotDataType.MAP;
+    if (value instanceof Object[]) {
+      Object[] array = (Object[]) value;
+      for (Object element : array) {
+        if (element == null) {
+          continue;
+        }
+        return PinotDataType.getMultiValueType(element);
+      }
+      // Empty or all-null reference array — element type undeterminable.
+      return PinotDataType.OBJECT_ARRAY;
     }
-    return ARGUMENT_TYPE_MAP.get(clazz);
-  }
-
-  /**
-   * Returns the corresponding DataType for the given class, or {@code null} 
if there is no one matching.
-   */
-  @Nullable
-  public static DataType getDataType(Class<?> clazz) {
-    return DATA_TYPE_MAP.get(clazz);
+    if (value instanceof int[]) {
+      return PinotDataType.PRIMITIVE_INT_ARRAY;
+    }
+    if (value instanceof long[]) {
+      return PinotDataType.PRIMITIVE_LONG_ARRAY;
+    }
+    if (value instanceof float[]) {
+      return PinotDataType.PRIMITIVE_FLOAT_ARRAY;
+    }
+    if (value instanceof double[]) {
+      return PinotDataType.PRIMITIVE_DOUBLE_ARRAY;
+    }
+    if (value instanceof boolean[]) {
+      return PinotDataType.PRIMITIVE_BOOLEAN_ARRAY;
+    }
+    if (value instanceof Collection) {
+      return PinotDataType.COLLECTION;
+    }
+    return PinotDataType.OBJECT;

Review Comment:
   The old `Class<?>` helper accepted `null`, and this replacement still has a 
`null` path through `getSingleValueType(value)`. `FunctionUtilsTest` doesn't 
cover `getArgumentType(null)`, so a future refactor could accidentally change 
the fallback from `OBJECT` without tripping the new suite.



##########
pinot-common/src/main/java/org/apache/pinot/common/utils/PinotDataType.java:
##########
@@ -1731,19 +1743,42 @@ public UUID[] toUuidArray(Object value) {
     }
   }
 
+  private static Object[] toObjectArray(Object array) {
+    if (array instanceof Collection) {
+      return ((Collection<?>) array).toArray();
+    }
+    Class<?> componentType = array.getClass().getComponentType();
+    if (componentType.isPrimitive()) {
+      if (componentType == int.class) {
+        return ArrayUtils.toObject((int[]) array);
+      }
+      if (componentType == long.class) {
+        return ArrayUtils.toObject((long[]) array);
+      }
+      if (componentType == float.class) {
+        return ArrayUtils.toObject((float[]) array);
+      }
+      if (componentType == double.class) {
+        return ArrayUtils.toObject((double[]) array);
+      }
+      if (componentType == boolean.class) {
+        return ArrayUtils.toObject((boolean[]) array);
+      }
+      throw new UnsupportedOperationException("Unsupported primitive array 
type: " + componentType);
+    } else {
+      return (Object[]) array;
+    }
+  }
+
   public Object convert(Object value, PinotDataType sourceType) {
     throw new UnsupportedOperationException("Cannot convert value from " + 
sourceType + " to " + this);
   }
 
-  /**
-   * Converts to the internal representation of the value.
-   * <ul>
-   *   <li>BOOLEAN -> int</li>
-   *   <li>TIMESTAMP -> long</li>
-   *   <li>BOOLEAN_ARRAY -> int[]</li>
-   *   <li>TIMESTAMP_ARRAY -> long[]</li>
-   * </ul>
-   */
+  /// Converts to the internal representation of the value.
+  /// - `BOOLEAN` → `Integer` (0/1)
+  /// - `TIMESTAMP` → `Long` (epoch millis)
+  /// - `PRIMITIVE_BOOLEAN_ARRAY` / `BOOLEAN_ARRAY` → `Integer[]` (per-element 
0/1)
+  /// - `TIMESTAMP_ARRAY` → `Long[]` (per-element epoch millis)

Review Comment:
   This was previously Javadoc, but `///` is only a normal line comment in 
Java. That means the public `toInternal` contract drops out of generated 
Javadocs and IDE quick docs right when this PR changes the boolean-array 
behavior. Please keep this as real Javadoc (`/** ... */`) so the updated API 
contract remains discoverable.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to