cloud-fan commented on code in PR #55599:
URL: https://github.com/apache/spark/pull/55599#discussion_r3184118148


##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##########
@@ -952,3 +952,34 @@ case class SchemaOfVariantAgg(
   override protected def withNewChildInternal(newChild: Expression): 
Expression =
     copy(child = newChild)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is 
malformed.",
+  examples = """
+    Examples:
+      > SELECT _FUNC_(parse_json('null'));
+       true
+      > SELECT _FUNC_(parse_json('[{"b":true,"a":0}]'));
+       true
+  """,
+  since = "4.2.0",
+  group = "variant_funcs"
+)
+case class IsValidVariant(child: Expression) extends UnaryExpression

Review Comment:
   Peer variant predicates have typed wrappers: `is_variant_null` and 
`to_variant_object` are in both 
`sql/api/src/main/scala/org/apache/spark/sql/functions.scala` and 
`python/pyspark/sql/functions/builtin.py`. Without them, DataFrame and PySpark 
users have to fall back to `expr("is_valid_variant(col)")`. Worth adding both 
in this PR for parity (Connect coverage will follow automatically).



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##########
@@ -952,3 +952,34 @@ case class SchemaOfVariantAgg(
   override protected def withNewChildInternal(newChild: Expression): 
Expression =
     copy(child = newChild)
 }
+
+@ExpressionDescription(
+  usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is 
malformed.",

Review Comment:
   The description doesn't mention the SQL NULL case. With `propagateNull = 
true` (`StaticInvoke` default), SQL NULL input returns SQL NULL — not `false`. 
Compare to `IsVariantNull`'s description (`...false otherwise (including in the 
case of SQL NULL)`). Suggest clarifying:
   
   ```suggestion
     usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is 
malformed, NULL if `v` is NULL.",
   ```



##########
common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java:
##########
@@ -589,6 +590,88 @@ public static <T> T handleArray(byte[] value, int pos, 
ArrayHandler<T> handler)
     return handler.apply(size, offsetSize, offsetStart, dataStart);
   }
 
+  // Validate whether a variant is well-formed. Returns true if the variant 
binary is structurally
+  // well-formed (all bounds and type-info checks pass), false if it is 
malformed.
+  //
+  // This is close to, but not strictly equivalent to, "`toJson` does not 
throw": this function
+  // does not enforce the `SIZE_LIMIT` check that the `Variant` constructor 
applies (which throws
+  // `VARIANT_CONSTRUCTOR_SIZE_LIMIT`). The implementation otherwise has the 
same structure as
+  // `toJson` (see `Variant.toJsonImpl`).
+  //
+  // Implementation note: this `try { ... } catch (SparkRuntimeException e)` 
is sound only because
+  // every helper invoked by `validateImpl` throws `MALFORMED_VARIANT` /
+  // `UNKNOWN_PRIMITIVE_TYPE_IN_VARIANT` rather than a raw 
`ArrayIndexOutOfBoundsException` on
+  // malformed input. Preserve that invariant when adding new cases.
+  public static boolean isValidVariant(byte[] value, byte[] metadata) {
+    if (value == null || metadata == null) return false;
+    // Validate the metadata version, similar to the check in the `Variant` 
constructor.
+    if (metadata.length < 1 || (metadata[0] & VERSION_MASK) != VERSION) return 
false;
+    try {
+      validateImpl(value, metadata, 0);
+      return true;
+    } catch (SparkRuntimeException e) {
+      return false;
+    }
+  }
+
+  private static void validateImpl(byte[] value, byte[] metadata, int pos) {
+    switch (getType(value, pos)) {

Review Comment:
   No `default` case — Java doesn't enforce exhaustive enum switches, so if a 
new entry is later added to `Type` and the maintainer forgets `validateImpl`, 
the validator silently no-ops on it and returns `true` for what is in fact 
malformed (or new) input. Adding `default: throw malformedVariant();` would 
surface the gap instead of hiding it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to