cloud-fan commented on code in PR #55599:
URL: https://github.com/apache/spark/pull/55599#discussion_r3184118148
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##########
@@ -952,3 +952,34 @@ case class SchemaOfVariantAgg(
override protected def withNewChildInternal(newChild: Expression):
Expression =
copy(child = newChild)
}
+
+@ExpressionDescription(
+ usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is
malformed.",
+ examples = """
+ Examples:
+ > SELECT _FUNC_(parse_json('null'));
+ true
+ > SELECT _FUNC_(parse_json('[{"b":true,"a":0}]'));
+ true
+ """,
+ since = "4.2.0",
+ group = "variant_funcs"
+)
+case class IsValidVariant(child: Expression) extends UnaryExpression
Review Comment:
Peer variant predicates have typed wrappers: `is_variant_null` and
`to_variant_object` are in both
`sql/api/src/main/scala/org/apache/spark/sql/functions.scala` and
`python/pyspark/sql/functions/builtin.py`. Without them, DataFrame and PySpark
users have to fall back to `expr("is_valid_variant(col)")`. Worth adding both
in this PR for parity (Connect coverage will follow automatically).
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/variant/variantExpressions.scala:
##########
@@ -952,3 +952,34 @@ case class SchemaOfVariantAgg(
override protected def withNewChildInternal(newChild: Expression):
Expression =
copy(child = newChild)
}
+
+@ExpressionDescription(
+ usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is
malformed.",
Review Comment:
The description doesn't mention the SQL NULL case. With `propagateNull =
true` (`StaticInvoke` default), SQL NULL input returns SQL NULL — not `false`.
Compare to `IsVariantNull`'s description (`...false otherwise (including in the
case of SQL NULL)`). Suggest clarifying:
```suggestion
usage = "_FUNC_(v) - Returns true if the variant is valid, false if it is
malformed, NULL if `v` is NULL.",
```
##########
common/variant/src/main/java/org/apache/spark/types/variant/VariantUtil.java:
##########
@@ -589,6 +590,88 @@ public static <T> T handleArray(byte[] value, int pos,
ArrayHandler<T> handler)
return handler.apply(size, offsetSize, offsetStart, dataStart);
}
+ // Validate whether a variant is well-formed. Returns true if the variant
binary is structurally
+ // well-formed (all bounds and type-info checks pass), false if it is
malformed.
+ //
+ // This is close to, but not strictly equivalent to, "`toJson` does not
throw": this function
+ // does not enforce the `SIZE_LIMIT` check that the `Variant` constructor
applies (which throws
+ // `VARIANT_CONSTRUCTOR_SIZE_LIMIT`). The implementation otherwise has the
same structure as
+ // `toJson` (see `Variant.toJsonImpl`).
+ //
+ // Implementation note: this `try { ... } catch (SparkRuntimeException e)`
is sound only because
+ // every helper invoked by `validateImpl` throws `MALFORMED_VARIANT` /
+ // `UNKNOWN_PRIMITIVE_TYPE_IN_VARIANT` rather than a raw
`ArrayIndexOutOfBoundsException` on
+ // malformed input. Preserve that invariant when adding new cases.
+ public static boolean isValidVariant(byte[] value, byte[] metadata) {
+ if (value == null || metadata == null) return false;
+ // Validate the metadata version, similar to the check in the `Variant`
constructor.
+ if (metadata.length < 1 || (metadata[0] & VERSION_MASK) != VERSION) return
false;
+ try {
+ validateImpl(value, metadata, 0);
+ return true;
+ } catch (SparkRuntimeException e) {
+ return false;
+ }
+ }
+
+ private static void validateImpl(byte[] value, byte[] metadata, int pos) {
+ switch (getType(value, pos)) {
Review Comment:
No `default` case — Java doesn't enforce exhaustive enum switches, so if a
new entry is later added to `Type` and the maintainer forgets `validateImpl`,
the validator silently no-ops on it and returns `true` for what is in fact
malformed (or new) input. Adding `default: throw malformedVariant();` would
surface the gap instead of hiding it.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]