cloud-fan commented on code in PR #55631:
URL: https://github.com/apache/spark/pull/55631#discussion_r3206797560
##########
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/util/Geography.java:
##########
@@ -130,8 +130,13 @@ public static Geography fromEwkt(byte[] ewkt) {
/** Geography binary standard format converters: WKB and EWKB. */
@Override
- public byte[] toWkb() {
- return toWkbInternal(DEFAULT_ENDIANNESS);
+ public byte[] toWkb(String endianness) {
Review Comment:
Reinforcing the layer concern from last round — the hardcoded
`"st_asbinary"` argument is gone, but the deeper issue (validation living here
at all) wasn't addressed. The validation block is byte-identical in
`Geometry.java:133-140`, and the `Geo` interface itself now declares `byte[]
toWkb(String endianness)` (`Geo.java:61`) — extending the Geo type
abstraction's contract to include SQL surface tokens (`'NDR'`/`'XDR'`).
The peer feature `ST_GeomFromWKB` validates its second arg in
`STUtils.stGeomFromWKB(byte[], int)` (`STUtils.java:154`) — that's the
established location for ST argument validation. Following the same pattern
would (a) collapse the duplication, (b) keep `Geography`/`Geometry` purely
`ByteOrder`-typed, and (c) let the 1-arg `STUtils.stAsBinary(geo)` call
`.toWkb(ByteOrder.LITTLE_ENDIAN)` directly without a string-equality round-trip
on the parquet writer's per-row hot path (`ParquetWriteSupport.scala:283,290`).
Sketch:
```java
// In STUtils.java
private static ByteOrder parseEndianness(UTF8String endianness) {
String s = endianness.toString();
if (s.equalsIgnoreCase("NDR")) return ByteOrder.LITTLE_ENDIAN;
if (s.equalsIgnoreCase("XDR")) return ByteOrder.BIG_ENDIAN;
throw QueryExecutionErrors.stInvalidArgumentErrorInvalidEndiannessValue(s);
}
public static byte[] stAsBinary(GeographyVal geo) {
return fromPhysVal(geo).toWkb(ByteOrder.LITTLE_ENDIAN);
}
public static byte[] stAsBinary(GeographyVal geo, UTF8String endianness) {
return fromPhysVal(geo).toWkb(parseEndianness(endianness));
}
```
Then drop `toWkb(String)` from `Geo.java`, `Geography.java`, and
`Geometry.java`.
##########
sql/api/src/main/scala/org/apache/spark/sql/functions.scala:
##########
@@ -11113,6 +11113,18 @@ object functions {
def st_asbinary(geo: Column): Column =
Column.fn("st_asbinary", geo)
+ /**
+ * Returns the input GEOGRAPHY or GEOMETRY value in WKB format using the
specified endianness.
+ *
+ * @group st_funcs
+ * @since 4.2.0
+ */
+ def st_asbinary(geo: Column, endianness: Column): Column =
+ Column.fn("st_asbinary", geo, endianness)
+
+ def st_asbinary(geo: Column, endianness: String): Column =
+ Column.fn("st_asbinary", geo, lit(endianness))
Review Comment:
This overload is public API but has no scaladoc, no `@since`, and no
`@group` — diverges from the universal convention in this file (e.g.
`decode`/`encode`, `tuple_intersection_agg_double`, `lpad`, including the
`Column`+`String` sibling of the `Column`+`Column` form just above). Late catch
— I should have flagged it last round.
```suggestion
/**
* Returns the input GEOGRAPHY or GEOMETRY value in WKB format using the
specified endianness.
*
* @group st_funcs
* @since 4.2.0
*/
def st_asbinary(geo: Column, endianness: String): Column =
Column.fn("st_asbinary", geo, lit(endianness))
```
##########
sql/core/src/test/scala/org/apache/spark/sql/STExpressionsSuite.scala:
##########
@@ -476,16 +476,38 @@ class STExpressionsSuite
test("ST_AsBinary") {
// Test data: WKB representation of POINT(1 2).
- val wkb =
Hex.unhex("0101000000000000000000F03F0000000000000040".getBytes())
- val wkbLiteral = Literal.create(wkb, BinaryType)
+ val wkbNdr =
Hex.unhex("0101000000000000000000F03F0000000000000040".getBytes())
+ val wkbXdr =
Hex.unhex("00000000013FF00000000000004000000000000000".getBytes())
+ val wkbLiteral = Literal.create(wkbNdr, BinaryType)
+ val endiannessNdr = Literal.create("NDR")
+ val endiannessXdr = Literal.create("XDR")
// ST_GeogFromWKB and ST_AsBinary.
val geographyExpression = ST_GeogFromWKB(wkbLiteral)
assert(geographyExpression.dataType.sameType(defaultGeographyType))
- checkEvaluation(ST_AsBinary(geographyExpression), wkb)
+ checkEvaluation(new ST_AsBinary(geographyExpression), wkbNdr)
+ checkEvaluation(ST_AsBinary(geographyExpression, endiannessNdr), wkbNdr)
+ checkEvaluation(ST_AsBinary(geographyExpression, endiannessXdr), wkbXdr)
// ST_GeomFromWKB and ST_AsBinary.
val geometryExpression = new ST_GeomFromWKB(wkbLiteral)
assert(geometryExpression.dataType.sameType(defaultGeometryType))
- checkEvaluation(ST_AsBinary(geometryExpression), wkb)
+ checkEvaluation(new ST_AsBinary(geometryExpression), wkbNdr)
+ checkEvaluation(ST_AsBinary(geometryExpression, endiannessNdr), wkbNdr)
+ checkEvaluation(ST_AsBinary(geometryExpression, endiannessXdr), wkbXdr)
+ // Test NULL handling.
+ checkEvaluation(new ST_AsBinary(Literal.create(null,
defaultGeographyType)), null)
+ checkEvaluation(ST_AsBinary(Literal.create(null, defaultGeographyType),
endiannessNdr), null)
+ checkEvaluation(new ST_AsBinary(Literal.create(null,
defaultGeometryType)), null)
+ checkEvaluation(ST_AsBinary(Literal.create(null, defaultGeometryType),
endiannessXdr), null)
+ // Test invalid endianness.
+ Seq(geographyExpression, geometryExpression).foreach { expr =>
+ checkError(
+ exception = intercept[SparkIllegalArgumentException] {
+ ST_AsBinary(expr, Literal.create("ABC")).eval()
+ },
+ condition = "ST_INVALID_ENDIANNESS_VALUE",
+ parameters = Map("endianness" -> "ABC")
+ )
+ }
Review Comment:
Two cases I asked for last round are still missing — the new tests cover
invalid string and NULL-geo + various endianness, but not:
1. **Mixed-case endianness**: validation uses `equalsIgnoreCase`, so
`'ndr'`/`'xdr'`/`'NdR'` should round-trip successfully. With no test, a future
change that swaps `equalsIgnoreCase` for `equals` would break user queries
silently.
2. **Non-null geo + NULL endianness Literal**: confirms
`RuntimeReplaceable`/`StaticInvoke` null-propagation on the second arg.
Suggested additions:
```scala
// Mixed-case endianness (validation is case-insensitive).
checkEvaluation(ST_AsBinary(geographyExpression, Literal.create("ndr")),
wkbNdr)
checkEvaluation(ST_AsBinary(geometryExpression, Literal.create("XdR")),
wkbXdr)
// NULL endianness propagates through StaticInvoke.
checkEvaluation(
ST_AsBinary(geographyExpression, Literal.create(null, StringType)),
null)
checkEvaluation(
ST_AsBinary(geometryExpression, Literal.create(null, StringType)),
null)
```
Worth adding a mixed-case row to `sql-tests/inputs/st-functions.sql` too,
for end-to-end coverage.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]