cloud-fan commented on code in PR #55790:
URL: https://github.com/apache/spark/pull/55790#discussion_r3226725866


##########
docs/sql-migration-guide.md:
##########
@@ -31,6 +31,7 @@ license: |
 - Since Spark 4.2, Spark enables order-independent checksums for shuffle 
outputs by default to detect data inconsistencies during indeterminate shuffle 
stage retries. If a checksum mismatch is detected, Spark rolls back and 
re-executes all succeeding stages that depend on the shuffle output. If rolling 
back is not possible for some succeeding stages, the job will fail. To restore 
the previous behavior, set `spark.sql.shuffle.orderIndependentChecksum.enabled` 
and `spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch` to 
`false`.
 - Since Spark 4.2, support for Derby JDBC datasource is deprecated.
 - Since Spark 4.2, a new default method `mergeWith` has been added to the 
`CustomTaskMetric` interface. The default implementation sums the two metric 
values, which is correct for count-type metrics. Data source connector 
implementations that report non-additive metrics (e.g., maximum, average, 
compression ratio, or gauge values) must override `mergeWith` to provide 
correct merge semantics.
+- Since Spark 4.2, the geospatial `GEOMETRY` and `GEOGRAPHY` types and the 
corresponding `ST_*` functions are enabled. See [Geospatial 
(Geometry/Geography) Types](sql-ref-geospatial-types.html) for additional 
details.

Review Comment:
   I'd suggest dropping this bullet. The 4.1→4.2 entries above it all describe 
behavior changes that affect *existing* workloads (default-on order-independent 
checksums; new `CustomTaskMetric.mergeWith` contract for connectors; Derby JDBC 
deprecation), whereas turning on a new opt-in feature is purely additive:
   
   - `GEOMETRY` / `GEOGRAPHY` are non-reserved keywords in both ANSI and 
non-ANSI mode (`SqlBaseParser.g4:2048-2049, :2456-2457`), so they don't collide 
with existing identifiers.
   - `st_*` registry entries don't change resolution for code that doesn't call 
them.
   - The PR description correctly answers "Does this PR introduce any 
user-facing change? **No.**" — which is at odds with adding a migration bullet, 
since the migration guide is by definition for user-visible changes.
   
   This feels like release-notes territory rather than the migration guide. 
Happy to be talked out of it if there's a specific scenario I'm missing where 
an existing workload changes behavior at 4.2.



##########
docs/sql-ref-datatypes.md:
##########
@@ -95,8 +95,8 @@ Spark SQL and DataFrames support the following data types:
 
 * Spatial types
   Spatial objects as defined in the [OGC Simple Feature 
Access](https://portal.ogc.org/files/?artifact_id=25355) specification.
-  - `GeometryType`: Represents GEOMETRY values—spatial objects in a Cartesian 
coordinate system. The type can be fixed to a single SRID, e.g. 
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. Default SRID when 
not specified is 4326 (WGS 84).
-  - `GeographyType`: Represents GEOGRAPHY values—spatial objects in a 
geographic coordinate system (latitude/longitude). Edge interpolation is always 
SPHERICAL. The type can be fixed to a single SRID, e.g. `geography(4326)`, or 
allow mixed SRIDs with `geography(any)`. Default SRID is 4326 (WGS 84).
+  - `GeometryType`: Represents GEOMETRY values, spatial objects in a Cartesian 
coordinate system. The type must be fixed to a single SRID, e.g. 
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY` 
columns must always be declared with an explicit SRID or `ANY`.
+  - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a 
geographic coordinate system (latitude/longitude). Edge interpolation is always 
SPHERICAL. The type must be fixed to a single geographic SRID, e.g. 
`geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL, 
`GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`.

Review Comment:
   Minor wording: the "must be fixed ... or allow mixed SRIDs" parallelism 
reads awkwardly — `must` is a stronger commitment than the `or allow` 
alternative accepts. The original `can be fixed ... or allow ...` was natural 
English and didn't lose any meaning, because the very next sentence already 
states the mandatory part ("In SQL, `GEOMETRY` columns must always be declared 
..."). Suggest reverting that verb.
   
   ```suggestion
     - `GeometryType`: Represents GEOMETRY values, spatial objects in a 
Cartesian coordinate system. The type can be fixed to a single SRID, e.g. 
`geometry(4326)`, or allow mixed SRIDs with `geometry(any)`. In SQL, `GEOMETRY` 
columns must always be declared with an explicit SRID or `ANY`.
     - `GeographyType`: Represents GEOGRAPHY values, spatial objects in a 
geographic coordinate system (latitude/longitude). Edge interpolation is always 
SPHERICAL. The type can be fixed to a single geographic SRID, e.g. 
`geography(4326)`, or allow mixed SRIDs with `geography(any)`. In SQL, 
`GEOGRAPHY` columns must always be declared with an explicit SRID or `ANY`.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to