Copilot commented on code in PR #2658:
URL: https://github.com/apache/sedona/pull/2658#discussion_r2824419867


##########
common/src/main/java/org/apache/sedona/common/FunctionsProj4.java:
##########
@@ -62,6 +67,93 @@ public class FunctionsProj4 {
   private static final Pattern EPSG_PATTERN =
       Pattern.compile("^EPSG:(\\d+)$", Pattern.CASE_INSENSITIVE);
 
+  /** Name used for the registered URL CRS provider. */
+  private static final String URL_CRS_PROVIDER_NAME = "sedona-url-crs";
+
+  /**
+   * Tracks the currently registered URL CRS provider config (baseUrl + "|" + 
pathTemplate + "|" +
+   * format). Null means no provider registered yet. Uses AtomicReference for 
thread-safe lazy
+   * initialization on executors.
+   */
+  private static final AtomicReference<String> registeredUrlCrsConfig = new 
AtomicReference<>(null);
+
+  /**
+   * Register a URL-based CRS provider with proj4sedona's Defs registry. This 
provider will be
+   * consulted before the built-in provider when resolving EPSG codes.
+   *
+   * <p>This method is safe to call concurrently from multiple threads — it 
uses double-checked
+   * locking so the fast path (already registered with the same config) is 
lock-free, and the
+   * synchronized slow path executes at most once per JVM (or once per config 
change).
+   *
+   * @param baseUrl The base URL of the CRS definition server
+   * @param pathTemplate The URL path template (e.g., 
"/{authority}/{code}.json")
+   * @param format The expected response format: "projjson", "proj", "wkt1", 
or "wkt2"
+   */
+  public static void registerUrlCrsProvider(String baseUrl, String 
pathTemplate, String format) {
+    if (baseUrl == null || baseUrl.isEmpty()) {

Review Comment:
   `registerUrlCrsProvider` returns early when `baseUrl` is null/empty, which 
means a previously-registered `sedona-url-crs` provider will remain active even 
after users set `spark.sedona.crs.url.base` back to empty. This breaks the 
documented/expected "disable" behavior and can cause unexpected remote HTTP 
lookups in later queries within the same executor JVM. Consider treating 
null/empty `baseUrl` as a signal to unregister the provider (if present) and 
reset `registeredUrlCrsConfig` to null in a thread-safe way.
   ```suggestion
       if (baseUrl == null || baseUrl.isEmpty()) {
         // Treat null/empty baseUrl as a request to disable the URL CRS 
provider.
         synchronized (registeredUrlCrsConfig) {
           String current = registeredUrlCrsConfig.get();
           if (current != null) {
             Defs.removeProvider(URL_CRS_PROVIDER_NAME);
             registeredUrlCrsConfig.set(null);
           }
         }
   ```



##########
common/src/main/java/org/apache/sedona/common/FunctionsProj4.java:
##########
@@ -62,6 +67,93 @@ public class FunctionsProj4 {
   private static final Pattern EPSG_PATTERN =
       Pattern.compile("^EPSG:(\\d+)$", Pattern.CASE_INSENSITIVE);
 
+  /** Name used for the registered URL CRS provider. */
+  private static final String URL_CRS_PROVIDER_NAME = "sedona-url-crs";
+
+  /**
+   * Tracks the currently registered URL CRS provider config (baseUrl + "|" + 
pathTemplate + "|" +
+   * format). Null means no provider registered yet. Uses AtomicReference for 
thread-safe lazy
+   * initialization on executors.
+   */
+  private static final AtomicReference<String> registeredUrlCrsConfig = new 
AtomicReference<>(null);
+
+  /**
+   * Register a URL-based CRS provider with proj4sedona's Defs registry. This 
provider will be
+   * consulted before the built-in provider when resolving EPSG codes.
+   *
+   * <p>This method is safe to call concurrently from multiple threads — it 
uses double-checked
+   * locking so the fast path (already registered with the same config) is 
lock-free, and the
+   * synchronized slow path executes at most once per JVM (or once per config 
change).
+   *
+   * @param baseUrl The base URL of the CRS definition server
+   * @param pathTemplate The URL path template (e.g., 
"/{authority}/{code}.json")
+   * @param format The expected response format: "projjson", "proj", "wkt1", 
or "wkt2"
+   */
+  public static void registerUrlCrsProvider(String baseUrl, String 
pathTemplate, String format) {
+    if (baseUrl == null || baseUrl.isEmpty()) {
+      return;
+    }
+
+    String configKey = baseUrl + "|" + pathTemplate + "|" + format;
+
+    // Fast path (lock-free): already registered with the same config.
+    // This handles 99.999%+ of calls with just a volatile read + 
String.equals().
+    if (configKey.equals(registeredUrlCrsConfig.get())) {
+      return;
+    }

Review Comment:
   The `configKey` uses the raw `format` string, but `parseCrsFormat` is 
case-insensitive and also defaults unknown/empty formats to PROJJSON. As a 
result, logically equivalent configs like `format=PROJJSON` vs `projjson` (or 
`null` vs `projjson`) will force unnecessary remove/re-register cycles. 
Canonicalizing `format` (e.g., lowercasing and/or using the parsed enum name) 
when building `configKey` would avoid redundant registrations and reduce churn 
in the Defs registry.



##########
docs/api/sql/CRS-Transformation.md:
##########
@@ -200,6 +200,172 @@ SELECT ST_Transform(
 ) AS transformed_point
 ```
 
+## URL CRS Provider
+
+Since v1.9.0, Sedona supports resolving CRS definitions from a remote HTTP 
server. This is useful when you need custom or internal CRS definitions that 
are not included in the built-in database, or when you want to use your own CRS 
definition service.
+
+When configured, the URL provider is consulted **before** the built-in CRS 
database. If the URL provider returns a valid CRS definition, it is used 
directly. If the URL returns a 404 or an error, Sedona falls back to the 
built-in definitions.
+
+### Hosting CRS definitions
+
+You can host your custom CRS definitions on any HTTP-accessible location. Two 
common approaches:
+
+- **GitHub repository**: Store CRS definition files in a public GitHub repo 
and use the raw content URL. This is the easiest way to get started — no server 
infrastructure required.
+- **Public S3 bucket**: Upload CRS definition files to an Amazon S3 bucket 
with public read access and use the S3 static website URL or CloudFront 
distribution.
+
+Each file should contain a single CRS definition in the format you specify via 
`spark.sedona.crs.url.format` (PROJJSON, PROJ string, WKT1, or WKT2).
+
+### Configuration
+
+Set the following Spark configuration properties when creating your Sedona 
session:
+
+```python
+config = (
+    SedonaContext.builder()
+    .config("spark.sedona.crs.url.base", "https://crs.example.com";)
+    .config("spark.sedona.crs.url.pathTemplate", "/{authority}/{code}.json")
+    .config("spark.sedona.crs.url.format", "projjson")
+    .getOrCreate()
+)
+sedona = SedonaContext.create(config)
+```
+
+With the default path template, resolving `EPSG:4326` will fetch:
+
+```
+https://crs.example.com/epsg/4326.json
+```
+
+Only `spark.sedona.crs.url.base` is required. The other two properties have 
sensible defaults (`/{authority}/{code}.json` and `projjson`).
+
+### Supported response formats
+
+| Format value | Description | Content example |
+|-------------|-------------|----------------|
+| `projjson` | PROJJSON (default) | `{"type": "GeographicCRS", ...}` |
+| `proj` | PROJ string | `+proj=longlat +datum=WGS84 +no_defs` |
+| `wkt1` | OGC WKT1 | `GEOGCS["WGS 84", ...]` |
+| `wkt2` | ISO 19162 WKT2 | `GEOGCRS["WGS 84", ...]` |
+
+### Example: GitHub repository
+
+Suppose you have a GitHub repo `myorg/crs-definitions` with the following 
structure:
+
+```
+crs-definitions/
+  epsg/
+    990001.proj
+    990002.proj
+```
+
+where `epsg/990001.proj` contains a PROJ string like:
+
+```
++proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0 +x_0=0 +y_0=0 +k=1 
+units=m +no_defs
+```
+
+Point Sedona to the raw GitHub content URL:
+
+```python
+config = (
+    SedonaContext.builder()
+    .config(
+        "spark.sedona.crs.url.base",
+        "https://raw.githubusercontent.com/myorg/crs-definitions/main";,
+    )
+    .config("spark.sedona.crs.url.pathTemplate", "/epsg/{code}.proj")
+    .config("spark.sedona.crs.url.format", "proj")
+    .getOrCreate()
+)
+sedona = SedonaContext.create(config)
+
+# Resolves EPSG:990001 from:
+# https://raw.githubusercontent.com/myorg/crs-definitions/main/epsg/990001.proj
+sedona.sql("""
+    SELECT ST_Transform(
+        ST_GeomFromText('POINT(-122.4194 37.7749)'),
+        'EPSG:4326',
+        'EPSG:990001'
+    ) AS transformed_point
+""").show()
+```
+
+### Example: self-hosted CRS server
+
+```python
+config = (
+    SedonaContext.builder()
+    .config("spark.sedona.crs.url.base", "https://crs.mycompany.com";)
+    .config("spark.sedona.crs.url.pathTemplate", "/epsg/{code}.proj")
+    .config("spark.sedona.crs.url.format", "proj")
+    .getOrCreate()
+)
+sedona = SedonaContext.create(config)
+
+# Now ST_Transform will try https://crs.mycompany.com/epsg/3857.proj
+# before falling back to built-in definitions
+sedona.sql("""
+    SELECT ST_Transform(
+        ST_GeomFromText('POINT(-122.4194 37.7749)'),
+        'EPSG:4326',
+        'EPSG:3857'
+    ) AS transformed_point
+""").show()
+```
+
+### Example: custom authority codes
+
+The URL provider is especially useful for custom or internal authority codes 
that are not in any public database. With the default path template 
`/{authority}/{code}.json`, the `{authority}` placeholder is replaced by the 
authority name from the CRS string (lowercased):
+
+```python
+config = (
+    SedonaContext.builder()
+    .config("spark.sedona.crs.url.base", "https://crs.mycompany.com";)
+    .config("spark.sedona.crs.url.format", "proj")
+    .getOrCreate()
+)
+sedona = SedonaContext.create(config)
+
+# Resolves MYORG:1001 from:
+# https://crs.mycompany.com/myorg/1001.json
+sedona.sql("""
+    SELECT ST_Transform(
+        ST_GeomFromText('POINT(-122.4194 37.7749)'),
+        'EPSG:4326',
+        'MYORG:1001'
+    ) AS transformed_point
+""").show()
+```
+
+### Example: using geometry SRID with URL provider
+
+If the geometry already has an SRID set (e.g., via `ST_SetSRID`), you can omit 
the source CRS parameter. The source CRS is derived from the geometry's SRID as 
an EPSG code:
+
+```python
+config = (
+    SedonaContext.builder()
+    .config("spark.sedona.crs.url.base", "https://crs.mycompany.com";)
+    .config("spark.sedona.crs.url.format", "proj")
+    .getOrCreate()
+)
+sedona = SedonaContext.create(config)
+
+# The source CRS is taken from the geometry's SRID (4326 → EPSG:4326).
+# Only the target CRS string is needed.
+sedona.sql("""
+    SELECT ST_Transform(
+        ST_SetSRID(ST_GeomFromText('POINT(-122.4194 37.7749)'), 4326),
+        'EPSG:3857'
+    ) AS transformed_point
+""").show()
+```
+
+### Disabling the URL provider
+
+To disable, omit `spark.sedona.crs.url.base` or set it to an empty string (the 
default).

Review Comment:
   This section says setting `spark.sedona.crs.url.base` to an empty string 
disables the URL provider, but the current registration logic only ever 
registers (it never unregisters when the config is cleared). Either update the 
implementation to actually unregister on disable, or adjust the docs to clarify 
that the provider remains registered for the lifetime of the executor JVM once 
enabled.
   ```suggestion
   To avoid enabling the URL provider, omit `spark.sedona.crs.url.base` or 
leave it as an empty string (the default) when the executor JVM starts.
   Note that once a URL provider has been registered in an executor JVM, later 
clearing this configuration or setting it back to an empty string does **not** 
unregister it; the provider remains active for the lifetime of that executor 
JVM.
   ```



##########
spark/common/src/test/scala/org/apache/sedona/sql/CRSTransformProj4Test.scala:
##########
@@ -855,4 +858,121 @@ class CRSTransformProj4Test extends TestBaseScala {
       assertEquals("All 40 points should transform successfully", 40, 
successCount)
     }
   }
+
+  describe("URL CRS Provider config integration") {
+
+    it("should still transform correctly when URL provider is not configured") 
{
+      // Verify default behavior (no URL provider) still works
+      sparkSession.conf.set("spark.sedona.crs.url.base", "")
+      val result = sparkSession
+        .sql("SELECT ST_Transform(ST_SetSRID(ST_GeomFromWKT('POINT (-122.4194 
37.7749)'), 4326), 'EPSG:4326', 'EPSG:3857')")
+        .first()
+        .getAs[Geometry](0)
+
+      assertNotNull(result)
+      assertEquals(3857, result.getSRID)
+      assertEquals(-13627665.27, result.getCoordinate.x, COORD_TOLERANCE)
+      assertEquals(4547675.35, result.getCoordinate.y, COORD_TOLERANCE)
+    }
+
+    it("should fall back to built-in when URL provider returns nothing") {
+      // Point to a non-existent server — provider will fail, should fall back 
to built-in
+      sparkSession.conf.set("spark.sedona.crs.url.base", "http://127.0.0.1:1";)
+      sparkSession.conf.set("spark.sedona.crs.url.pathTemplate", 
"/epsg/{code}.json")
+      sparkSession.conf.set("spark.sedona.crs.url.format", "projjson")
+      try {
+        val result = sparkSession
+          .sql("SELECT ST_Transform(ST_SetSRID(ST_GeomFromWKT('POINT 
(-122.4194 37.7749)'), 4326), 'EPSG:4326', 'EPSG:3857')")
+          .first()
+          .getAs[Geometry](0)
+
+        // Should succeed via built-in fallback
+        assertNotNull(result)
+        assertEquals(3857, result.getSRID)
+        assertEquals(-13627665.27, result.getCoordinate.x, COORD_TOLERANCE)
+        assertEquals(4547675.35, result.getCoordinate.y, COORD_TOLERANCE)
+      } finally {
+        sparkSession.conf.set("spark.sedona.crs.url.base", "")
+        org.datasyslab.proj4sedona.defs.Defs.removeProvider("sedona-url-crs")
+      }
+    }
+
+    it("should register URL CRS provider when config is set") {
+      sparkSession.conf.set("spark.sedona.crs.url.base", 
"https://test.example.com";)
+      sparkSession.conf.set("spark.sedona.crs.url.pathTemplate", 
"/epsg/{code}.json")
+      sparkSession.conf.set("spark.sedona.crs.url.format", "projjson")
+      try {
+        // Force a transform to trigger provider registration
+        val result = sparkSession
+          .sql("SELECT ST_Transform(ST_SetSRID(ST_GeomFromWKT('POINT 
(-122.4194 37.7749)'), 4326), 'EPSG:4326', 'EPSG:3857')")
+          .first()
+          .getAs[Geometry](0)
+
+        assertNotNull(result)
+
+        // Verify provider was registered
+        val providers = org.datasyslab.proj4sedona.defs.Defs.getProviders
+        val found = providers.stream().anyMatch(p => p.getName == 
"sedona-url-crs")
+        assertTrue("sedona-url-crs provider should be registered", found)
+      } finally {
+        sparkSession.conf.set("spark.sedona.crs.url.base", "")
+        org.datasyslab.proj4sedona.defs.Defs.removeProvider("sedona-url-crs")
+      }
+    }
+
+    it("should transform using local HTTP URL CRS provider with custom CRS") {
+      // Serve a deliberately wrong CRS definition for fake EPSG:990001 that no
+      // built-in provider knows. Uses Mercator with absurd false 
easting/northing.
+      // If the transform succeeds with shifted coordinates, the URL provider 
was used.
+      // If the URL provider didn't work, the transform would fail entirely.
+      val requestCount = new AtomicInteger(0)
+      val server = HttpServer.create(new InetSocketAddress(0), 0)
+      val port = server.getAddress.getPort
+
+      // Web Mercator with intentional 10M/20M false easting/northing
+      val weirdMercator =
+        "+proj=merc +a=6378137 +b=6378137 +lat_ts=0 +lon_0=0" +
+          " +x_0=10000000 +y_0=20000000 +k=1 +units=m +no_defs"
+
+      server.createContext(
+        "/epsg/",
+        exchange => {
+          val path = exchange.getRequestURI.getPath
+          if (path.contains("990001")) {
+            requestCount.incrementAndGet()
+            val body = weirdMercator.getBytes("UTF-8")
+            exchange.sendResponseHeaders(200, body.length)
+            exchange.getResponseBody.write(body)

Review Comment:
   Using `weirdMercator.getBytes("UTF-8")` relies on a string-typed charset 
name. Prefer `StandardCharsets.UTF_8` (or Scala equivalent) to avoid 
checked/legacy charset-name handling and make the encoding choice compile-time 
safe.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to