Tristan Groenewold created SEDONA-744:
-----------------------------------------

             Summary: Sedona Geometry/Geography Spark UDTs not compatible with 
Iceberg v3 native Geometry/Geography types
                 Key: SEDONA-744
                 URL: https://issues.apache.org/jira/browse/SEDONA-744
             Project: Apache Sedona
          Issue Type: Bug
         Environment: "Spark": {
  "id": "Spark",
  "name": "Spark",
  "group": "spark",
  "properties": {
    "SPARK_HOME": {
      "name": "SPARK_HOME",
      "value": "/opt/spark/",
      "type": "string"
    },
    "spark.master": {
      "name": "spark.master",
      "value": "local[*]",
      "type": "string"
    },
    "spark.submit.deployMode": {
      "name": "spark.submit.deployMode",
      "value": "client",
      "type": "string"
    },
    "spark.app.name": {
      "name": "spark.app.name",
      "value": "Zeppelin",
      "type": "string"
    },
    "spark.driver.cores": { "value": "8" },
    "spark.driver.memory": { "value": "32g" },
    "spark.executor.cores": { "value": "8" },
    "spark.executor.memory": { "value": "24g" },
    "spark.executor.instances": { "value": "1" },
    "spark.jars": {
      "value": 
"/opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.9.2.jar,/opt/spark/jars/sedona-spark-shaded-3.5_2.12-1.8.0.jar,/opt/spark/jars/geotools-wrapper-1.8.0-33.1.jar"
    },
    "zeppelin.spark.useHiveContext": { "value": "true" },
    "spark.sql.extensions": {
      "value": 
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
    },
    "spark.sql.catalog.iceberg.uri": {
      "value": "thrift://hive-metastore:9083"
    },
    "spark.sql.catalog.iceberg.warehouse": {
      "value": "ofs://omservice/warehouse/iceberg"
    },

    ### 🔒 Trimmed secrets ###
    "spark.hadoop.javax.jdo.option.ConnectionUserName": {
      "value": "***REDACTED***"
    },
    "spark.hadoop.javax.jdo.option.ConnectionURL": {
      "value": "***REDACTED***"
    },
    "spark.hadoop.javax.jdo.option.ConnectionPassword": {
      "value": "***REDACTED***"
    },
    "spark.hive.metastore.uris": {
      "value": "thrift://hive-metastore:9083"
    },
    "spark.sql.hive.metastore.version": { "value": "3.1.3" },
    "spark.sql.hive.metastore.jars": { "value": "maven" },
    "spark.sql.hive.metastore.sharedPrefixes": {
      "value": 
"org.postgresql,org.apache.hadoop.hive.ql.io.parquet,org.apache.hadoop.hive.serde2"
    },
    "spark.sql.hive.convertMetastoreParquet": { "value": "false" },
    "spark.sql.hive.convertMetastoreOrc": { "value": "false" },
    "spark.sql.hive.caseSensitiveInferenceMode": { "value": "NEVER_INFER" },
    "spark.hadoop.fs.ofs.impl": {
      "value": "org.apache.hadoop.fs.ozone.RootedOzoneFileSystem"
    },
    "spark.hadoop.fs.AbstractFileSystem.ofs.impl": {
      "value": "org.apache.hadoop.fs.ozone.OzoneFileSystem"
    },
    "spark.hadoop.fs.defaultFS": { "value": "ofs://omservice/" },
    "spark.sql.warehouse.dir": { "value": "ofs://omservice/warehouse/spark" },
    "spark.executor.heartbeatInterval": { "value": "120s" },
    "spark.shuffle.service.enabled": { "value": "false" },
    "spark.dynamicAllocation.enabled": { "value": "false" },
    "spark.driver.bindAddress": { "value": "0.0.0.0" },
    "spark.network.timeout": { "value": "1200s" },
    "spark.sql.shuffle.partitions": { "value": "50" },
    "spark.sql.adaptive.enabled": { "value": "true" },
    "spark.sql.adaptive.coalescePartitions.enabled": { "value": "true" },
    "spark.storage.blockManagerSlaveTimeoutMs": { "value": "1200000" },
    "spark.shuffle.registration.timeout": { "value": "120000" },
    "spark.shuffle.registration.maxAttempts": { "value": "8" },
    "spark.sql.iceberg.vectorization.enabled": { "value": "true" }
  }
}

            Reporter: Tristan Groenewold
             Fix For: 1.8.0


When using Sedona with Spark 3.5 and Iceberg v3 tables, attempts to create an 
Iceberg table with {{geometry}} or {{geography}} fail. Iceberg v3 defines these 
as native types, but Sedona registers them as Spark User-Defined Types (UDTs). 
Spark’s SQL layer rejects UDTs in DDL for Iceberg tables with the error:

```python

pyspark.errors.exceptions.captured.UnsupportedOperationException: User-defined 
types are not supported

```

*Reproduction Code*

```code 

%Spark.pyspark

from sedona.spark import *
from sedona.register import SedonaRegistrator
from pyspark.sql.functions import expr

sedona = SedonaContext.create(spark)
SedonaRegistrator.registerAll(spark)

# Create Iceberg v3 table with geometry column
spark.sql("""
CREATE TABLE iceberg.geo.icetable2 (id string, geometry geometry)
USING iceberg
TBLPROPERTIES('format-version'='3')
""")

*Observed Behavior*
Fails with {{{}User-defined types are not supported{}}}.

*Expected Behavior*
Sedona geometries should be writable/readable as Iceberg v3 native {{geometry}} 
(and eventually {{{}geography{}}}) columns.

*Possible Approaches*
 * Align Sedona’s UDT registration with Spark’s logical type system so that 
Iceberg recognizes {{{}geometry{}}}/{{{}geography{}}} as native types.

 * Provide a type mapping bridge layer: Sedona UDT ↔ Iceberg v3 type.

 * Add explicit serializers/deserializers for Iceberg’s geometry type.

 * PRIOR EXISTING Community Approach (Spark 3.1,3.2,3.3 compatibility only)   : 
([https://github.com/spatialx-project/sedona-iceberg-extension/) 
|https://github.com/spatialx-project/sedona-iceberg-extension/)]
 *  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to