Tristan Groenewold created SEDONA-744:
-----------------------------------------
Summary: Sedona Geometry/Geography Spark UDTs not compatible with
Iceberg v3 native Geometry/Geography types
Key: SEDONA-744
URL: https://issues.apache.org/jira/browse/SEDONA-744
Project: Apache Sedona
Issue Type: Bug
Environment: "Spark": {
"id": "Spark",
"name": "Spark",
"group": "spark",
"properties": {
"SPARK_HOME": {
"name": "SPARK_HOME",
"value": "/opt/spark/",
"type": "string"
},
"spark.master": {
"name": "spark.master",
"value": "local[*]",
"type": "string"
},
"spark.submit.deployMode": {
"name": "spark.submit.deployMode",
"value": "client",
"type": "string"
},
"spark.app.name": {
"name": "spark.app.name",
"value": "Zeppelin",
"type": "string"
},
"spark.driver.cores": { "value": "8" },
"spark.driver.memory": { "value": "32g" },
"spark.executor.cores": { "value": "8" },
"spark.executor.memory": { "value": "24g" },
"spark.executor.instances": { "value": "1" },
"spark.jars": {
"value":
"/opt/spark/jars/iceberg-spark-runtime-3.5_2.12-1.9.2.jar,/opt/spark/jars/sedona-spark-shaded-3.5_2.12-1.8.0.jar,/opt/spark/jars/geotools-wrapper-1.8.0-33.1.jar"
},
"zeppelin.spark.useHiveContext": { "value": "true" },
"spark.sql.extensions": {
"value":
"org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions,org.apache.sedona.sql.SedonaSqlExtensions"
},
"spark.sql.catalog.iceberg.uri": {
"value": "thrift://hive-metastore:9083"
},
"spark.sql.catalog.iceberg.warehouse": {
"value": "ofs://omservice/warehouse/iceberg"
},
### 🔒 Trimmed secrets ###
"spark.hadoop.javax.jdo.option.ConnectionUserName": {
"value": "***REDACTED***"
},
"spark.hadoop.javax.jdo.option.ConnectionURL": {
"value": "***REDACTED***"
},
"spark.hadoop.javax.jdo.option.ConnectionPassword": {
"value": "***REDACTED***"
},
"spark.hive.metastore.uris": {
"value": "thrift://hive-metastore:9083"
},
"spark.sql.hive.metastore.version": { "value": "3.1.3" },
"spark.sql.hive.metastore.jars": { "value": "maven" },
"spark.sql.hive.metastore.sharedPrefixes": {
"value":
"org.postgresql,org.apache.hadoop.hive.ql.io.parquet,org.apache.hadoop.hive.serde2"
},
"spark.sql.hive.convertMetastoreParquet": { "value": "false" },
"spark.sql.hive.convertMetastoreOrc": { "value": "false" },
"spark.sql.hive.caseSensitiveInferenceMode": { "value": "NEVER_INFER" },
"spark.hadoop.fs.ofs.impl": {
"value": "org.apache.hadoop.fs.ozone.RootedOzoneFileSystem"
},
"spark.hadoop.fs.AbstractFileSystem.ofs.impl": {
"value": "org.apache.hadoop.fs.ozone.OzoneFileSystem"
},
"spark.hadoop.fs.defaultFS": { "value": "ofs://omservice/" },
"spark.sql.warehouse.dir": { "value": "ofs://omservice/warehouse/spark" },
"spark.executor.heartbeatInterval": { "value": "120s" },
"spark.shuffle.service.enabled": { "value": "false" },
"spark.dynamicAllocation.enabled": { "value": "false" },
"spark.driver.bindAddress": { "value": "0.0.0.0" },
"spark.network.timeout": { "value": "1200s" },
"spark.sql.shuffle.partitions": { "value": "50" },
"spark.sql.adaptive.enabled": { "value": "true" },
"spark.sql.adaptive.coalescePartitions.enabled": { "value": "true" },
"spark.storage.blockManagerSlaveTimeoutMs": { "value": "1200000" },
"spark.shuffle.registration.timeout": { "value": "120000" },
"spark.shuffle.registration.maxAttempts": { "value": "8" },
"spark.sql.iceberg.vectorization.enabled": { "value": "true" }
}
}
Reporter: Tristan Groenewold
Fix For: 1.8.0
When using Sedona with Spark 3.5 and Iceberg v3 tables, attempts to create an
Iceberg table with {{geometry}} or {{geography}} fail. Iceberg v3 defines these
as native types, but Sedona registers them as Spark User-Defined Types (UDTs).
Spark’s SQL layer rejects UDTs in DDL for Iceberg tables with the error:
```python
pyspark.errors.exceptions.captured.UnsupportedOperationException: User-defined
types are not supported
```
*Reproduction Code*
```codeÂ
%Spark.pyspark
from sedona.spark import *
from sedona.register import SedonaRegistrator
from pyspark.sql.functions import expr
sedona = SedonaContext.create(spark)
SedonaRegistrator.registerAll(spark)
# Create Iceberg v3 table with geometry column
spark.sql("""
CREATE TABLE iceberg.geo.icetable2 (id string, geometry geometry)
USING iceberg
TBLPROPERTIES('format-version'='3')
""")
*Observed Behavior*
Fails with {{{}User-defined types are not supported{}}}.
*Expected Behavior*
Sedona geometries should be writable/readable as Iceberg v3 native {{geometry}}
(and eventually {{{}geography{}}}) columns.
*Possible Approaches*
* Align Sedona’s UDT registration with Spark’s logical type system so that
Iceberg recognizes {{{}geometry{}}}/{{{}geography{}}} as native types.
* Provide a type mapping bridge layer: Sedona UDT ↔ Iceberg v3 type.
* Add explicit serializers/deserializers for Iceberg’s geometry type.
* PRIOR EXISTING Community Approach (Spark 3.1,3.2,3.3 compatibility only)Â Â :
([https://github.com/spatialx-project/sedona-iceberg-extension/)
|https://github.com/spatialx-project/sedona-iceberg-extension/)]
* Â
--
This message was sent by Atlassian Jira
(v8.20.10#820010)