[
https://issues.apache.org/jira/browse/SEDONA-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17723350#comment-17723350
]
Jia Yu commented on SEDONA-278:
-------------------------------
[~qmailhos] Just to add my two cents:
Before Sedona 1.4.0, the binary format of Geometry type in Sedona "happens" to
be in EWKB format, which we didn't describe in our public doc. When you
directly store geometries to disk, the on-disk binary will be WKB. You can read
back and create Geometries using ST_GeomFromWKB. This is kind of considered as
internal APIs so we changed it in 1.4.0 without thinking about how this will
affect users.
In Sedona 1.4.0, to accelerate serialization/deserialization, the binary format
of Geometry type in Sedona is no longer in EWKB format. So when you read back,
you cannot create geometries using ST_GeomFromWKB.
So the correct way to store it in Deltalake in all Sedona versions is, first
call ST_AsEWKB to convert Sedona geometries to WKB format, then store the table
to Detlalake. When you read back, use ST_GeomFromWKB to create the geometry
column.
> WKB geometry column in Delta Lake table not recognized as such in functions
> ST_GeomFromWKB or ST_AsText
> -------------------------------------------------------------------------------------------------------
>
> Key: SEDONA-278
> URL: https://issues.apache.org/jira/browse/SEDONA-278
> Project: Apache Sedona
> Issue Type: Bug
> Affects Versions: 1.4.0
> Environment: Databricks Runtime 12.1 with Apache Sedona 1.4.0
> Reporter: Quentin Mailhos
> Priority: Major
> Labels: Binary, WKB
> Fix For: 1.4.1
>
> Attachments: MicrosoftTeams-image (3).png
>
>
> After upgrading to Databricks DBR 12.1 with Apache Sedona 1.4.0, functions
> ST_GeomFromWKB and ST_AsText fail to read a Well-Known Binary (WKB) type
> column from a Delta Lake table, please see attached screenshot.
> SQL error message is ambiguous:
> [DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve
> "st_geomfromwkb(geom)" due to data type mismatch: parameter 1 requires
> ("STRING" or "BINARY") type, however, "geom" is of "BINARY" type.;
> Spark error message as follows:
> Caused by: org.apache.spark.sql.AnalysisException: Invalid Spark read type:
> expected optional group geom (LIST) \{ repeated group list { required int32
> element (INTEGER(8,true)); } } to be list type but found Some(BinaryType)
> Workflow used to work just fine in Databricks DBR 9 LTS with Apache Sedona
> 1.1.0.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)