Kontinuation opened a new pull request, #739:
URL: https://github.com/apache/incubator-sedona/pull/739
## Did you read the Contributor Guide?
- Yes, I have read [Contributor
Rules](https://sedona.apache.org/community/rule/) and [Contributor Development
Guide](https://sedona.apache.org/community/develop/)
## Is this PR related to a JIRA ticket?
- Yes, the URL of the assoicated JIRA ticket is
https://issues.apache.org/jira/browse/SEDONA-207. The PR name follows the
format `[SEDONA-XXX] my subject`.
## What changes were proposed in this PR?
### New Geometry Serde
The new geometry serde was implemented in
`common/src/main/java/org/apache/sedona/common/geometrySerde/`. The ShapeSerde
used by Kryo serializer and the WKB based serde used by `GeometryUDT` were
replaced by this new serde. Please refer to
[SEDONA-207](https://issues.apache.org/jira/browse/SEDONA-207) for a detailed
explanation of this new geometry serde.
### GeoParquet
GeoParquet stores geometry objects as WKB binary values, which happens to be
the old serialization format of `GeometryUDT` thus no special treatment was
needed. This PR changed the serialization format of `GeometryUDT`, so geometry
values in GeoParquet files need to be explicitly parsed and serialized.
### GeometryUDT in Python
We've implemented the new serialization format in pure python, it is 2~3x
slower than `shapely.wkb.loads/dumps`, which would impact the performance of
`collect`, `toPandas` and Python UDFs in pyspark. We'll explore ways to
implement it as a CPython extension to achieve good performance.
## How was this patch tested?
Unit tests were added to test this patch. This patch was also manually
tested on a Spark standalone cluster.
The geometry serde code for Python was manually tested with shapely 2.0. We
need to update the python unit tests to be compatible with shapely 2.0 in the
future.
## Did this PR include necessary documentation updates?
- No, this PR does not affect any public API so no need to change the docs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org