jiayuasu commented on issue #4455: URL: https://github.com/apache/datafusion-comet/issues/4455#issuecomment-4557003622
Hi, this is Jia from Apache Sedona / Wherobots. We were also a driving force of the Parquet Geo and Iceberg Geo design and implementation. Here is my 2 cents. Spark 4.2 will add the following 5 geospatial constructor and accessor functions to enable Parquet and Iceberg read/write ([reference](https://spark.apache.org/docs/4.2.0-preview1/api/sql/index.html#st_asbinary)): - `ST_AsBinary` - `ST_GeogFromWKB` - `ST_GeomFromWKB` - `ST_SRID` - `ST_SetSRID` So if Comet wants to enable Iceberg and Parquet read/write, we should add these 5 functions for Comet + Spark 4.2 compatibility, which unblocks Comet. Even just `ST_GeogFromWKB` and `ST_SRID` are already quite troublesome on their own, so I suggest we wrap SedonaDB for those. Going beyond these 5 functions leads to two problems: 1. **It changes the direction of Comet.** 2. **Maintaining ST functions is an enormous burden.** Many people underestimate this. It is not as simple as wrapping GEOS, Rust GEO, etc. Three reasons: 1. **Function behavior differs across libraries.** Do we want to follow SQL MM3, OGC, or PostGIS semantics? 2. **Lack of domain knowledge in code review.** We have seen this play out in many projects. People push domain-specific functions into the main engine, then nobody can effectively review them or improve their performance. Users resort to other solutions, and the built-in component either lacks maintenance (HIVE Spatial SQL) or gets deprecated (Spark GraphX). Do we want to repeat this? 3. **If people want something quick for simple use cases, we should enable that via extensions or UDFs.** For example, make it easy to use Comet together with SedonaDB, or wrap Rust GEO or GEOS. The main engine should not bear that burden and risk. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
