jiayuasu commented on issue #4455:
URL: 
https://github.com/apache/datafusion-comet/issues/4455#issuecomment-4557003622

   Hi, this is Jia from Apache Sedona / Wherobots. We were also a driving force 
of the Parquet Geo and Iceberg Geo design and implementation.
   
   Here is my 2 cents.
   
   Spark 4.2 will add the following 5 geospatial constructor and accessor 
functions to enable Parquet and Iceberg read/write 
([reference](https://spark.apache.org/docs/4.2.0-preview1/api/sql/index.html#st_asbinary)):
   
   - `ST_AsBinary`
   - `ST_GeogFromWKB`
   - `ST_GeomFromWKB`
   - `ST_SRID`
   - `ST_SetSRID`
   
   So if Comet wants to enable Iceberg and Parquet read/write, we should add 
these 5 functions for Comet + Spark 4.2 compatibility, which unblocks Comet. 
Even just `ST_GeogFromWKB` and `ST_SRID` are already quite troublesome on their 
own, so I suggest we wrap SedonaDB for those.
   
   Going beyond these 5 functions leads to two problems:
   
   1. **It changes the direction of Comet.**
   
   2. **Maintaining ST functions is an enormous burden.** Many people 
underestimate this. It is not as simple as wrapping GEOS, Rust GEO, etc. Three 
reasons:
   
      1. **Function behavior differs across libraries.** Do we want to follow 
SQL MM3, OGC, or PostGIS semantics?
   
      2. **Lack of domain knowledge in code review.** We have seen this play 
out in many projects. People push domain-specific functions into the main 
engine, then nobody can effectively review them or improve their performance. 
Users resort to other solutions, and the built-in component either lacks 
maintenance (HIVE Spatial SQL) or gets deprecated (Spark GraphX). Do we want to 
repeat this?
   
      3. **If people want something quick for simple use cases, we should 
enable that via extensions or UDFs.** For example, make it easy to use Comet 
together with SedonaDB, or wrap Rust GEO or GEOS. The main engine should not 
bear that burden and risk.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to