cholmes commented on issue #10260: URL: https://github.com/apache/iceberg/issues/10260#issuecomment-2260700912
> I agree that reducing the amount of options is desired. However, the argument in favour of PROJJSON is biased. It assumes that there is only two options: having PROJ, or having no referencing library at all. PROJJSON is an open standard, not a reference library. It follows the rich tradition of geojson, georss, vector tiles, STAC, mbtiles, pmtiles, flatgeobuf, zarr, copc and many others in that it has started in the open source community and in real usage, and most have evolved to some form of formal standardization. Yes, PROJJSON right now only has a single implementation, but it is written as a JSON encoding of WKT2:2019, and the goal is to become a standard. > The third option, having a referencing library other than PROJ (e.g. ESRI, GeoTools, Apache SIS, Proj4J, PyCRS and more that I don't know) seems completely ignored. No, that's not completely ignored - those just don't yet implement projjson. To me the next step is to push for them to implement it, and to try to find funding to enable that. The twist seems to be that many don't fully implement WKT2:2019. If they have a wkt2 implementation the parsing from JSON to wkt seems to be fairly easy - it took a day or two to do it for javascript. If OGC insists on a CRSJSON that differs too much from PROJJSON then libraries should be able to parse both and put them into the same WKT2:2019 data model. > A standard CRS JSON is very likely to happen, just not now. It may be a matter of about 2 years. This delay is the price to pay for better consistency with ISO 19111 and ISO 19115-4. PROJJSON is not 1.0, and can easily evolve to be completely consistent with how the CRS spec evolves. But we need something that works today, not two years from now. Like I said above my hope is that PROJJSON can evolve to be consistent with CRSJSON, or even merge them. But if there don't manage to 100% align then libraries should be able to easily parse both. > But we are going in circles: JSON is easier to parse for non-geospatial libraries, but WKT is better supported by all geospatial libraries other than PROJ. It is not obvious to said which side is more important. If we want geospatial to have a bigger impact on the world than the size of the existing geospatial market it is clear to me that being easier to parse for non-geospatial libraries is more important. We can't expect every implementation of iceberg to include geospatial libraries, so we need a smooth 'on-ramp' for implementors to support geospatial without understanding the depths of coordinate reference systems. We have a great start, with just focusing on OGC:CRS84. Having a next step be to just understand a few common CRS's by parsing JSON seems like a good way to meet people 'more than half way'. And then geospatial libraries can evolve to support JSON encoding of CRS's (PROJJSON and/or CRS JSON) - and ideally we in the geospatial community work out that set of recommendations. For now I think that bit is more important for GeoParquet, where the clear 'native' format to use for Parquet metadata is JSON. And I think we should all work together to get to a path from where we are today to the two year goal - we are loath to do a 2.0 for GeoParquet, but we could consider it if there is clear consensus between the various geospatial communities on the need for a breaking change from PROJJSON. For Iceberg I do think the best answer is the SPATIAL_REF_SYS table, text from the [core spec](https://portal.ogc.org/files/?artifact_id=25354) ``` 6.1.3 Identification of Spatial Reference Systems Every Geometry Column and every geometric entity is associated with exactly one Spatial Reference System. The Spatial Reference System identifies the coordinate system for all geometric objects stored in the column, and gives meaning to the numeric coordinate values for any geometric object stored in the column. Examples of commonly used Spatial Reference Systems include ―Latitude Longitude‖ and ―UTM Zone 10‖. The SPATIAL_REF_SYS table stores information on each Spatial Reference System in the database. The columns of this table are the Spatial Reference System Identifier (SRID), the Spatial Reference System Authority Name (AUTH_NAME), the Authority Specific Spatial Reference System Identifier (AUTH_SRID) and the Wellknown Text description of the Spatial Reference System (SRTEXT). The Spatial Reference System Identifier (SRID) constitutes a unique integer key for a Spatial Reference System within a database. Interoperability between clients is achieved via the SRTEXT column which stores the Well-known Text representation for a Spatial Reference System. ``` And there are additional details [in postgis docs](https://postgis.net/docs/manual-1.4/ch04.html#spatial_ref_sys) and [geopackage spec](https://www.geopackage.org/spec/#spatial_ref_sys). This allows SRID to be used, but includes a table of all the core WKT values to map to those SRID's, and lets users define their own. I think this means that core iceberg should not need to know PROJJSON. I do still believe PROJJSON is the best choice for GeoParquet and Parquet, and we can continue to work together to figure out the best approach there so the entire ecosystem works well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org