adamaps opened a new issue, #1335:
URL: https://github.com/apache/sedona/issues/1335

   Rather than implicitly setting (or assuming) a Coordinate Reference System 
(CRS) for a Sedona dataframe derived from a shapefile, it would be helpful to 
include the known CRS as a dataframe property or as EWKT geometry.
   
   The CRS can be obtained from the corresponding shapefile .prj file (if one 
exists) using the parent `ID` property of the WKT CRS definition based on an 
OGC WKT CRS standard (https://www.ogc.org/standard/wkt-crs/). If the .prj file 
is not formatted to this standard or does not exist then no CRS shall be 
retrieved.
   
   The parent ID for this WKT CRS example is `["EPSG": 7930]`
   ```
   GEODCRS["ETRF2000",
    DATUM["European Terrestrial Reference Frame 2000",
    ELLIPSOID["GRS 1980",6378137,298.257222101]
    ],
    CS[Cartesian,3],
    AXIS["(X)",geocentricX],
    AXIS["(Y)",geocentricY],
    AXIS["(Z)",geocentricZ],
    LENGTHUNIT["metre",1.0],
    DEFININGTRANSFORMATION["ITRF2000 to ETRF2000 (EUREF)",ID["EPSG",7940]],
   ID["EPSG",7930]
   ]
   ```
   
   ## Expected behavior
   
   When reading a shapefile from a folder path using the ShapefileReader class 
(any method: `readToGeometryRDD`, `readToPolygonRDD`, `readToPolygonRDD`, 
`readToLineStringRDD`), include a dataframe property (or return geometry as 
EWKT) that stores the shapefile CRS as defined in shapefile.prj (if it exists 
in the input folder).
   
   
   ## Actual behavior
   
   Shapefile geometry is retrieved as WKT without any CRS information, even if 
it exists in the .prj file. This makes it difficult to work with a variety of 
shapefile inputs that may be based on a variety of coordinate systems. 
   
   ## Steps to reproduce the problem
   
   Example usage here is to transform to a desired coordinate system regardless 
of the input coordinate system, making use of the `sedona-1.5.1` supported 
format `ST_Transform (A: Geometry, TargetCRS: String)` as described here: 
https://sedona.apache.org/1.5.0/api/sql/Function/#st_transform.
   
   ```python
   shp_rdd = ShapefileReader.readToGeometryRDD(SEDONA, shp_file)
   shp_df = Adapter.toDf(shp_rdd, SEDONA)
   shp_df.createOrReplaceTempView("shp_data")
   output_df = SEDONA.sql("select ST_Transform(geometry, 'EPSG:4326') as 
geometry")
   ```
   
   This returns the following error:
   ```
   Source CRS must be specified. No SRID found on geometry.
   ```
   
   ## Settings
   
   Sedona version = 1.5.1
   
   Apache Spark version = 3.3.0
   
   API type = Python
   
   Python version = 3.10
   
   Environment = AWS Glue 4.0 using `sedona-spark-shaded-3.0_2.12-1.5.1.jar` 
and `geotools-wrapper-1.5.1-28.2.jar`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to