Kristin Cowalcijk created SEDONA-605:
----------------------------------------

             Summary: RS_AsRaster(useGeometryExtent=false) does not work with 
reference rasters with scaleX/Y < 1
                 Key: SEDONA-605
                 URL: https://issues.apache.org/jira/browse/SEDONA-605
             Project: Apache Sedona
          Issue Type: Bug
    Affects Versions: 1.6.0
            Reporter: Kristin Cowalcijk
         Attachments: zonal_stats_issue.zip

This problem is reported by users on Discord. They found that RS_ZonalStats 
does not work with a raster tile in EPSG:4326. Using the attached data you can 
see that the zonal stats computed are mostly NaN:
{code:python}
rawDf = spark.read.format("binaryFile").option("pathGlobFilter", 
"*.tiff").load("zonal_stats_issue/data_andalusia")
rawDf.createOrReplaceTempView("rawdf")
rasterDf = sedona.sql("""
SELECT
  RS_FromGeoTiff(content) as tile,
  path
FROM rawdf
""")
rasterDf.createOrReplaceTempView("l8imgs")

parcels = ShapefileReader.readToGeometryRDD(sedona, 
"zonal_stats_issue/parcelas")
parcles_df = Adapter.toDf(parcels, sedona)
parcles_df.createOrReplaceTempView("parcels")

features_df = sedona.sql("""
WITH matched_tile AS (
    SELECT path, tile, geometry, idPanel
    FROM l8imgs, parcels
    WHERE ST_Intersects(RS_Envelope(tile), parcels.geometry) OR 
ST_Within(RS_Envelope(tile), parcels.geometry)
)
SELECT path, idPanel, RS_ZonalStats(tile, geometry, 1, 'mean') as stats_mean 
FROM matched_tile
""")
features_df.show(1000, False). # <-- Lots of NaN here.
{code}
Output:
{code:java}
+----------------------------------------------------+--------------------+------------------+
|path                                                |idPanel             
|stats_mean        |
+----------------------------------------------------+--------------------+------------------+
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:9002:2 |NaN  
             |
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:4   |NaN  
             |
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:3   |NaN  
             |
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:30:2   |NaN  
             |
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:26:3   |NaN  
             |
|zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:27:4   |NaN  
             |
...
{code}
This problem is caused by incorrect rasterization of the parcel geometries when 
the reference raster has scaleX/scaleY smaller than 1. There's some bad 
double->int casting when computing the extent of the result of rasterization, 
which is:

1. Unnecessary when we're using the extent of the reference raster
2. Problematic when handling rasters with non-integral scaleX or scaleY values

This bug affects the following RS functions:
 # {{RS_AsRaster}}
 # {{RS_ZonalStats}}
 # {{RS_ZonalStatsAll}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to