Kristin Cowalcijk created SEDONA-605: ----------------------------------------
Summary: RS_AsRaster(useGeometryExtent=false) does not work with reference rasters with scaleX/Y < 1 Key: SEDONA-605 URL: https://issues.apache.org/jira/browse/SEDONA-605 Project: Apache Sedona Issue Type: Bug Affects Versions: 1.6.0 Reporter: Kristin Cowalcijk Attachments: zonal_stats_issue.zip This problem is reported by users on Discord. They found that RS_ZonalStats does not work with a raster tile in EPSG:4326. Using the attached data you can see that the zonal stats computed are mostly NaN: {code:python} rawDf = spark.read.format("binaryFile").option("pathGlobFilter", "*.tiff").load("zonal_stats_issue/data_andalusia") rawDf.createOrReplaceTempView("rawdf") rasterDf = sedona.sql(""" SELECT RS_FromGeoTiff(content) as tile, path FROM rawdf """) rasterDf.createOrReplaceTempView("l8imgs") parcels = ShapefileReader.readToGeometryRDD(sedona, "zonal_stats_issue/parcelas") parcles_df = Adapter.toDf(parcels, sedona) parcles_df.createOrReplaceTempView("parcels") features_df = sedona.sql(""" WITH matched_tile AS ( SELECT path, tile, geometry, idPanel FROM l8imgs, parcels WHERE ST_Intersects(RS_Envelope(tile), parcels.geometry) OR ST_Within(RS_Envelope(tile), parcels.geometry) ) SELECT path, idPanel, RS_ZonalStats(tile, geometry, 1, 'mean') as stats_mean FROM matched_tile """) features_df.show(1000, False). # <-- Lots of NaN here. {code} Output: {code:java} +----------------------------------------------------+--------------------+------------------+ |path |idPanel |stats_mean | +----------------------------------------------------+--------------------+------------------+ |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:9002:2 |NaN | |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:4 |NaN | |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:32:3 |NaN | |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:30:2 |NaN | |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:26:3 |NaN | |zonal_stats_issue/data_andalusia/s2_20240604_01.tiff|14:38:0:0:14:27:4 |NaN | ... {code} This problem is caused by incorrect rasterization of the parcel geometries when the reference raster has scaleX/scaleY smaller than 1. There's some bad double->int casting when computing the extent of the result of rasterization, which is: 1. Unnecessary when we're using the extent of the reference raster 2. Problematic when handling rasters with non-integral scaleX or scaleY values This bug affects the following RS functions: # {{RS_AsRaster}} # {{RS_ZonalStats}} # {{RS_ZonalStatsAll}} -- This message was sent by Atlassian Jira (v8.20.10#820010)