[PR] [SEDONA-637] Show spatial filters pushed to GeoParquet scans in the query plan; allow disabling spatial filter pushdown [sedona]

via GitHub Mon, 05 Aug 2024 06:52:18 -0700


Kontinuation opened a new pull request, #1540:
URL: https://github.com/apache/sedona/pull/1540


   ## Did you read the Contributor Guide?
   
   - Yes, I have read the [Contributor 
Rules](https://sedona.apache.org/latest-snapshot/community/rule/) and 
[Contributor Development 
Guide](https://sedona.apache.org/latest-snapshot/community/develop/)
   
   ## Is this PR related to a JIRA ticket?
   
   - Yes, the URL of the associated JIRA ticket is 
https://issues.apache.org/jira/browse/SEDONA-637. The PR name follows the 
format `[SEDONA-XXX] my subject`.
   
   ## What changes were proposed in this PR?
   
   Spatial filters pushed down to the GeoParquet scan node are visible in the 
query plan. For example, the following query
   ```python
   df.where("ST_Intersects(geometry, ST_Point(1, 1))").explain()
   ```
   
   Produces the following query plan
   ```
   == Physical Plan ==
   Filter (isnotnull(geometry#218) AND  
**org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  )
   +- FileScan geoparquet [id#217L,geometry#218,bbox#219] Batched: false, 
DataFilters: [isnotnull(geometry#218),  
**org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  ], Format: 
GeoParquet with spatial filter [geometry INTERSECTS POINT (1 1)], Location: 
InMemoryFileIndex(1 paths).., PartitionFilters: [], PushedFilters: 
[IsNotNull(geometry)], ReadSchema: 
struct<id:bigint,geometry:binary,bbox:struct<xmin:double,ymin:double,xmax:double,ymax:double>>
   ```
   
   The spatial filters pushed down to GeoParquet scan is shown in `Format: 
GeoParquet with spatial filter [...]`.
   
   Spatial filter push-down can be manually disabled by configuring the Spark 
configuration `spark.sedona.geoparquet.spatialFilterPushDown` to `false`:
   
   ```
   spark.conf.set("spark.sedona.geoparquet.spatialFilterPushDown", "false")
   df.where("ST_Intersects(geometry, ST_Point(1, 1))").explain()
   ```
   
   ```
   == Physical Plan ==
   Filter (isnotnull(geometry#218) AND  
**org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  )
   +- FileScan geoparquet [id#217L,geometry#218,bbox#219] Batched: false, 
DataFilters: [isnotnull(geometry#218),  
**org.apache.spark.sql.sedona_sql.expressions.ST_Intersects**  ], Format: 
GeoParquet, Location: InMemoryFileIndex(1 paths).., PartitionFilters: [], 
PushedFilters: [IsNotNull(geometry)], ReadSchema: 
struct<id:bigint,geometry:binary,bbox:struct<xmin:double,ymin:double,xmax:double,ymax:double>>
   ```
   
   ## How was this patch tested?
   
   Pass newly added tests
   
   ## Did this PR include necessary documentation updates?
   
   - Yes, I have updated the documentation.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] [SEDONA-637] Show spatial filters pushed to GeoParquet scans in the query plan; allow disabling spatial filter pushdown [sedona]

Reply via email to