Copilot commented on code in PR #598:
URL: https://github.com/apache/sedona-db/pull/598#discussion_r2794200776
##########
rust/sedona-spatial-join/src/prepare.rs:
##########
@@ -296,8 +297,14 @@ impl SpatialJoinComponentsBuilder {
Ok(build_partitioner)
}
- /// Construct a `SpatialPartitioner` (e.g. Flat) from the statistics of
partitioned build
- /// side for partitioning the probe side.
+ /// The number of partitions above which the probe side uses an RTree
+ /// partitioner instead of a flat (linear-scan) partitioner. Benchmarks
+ /// show the crossover at ~36 partitions; 48 gives a comfortable margin.
Review Comment:
The constant `RTREE_PARTITION_THRESHOLD` lacks inline documentation
explaining the performance trade-offs. Consider adding a doc comment above the
constant that briefly explains this is based on benchmark results showing the
crossover point at ~36 partitions, similar to the detailed comments in the
function below.
```suggestion
/// Threshold for switching the probe side from a flat (linear-scan)
/// partitioner to an RTree-based partitioner. Benchmarks show that the
/// additional RTree build/query overhead only pays off once there are
/// roughly 36 partitions; we use 48 here to provide a conservative
margin
/// for different data distributions and workloads.
```
##########
rust/sedona-spatial-join/bench/partitioning/flat_vs_rtree.rs:
##########
@@ -0,0 +1,86 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements. See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership. The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License. You may obtain a copy of the License at
+//
+// http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied. See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Head-to-head benchmark of FlatPartitioner vs RTreePartitioner across
+//! varying partition counts to find the optimal switch point.
+
+mod common;
+
+use std::hint::black_box;
+
+use common::{default_extent, grid_partitions, sample_queries,
QUERY_BATCH_SIZE};
+use criterion::{criterion_group, criterion_main, BenchmarkId, Criterion,
Throughput};
+use sedona_spatial_join::partitioning::{
+ flat::FlatPartitioner, rtree::RTreePartitioner, SpatialPartitioner,
+};
+
+/// Grid dimensions to benchmark. Each produces dim*dim partitions.
+/// 4x4=16, 5x5=25, 6x6=36, 8x8=64, 10x10=100, 16x16=256, 20x20=400
Review Comment:
The comment on line 31-32 explains the partition counts but doesn't clarify
why these specific dimensions were chosen. Consider adding a note that these
dimensions were selected to bracket the crossover point (36 partitions)
observed in the PR description.
```suggestion
/// 4x4=16, 5x5=25, 6x6=36, 8x8=64, 10x10=100, 16x16=256, 20x20=400.
/// These values were chosen to bracket the empirically observed crossover
/// point around 36 partitions (6x6), covering both smaller and larger grids.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]