[GitHub] [sedona] jiayuasu commented on issue #854: Dynamic Index Build Side is Statically Chosen

via GitHub Thu, 08 Jun 2023 15:03:08 -0700


jiayuasu commented on issue #854:
URL: https://github.com/apache/sedona/issues/854#issuecomment-1583448888


   There are 3 important params involved in a spatial join:
   1. Spatial partitioning dominant side: `sedona.join.spatitionside`. Default: 
LEFT
   2. Spatial index build side: `sedona.join.indexbuildside`. Default: LEFT. 
See 
https://github.com/apache/sedona/blob/master/core/src/main/java/org/apache/sedona/core/joinJudgement/DynamicIndexLookupJudgement.java#L91
   3. Num of partitions of both RDDs used in a join. Default to use the num 
partition of `sedona.join.spatitionside` but will be optimized to reasonable 
partitions if the data is way less than num of partitions.
   
   The best practice is the Spatial partitioning dominant side and Spatial 
index build side should always be the large dataset (not the smaller dataset). 
To find out which one is larger, you can use the count of both RDDs. Note that: 
SpatialRDD.analyze() function already computes the count. You can leverage that 
to automatically determine the dominant side: 
https://github.com/apache/sedona/blob/master/sql/common/src/main/scala/org/apache/spark/sql/sedona_sql/strategy/join/TraitJoinQueryExec.scala#L59
   
   You can add the automation and leave `sedona.join.spatitionside` and 
`sedona.join.indexbuildside` as optional. In other words, our optimizer will 
automatically determine the two sides unless the user explicitly specifies the 
parameters.
   
   I will leave the implementation to you. But if you feel this is too hard, 
please let me know.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [sedona] jiayuasu commented on issue #854: Dynamic Index Build Side is Statically Chosen

Reply via email to