Hi Richard, The problem that you are working on is called KNN join. A distributed and accurate KNN join is very hard to implement although an existing paper already provides the detailed algorithm.
I would suggest that you do an approximate KNN join. This could be done in two steps (1) do a spatial distance join in Sedona. The distance means you only look for KNN of a spatial object within such distance. (2) For each object and its potential neighbors, perform a KNN check. You could use Sedona RDD API to do Step 1, then write a little bit of code to implement the second logic. Thanks, Jia On Fri, Jul 22, 2022 at 10:39 AM Chud Muckram <[email protected]> wrote: > Hi, > > I have a large dataset of points on the order of hundreds of millions of > points and a dataset of lines that is on the order of millions of lines. I > was looking for a method in apache sedona to do the following: > For every point in my dataset find the distance to the nearest line. > > I think somehow using spatial knn to loop over every point would work but I > dont see any function that does that in the documentation. In the > documentation spatial knn does it for one query point and a feature > dataset. > > Thanks, > Richard >
