Hi Richard,

The problem that you are working on is called KNN join. A distributed and
accurate KNN join is very hard to implement although an existing paper
already provides the detailed algorithm.

I would suggest that you do an approximate KNN join. This could be done in
two steps (1) do a spatial distance join in Sedona. The distance means you
only look for KNN of a spatial object within such distance. (2) For each
object and its potential neighbors, perform a KNN check. You could use
Sedona RDD API to do Step 1, then write a little bit of code to implement
the second logic.

Thanks,
Jia

On Fri, Jul 22, 2022 at 10:39 AM Chud Muckram <[email protected]> wrote:

> Hi,
>
> I have a large dataset of points on the order of hundreds of millions of
> points and a dataset of lines that is on the order of millions of lines.  I
> was looking for a method in apache sedona to do the following:
> For every point in my dataset find the distance to the nearest line.
>
> I think somehow using spatial knn to loop over every point would work but I
> dont see any function that does that in the documentation.  In the
> documentation spatial knn does it for one query point and a feature
> dataset.
>
> Thanks,
> Richard
>

Reply via email to