Hi Alessandro, You cannot use Sedona KNNQuery.SpatialKNNquery after DistanceJoinQuery. You should add your own filtering logic (in Spark mappartition func) after DistanceJoinQuery result.
1. Your contribution should cover RDD API. For now, I cannot think of a SQL Syntax that describes the KNN join query. 2. Your contribution should cover both Scala/Java API and Python. The core algorithm will be implemented in Java KNNQuery.java. By default, it automatically works for Scala. For Python support, you need to have a corresponding wrapper API in Python. But you can first finish the Java implementation, and then create the PR and consult Pawel @Paweł Kociński <pawel93kocin...@gmail.com> who is the lead of Python API. 3. You can refer to [1] [2] for compiling and documenting your work. But you won't be able to publish since you are not a Sedona committer. Thanks, Jia On Mon, Mar 29, 2021 at 2:42 AM Alessandro Calvio <alexcal...@hotmail.it> wrote: > Hi all, > > thank you for your answer. > > It would be very interesting understand how to implement the solution > proposed by the paper in Apache Sedona. > > Anyway, I think I could try to implement the simplified version proposed > by you. If I understand correctly it would be like use the current > *SpatialKNNQuery > *function on the geometries filtered out by *DistanceJoinQuery*, am I > right? > > > > Can I refer to these links [1], [2] and [3] as guide to the compilation > and publish mechanism? And what about the limitations of the contribution > mentioned in my previous questions? > > > > Finally, I didn’t receive the mail of Adam but yes, the expected output > would have been the one described by you. > > > > Thanks, > > Best regards, > > Alessandro > > > > [1]: https://sedona.incubator.apache.org/community/rule/ > > [2]: > https://sedona.incubator.apache.org/download/compile/#compile-the-documentation > > [3]: https://sedona.incubator.apache.org/download/publish/ > > > > > > *Da: *Jia Yu <jiayu198...@gmail.com> > *Inviato: *lunedì 29 marzo 2021 07:53 > *A: *dev@sedona.apache.org; alexcal...@hotmail.it; adam...@gmail.com > *Oggetto: *Re: Apache Sedona contribution > > > > Hi folks, > > > > Thanks for your proposal. However, the reason why Sedona does not have KNN > Join query is that a complete and correct KNN join is very difficult to > implement. > > > > Note that: the existing spatial partitioning scheme in Sedona cannot yield > KNN join correctly because once you zip two RDDs together, there is no > guarantee that for each point in Partition A of RDD1, you can find its kth > neighbor in Partition A of RDD2. To implement a correct KNN join, we need > to find a correct partitioning mechanism. This research problem has been > studied by this TKDE paper: > https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1 > > We have confirmed that this is the correct solution we want. > > > > Alessandro, if you want to proceed, I would suggest that, you can > implement a simplified version of KNN Join which is: > > > > For each obj in RDD 1, within its D radius circle, find its k nearest > neighbors in RDD2. > > > > To do so, you can apply a KNN neighbor map function after Sedona > JoinQuery.DistanceJoinQuery API: > https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289 > or > https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253 > > > > Thanks, > > Jia > > > > > > > > > > > > > > On Fri, Mar 26, 2021 at 8:35 PM Adam Binford <adam...@gmail.com> wrote: > > Out of curiosity and knowing next to nothing about KNN, what is the return > value supposed to represent? The K nearest nearest geometries in spatialRDD > to any geometry in dataset point? > > Adam > > On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <alexcal...@hotmail.it> > wrote: > > > Hi, > > I’m a graduated in Computer Engineering and I am writing in connection > > with the possibility to contribute to the Apache Sedona project. > > During my work I bumped into a problem regarding the incapability to > > perform the KNNQuery operation with a dataset rather than a single point. > > Hence, the contribution will enhance the library with a new signature of > > the SpatialKNNQuery: > > > > public static <U extends Geometry, T extends Geometry> List<T> > > SpatialKnnQuery( > > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean > > useIndex > > ) > > > > The solution I’ve tried is similar to the one exploited for the > > Join-Query. In a few words, I’ll subdivide both dataset geographically, > zip > > the partitions together and finally iterate on each partition computing > the > > nearest neighbour query. > > I’d like to know if it could be a good proposal for a contribution and > ask > > you some questions about the idea: > > > > 1. Can the contribution be limited to RDD API or should it cover the > > SQL API too? > > 2. Can the contribution be limited to enhance the Scala/Java API or > > should it cover the Python API too? > > 3. Need the tests to be runned in local or should I deploy something > > like a cluster? > > > > It would be my first contribution in a open-source project so I’m not > very > > experienced in these kind of procedures. I want to be sure that I can > > develop and submit my solution in a correct environment: where could I > find > > a guide or doc with all the steps to do this after a possible approval? > > > > Waiting for a response, > > Best regards, > > Alessandro. > > > > >