Hi all, thank you for your answer. It would be very interesting understand how to implement the solution proposed by the paper in Apache Sedona. Anyway, I think I could try to implement the simplified version proposed by you. If I understand correctly it would be like use the current SpatialKNNQuery function on the geometries filtered out by DistanceJoinQuery, am I right?
Can I refer to these links [1], [2] and [3] as guide to the compilation and publish mechanism? And what about the limitations of the contribution mentioned in my previous questions? Finally, I didn’t receive the mail of Adam but yes, the expected output would have been the one described by you. Thanks, Best regards, Alessandro [1]: https://sedona.incubator.apache.org/community/rule/ [2]: https://sedona.incubator.apache.org/download/compile/#compile-the-documentation [3]: https://sedona.incubator.apache.org/download/publish/ Da: Jia Yu<mailto:jiayu198...@gmail.com> Inviato: lunedì 29 marzo 2021 07:53 A: dev@sedona.apache.org<mailto:dev@sedona.apache.org>; alexcal...@hotmail.it<mailto:alexcal...@hotmail.it>; adam...@gmail.com<mailto:adam...@gmail.com> Oggetto: Re: Apache Sedona contribution Hi folks, Thanks for your proposal. However, the reason why Sedona does not have KNN Join query is that a complete and correct KNN join is very difficult to implement. Note that: the existing spatial partitioning scheme in Sedona cannot yield KNN join correctly because once you zip two RDDs together, there is no guarantee that for each point in Partition A of RDD1, you can find its kth neighbor in Partition A of RDD2. To implement a correct KNN join, we need to find a correct partitioning mechanism. This research problem has been studied by this TKDE paper: https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1 We have confirmed that this is the correct solution we want. Alessandro, if you want to proceed, I would suggest that, you can implement a simplified version of KNN Join which is: For each obj in RDD 1, within its D radius circle, find its k nearest neighbors in RDD2. To do so, you can apply a KNN neighbor map function after Sedona JoinQuery.DistanceJoinQuery API: https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289 or https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253 Thanks, Jia On Fri, Mar 26, 2021 at 8:35 PM Adam Binford <adam...@gmail.com<mailto:adam...@gmail.com>> wrote: Out of curiosity and knowing next to nothing about KNN, what is the return value supposed to represent? The K nearest nearest geometries in spatialRDD to any geometry in dataset point? Adam On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <alexcal...@hotmail.it<mailto:alexcal...@hotmail.it>> wrote: > Hi, > I’m a graduated in Computer Engineering and I am writing in connection > with the possibility to contribute to the Apache Sedona project. > During my work I bumped into a problem regarding the incapability to > perform the KNNQuery operation with a dataset rather than a single point. > Hence, the contribution will enhance the library with a new signature of > the SpatialKNNQuery: > > public static <U extends Geometry, T extends Geometry> List<T> > SpatialKnnQuery( > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean > useIndex > ) > > The solution I’ve tried is similar to the one exploited for the > Join-Query. In a few words, I’ll subdivide both dataset geographically, zip > the partitions together and finally iterate on each partition computing the > nearest neighbour query. > I’d like to know if it could be a good proposal for a contribution and ask > you some questions about the idea: > > 1. Can the contribution be limited to RDD API or should it cover the > SQL API too? > 2. Can the contribution be limited to enhance the Scala/Java API or > should it cover the Python API too? > 3. Need the tests to be runned in local or should I deploy something > like a cluster? > > It would be my first contribution in a open-source project so I’m not very > experienced in these kind of procedures. I want to be sure that I can > develop and submit my solution in a correct environment: where could I find > a guide or doc with all the steps to do this after a possible approval? > > Waiting for a response, > Best regards, > Alessandro. >