The full citation of that TKDE paper is: Chatzimilioudis, G., Costa, C., Zeinalipour-Yazti, D., Lee, W. C., & Pitoura, E. (2015). Distributed in-memory processing of all k nearest neighbor queries. *IEEE transactions on knowledge and data engineering*, *28*(4), 925-938.
On Sun, Mar 28, 2021 at 10:52 PM Jia Yu <[email protected]> wrote: > Hi folks, > > Thanks for your proposal. However, the reason why Sedona does not have KNN > Join query is that a complete and correct KNN join is very difficult to > implement. > > Note that: the existing spatial partitioning scheme in Sedona cannot yield > KNN join correctly because once you zip two RDDs together, there is no > guarantee that for each point in Partition A of RDD1, you can find its kth > neighbor in Partition A of RDD2. To implement a correct KNN join, we need > to find a correct partitioning mechanism. This research problem has been > studied by this TKDE paper: > https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1 > We have confirmed that this is the correct solution we want. > > Alessandro, if you want to proceed, I would suggest that, you can > implement a simplified version of KNN Join which is: > > For each obj in RDD 1, within its D radius circle, find its k nearest > neighbors in RDD2. > > To do so, you can apply a KNN neighbor map function after Sedona > JoinQuery.DistanceJoinQuery API: > https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289 > or > https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253 > > Thanks, > Jia > > > > > > > On Fri, Mar 26, 2021 at 8:35 PM Adam Binford <[email protected]> wrote: > >> Out of curiosity and knowing next to nothing about KNN, what is the return >> value supposed to represent? The K nearest nearest geometries in >> spatialRDD >> to any geometry in dataset point? >> >> Adam >> >> On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <[email protected]> >> wrote: >> >> > Hi, >> > I’m a graduated in Computer Engineering and I am writing in connection >> > with the possibility to contribute to the Apache Sedona project. >> > During my work I bumped into a problem regarding the incapability to >> > perform the KNNQuery operation with a dataset rather than a single >> point. >> > Hence, the contribution will enhance the library with a new signature of >> > the SpatialKNNQuery: >> > >> > public static <U extends Geometry, T extends Geometry> List<T> >> > SpatialKnnQuery( >> > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean >> > useIndex >> > ) >> > >> > The solution I’ve tried is similar to the one exploited for the >> > Join-Query. In a few words, I’ll subdivide both dataset geographically, >> zip >> > the partitions together and finally iterate on each partition computing >> the >> > nearest neighbour query. >> > I’d like to know if it could be a good proposal for a contribution and >> ask >> > you some questions about the idea: >> > >> > 1. Can the contribution be limited to RDD API or should it cover the >> > SQL API too? >> > 2. Can the contribution be limited to enhance the Scala/Java API or >> > should it cover the Python API too? >> > 3. Need the tests to be runned in local or should I deploy something >> > like a cluster? >> > >> > It would be my first contribution in a open-source project so I’m not >> very >> > experienced in these kind of procedures. I want to be sure that I can >> > develop and submit my solution in a correct environment: where could I >> find >> > a guide or doc with all the steps to do this after a possible approval? >> > >> > Waiting for a response, >> > Best regards, >> > Alessandro. >> > >> >
