Out of curiosity and knowing next to nothing about KNN, what is the return value supposed to represent? The K nearest nearest geometries in spatialRDD to any geometry in dataset point?
Adam On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <[email protected]> wrote: > Hi, > I’m a graduated in Computer Engineering and I am writing in connection > with the possibility to contribute to the Apache Sedona project. > During my work I bumped into a problem regarding the incapability to > perform the KNNQuery operation with a dataset rather than a single point. > Hence, the contribution will enhance the library with a new signature of > the SpatialKNNQuery: > > public static <U extends Geometry, T extends Geometry> List<T> > SpatialKnnQuery( > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean > useIndex > ) > > The solution I’ve tried is similar to the one exploited for the > Join-Query. In a few words, I’ll subdivide both dataset geographically, zip > the partitions together and finally iterate on each partition computing the > nearest neighbour query. > I’d like to know if it could be a good proposal for a contribution and ask > you some questions about the idea: > > 1. Can the contribution be limited to RDD API or should it cover the > SQL API too? > 2. Can the contribution be limited to enhance the Scala/Java API or > should it cover the Python API too? > 3. Need the tests to be runned in local or should I deploy something > like a cluster? > > It would be my first contribution in a open-source project so I’m not very > experienced in these kind of procedures. I want to be sure that I can > develop and submit my solution in a correct environment: where could I find > a guide or doc with all the steps to do this after a possible approval? > > Waiting for a response, > Best regards, > Alessandro. >
