Re: Apache Sedona contribution

Jia Yu Sun, 28 Mar 2021 22:54:44 -0700

The full citation of that TKDE paper is: Chatzimilioudis, G., Costa, C.,
Zeinalipour-Yazti, D., Lee, W. C., & Pitoura, E. (2015). Distributed
in-memory processing of all k nearest neighbor queries. *IEEE transactions
on knowledge and data engineering*, *28*(4), 925-938.


On Sun, Mar 28, 2021 at 10:52 PM Jia Yu <[email protected]> wrote:

> Hi folks,
>
> Thanks for your proposal. However, the reason why Sedona does not have KNN
> Join query is that a complete and correct KNN join is very difficult to
> implement.
>
> Note that: the existing spatial partitioning scheme in Sedona cannot yield
> KNN join correctly because once you zip two RDDs together, there is no
> guarantee that for each point in Partition A of RDD1, you can find its kth
> neighbor in Partition A of RDD2. To implement a correct KNN join, we need
> to find a correct partitioning mechanism. This research problem has been
> studied  by this TKDE paper:
> https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1
> We have confirmed that this is the correct solution we want.
>
> Alessandro, if you want to proceed, I would suggest that, you can
> implement a simplified version of KNN Join which is:
>
> For each obj in RDD 1, within its D radius circle, find its k nearest
> neighbors in RDD2.
>
> To do so, you can apply a KNN neighbor map function after Sedona
> JoinQuery.DistanceJoinQuery API:
> https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289
>  or
> https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253
>
> Thanks,
> Jia
>
>
>
>
>
>
> On Fri, Mar 26, 2021 at 8:35 PM Adam Binford <[email protected]> wrote:
>
>> Out of curiosity and knowing next to nothing about KNN, what is the return
>> value supposed to represent? The K nearest nearest geometries in
>> spatialRDD
>> to any geometry in dataset point?
>>
>> Adam
>>
>> On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <[email protected]>
>> wrote:
>>
>> > Hi,
>> > I’m a graduated in Computer Engineering and I am writing in connection
>> > with the possibility to contribute to the Apache Sedona project.
>> > During my work I bumped into a problem regarding the incapability to
>> > perform the KNNQuery operation with a dataset rather than a single
>> point.
>> > Hence, the contribution will enhance the library with a new signature of
>> > the SpatialKNNQuery:
>> >
>> > public static <U extends Geometry, T extends Geometry> List<T>
>> > SpatialKnnQuery(
>> > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean
>> > useIndex
>> > )
>> >
>> > The solution I’ve tried is similar to the one exploited for the
>> > Join-Query. In a few words, I’ll subdivide both dataset geographically,
>> zip
>> > the partitions together and finally iterate on each partition computing
>> the
>> > nearest neighbour query.
>> > I’d like to know if it could be a good proposal for a contribution and
>> ask
>> > you some questions about the idea:
>> >
>> >   1.  Can the contribution be limited to RDD API or should it cover the
>> > SQL API too?
>> >   2.  Can the contribution be limited to enhance the Scala/Java API or
>> > should it cover the Python API too?
>> >   3.  Need the tests to be runned in local or should I deploy something
>> > like a cluster?
>> >
>> > It would be my first contribution in a open-source project so I’m not
>> very
>> > experienced in these kind of procedures. I want to be sure that I can
>> > develop and submit my solution in a correct environment: where could I
>> find
>> > a guide or doc with all the steps to do this after a possible approval?
>> >
>> > Waiting for a response,
>> > Best regards,
>> > Alessandro.
>> >
>>
>

Re: Apache Sedona contribution

Reply via email to