R: Apache Sedona contribution

Alessandro Calvio Mon, 29 Mar 2021 03:06:38 -0700

Hi all,
thank you for your answer.
It would be very interesting understand how to implement the solution proposed 
by the paper in Apache Sedona.
Anyway, I think I could try to implement the simplified version proposed by 
you. If I understand correctly it would be like use the current SpatialKNNQuery 
function on the geometries filtered out by DistanceJoinQuery, am I right?

Can I refer to these links [1], [2] and [3] as guide to the compilation and 
publish mechanism? And what about the limitations of the contribution mentioned 
in my previous questions?

Finally, I didn’t receive the mail of Adam but yes, the expected output would 
have been the one described by you.

Thanks,
Best regards,
Alessandro

[1]: https://sedona.incubator.apache.org/community/rule/
[2]: 
https://sedona.incubator.apache.org/download/compile/#compile-the-documentation
[3]: https://sedona.incubator.apache.org/download/publish/

Da: Jia Yu<mailto:jiayu198...@gmail.com>
Inviato: lunedì 29 marzo 2021 07:53
A: dev@sedona.apache.org<mailto:dev@sedona.apache.org>; 
alexcal...@hotmail.it<mailto:alexcal...@hotmail.it>; 
adam...@gmail.com<mailto:adam...@gmail.com>
Oggetto: Re: Apache Sedona contribution

Hi folks,

Thanks for your proposal. However, the reason why Sedona does not have KNN Join 
query is that a complete and correct KNN join is very difficult to implement.

Note that: the existing spatial partitioning scheme in Sedona cannot yield KNN 
join correctly because once you zip two RDDs together, there is no guarantee 
that for each point in Partition A of RDD1, you can find its kth neighbor in 
Partition A of RDD2. To implement a correct KNN join, we need to find a correct 
partitioning mechanism. This research problem has been studied  by this TKDE 
paper: 
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1
We have confirmed that this is the correct solution we want.

Alessandro, if you want to proceed, I would suggest that, you can implement a 
simplified version of KNN Join which is:

For each obj in RDD 1, within its D radius circle, find its k nearest neighbors 
in RDD2.

To do so, you can apply a KNN neighbor map function after Sedona 
JoinQuery.DistanceJoinQuery API: 
https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289
   or 
https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253

Thanks,
Jia

On Fri, Mar 26, 2021 at 8:35 PM Adam Binford 
<adam...@gmail.com<mailto:adam...@gmail.com>> wrote:
Out of curiosity and knowing next to nothing about KNN, what is the return
value supposed to represent? The K nearest nearest geometries in spatialRDD
to any geometry in dataset point?

Adam

On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio 
<alexcal...@hotmail.it<mailto:alexcal...@hotmail.it>>
wrote:

> Hi,
> I’m a graduated in Computer Engineering and I am writing in connection
> with the possibility to contribute to the Apache Sedona project.
> During my work I bumped into a problem regarding the incapability to
> perform the KNNQuery operation with a dataset rather than a single point.
> Hence, the contribution will enhance the library with a new signature of
> the SpatialKNNQuery:
>
> public static <U extends Geometry, T extends Geometry> List<T>
> SpatialKnnQuery(
> SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean
> useIndex
> )
>
> The solution I’ve tried is similar to the one exploited for the
> Join-Query. In a few words, I’ll subdivide both dataset geographically, zip
> the partitions together and finally iterate on each partition computing the
> nearest neighbour query.
> I’d like to know if it could be a good proposal for a contribution and ask
> you some questions about the idea:
>
>   1.  Can the contribution be limited to RDD API or should it cover the
> SQL API too?
>   2.  Can the contribution be limited to enhance the Scala/Java API or
> should it cover the Python API too?
>   3.  Need the tests to be runned in local or should I deploy something
> like a cluster?
>
> It would be my first contribution in a open-source project so I’m not very
> experienced in these kind of procedures. I want to be sure that I can
> develop and submit my solution in a correct environment: where could I find
> a guide or doc with all the steps to do this after a possible approval?
>
> Waiting for a response,
> Best regards,
> Alessandro.
>

R: Apache Sedona contribution

Reply via email to