Hi Alessandro,

You cannot use Sedona KNNQuery.SpatialKNNquery after DistanceJoinQuery. You
should add your own filtering logic (in Spark mappartition func) after
DistanceJoinQuery result.

1. Your contribution should cover RDD API. For now, I cannot think of a SQL
Syntax that describes the KNN join query.
2. Your contribution should cover both Scala/Java API and Python. The core
algorithm will be implemented in Java KNNQuery.java. By default, it
automatically works for Scala. For Python support, you need to have a
corresponding wrapper API in Python. But you can first finish the Java
implementation, and then create the PR and consult Pawel @Paweł Kociński
<pawel93kocin...@gmail.com> who is the lead of Python API.
3. You can refer to [1] [2] for compiling and documenting your work. But
you won't be able to publish since you are not a Sedona committer.

Thanks,
Jia


On Mon, Mar 29, 2021 at 2:42 AM Alessandro Calvio <alexcal...@hotmail.it>
wrote:

> Hi all,
>
> thank you for your answer.
>
> It would be very interesting understand how to implement the solution
> proposed by the paper in Apache Sedona.
>
> Anyway, I think I could try to implement the simplified version proposed
> by you. If I understand correctly it would be like use the current 
> *SpatialKNNQuery
> *function on the geometries filtered out by *DistanceJoinQuery*, am I
> right?
>
>
>
> Can I refer to these links [1], [2] and [3] as guide to the compilation
> and publish mechanism? And what about the limitations of the contribution
> mentioned in my previous questions?
>
>
>
> Finally, I didn’t receive the mail of Adam but yes, the expected output
> would have been the one described by you.
>
>
>
> Thanks,
>
> Best regards,
>
> Alessandro
>
>
>
> [1]: https://sedona.incubator.apache.org/community/rule/
>
> [2]:
> https://sedona.incubator.apache.org/download/compile/#compile-the-documentation
>
> [3]: https://sedona.incubator.apache.org/download/publish/
>
>
>
>
>
> *Da: *Jia Yu <jiayu198...@gmail.com>
> *Inviato: *lunedì 29 marzo 2021 07:53
> *A: *dev@sedona.apache.org; alexcal...@hotmail.it; adam...@gmail.com
> *Oggetto: *Re: Apache Sedona contribution
>
>
>
> Hi folks,
>
>
>
> Thanks for your proposal. However, the reason why Sedona does not have KNN
> Join query is that a complete and correct KNN join is very difficult to
> implement.
>
>
>
> Note that: the existing spatial partitioning scheme in Sedona cannot yield
> KNN join correctly because once you zip two RDDs together, there is no
> guarantee that for each point in Partition A of RDD1, you can find its kth
> neighbor in Partition A of RDD2. To implement a correct KNN join, we need
> to find a correct partitioning mechanism. This research problem has been
> studied  by this TKDE paper:
> https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7337428&casa_token=PZSM8VwhkwMAAAAA:slOnDt2_70HFwdu81c_7jVRiYcZPj7FPbJ3OvET_g0ApMDDEcg2Fq71CMgYxWrSCdXmjZqACew&tag=1
>
> We have confirmed that this is the correct solution we want.
>
>
>
> Alessandro, if you want to proceed, I would suggest that, you can
> implement a simplified version of KNN Join which is:
>
>
>
> For each obj in RDD 1, within its D radius circle, find its k nearest
> neighbors in RDD2.
>
>
>
> To do so, you can apply a KNN neighbor map function after Sedona
> JoinQuery.DistanceJoinQuery API:
> https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L289
>  or
> https://github.com/apache/incubator-sedona/blob/master/core/src/main/java/org/apache/sedona/core/spatialOperator/JoinQuery.java#L253
>
>
>
> Thanks,
>
> Jia
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Mar 26, 2021 at 8:35 PM Adam Binford <adam...@gmail.com> wrote:
>
> Out of curiosity and knowing next to nothing about KNN, what is the return
> value supposed to represent? The K nearest nearest geometries in spatialRDD
> to any geometry in dataset point?
>
> Adam
>
> On Fri, Mar 26, 2021, 6:56 AM Alessandro Calvio <alexcal...@hotmail.it>
> wrote:
>
> > Hi,
> > I’m a graduated in Computer Engineering and I am writing in connection
> > with the possibility to contribute to the Apache Sedona project.
> > During my work I bumped into a problem regarding the incapability to
> > perform the KNNQuery operation with a dataset rather than a single point.
> > Hence, the contribution will enhance the library with a new signature of
> > the SpatialKNNQuery:
> >
> > public static <U extends Geometry, T extends Geometry> List<T>
> > SpatialKnnQuery(
> > SpatialRDD<T> spatialRDD, SpatialRDD<U> datasetPoint, Integer k, boolean
> > useIndex
> > )
> >
> > The solution I’ve tried is similar to the one exploited for the
> > Join-Query. In a few words, I’ll subdivide both dataset geographically,
> zip
> > the partitions together and finally iterate on each partition computing
> the
> > nearest neighbour query.
> > I’d like to know if it could be a good proposal for a contribution and
> ask
> > you some questions about the idea:
> >
> >   1.  Can the contribution be limited to RDD API or should it cover the
> > SQL API too?
> >   2.  Can the contribution be limited to enhance the Scala/Java API or
> > should it cover the Python API too?
> >   3.  Need the tests to be runned in local or should I deploy something
> > like a cluster?
> >
> > It would be my first contribution in a open-source project so I’m not
> very
> > experienced in these kind of procedures. I want to be sure that I can
> > develop and submit my solution in a correct environment: where could I
> find
> > a guide or doc with all the steps to do this after a possible approval?
> >
> > Waiting for a response,
> > Best regards,
> > Alessandro.
> >
>
>
>

Reply via email to