Hi,

What merge behavior do you want when A~=B, B~=C but A!=C? Should the merge
emit ABC? AB and BC? Something else?

Best,
Karl

On Sat, Nov 21, 2015 at 5:24 AM OcterA <mroct...@gmail.com> wrote:

> Hello,
>
> I have a set of X data (around 30M entry), I have to do a batch to merge
> data which are similar, at the end I will have around X/2 data.
>
> At this moment, i've done the basis : open files, map to an usable Ojbect,
> but I'm stuck at the merge part...
>
> The merge condition is composed from various conditions
>
>     A.get*Start*Point == B.get*End*Point
>     Difference between A.getStartDate and B.getStartDate is less than X1
> second
>     Difference between A.getEndDate and B.getEndDate is less than X2 second
>     A.getField1 startWith B.getField1
>     some more like that...
>
> Suddenly, I can have A~=B, B~=C but A!=C. For my Spark comprehension, this
> is a problem, because I can have an hash to reduce greatly the scan time...
>
> Have you some advice, to resolve my problem, or pointers on method which
> can
> help me? Maybe an another tools from the Hadoop ecosystem?
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-merging-object-with-approximation-tp25445.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Reply via email to