-->

On Wed, May 3, 2017 at 2:59 AM, Bhupesh Chawda <bhup...@datatorrent.com>
wrote:

> The main difference is in the implementations of managed state that are
> used in the two join impls.
> The advantage mainly comes from the fact that Join impl 1 uses
> ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is
> based on the other two implementations (both with the notion of either a
> key or a time bucket).
>

How does it affect performance and scalability? I think that's the key
question it comes down to.



>
> I agree that the windowed version addresses a more generic usecase. My only
> concern was are there use cases / user communities which are not familiar
> with the windowed semantics and might prefer the other implementation
> instead? Would that warrant keeping the other implementation around?
>

It should be possible to create a module or wrapper if the intention is to
simplify a specific use case?


>
>
>
>
> On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <t...@apache.org> wrote:
>
> > There is one more important difference not mentioned:
> >
> > Join Impl 1 doesn't work and Join Impl 2 does :)
> >
> > Can you clarify why a (working) Join Impl 1 would perform better? And if
> it
> > is the case, how the amount of work fixing 1 would stack up against
> > improving 2?
> >
> > Join Impl 2 has greater flexibility due to the generalized windowing. If
> > everything else is same I prefer we put our efforts there.
> >
> > Thanks,
> > Thomas
> >
> >
> >
> > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org>
> > wrote:
> >
> > > Hi Community,
> > >
> > > Currently the support for join in Malhar is little fuzzy for the end
> > user.
> > > We have multiple implementations -
> > >
> > >    1. Join Impl 1 - Inner Join implementation, based on Managed state
> > >    2. Join Impl 2 - Merge operator, Windowed implementation, based on
> > >    Spillable structures (based on managed state)
> > >
> > > Following are the differences between the two:
> > >
> > >    - As the name implies, Join Impl 1 is meant for inner joins, while
> > Join
> > >    Impl 2 has generic support for inner as well as outer joins.
> > >    - Join Impl 1 supports sliding time windows with support for
> expiring
> > >    old tuples. Join Impl 2 needs understanding of windowing concepts
> and
> > > uses
> > >    watermarking support for functioning.
> > >    - By looking at the implementations of managed state used by Join
> > Impl 1
> > >    and Join Impl 2, it seems like Join Impl 1 would have a performance
> > >    advantage over Join Impl 2.
> > >
> > > The purpose of this email is to see what can be done to simplify the
> join
> > > usability in Malhar. Following are some options:
> > >
> > >    1. Keep both implementations with clear documentation of the
> usability
> > >    for both.
> > >    2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to
> improve
> > >    performance. Note that even though Join Impl 1 addresses a very
> > specific
> > >    use case, it is the most common requirement in streaming join use
> > cases.
> > >    3. Any other option?
> > >
> > > Thanks.
> > >
> > > ~ Bhupesh
> > >
> > > ​​
> > >
> >
>

Reply via email to