There is one more important difference not mentioned:

Join Impl 1 doesn't work and Join Impl 2 does :)

Can you clarify why a (working) Join Impl 1 would perform better? And if it
is the case, how the amount of work fixing 1 would stack up against
improving 2?

Join Impl 2 has greater flexibility due to the generalized windowing. If
everything else is same I prefer we put our efforts there.

Thanks,
Thomas



On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org> wrote:

> Hi Community,
>
> Currently the support for join in Malhar is little fuzzy for the end user.
> We have multiple implementations -
>
>    1. Join Impl 1 - Inner Join implementation, based on Managed state
>    2. Join Impl 2 - Merge operator, Windowed implementation, based on
>    Spillable structures (based on managed state)
>
> Following are the differences between the two:
>
>    - As the name implies, Join Impl 1 is meant for inner joins, while Join
>    Impl 2 has generic support for inner as well as outer joins.
>    - Join Impl 1 supports sliding time windows with support for expiring
>    old tuples. Join Impl 2 needs understanding of windowing concepts and
> uses
>    watermarking support for functioning.
>    - By looking at the implementations of managed state used by Join Impl 1
>    and Join Impl 2, it seems like Join Impl 1 would have a performance
>    advantage over Join Impl 2.
>
> The purpose of this email is to see what can be done to simplify the join
> usability in Malhar. Following are some options:
>
>    1. Keep both implementations with clear documentation of the usability
>    for both.
>    2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve
>    performance. Note that even though Join Impl 1 addresses a very specific
>    use case, it is the most common requirement in streaming join use cases.
>    3. Any other option?
>
> Thanks.
>
> ~ Bhupesh
>
> ​​
>

Reply via email to