There is one more important difference not mentioned: Join Impl 1 doesn't work and Join Impl 2 does :)
Can you clarify why a (working) Join Impl 1 would perform better? And if it is the case, how the amount of work fixing 1 would stack up against improving 2? Join Impl 2 has greater flexibility due to the generalized windowing. If everything else is same I prefer we put our efforts there. Thanks, Thomas On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org> wrote: > Hi Community, > > Currently the support for join in Malhar is little fuzzy for the end user. > We have multiple implementations - > > 1. Join Impl 1 - Inner Join implementation, based on Managed state > 2. Join Impl 2 - Merge operator, Windowed implementation, based on > Spillable structures (based on managed state) > > Following are the differences between the two: > > - As the name implies, Join Impl 1 is meant for inner joins, while Join > Impl 2 has generic support for inner as well as outer joins. > - Join Impl 1 supports sliding time windows with support for expiring > old tuples. Join Impl 2 needs understanding of windowing concepts and > uses > watermarking support for functioning. > - By looking at the implementations of managed state used by Join Impl 1 > and Join Impl 2, it seems like Join Impl 1 would have a performance > advantage over Join Impl 2. > > The purpose of this email is to see what can be done to simplify the join > usability in Malhar. Following are some options: > > 1. Keep both implementations with clear documentation of the usability > for both. > 2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve > performance. Note that even though Join Impl 1 addresses a very specific > use case, it is the most common requirement in streaming join use cases. > 3. Any other option? > > Thanks. > > ~ Bhupesh > > >