The main difference is in the implementations of managed state that are used in the two join impls. The advantage mainly comes from the fact that Join impl 1 uses ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is based on the other two implementations (both with the notion of either a key or a time bucket).
I agree that the windowed version addresses a more generic usecase. My only concern was are there use cases / user communities which are not familiar with the windowed semantics and might prefer the other implementation instead? Would that warrant keeping the other implementation around? ~ Bhupesh _______________________________________________________ Bhupesh Chawda E: bhup...@datatorrent.com | Twitter: @bhupeshsc www.datatorrent.com | apex.apache.org On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <t...@apache.org> wrote: > There is one more important difference not mentioned: > > Join Impl 1 doesn't work and Join Impl 2 does :) > > Can you clarify why a (working) Join Impl 1 would perform better? And if it > is the case, how the amount of work fixing 1 would stack up against > improving 2? > > Join Impl 2 has greater flexibility due to the generalized windowing. If > everything else is same I prefer we put our efforts there. > > Thanks, > Thomas > > > > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org> > wrote: > > > Hi Community, > > > > Currently the support for join in Malhar is little fuzzy for the end > user. > > We have multiple implementations - > > > > 1. Join Impl 1 - Inner Join implementation, based on Managed state > > 2. Join Impl 2 - Merge operator, Windowed implementation, based on > > Spillable structures (based on managed state) > > > > Following are the differences between the two: > > > > - As the name implies, Join Impl 1 is meant for inner joins, while > Join > > Impl 2 has generic support for inner as well as outer joins. > > - Join Impl 1 supports sliding time windows with support for expiring > > old tuples. Join Impl 2 needs understanding of windowing concepts and > > uses > > watermarking support for functioning. > > - By looking at the implementations of managed state used by Join > Impl 1 > > and Join Impl 2, it seems like Join Impl 1 would have a performance > > advantage over Join Impl 2. > > > > The purpose of this email is to see what can be done to simplify the join > > usability in Malhar. Following are some options: > > > > 1. Keep both implementations with clear documentation of the usability > > for both. > > 2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to improve > > performance. Note that even though Join Impl 1 addresses a very > specific > > use case, it is the most common requirement in streaming join use > cases. > > 3. Any other option? > > > > Thanks. > > > > ~ Bhupesh > > > > > > >