--> On Wed, May 3, 2017 at 2:59 AM, Bhupesh Chawda <bhup...@datatorrent.com> wrote:
> The main difference is in the implementations of managed state that are > used in the two join impls. > The advantage mainly comes from the fact that Join impl 1 uses > ManagedTimeStateImpl (key buckets + time buckets) while Join impl 2 is > based on the other two implementations (both with the notion of either a > key or a time bucket). > How does it affect performance and scalability? I think that's the key question it comes down to. > > I agree that the windowed version addresses a more generic usecase. My only > concern was are there use cases / user communities which are not familiar > with the windowed semantics and might prefer the other implementation > instead? Would that warrant keeping the other implementation around? > It should be possible to create a module or wrapper if the intention is to simplify a specific use case? > > > > > On Fri, Apr 28, 2017 at 10:09 AM, Thomas Weise <t...@apache.org> wrote: > > > There is one more important difference not mentioned: > > > > Join Impl 1 doesn't work and Join Impl 2 does :) > > > > Can you clarify why a (working) Join Impl 1 would perform better? And if > it > > is the case, how the amount of work fixing 1 would stack up against > > improving 2? > > > > Join Impl 2 has greater flexibility due to the generalized windowing. If > > everything else is same I prefer we put our efforts there. > > > > Thanks, > > Thomas > > > > > > > > On Wed, Apr 26, 2017 at 11:14 PM, Bhupesh Chawda <bhup...@apache.org> > > wrote: > > > > > Hi Community, > > > > > > Currently the support for join in Malhar is little fuzzy for the end > > user. > > > We have multiple implementations - > > > > > > 1. Join Impl 1 - Inner Join implementation, based on Managed state > > > 2. Join Impl 2 - Merge operator, Windowed implementation, based on > > > Spillable structures (based on managed state) > > > > > > Following are the differences between the two: > > > > > > - As the name implies, Join Impl 1 is meant for inner joins, while > > Join > > > Impl 2 has generic support for inner as well as outer joins. > > > - Join Impl 1 supports sliding time windows with support for > expiring > > > old tuples. Join Impl 2 needs understanding of windowing concepts > and > > > uses > > > watermarking support for functioning. > > > - By looking at the implementations of managed state used by Join > > Impl 1 > > > and Join Impl 2, it seems like Join Impl 1 would have a performance > > > advantage over Join Impl 2. > > > > > > The purpose of this email is to see what can be done to simplify the > join > > > usability in Malhar. Following are some options: > > > > > > 1. Keep both implementations with clear documentation of the > usability > > > for both. > > > 2. Remove Join Impl 1 from Malhar and work with Join Impl 2 to > improve > > > performance. Note that even though Join Impl 1 addresses a very > > specific > > > use case, it is the most common requirement in streaming join use > > cases. > > > 3. Any other option? > > > > > > Thanks. > > > > > > ~ Bhupesh > > > > > > > > > > > >