Re: [DISCUSS] SEP-4: Adjunct Data Store for Unbounded DataSets

Wei Song Fri, 19 May 2017 16:00:11 -0700

Thanks Xinyu for your feedback. With regard to your question, when a new
version of a file becomes available, we would already be in the normal
processing mode, either the connector or external system would need to
inject an indication to signal the end of the current version and continue
send the new version. The adjunct data store would recognize the
indication, and build a new version in background. While the new version is
being built, we continue process incoming event from main stream using the
existing version. Once the new version is built, we switch to it and old
version can be discarded if desired. It should be seamless from
processing's perspective.


On Fri, May 19, 2017 at 2:19 PM, xinyu liu <[email protected]> wrote:

> Hi, Wei,
>
> +1 on the proposed design. This is going to reduce a lot of heavy-lifting
> work that's needed done by user code today to bootstrap a data stream into
> local store. The configs look quite straightforward and easy to set up.
> Overall the design looks great to me.
>
> I have one question: in the proposal you mentioned "When Samza is running
> in 24x7 mode, the stream for a bounded dataset may deliver multiple
> versions.". So after the bootstrap of the initial version is done, what
> will happen when the new version comes? Right now by default Bootstrap
> stream is set up to be priority INT_MAX, meaning it will preempt other
> streams to be processed if the bootstrap is going on. Are we expecting
> pauses when the new version of adjunct data coming? Please let me know what
> will be the plan to handle this scenario.
>
> Thanks,
> Xinyu
>
> On Tue, May 16, 2017 at 2:15 PM, Navina Ramesh (Apache) <[email protected]
> >
> wrote:
>
> > Thanks for trying 3 times, Wei. Sorry about the trouble. Not sure where
> the
> > problem lies. Looking forward to review your design.
> >
> > Navina
> >
> > On Tue, May 16, 2017 at 8:56 AM, Wei Song <[email protected]> wrote:
> >
> > > Hey everyone,
> > >
> > > I created a proposal for SAMZA-1278
> > > <https://issues.apache.org/jira/browse/SAMZA-1278>, Adjunct Data Store
> > > for Unbounded DataSets, which introduces an automatic mechanism to
> store
> > > adjunct data for stream tasks.
> > >
> > > https://cwiki.apache.org/confluence/display/SAMZA/Adjunct+Da
> > > ta+Store+for+Unbounded+DataSets
> > >
> > > Please review and comments are welcome!
> > >
> > > For those who are not actively following the master branch, you may
> have
> > > more questions than others. Feel free to ask them here.
> > >
> > > P.S. this is the 3rd try, sent this last week, but apparently no one at
> > > Linkedin has received, including samza-dev here just to be sure.
> > >
> > > --
> > > Thanks,
> > > -Wei
> > >
> >
>

Re: [DISCUSS] SEP-4: Adjunct Data Store for Unbounded DataSets

Reply via email to