Hi, Wei,

+1 on the proposed design. This is going to reduce a lot of heavy-lifting
work that's needed done by user code today to bootstrap a data stream into
local store. The configs look quite straightforward and easy to set up.
Overall the design looks great to me.

I have one question: in the proposal you mentioned "When Samza is running
in 24x7 mode, the stream for a bounded dataset may deliver multiple
versions.". So after the bootstrap of the initial version is done, what
will happen when the new version comes? Right now by default Bootstrap
stream is set up to be priority INT_MAX, meaning it will preempt other
streams to be processed if the bootstrap is going on. Are we expecting
pauses when the new version of adjunct data coming? Please let me know what
will be the plan to handle this scenario.

Thanks,
Xinyu

On Tue, May 16, 2017 at 2:15 PM, Navina Ramesh (Apache) <nav...@apache.org>
wrote:

> Thanks for trying 3 times, Wei. Sorry about the trouble. Not sure where the
> problem lies. Looking forward to review your design.
>
> Navina
>
> On Tue, May 16, 2017 at 8:56 AM, Wei Song <ws...@linkedin.com> wrote:
>
> > Hey everyone,
> >
> > I created a proposal for SAMZA-1278
> > <https://issues.apache.org/jira/browse/SAMZA-1278>, Adjunct Data Store
> > for Unbounded DataSets, which introduces an automatic mechanism to store
> > adjunct data for stream tasks.
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/Adjunct+Da
> > ta+Store+for+Unbounded+DataSets
> >
> > Please review and comments are welcome!
> >
> > For those who are not actively following the master branch, you may have
> > more questions than others. Feel free to ask them here.
> >
> > P.S. this is the 3rd try, sent this last week, but apparently no one at
> > Linkedin has received, including samza-dev here just to be sure.
> >
> > --
> > Thanks,
> > -Wei
> >
>

Reply via email to