Re: [DISCUSS] Support Scala 2.12

2017-03-30 Thread Prateek Maheshwari
Hi Maksim,

Thanks for the PR comments and updates. Looks good to me too.

- Prateek

On Wed, Mar 29, 2017 at 12:17 PM, Prateek Maheshwari <
pmaheshw...@linkedin.com> wrote:

> Hi Maksim,
>
> I'm in favor of adding Scala 2.12 support as well, thanks for the PR.
> I have a few questions about the way JavaConverter APIs and some of the
> conversions in the PR work. I'll try it out locally and update the PR with
> feedback/questions soon.
>
> Thanks,
> Prateek
>
>
> On Tue, Mar 28, 2017 at 3:01 PM, Maksim Logvinenko 
> wrote:
>
>> Hi guys,
>>
>> As far as I can understand nobody is against having Scala 2.12 support in
>> Samza master. Can we merge PR then?
>>
>> Best regards,
>> Maxim Logvinenko
>>
>> On 17 March 2017 at 23:42:16, Navina Ramesh (nram...@linkedin.com.invalid
>> )
>> wrote:
>>
>> Thanks for creating the DISCUSS email!
>>
>> This is good. It's a good idea to update to 2.12 since it looks like we
>> are
>> fully backward compatible with older versions. +1 from me.
>>
>> Cheers!
>> Navina
>>
>> On Fri, Mar 17, 2017 at 1:34 PM, Jagadish Venkatraman <
>> jagadish1...@gmail.com> wrote:
>>
>> > Thanks for starting this discussion and the patch. +1 for supporting
>> scala
>> > 2.12. I assume the changes are fully backwards compatible with scala
>> 2.10,
>> > 2.11 (as evidenced by your check-all)?
>> >
>> > Also, another observation is that the generated Samza binaries will have
>> > 2.12 as the suffix for the future release (I this should be totally OK).
>> >
>> >
>> > On Fri, Mar 17, 2017 at 1:26 PM, Maksim Logvinenko <
>> mlogvine...@gmail.com>
>>
>> > wrote:
>> >
>> > > Hi guys,
>> > >
>> > > I’ve created JIRA and already submitted patch which adds support of
>> scala
>> > > 2.12. Here is the ticket: https://issues.apache.org/
>> > jira/browse/SAMZA-1135
>> > > .
>> > > Nothing serious: I’ve removed JavaConversions usage (because it’s
>> marked
>> > as
>> > > deprecated now) and bumped kafka and scalatest versions since previous
>> > > versions don’t have scala 2.12 support. I run ./bin/check-all.sh on my
>> > > laptop and it was successful for all scala versions (2.10, 2.11 and
>> 2.12)
>> > > and for both YARN versions.
>> > >
>> > > Thanks,
>> > > Maxim Logvinenko
>> > >
>> >
>> >
>> >
>> > --
>> > Jagadish V,
>> > Graduate Student,
>> > Department of Computer Science,
>> > Stanford University
>> >
>>
>>
>>
>> --
>> Navina R.
>>
>
>


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Prateek Maheshwari
Yi, why add 'local' to the method name? Isn't the method called only by the
StreamProcessor to get its own ID? Seems like both 1 & 2 belong in the
method documentation.

- Prateek

On Thu, Mar 30, 2017 at 1:43 PM, Yi Pan  wrote:

> Talked w/ Navina offline and agreed upon:
> 1) JobCoordinator.getLocalProcessorId() to be clear that we are getting
> the
> local processorId
> 2) Document the use case that there might be multiple StreamProcessors in
> the same JVM and ProcessorIdGenerator should implement a counter in this
> case.
>
> So, +1 (binding)
>
> On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
> > Hi Navina,
> >
> > Thanks for the great proposal! Having the big proposals documented on
> SEPs
> > is really great to have a good understanding on the system!
> > I have only a clarification question, the proposal states that every
> > containerId is the same as the processorId. So this means that inside a
> > container there will be a single processor? is this related to SAMZA-1080
> > somehow?
> >
> >
> > Best,
> >
> > Renato M.
> >
> > 2017-03-30 20:45 GMT+02:00 Navina Ramesh :
> >
> > > Hi Yi,
> > > Good question. Three reasons:
> > >
> > > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > > JobCoordinator. One of them was to generate/assign processorId. So, it
> > > makes sense to keep getProcessorId() within JobCoordinator interface.
> > > 2. StreamProcessor was initially introduced as a user-facing API
> > > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> > It
> > > was pushing the burden of guaranteeing unique among the processors of a
> > job
> > > to the user. This was not favorable.
> > > 3. In general, I think we have consensus that the processorIdGenerator
> is
> > > going to specific to a runtime environment. Hence, it seems more
> > > appropriate to move it to a lower abstraction layer that deals with the
> > > underlying execution environment.
> > >
> > > Let me know if you have a different perspective on this.
> > >
> > > Cheers!
> > > Navina
> > >
> > > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
> > >
> > > > @Navina,
> > > >
> > > > Sorry to chime in late. One question:
> > > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > > Because JobCoordinator provides coordination service across many
> > > > processors, an interface getProcessorId() in JobCoordinator is
> > confusing
> > > > regarding to which processorId we are getting.
> > > >
> > > > Otherwise, the proposal looks good.
> > > >
> > > > -Yi
> > > >
> > > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > > >  > > > > wrote:
> > > >
> > > > > Good to hear from you, Yan. Thanks! :)
> > > > >
> > > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> > > wrote:
> > > > >
> > > > > > +1 . Thanks for the proposal, Navina. :)
> > > > > >
> > > > > > Fang, Yan
> > > > > > yanfang...@gmail.com
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > > >
> > > > > > > +1 (non binding) from me.
> > > > > > >
> > > > > > > - Prateek
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S 
> > wrote:
> > > > > > >
> > > > > > > > +1 Looks good to me.
> > > > > > > >
> > > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > > xinyuliu...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > > blocker
> > > > > for
> > > > > > > > > integrating fluent API with StreamProcessor, and hopefully
> we
> > > can
> > > > > get
> > > > > > > it
> > > > > > > > > resolved soon :).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Xinyu
> > > > > > > > >
> > > > > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > > > > nav...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > This is a voting thread for SEP-1: Semantics of
> ProcessorId
> > > in
> > > > > > Samza.
> > > > > > > > > > For reference, here is the wiki link:
> > > > > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > > > > >
> > > > > > > > > > Link to discussion mail thread:
> > > > > > > > > > http://mail-archives.apache.or
> g/mod_mbox/samza-dev/201703.
> > > > > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > > > > gmail.com%3E
> > > > > > > > > >
> > > > > > > > > > Please vote on this SEP asap. :)
> > > > > > > > > >
> > > > > > > > > > Thanks!
> > > > > > > > > > Navina
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > 

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Jagadish Venkatraman
FYI, I created https://issues.apache.org/jira/browse/SAMZA-1177 to track
this.

We will ensure that we have explicit upgrade documentation for upcoming
releases.

On Thu, Mar 30, 2017 at 2:21 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Hi Thomas and Maxim,
>
> Thank you for bringing up this concern and pointing out the gap in the
> documentation. We will ensure that this is documented in a separate upgrade
> page going forward.
>
> Best,
> Jagadish
>
> On Thu, Mar 30, 2017 at 12:02 PM, Navina Ramesh <
> nram...@linkedin.com.invalid> wrote:
>
>> Hi everyone,
>> Apologize for re-chiming in late on this issue.
>>
>> > I'm not sure I agree with the policy (removing migration code and
>> wanting
>> people to upgrade seem at odds to me), but minimally I think we should not
>> assume people are upgrading to each new Samza version.
>>
>> I agree that we should not assume that people will upgrade by stepping
>> through each version of Samza. However, I don't agree that migration code
>> should not be removed at all. Thinking in terms of a project management
>> and
>> maintenance, I think it is a common practice (at least in companies, if
>> not
>> in open-source and I could be wrong too :D ) to keep migration code only
>> for the version it applies. It does add significant overhead to maintain
>> version upgrade/migration code across all future versions.
>>
>> In this case, this was the first time we tried "automatic upgrade" from
>> one
>> version to the other (0.9 -> 0.10). We could have done a better job at
>> documenting the upgrade steps with each version. I wish we had more
>> outspoken voices in the community sooner than later :)
>>
>> Every project takes times to iron out issues related to release and
>> version
>> upgrade. I am glad that we have so much feedback now. As Yi suggested, the
>> SEP process is a starting step towards documenting our changes across
>> versions. Additionally, we will work on adding a dedicated page for
>> upgrades and these will be available for all of the *upcoming* versions.
>>
>> Please let us know if you have any other concerns or ideas on how we can
>> improve on our process.
>>
>> @XiaoChuan: Unfortunately, we don't have proper documentation on upgrading
>> Samza across various versions. Like I mentioned before, we will put in
>> extra efforts going forward. There aren't any migration/upgrade steps
>> needed for versions post 0.10.*. You should be able to simply upgrade
>> without any issues. Upgrade from 0.9 to 0.10 is an exceptional case. Happy
>> to help you out in case you encounter more issues.
>>
>> Cheers!
>> Navina
>>
>> On Thu, Mar 30, 2017 at 11:04 AM, XiaoChuan Yu 
>> wrote:
>>
>> > Is there some sort of document on how to upgrade Samza through various
>> > versions like the page here for Kafka:
>> > https://kafka.apache.org/documentation/#upgrade ?
>> > Having something like this would be ideal.
>> > On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker 
>> wrote:
>> >
>> > > Thanks for the reply Yi, and I apologize if I came off a bit snarky.
>> > > I'm not sure I agree with the policy (removing migration code and
>> > > wanting people to upgrade seem at odds to me), but minimally I think
>> we
>> > > should not assume people are upgrading to each new Samza version. We
>> > > have done so when features or fixes warrant, and even then on a
>> per-job
>> > > basis, and I would expect this is a common practice.
>> > >
>> > > -Tommy
>> > >
>> > > On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
>> > > > Hi, Thomas,
>> > > >
>> > > > Sorry to hear that you were hit by the removal of migration in Samza
>> > > > 0.11.
>> > > > The reason we removed it is following a deprecate-removal policy in
>> > > > two
>> > > > versions. We are not aware that people still using 0.9 after we
>> > > > released
>> > > > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
>> > > > Document can
>> > > > be better to capture that. We are making changes to the design
>> > > > proposal
>> > > > s.t. it is more transparent and open to the whole community, through
>> > > > the
>> > > > newly proposed SEP process. These kind of breaking changes will go
>> > > > through
>> > > > the SEP discuss-vote process in the future and hopefully capture all
>> > > > these
>> > > > kind of concerns earlier.
>> > > >
>> > > > Best!
>> > > >
>> > > > -Yi
>> > > >
>> > > > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
>> > > > wrote:
>> > > >
>> > > > >
>> > > > > Yes, we were burned by this. The changelog mapping will be
>> > > > > regenerated
>> > > > > instead of migrated and the result will completely hose the job
>> > > > > (because the mapping was not generated deterministically in
>> > > > > previous
>> > > > > versions of Samza). I don't understand why the migration code was
>> > > > > removed but it was, and to the best of my knowledge the necessity
>> > > > > to
>> > > > > not skip version 0.10.0 when upgrading was 

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Jagadish Venkatraman
Hi Thomas and Maxim,

Thank you for bringing up this concern and pointing out the gap in the
documentation. We will ensure that this is documented in a separate upgrade
page going forward.

Best,
Jagadish

On Thu, Mar 30, 2017 at 12:02 PM, Navina Ramesh <
nram...@linkedin.com.invalid> wrote:

> Hi everyone,
> Apologize for re-chiming in late on this issue.
>
> > I'm not sure I agree with the policy (removing migration code and wanting
> people to upgrade seem at odds to me), but minimally I think we should not
> assume people are upgrading to each new Samza version.
>
> I agree that we should not assume that people will upgrade by stepping
> through each version of Samza. However, I don't agree that migration code
> should not be removed at all. Thinking in terms of a project management and
> maintenance, I think it is a common practice (at least in companies, if not
> in open-source and I could be wrong too :D ) to keep migration code only
> for the version it applies. It does add significant overhead to maintain
> version upgrade/migration code across all future versions.
>
> In this case, this was the first time we tried "automatic upgrade" from one
> version to the other (0.9 -> 0.10). We could have done a better job at
> documenting the upgrade steps with each version. I wish we had more
> outspoken voices in the community sooner than later :)
>
> Every project takes times to iron out issues related to release and version
> upgrade. I am glad that we have so much feedback now. As Yi suggested, the
> SEP process is a starting step towards documenting our changes across
> versions. Additionally, we will work on adding a dedicated page for
> upgrades and these will be available for all of the *upcoming* versions.
>
> Please let us know if you have any other concerns or ideas on how we can
> improve on our process.
>
> @XiaoChuan: Unfortunately, we don't have proper documentation on upgrading
> Samza across various versions. Like I mentioned before, we will put in
> extra efforts going forward. There aren't any migration/upgrade steps
> needed for versions post 0.10.*. You should be able to simply upgrade
> without any issues. Upgrade from 0.9 to 0.10 is an exceptional case. Happy
> to help you out in case you encounter more issues.
>
> Cheers!
> Navina
>
> On Thu, Mar 30, 2017 at 11:04 AM, XiaoChuan Yu 
> wrote:
>
> > Is there some sort of document on how to upgrade Samza through various
> > versions like the page here for Kafka:
> > https://kafka.apache.org/documentation/#upgrade ?
> > Having something like this would be ideal.
> > On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker  wrote:
> >
> > > Thanks for the reply Yi, and I apologize if I came off a bit snarky.
> > > I'm not sure I agree with the policy (removing migration code and
> > > wanting people to upgrade seem at odds to me), but minimally I think we
> > > should not assume people are upgrading to each new Samza version. We
> > > have done so when features or fixes warrant, and even then on a per-job
> > > basis, and I would expect this is a common practice.
> > >
> > > -Tommy
> > >
> > > On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> > > > Hi, Thomas,
> > > >
> > > > Sorry to hear that you were hit by the removal of migration in Samza
> > > > 0.11.
> > > > The reason we removed it is following a deprecate-removal policy in
> > > > two
> > > > versions. We are not aware that people still using 0.9 after we
> > > > released
> > > > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> > > > Document can
> > > > be better to capture that. We are making changes to the design
> > > > proposal
> > > > s.t. it is more transparent and open to the whole community, through
> > > > the
> > > > newly proposed SEP process. These kind of breaking changes will go
> > > > through
> > > > the SEP discuss-vote process in the future and hopefully capture all
> > > > these
> > > > kind of concerns earlier.
> > > >
> > > > Best!
> > > >
> > > > -Yi
> > > >
> > > > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> > > > wrote:
> > > >
> > > > >
> > > > > Yes, we were burned by this. The changelog mapping will be
> > > > > regenerated
> > > > > instead of migrated and the result will completely hose the job
> > > > > (because the mapping was not generated deterministically in
> > > > > previous
> > > > > versions of Samza). I don't understand why the migration code was
> > > > > removed but it was, and to the best of my knowledge the necessity
> > > > > to
> > > > > not skip version 0.10.0 when upgrading was not documented, let
> > > > > alone
> > > > > enforced.
> > > > >
> > > > > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > > > > >
> > > > > > Good observation Jake!
> > > > > >
> > > > > > The code for migration was removed in Samza 11. The migration
> > > > > > would
> > > > > > read
> > > > > > change-log offsets from the checkpoint topic and write them to
> > > > 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Renato Marroquín Mogrovejo
Thanks for the answers Navina!

+1 (non-binding)

2017-03-30 22:32 GMT+02:00 Navina Ramesh :

> Hi Renato,
>
> > Having the big proposals documented on SEPs is really great to have a
> good understanding on the system!
> I agree. Our previous design process was not being strictly enforced. We
> hope to enforce it going forward as there are major changes coming into the
> next release.
>
> > So this means that inside a container there will be a single processor?
> StreamProcessor is nothing more than a Samza container, along with an
> instance of JobCoordinator in it. Think about it as a thin-wrapper around
> SamzaContainer and JobCoordinator instance. You can find more details on
> this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
> Going forward, we want a Samza job to consist of one or more
> StreamProcessors, instead of N SamzaContainers and 1 AppMaster.
>
> >  is this related to SAMZA-1080 somehow?
> Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
> JobCoordinator. In fact, at LinkedIn, one of the teams is already using
> this API with the StandaloneJobCoordinator and delegating partition
> distribution to kafka high-level consumer (since systemconsumer is
> pluggable in Samza, we have some internal wrappers around high-level
> consumer). It has been working really well for stateless applications, I
> believe.
>
> Cheers!
> Navina
>
> On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
> renatoj.marroq...@gmail.com> wrote:
>
> > Hi Navina,
> >
> > Thanks for the great proposal! Having the big proposals documented on
> SEPs
> > is really great to have a good understanding on the system!
> > I have only a clarification question, the proposal states that every
> > containerId is the same as the processorId. So this means that inside a
> > container there will be a single processor? is this related to SAMZA-1080
> > somehow?
> >
> >
> > Best,
> >
> > Renato M.
> >
> > 2017-03-30 20:45 GMT+02:00 Navina Ramesh :
> >
> > > Hi Yi,
> > > Good question. Three reasons:
> > >
> > > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > > JobCoordinator. One of them was to generate/assign processorId. So, it
> > > makes sense to keep getProcessorId() within JobCoordinator interface.
> > > 2. StreamProcessor was initially introduced as a user-facing API
> > > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> > It
> > > was pushing the burden of guaranteeing unique among the processors of a
> > job
> > > to the user. This was not favorable.
> > > 3. In general, I think we have consensus that the processorIdGenerator
> is
> > > going to specific to a runtime environment. Hence, it seems more
> > > appropriate to move it to a lower abstraction layer that deals with the
> > > underlying execution environment.
> > >
> > > Let me know if you have a different perspective on this.
> > >
> > > Cheers!
> > > Navina
> > >
> > > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
> > >
> > > > @Navina,
> > > >
> > > > Sorry to chime in late. One question:
> > > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > > Because JobCoordinator provides coordination service across many
> > > > processors, an interface getProcessorId() in JobCoordinator is
> > confusing
> > > > regarding to which processorId we are getting.
> > > >
> > > > Otherwise, the proposal looks good.
> > > >
> > > > -Yi
> > > >
> > > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > > >  > > > > wrote:
> > > >
> > > > > Good to hear from you, Yan. Thanks! :)
> > > > >
> > > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> > > wrote:
> > > > >
> > > > > > +1 . Thanks for the proposal, Navina. :)
> > > > > >
> > > > > > Fang, Yan
> > > > > > yanfang...@gmail.com
> > > > > >
> > > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > > >
> > > > > > > +1 (non binding) from me.
> > > > > > >
> > > > > > > - Prateek
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S 
> > wrote:
> > > > > > >
> > > > > > > > +1 Looks good to me.
> > > > > > > >
> > > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > > xinyuliu...@gmail.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > > blocker
> > > > > for
> > > > > > > > > integrating fluent API with StreamProcessor, and hopefully
> we
> > > can
> > > > > get
> > > > > > > it
> > > > > > > > > resolved soon :).
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Xinyu
> > > > > > > > >
> > > > > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > > > > nav...@apache.org>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Yi Pan
Talked w/ Navina offline and agreed upon:
1) JobCoordinator.getLocalProcessorId() to be clear that we are getting the
local processorId
2) Document the use case that there might be multiple StreamProcessors in
the same JVM and ProcessorIdGenerator should implement a counter in this
case.

So, +1 (binding)

On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Navina,
>
> Thanks for the great proposal! Having the big proposals documented on SEPs
> is really great to have a good understanding on the system!
> I have only a clarification question, the proposal states that every
> containerId is the same as the processorId. So this means that inside a
> container there will be a single processor? is this related to SAMZA-1080
> somehow?
>
>
> Best,
>
> Renato M.
>
> 2017-03-30 20:45 GMT+02:00 Navina Ramesh :
>
> > Hi Yi,
> > Good question. Three reasons:
> >
> > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > JobCoordinator. One of them was to generate/assign processorId. So, it
> > makes sense to keep getProcessorId() within JobCoordinator interface.
> > 2. StreamProcessor was initially introduced as a user-facing API
> > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> It
> > was pushing the burden of guaranteeing unique among the processors of a
> job
> > to the user. This was not favorable.
> > 3. In general, I think we have consensus that the processorIdGenerator is
> > going to specific to a runtime environment. Hence, it seems more
> > appropriate to move it to a lower abstraction layer that deals with the
> > underlying execution environment.
> >
> > Let me know if you have a different perspective on this.
> >
> > Cheers!
> > Navina
> >
> > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
> >
> > > @Navina,
> > >
> > > Sorry to chime in late. One question:
> > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > Because JobCoordinator provides coordination service across many
> > > processors, an interface getProcessorId() in JobCoordinator is
> confusing
> > > regarding to which processorId we are getting.
> > >
> > > Otherwise, the proposal looks good.
> > >
> > > -Yi
> > >
> > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > >  > > > wrote:
> > >
> > > > Good to hear from you, Yan. Thanks! :)
> > > >
> > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> > wrote:
> > > >
> > > > > +1 . Thanks for the proposal, Navina. :)
> > > > >
> > > > > Fang, Yan
> > > > > yanfang...@gmail.com
> > > > >
> > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > +1 (non binding) from me.
> > > > > >
> > > > > > - Prateek
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S 
> wrote:
> > > > > >
> > > > > > > +1 Looks good to me.
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > xinyuliu...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > blocker
> > > > for
> > > > > > > > integrating fluent API with StreamProcessor, and hopefully we
> > can
> > > > get
> > > > > > it
> > > > > > > > resolved soon :).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Xinyu
> > > > > > > >
> > > > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > > > nav...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId
> > in
> > > > > Samza.
> > > > > > > > > For reference, here is the wiki link:
> > > > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > > > >
> > > > > > > > > Link to discussion mail thread:
> > > > > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > > > gmail.com%3E
> > > > > > > > >
> > > > > > > > > Please vote on this SEP asap. :)
> > > > > > > > >
> > > > > > > > > Thanks!
> > > > > > > > > Navina
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Navina R.
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Renato,

> Having the big proposals documented on SEPs is really great to have a
good understanding on the system!
I agree. Our previous design process was not being strictly enforced. We
hope to enforce it going forward as there are major changes coming into the
next release.

> So this means that inside a container there will be a single processor?
StreamProcessor is nothing more than a Samza container, along with an
instance of JobCoordinator in it. Think about it as a thin-wrapper around
SamzaContainer and JobCoordinator instance. You can find more details on
this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
Going forward, we want a Samza job to consist of one or more
StreamProcessors, instead of N SamzaContainers and 1 AppMaster.

>  is this related to SAMZA-1080 somehow?
Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
JobCoordinator. In fact, at LinkedIn, one of the teams is already using
this API with the StandaloneJobCoordinator and delegating partition
distribution to kafka high-level consumer (since systemconsumer is
pluggable in Samza, we have some internal wrappers around high-level
consumer). It has been working really well for stateless applications, I
believe.

Cheers!
Navina

On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Navina,
>
> Thanks for the great proposal! Having the big proposals documented on SEPs
> is really great to have a good understanding on the system!
> I have only a clarification question, the proposal states that every
> containerId is the same as the processorId. So this means that inside a
> container there will be a single processor? is this related to SAMZA-1080
> somehow?
>
>
> Best,
>
> Renato M.
>
> 2017-03-30 20:45 GMT+02:00 Navina Ramesh :
>
> > Hi Yi,
> > Good question. Three reasons:
> >
> > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > JobCoordinator. One of them was to generate/assign processorId. So, it
> > makes sense to keep getProcessorId() within JobCoordinator interface.
> > 2. StreamProcessor was initially introduced as a user-facing API
> > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> It
> > was pushing the burden of guaranteeing unique among the processors of a
> job
> > to the user. This was not favorable.
> > 3. In general, I think we have consensus that the processorIdGenerator is
> > going to specific to a runtime environment. Hence, it seems more
> > appropriate to move it to a lower abstraction layer that deals with the
> > underlying execution environment.
> >
> > Let me know if you have a different perspective on this.
> >
> > Cheers!
> > Navina
> >
> > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
> >
> > > @Navina,
> > >
> > > Sorry to chime in late. One question:
> > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > Because JobCoordinator provides coordination service across many
> > > processors, an interface getProcessorId() in JobCoordinator is
> confusing
> > > regarding to which processorId we are getting.
> > >
> > > Otherwise, the proposal looks good.
> > >
> > > -Yi
> > >
> > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > >  > > > wrote:
> > >
> > > > Good to hear from you, Yan. Thanks! :)
> > > >
> > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> > wrote:
> > > >
> > > > > +1 . Thanks for the proposal, Navina. :)
> > > > >
> > > > > Fang, Yan
> > > > > yanfang...@gmail.com
> > > > >
> > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > +1 (non binding) from me.
> > > > > >
> > > > > > - Prateek
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S 
> wrote:
> > > > > >
> > > > > > > +1 Looks good to me.
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > xinyuliu...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > blocker
> > > > for
> > > > > > > > integrating fluent API with StreamProcessor, and hopefully we
> > can
> > > > get
> > > > > > it
> > > > > > > > resolved soon :).
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Xinyu
> > > > > > > >
> > > > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > > > nav...@apache.org>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId
> > in
> > > > > Samza.
> > > > > > > > > For reference, here is the wiki link:
> > > > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > > > >
> > > > > > > > > Link to discussion mail thread:
> > > > > > > > > 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Renato Marroquín Mogrovejo
Hi Navina,

Thanks for the great proposal! Having the big proposals documented on SEPs
is really great to have a good understanding on the system!
I have only a clarification question, the proposal states that every
containerId is the same as the processorId. So this means that inside a
container there will be a single processor? is this related to SAMZA-1080
somehow?


Best,

Renato M.

2017-03-30 20:45 GMT+02:00 Navina Ramesh :

> Hi Yi,
> Good question. Three reasons:
>
> 1. In SAMZA-881, we came up with a set of responsibilities for the
> JobCoordinator. One of them was to generate/assign processorId. So, it
> makes sense to keep getProcessorId() within JobCoordinator interface.
> 2. StreamProcessor was initially introduced as a user-facing API
> SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor. It
> was pushing the burden of guaranteeing unique among the processors of a job
> to the user. This was not favorable.
> 3. In general, I think we have consensus that the processorIdGenerator is
> going to specific to a runtime environment. Hence, it seems more
> appropriate to move it to a lower abstraction layer that deals with the
> underlying execution environment.
>
> Let me know if you have a different perspective on this.
>
> Cheers!
> Navina
>
> On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
>
> > @Navina,
> >
> > Sorry to chime in late. One question:
> > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > Because JobCoordinator provides coordination service across many
> > processors, an interface getProcessorId() in JobCoordinator is confusing
> > regarding to which processorId we are getting.
> >
> > Otherwise, the proposal looks good.
> >
> > -Yi
> >
> > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> >  > > wrote:
> >
> > > Good to hear from you, Yan. Thanks! :)
> > >
> > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> wrote:
> > >
> > > > +1 . Thanks for the proposal, Navina. :)
> > > >
> > > > Fang, Yan
> > > > yanfang...@gmail.com
> > > >
> > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > >
> > > > > +1 (non binding) from me.
> > > > >
> > > > > - Prateek
> > > > >
> > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:
> > > > >
> > > > > > +1 Looks good to me.
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> xinyuliu...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > > +1 on my side. Very happy to see this proposal. This is a
> blocker
> > > for
> > > > > > > integrating fluent API with StreamProcessor, and hopefully we
> can
> > > get
> > > > > it
> > > > > > > resolved soon :).
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Xinyu
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > > nav...@apache.org>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi everyone,
> > > > > > > >
> > > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId
> in
> > > > Samza.
> > > > > > > > For reference, here is the wiki link:
> > > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > > >
> > > > > > > > Link to discussion mail thread:
> > > > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > > gmail.com%3E
> > > > > > > >
> > > > > > > > Please vote on this SEP asap. :)
> > > > > > > >
> > > > > > > > Thanks!
> > > > > > > > Navina
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Navina R.
> > >
> >
>
>
>
> --
> Navina R.
>


[GitHub] samza pull request #105: SAMZA-1176: Make TestJoinOperator unit tests safe f...

2017-03-30 Thread prateekm
GitHub user prateekm opened a pull request:

https://github.com/apache/samza/pull/105

SAMZA-1176: Make TestJoinOperator unit tests safe for concurrent execution

There are occasional failures like the following:
```joinRetainsMatchedMessagesReverse FAILED
java.lang.AssertionError: expected:<0> but was:<110>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at 
org.apache.samza.operators.TestJoinOperator.joinRetainsMatchedMessagesReverse(TestJoinOperator.java:174)```

These are presumably due to concurrent JUnit test execution. This change is 
to isolate these test cases so that they can be run concurrently.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/prateekm/samza join-tests

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/105.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #105


commit fbb48e49d46d5912d8ece1d36e1d74dd7f11ef0c
Author: Prateek Maheshwari 
Date:   2017-03-30T19:09:22Z

SAMZA-1176: Make TestJoinOperator unit tests safe for concurrent execution




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Navina Ramesh
Hi everyone,
Apologize for re-chiming in late on this issue.

> I'm not sure I agree with the policy (removing migration code and wanting
people to upgrade seem at odds to me), but minimally I think we should not
assume people are upgrading to each new Samza version.

I agree that we should not assume that people will upgrade by stepping
through each version of Samza. However, I don't agree that migration code
should not be removed at all. Thinking in terms of a project management and
maintenance, I think it is a common practice (at least in companies, if not
in open-source and I could be wrong too :D ) to keep migration code only
for the version it applies. It does add significant overhead to maintain
version upgrade/migration code across all future versions.

In this case, this was the first time we tried "automatic upgrade" from one
version to the other (0.9 -> 0.10). We could have done a better job at
documenting the upgrade steps with each version. I wish we had more
outspoken voices in the community sooner than later :)

Every project takes times to iron out issues related to release and version
upgrade. I am glad that we have so much feedback now. As Yi suggested, the
SEP process is a starting step towards documenting our changes across
versions. Additionally, we will work on adding a dedicated page for
upgrades and these will be available for all of the *upcoming* versions.

Please let us know if you have any other concerns or ideas on how we can
improve on our process.

@XiaoChuan: Unfortunately, we don't have proper documentation on upgrading
Samza across various versions. Like I mentioned before, we will put in
extra efforts going forward. There aren't any migration/upgrade steps
needed for versions post 0.10.*. You should be able to simply upgrade
without any issues. Upgrade from 0.9 to 0.10 is an exceptional case. Happy
to help you out in case you encounter more issues.

Cheers!
Navina

On Thu, Mar 30, 2017 at 11:04 AM, XiaoChuan Yu  wrote:

> Is there some sort of document on how to upgrade Samza through various
> versions like the page here for Kafka:
> https://kafka.apache.org/documentation/#upgrade ?
> Having something like this would be ideal.
> On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker  wrote:
>
> > Thanks for the reply Yi, and I apologize if I came off a bit snarky.
> > I'm not sure I agree with the policy (removing migration code and
> > wanting people to upgrade seem at odds to me), but minimally I think we
> > should not assume people are upgrading to each new Samza version. We
> > have done so when features or fixes warrant, and even then on a per-job
> > basis, and I would expect this is a common practice.
> >
> > -Tommy
> >
> > On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> > > Hi, Thomas,
> > >
> > > Sorry to hear that you were hit by the removal of migration in Samza
> > > 0.11.
> > > The reason we removed it is following a deprecate-removal policy in
> > > two
> > > versions. We are not aware that people still using 0.9 after we
> > > released
> > > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> > > Document can
> > > be better to capture that. We are making changes to the design
> > > proposal
> > > s.t. it is more transparent and open to the whole community, through
> > > the
> > > newly proposed SEP process. These kind of breaking changes will go
> > > through
> > > the SEP discuss-vote process in the future and hopefully capture all
> > > these
> > > kind of concerns earlier.
> > >
> > > Best!
> > >
> > > -Yi
> > >
> > > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> > > wrote:
> > >
> > > >
> > > > Yes, we were burned by this. The changelog mapping will be
> > > > regenerated
> > > > instead of migrated and the result will completely hose the job
> > > > (because the mapping was not generated deterministically in
> > > > previous
> > > > versions of Samza). I don't understand why the migration code was
> > > > removed but it was, and to the best of my knowledge the necessity
> > > > to
> > > > not skip version 0.10.0 when upgrading was not documented, let
> > > > alone
> > > > enforced.
> > > >
> > > > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > > > >
> > > > > Good observation Jake!
> > > > >
> > > > > The code for migration was removed in Samza 11. The migration
> > > > > would
> > > > > read
> > > > > change-log offsets from the checkpoint topic and write them to
> > > > > the
> > > > > coordinator stream.
> > > > >
> > > > > If you're using change-logged stores, I'd recommend upgrading
> > > > > from
> > > > > 0.9.1 to
> > > > > 0.10.0 first.
> > > > > Otherwise, you will loose offsets for change-logged stores.
> > > > >
> > > > > I suspect you should be okay for 0.10.0 to 0.12 upgrade.
> > > > >
> > > > > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  > > > > >
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > As I recall, 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Yi,
Good question. Three reasons:

1. In SAMZA-881, we came up with a set of responsibilities for the
JobCoordinator. One of them was to generate/assign processorId. So, it
makes sense to keep getProcessorId() within JobCoordinator interface.
2. StreamProcessor was initially introduced as a user-facing API
SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor. It
was pushing the burden of guaranteeing unique among the processors of a job
to the user. This was not favorable.
3. In general, I think we have consensus that the processorIdGenerator is
going to specific to a runtime environment. Hence, it seems more
appropriate to move it to a lower abstraction layer that deals with the
underlying execution environment.

Let me know if you have a different perspective on this.

Cheers!
Navina

On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:

> @Navina,
>
> Sorry to chime in late. One question:
> 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> Because JobCoordinator provides coordination service across many
> processors, an interface getProcessorId() in JobCoordinator is confusing
> regarding to which processorId we are getting.
>
> Otherwise, the proposal looks good.
>
> -Yi
>
> On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
>  > wrote:
>
> > Good to hear from you, Yan. Thanks! :)
> >
> > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang  wrote:
> >
> > > +1 . Thanks for the proposal, Navina. :)
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > >
> > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > pmaheshw...@linkedin.com.invalid> wrote:
> > >
> > > > +1 (non binding) from me.
> > > >
> > > > - Prateek
> > > >
> > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:
> > > >
> > > > > +1 Looks good to me.
> > > > >
> > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu 
> > > > wrote:
> > > > >
> > > > > > +1 on my side. Very happy to see this proposal. This is a blocker
> > for
> > > > > > integrating fluent API with StreamProcessor, and hopefully we can
> > get
> > > > it
> > > > > > resolved soon :).
> > > > > >
> > > > > > Thanks,
> > > > > > Xinyu
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > nav...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> > > Samza.
> > > > > > > For reference, here is the wiki link:
> > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > >
> > > > > > > Link to discussion mail thread:
> > > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > gmail.com%3E
> > > > > > >
> > > > > > > Please vote on this SEP asap. :)
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Navina
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>



-- 
Navina R.


Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread XiaoChuan Yu
Is there some sort of document on how to upgrade Samza through various
versions like the page here for Kafka:
https://kafka.apache.org/documentation/#upgrade ?
Having something like this would be ideal.
On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker  wrote:

> Thanks for the reply Yi, and I apologize if I came off a bit snarky.
> I'm not sure I agree with the policy (removing migration code and
> wanting people to upgrade seem at odds to me), but minimally I think we
> should not assume people are upgrading to each new Samza version. We
> have done so when features or fixes warrant, and even then on a per-job
> basis, and I would expect this is a common practice.
>
> -Tommy
>
> On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> > Hi, Thomas,
> >
> > Sorry to hear that you were hit by the removal of migration in Samza
> > 0.11.
> > The reason we removed it is following a deprecate-removal policy in
> > two
> > versions. We are not aware that people still using 0.9 after we
> > released
> > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> > Document can
> > be better to capture that. We are making changes to the design
> > proposal
> > s.t. it is more transparent and open to the whole community, through
> > the
> > newly proposed SEP process. These kind of breaking changes will go
> > through
> > the SEP discuss-vote process in the future and hopefully capture all
> > these
> > kind of concerns earlier.
> >
> > Best!
> >
> > -Yi
> >
> > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> > wrote:
> >
> > >
> > > Yes, we were burned by this. The changelog mapping will be
> > > regenerated
> > > instead of migrated and the result will completely hose the job
> > > (because the mapping was not generated deterministically in
> > > previous
> > > versions of Samza). I don't understand why the migration code was
> > > removed but it was, and to the best of my knowledge the necessity
> > > to
> > > not skip version 0.10.0 when upgrading was not documented, let
> > > alone
> > > enforced.
> > >
> > > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > > >
> > > > Good observation Jake!
> > > >
> > > > The code for migration was removed in Samza 11. The migration
> > > > would
> > > > read
> > > > change-log offsets from the checkpoint topic and write them to
> > > > the
> > > > coordinator stream.
> > > >
> > > > If you're using change-logged stores, I'd recommend upgrading
> > > > from
> > > > 0.9.1 to
> > > > 0.10.0 first.
> > > > Otherwise, you will loose offsets for change-logged stores.
> > > >
> > > > I suspect you should be okay for 0.10.0 to 0.12 upgrade.
> > > >
> > > > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  > > > >
> > > > wrote:
> > > >
> > > > >
> > > > >
> > > > > As I recall, samza 0.10 introduced the coordinator stream and
> > > > > there
> > > > > was
> > > > > code to do an automatic migration to use that feature. @navina,
> > > > > @yi, do you
> > > > > know if that migration code is still in samza 12?
> > > > >
> > > > > If not, then it's probably better to update from 0.9.1 to
> > > > > 0.10.0
> > > > > and then
> > > > > to 0.12.0. I don't think there were any changes requiring
> > > > > migration
> > > > > between
> > > > > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is
> > > > > probably
> > > > > less of
> > > > > an issue.
> > > > >
> > > > > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > > > > jagadish1...@gmail.com> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi Xiaochuan,
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Do I need to upgrade Kafka and/or YARN?
> > > > > > *Yarn version:*
> > > > > >
> > > > > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > > > > >- If you already have 2.6.0 installed (as you have said),
> > > > > > I
> > > > > > believe
> > > > > you
> > > > > >
> > > > > >
> > > > > >will be fine. (but I'm not sure)
> > > > > >
> > > > > > *Kafka version: *
> > > > > >
> > > > > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > > > > >- If your Kafka brokers are on an older version of Kafka,
> > > > > > you
> > > > > > should
> > > > > >upgrade them to use at-least 0.10. Kafka clients are
> > > > > > usually
> > > > > >incompatible with older versions of brokers.
> > > > > >
> > > > > > *Java version: *
> > > > > >
> > > > > >
> > > > > >
> > > > > >- Samza 0.12 binaries are compiled using Java 8.  Hence,
> > > > > > they
> > > > > > cannot
> > > > > be
> > > > > >
> > > > > >
> > > > > >run on older versions of the Java run-time.
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > I'm extremely new to Samza in terms of operations aspect.
> > > > > > > > I'm
> > > > > > > > not sure
> > > > > > what
> > > > > > information would be relevant in this case so please ask
> > > > > > away.
> > > > > >
> > > > > > I'd 

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Thomas Becker
Thanks for the reply Yi, and I apologize if I came off a bit snarky.
I'm not sure I agree with the policy (removing migration code and
wanting people to upgrade seem at odds to me), but minimally I think we
should not assume people are upgrading to each new Samza version. We
have done so when features or fixes warrant, and even then on a per-job
basis, and I would expect this is a common practice.

-Tommy

On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> Hi, Thomas,
>
> Sorry to hear that you were hit by the removal of migration in Samza
> 0.11.
> The reason we removed it is following a deprecate-removal policy in
> two
> versions. We are not aware that people still using 0.9 after we
> released
> 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> Document can
> be better to capture that. We are making changes to the design
> proposal
> s.t. it is more transparent and open to the whole community, through
> the
> newly proposed SEP process. These kind of breaking changes will go
> through
> the SEP discuss-vote process in the future and hopefully capture all
> these
> kind of concerns earlier.
>
> Best!
>
> -Yi
>
> On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> wrote:
>
> >
> > Yes, we were burned by this. The changelog mapping will be
> > regenerated
> > instead of migrated and the result will completely hose the job
> > (because the mapping was not generated deterministically in
> > previous
> > versions of Samza). I don't understand why the migration code was
> > removed but it was, and to the best of my knowledge the necessity
> > to
> > not skip version 0.10.0 when upgrading was not documented, let
> > alone
> > enforced.
> >
> > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > >
> > > Good observation Jake!
> > >
> > > The code for migration was removed in Samza 11. The migration
> > > would
> > > read
> > > change-log offsets from the checkpoint topic and write them to
> > > the
> > > coordinator stream.
> > >
> > > If you're using change-logged stores, I'd recommend upgrading
> > > from
> > > 0.9.1 to
> > > 0.10.0 first.
> > > Otherwise, you will loose offsets for change-logged stores.
> > >
> > > I suspect you should be okay for 0.10.0 to 0.12 upgrade.
> > >
> > > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  > > >
> > > wrote:
> > >
> > > >
> > > >
> > > > As I recall, samza 0.10 introduced the coordinator stream and
> > > > there
> > > > was
> > > > code to do an automatic migration to use that feature. @navina,
> > > > @yi, do you
> > > > know if that migration code is still in samza 12?
> > > >
> > > > If not, then it's probably better to update from 0.9.1 to
> > > > 0.10.0
> > > > and then
> > > > to 0.12.0. I don't think there were any changes requiring
> > > > migration
> > > > between
> > > > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is
> > > > probably
> > > > less of
> > > > an issue.
> > > >
> > > > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > > > jagadish1...@gmail.com> wrote:
> > > >
> > > > >
> > > > >
> > > > > Hi Xiaochuan,
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > Do I need to upgrade Kafka and/or YARN?
> > > > > *Yarn version:*
> > > > >
> > > > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > > > >- If you already have 2.6.0 installed (as you have said),
> > > > > I
> > > > > believe
> > > > you
> > > > >
> > > > >
> > > > >will be fine. (but I'm not sure)
> > > > >
> > > > > *Kafka version: *
> > > > >
> > > > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > > > >- If your Kafka brokers are on an older version of Kafka,
> > > > > you
> > > > > should
> > > > >upgrade them to use at-least 0.10. Kafka clients are
> > > > > usually
> > > > >incompatible with older versions of brokers.
> > > > >
> > > > > *Java version: *
> > > > >
> > > > >
> > > > >
> > > > >- Samza 0.12 binaries are compiled using Java 8.  Hence,
> > > > > they
> > > > > cannot
> > > > be
> > > > >
> > > > >
> > > > >run on older versions of the Java run-time.
> > > > >
> > > > >
> > > > > >
> > > > > >
> > > > > > >
> > > > > > >
> > > > > > > I'm extremely new to Samza in terms of operations aspect.
> > > > > > > I'm
> > > > > > > not sure
> > > > > what
> > > > > information would be relevant in this case so please ask
> > > > > away.
> > > > >
> > > > > I'd first start by upgrading the Kafka brokers (assuming
> > > > > you're
> > > > > on Java
> > > > 8+
> > > > >
> > > > >
> > > > > already).
> > > > > Let us know how the migration goes!
> > > > >
> > > > > Thanks,
> > > > > Jagadish
> > > > >
> > > > >
> > > > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu  > > > > ik.c
> > > > > om>
> > > > > wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > What are the general steps for upgrading Samza from 0.9 to
> > > > > > 0.12?
> > > > > > Do I need to upgrade Kafka and/or YARN?
> > > > 

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Yi Pan
Hi, Thomas,

Sorry to hear that you were hit by the removal of migration in Samza 0.11.
The reason we removed it is following a deprecate-removal policy in two
versions. We are not aware that people still using 0.9 after we released
0.11 and were not expecting a direct upgrade from 0.9 to 0.12. Document can
be better to capture that. We are making changes to the design proposal
s.t. it is more transparent and open to the whole community, through the
newly proposed SEP process. These kind of breaking changes will go through
the SEP discuss-vote process in the future and hopefully capture all these
kind of concerns earlier.

Best!

-Yi

On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker  wrote:

> Yes, we were burned by this. The changelog mapping will be regenerated
> instead of migrated and the result will completely hose the job
> (because the mapping was not generated deterministically in previous
> versions of Samza). I don't understand why the migration code was
> removed but it was, and to the best of my knowledge the necessity to
> not skip version 0.10.0 when upgrading was not documented, let alone
> enforced.
>
> On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > Good observation Jake!
> >
> > The code for migration was removed in Samza 11. The migration would
> > read
> > change-log offsets from the checkpoint topic and write them to the
> > coordinator stream.
> >
> > If you're using change-logged stores, I'd recommend upgrading from
> > 0.9.1 to
> > 0.10.0 first.
> > Otherwise, you will loose offsets for change-logged stores.
> >
> > I suspect you should be okay for 0.10.0 to 0.12 upgrade.
> >
> > On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes 
> > wrote:
> >
> > >
> > > As I recall, samza 0.10 introduced the coordinator stream and there
> > > was
> > > code to do an automatic migration to use that feature. @navina,
> > > @yi, do you
> > > know if that migration code is still in samza 12?
> > >
> > > If not, then it's probably better to update from 0.9.1 to 0.10.0
> > > and then
> > > to 0.12.0. I don't think there were any changes requiring migration
> > > between
> > > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably
> > > less of
> > > an issue.
> > >
> > > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > > jagadish1...@gmail.com> wrote:
> > >
> > > >
> > > > Hi Xiaochuan,
> > > >
> > > > >
> > > > > >
> > > > > > Do I need to upgrade Kafka and/or YARN?
> > > > *Yarn version:*
> > > >
> > > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > > >- If you already have 2.6.0 installed (as you have said), I
> > > > believe
> > > you
> > > >
> > > >will be fine. (but I'm not sure)
> > > >
> > > > *Kafka version: *
> > > >
> > > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > > >- If your Kafka brokers are on an older version of Kafka, you
> > > > should
> > > >upgrade them to use at-least 0.10. Kafka clients are usually
> > > >incompatible with older versions of brokers.
> > > >
> > > > *Java version: *
> > > >
> > > >
> > > >
> > > >- Samza 0.12 binaries are compiled using Java 8.  Hence, they
> > > > cannot
> > > be
> > > >
> > > >run on older versions of the Java run-time.
> > > >
> > > >
> > > > >
> > > > > >
> > > > > > I'm extremely new to Samza in terms of operations aspect. I'm
> > > > > > not sure
> > > > what
> > > > information would be relevant in this case so please ask away.
> > > >
> > > > I'd first start by upgrading the Kafka brokers (assuming you're
> > > > on Java
> > > 8+
> > > >
> > > > already).
> > > > Let us know how the migration goes!
> > > >
> > > > Thanks,
> > > > Jagadish
> > > >
> > > >
> > > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu  > > > om>
> > > > wrote:
> > > >
> > > > >
> > > > > Hi,
> > > > >
> > > > > What are the general steps for upgrading Samza from 0.9 to
> > > > > 0.12?
> > > > > Do I need to upgrade Kafka and/or YARN?
> > > > >
> > > > > I don't know how Samza was setup initially but we currently
> > > > > have the
> > > > > following setup:
> > > > >
> > > > > Samza version: 0.9.1
> > > > > YARN version: Hadoop 2.6.0-cdh5.4.8
> > > > > Kafka version: 0.9.0.1
> > > > >
> > > > > I think installation of Kafka and YARN were managed through
> > > > > Puppet.
> > > > > I'm extremely new to Samza in terms of operations aspect. I'm
> > > > > not sure
> > > > what
> > > > >
> > > > > information would be relevant in this case so please ask away.
> > > > >
> > > > > Thanks,
> > > > > Xiaochuan Yu
> > > > >
> > > >
> > > >
> > > > --
> > > > Jagadish V,
> > > > Graduate Student,
> > > > Department of Computer Science,
> > > > Stanford University
> > > >
> >
> >
> --
>
>
> Tommy Becker
>
> Senior Software Engineer
>
> O +1 919.460.4747
>
> tivo.com
>
>
> 
>
> This email and any attachments may contain confidential and privileged
> material for the sole use of the 

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Yi Pan
@Navina,

Sorry to chime in late. One question:
1. Why is it in JobCoordinator, and why not in StreamProcessor class?
Because JobCoordinator provides coordination service across many
processors, an interface getProcessorId() in JobCoordinator is confusing
regarding to which processorId we are getting.

Otherwise, the proposal looks good.

-Yi

On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh  wrote:

> Good to hear from you, Yan. Thanks! :)
>
> On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang  wrote:
>
> > +1 . Thanks for the proposal, Navina. :)
> >
> > Fang, Yan
> > yanfang...@gmail.com
> >
> > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > pmaheshw...@linkedin.com.invalid> wrote:
> >
> > > +1 (non binding) from me.
> > >
> > > - Prateek
> > >
> > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:
> > >
> > > > +1 Looks good to me.
> > > >
> > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu 
> > > wrote:
> > > >
> > > > > +1 on my side. Very happy to see this proposal. This is a blocker
> for
> > > > > integrating fluent API with StreamProcessor, and hopefully we can
> get
> > > it
> > > > > resolved soon :).
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > nav...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > Hi everyone,
> > > > > >
> > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> > Samza.
> > > > > > For reference, here is the wiki link:
> > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > >
> > > > > > Link to discussion mail thread:
> > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > gmail.com%3E
> > > > > >
> > > > > > Please vote on this SEP asap. :)
> > > > > >
> > > > > > Thanks!
> > > > > > Navina
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Navina R.
>


Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Thomas Becker
Yes, we were burned by this. The changelog mapping will be regenerated
instead of migrated and the result will completely hose the job
(because the mapping was not generated deterministically in previous
versions of Samza). I don't understand why the migration code was
removed but it was, and to the best of my knowledge the necessity to
not skip version 0.10.0 when upgrading was not documented, let alone
enforced.

On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> Good observation Jake!
>
> The code for migration was removed in Samza 11. The migration would
> read
> change-log offsets from the checkpoint topic and write them to the
> coordinator stream.
>
> If you're using change-logged stores, I'd recommend upgrading from
> 0.9.1 to
> 0.10.0 first.
> Otherwise, you will loose offsets for change-logged stores.
>
> I suspect you should be okay for 0.10.0 to 0.12 upgrade.
>
> On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes 
> wrote:
>
> >
> > As I recall, samza 0.10 introduced the coordinator stream and there
> > was
> > code to do an automatic migration to use that feature. @navina,
> > @yi, do you
> > know if that migration code is still in samza 12?
> >
> > If not, then it's probably better to update from 0.9.1 to 0.10.0
> > and then
> > to 0.12.0. I don't think there were any changes requiring migration
> > between
> > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably
> > less of
> > an issue.
> >
> > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > >
> > > Hi Xiaochuan,
> > >
> > > >
> > > > >
> > > > > Do I need to upgrade Kafka and/or YARN?
> > > *Yarn version:*
> > >
> > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > >- If you already have 2.6.0 installed (as you have said), I
> > > believe
> > you
> > >
> > >will be fine. (but I'm not sure)
> > >
> > > *Kafka version: *
> > >
> > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > >- If your Kafka brokers are on an older version of Kafka, you
> > > should
> > >upgrade them to use at-least 0.10. Kafka clients are usually
> > >incompatible with older versions of brokers.
> > >
> > > *Java version: *
> > >
> > >
> > >
> > >- Samza 0.12 binaries are compiled using Java 8.  Hence, they
> > > cannot
> > be
> > >
> > >run on older versions of the Java run-time.
> > >
> > >
> > > >
> > > > >
> > > > > I'm extremely new to Samza in terms of operations aspect. I'm
> > > > > not sure
> > > what
> > > information would be relevant in this case so please ask away.
> > >
> > > I'd first start by upgrading the Kafka brokers (assuming you're
> > > on Java
> > 8+
> > >
> > > already).
> > > Let us know how the migration goes!
> > >
> > > Thanks,
> > > Jagadish
> > >
> > >
> > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu  > > om>
> > > wrote:
> > >
> > > >
> > > > Hi,
> > > >
> > > > What are the general steps for upgrading Samza from 0.9 to
> > > > 0.12?
> > > > Do I need to upgrade Kafka and/or YARN?
> > > >
> > > > I don't know how Samza was setup initially but we currently
> > > > have the
> > > > following setup:
> > > >
> > > > Samza version: 0.9.1
> > > > YARN version: Hadoop 2.6.0-cdh5.4.8
> > > > Kafka version: 0.9.0.1
> > > >
> > > > I think installation of Kafka and YARN were managed through
> > > > Puppet.
> > > > I'm extremely new to Samza in terms of operations aspect. I'm
> > > > not sure
> > > what
> > > >
> > > > information would be relevant in this case so please ask away.
> > > >
> > > > Thanks,
> > > > Xiaochuan Yu
> > > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
>
>
--


Tommy Becker

Senior Software Engineer

O +1 919.460.4747

tivo.com




This email and any attachments may contain confidential and privileged material 
for the sole use of the intended recipient. Any review, copying, or 
distribution of this email (or any attachments) by others is prohibited. If you 
are not the intended recipient, please contact the sender immediately and 
permanently delete this email and any attachments. No employee or agent of TiVo 
Inc. is authorized to conclude any binding agreement on behalf of TiVo Inc. by 
email. Binding agreements with TiVo Inc. may only be made by a signed written 
agreement.