Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

2019-06-07 Thread Navina Ramesh
Yaay, Boris! Congrats! 🙂

From: Daniel Nishimura 
Sent: Friday, June 7, 2019 3:40 PM
To: dev@samza.apache.org
Subject: Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

Congrats!

On Fri, Jun 7, 2019 at 3:35 PM Ignacio Solis  wrote:

> Congrats Boris!
>
> On Fri, Jun 7, 2019 at 3:20 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Congratulations Boris!
> >
> > On Fri, Jun 7, 2019 at 3:19 PM Jagadish Venkatraman <
> > jagadish1...@gmail.com>
> > wrote:
> >
> > > Congratulations Boris!
> > >
> > > On Fri, Jun 7, 2019 at 3:15 PM Xinyu Liu 
> wrote:
> > >
> > > > Congrats, Boris!
> > > >
> > > > Xinyu
> > > >
> > > > On Fri, Jun 7, 2019 at 3:13 PM Jakob Homan 
> wrote:
> > > >
> > > > > Howdy all-
> > > > >I'm very pleased to announce that the Samza PMC has voted Boris
> > > > > Shkolnik to be a Project Management Committee (PMC) Member.  The
> PMC
> > > > > is responsible for the overall health of a project andl for voting
> in
> > > > > new committers and PMC members, as well as VOTEing on releases.
> Over
> > > > > the past two years, Boris has been a valuable committer on the
> > > > > project.
> > > > >
> > > > > Congrats Boris!
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jakob
> > > > > on behalf of the Samza PMC
> > > > >
> > > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>


Re: [VOTE] Migration of Samza git repo to gitbox.apache.org

2019-01-23 Thread Navina Ramesh
It looks like a mandatory migration. Why do we need a vote for this?


Thanks!
Navina


From: Pawas Chhokra 
Sent: Wednesday, January 23, 2019 11:50:32 AM
To: dev@samza.apache.org
Subject: [VOTE] Migration of Samza git repo to gitbox.apache.org

Hi all,

This is a call for a vote on migrating Samza git repo to gitbox.apache.org, on
11 AM, Jan 29, 2019. As mandated by the Apache Infrastructure Team, all git
repositories must be migrated from git-wip-us.apache.org URL to
gitbox.apache.org, as the old service is being decommissioned.
The vote will be open for 72 hours (ending at 12:00 PM PST Monday,
January 28). You can vote as follows:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

The vote is +1 from my side.

Thanks & Regards,
Pawas Chhokra


Re: [VOTE] SEP-12: Integration Test Framework

2018-05-17 Thread Navina Ramesh
+1 can't wait to have this in! :)


From: Jacob Maes 
Sent: Thursday, May 17, 2018 10:13:57 AM
To: dev@samza.apache.org
Subject: Re: [VOTE] SEP-12: Integration Test Framework

+1

On Thu, May 17, 2018 at 9:56 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks Sanil for the proposal. This will go a long way in simplifying
> testing of Samza applications.
>
> +1 (binding)
>
>
>
> On Thu, May 17, 2018 at 9:45 AM, Daniel Nishimura 
> wrote:
>
> > +1
> >
> > Looks great!
> >
> > On Thu, May 17, 2018 at 9:08 AM, Xinyu Liu 
> wrote:
> >
> > > +1
> > >
> > > The proposal looks great to me. Look forward to seeing the
> > implementation.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, May 16, 2018 at 6:12 PM, Sanil Jain 
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > This is a call for a vote for Samza's Integration Test Framework as
> > > > described by:
> > > >
> > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > 12%3A+Integration+Test+Framework
> > > >
> > > > The vote will be open for 3 days (ending at 6:00PM Monday,
> 05/21/2018).
> > > >
> > > > Link to the discuss mailing thread:
> > > >
> > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201805.mbox/%
> > > > 3CDM5PR21MB02827A6FA9F47CB8EF99A339A2810%40DM5PR21MB0282.
> > > > namprd21.prod.outlook.com%3E
> > > >
> > > >
> > > > Please vote:
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: Welcome Xinyu as new Samza PMC!

2018-01-17 Thread Navina Ramesh
Congratulations, Xinyu!
Thanks for all your contribution and looking forward to more 😊


Cheers!
Navina


From: Yi Pan 
Sent: Wednesday, January 17, 2018 10:26:54 AM
To: dev@samza.apache.org
Subject: Welcome Xinyu as new Samza PMC!

Finally all the documentation procedure is completed and Xinyu Liu has been
officially promoted to Samza PMC member! This is well deserved due to his
continued contribution to the Samza project.

Please join me to welcome Xinyu as our newest PMC member!

Cheers!

-Yi Pan


Re: [VOTE] Apache Samza 0.14.0 RC5

2017-12-22 Thread Navina Ramesh
+1 on RC5.


Verified signature and checksum. Ran ./bin/check-all.sh


Cheers!
Navina


From: xinyu liu 
Sent: Friday, December 22, 2017 2:50:48 PM
To: dev@samza.apache.org
Subject: [VOTE] Apache Samza 0.14.0 RC5

This is a call for a vote on a release of Apache Samza 0.14.0. Thanks
to everyone
who has contributed to this release.

The release candidate can be downloaded from here:
http://home.apache.org/~xinyu/samza-0.14.0-rc5/

The release candidate is signed with pgp key C31D7061, which can be found on
keyservers:
http://pgp.mit.edu/pks/lookup?op=get&search=0x35964389C31D7061

The git tag is release-0.14.1-rc5 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
refs/tags/release-0.14.0-rc5

Test binaries have been published to Maven's staging repository, and
are available
here:
https://repository.apache.org/content/repositories/orgapachesamza-1042

61 issues have been resolved as part of this release
https://issues.apache.org/jira/browse/SAMZA-1519?jql=project
%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%
20status%20%3D%20Resolved

The vote will be open for 72 hours (ending at 15:00 PM Thursday,
12/28/2017).

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:

[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Thanks,
Xinyu


[GitHub] samza pull request #379: SAMZA-1523 Cleanup table entries before shutting do...

2017-12-05 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/379

SAMZA-1523 Cleanup table entries before shutting down the processor

Modified the `TableUtils#deleteProcessorEntity` to provide an option to 
disable optimistic locking during a call to Azure Table Storage service. 

@sborya @PawasChhokra @nickpan47   Review please? 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza azure-etag-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/379.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #379


commit 28312cb8b91b16349f6d2bc60e9c41ae988b96e0
Author: navina 
Date:   2017-12-05T19:32:09Z

SAMZA-1523 Cleanup table entries before shutting down the processor




---


Re: [DISCUSS] Samza 0.14.0 release

2017-11-28 Thread Navina Ramesh
The list looks awesome! :)


From: Jacob Maes 
Sent: Tuesday, November 28, 2017 10:03:58 AM
To: dev@samza.apache.org
Subject: Re: [DISCUSS] Samza 0.14.0 release

+1

On Mon, Nov 27, 2017 at 8:15 PM, Fred Haifeng Ji 
wrote:

> +1! Thanks Bharath!
>
> Fred
>
> On Mon, Nov 27, 2017 at 11:10 AM, Yi Pan  wrote:
>
> > Thanks for driving this! +1
> >
> > A few minor things that are pending that I think we should pull in:
> > 1) https://issues.apache.org/jira/browse/SAMZA-1459
> > 2) https://github.com/apache/samza/pull/302
> > 3) https://github.com/apache/samza/pull/301
> > 4) https://github.com/apache/samza/pull/286
> > 5) https://issues.apache.org/jira/browse/SAMZA-1406
> > 6) https://issues.apache.org/jira/browse/SAMZA-1356
> > 7) https://github.com/apache/samza/pull/10
> > 8) https://github.com/apache/samza/pull/7
> >
> > Let's pull in the patches that are ready as well.
> >
> > Thanks!
> >
> > On Mon, Nov 27, 2017 at 10:45 AM, Debraj Manna  >
> > wrote:
> >
> > > +1
> > >
> > > On Mon, Nov 27, 2017 at 11:32 PM, xinyu liu 
> > wrote:
> > >
> > > > +1.
> > > >
> > > > Very happy to see a lot of important features added in this release.
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Mon, Nov 27, 2017 at 10:00 AM, Jagadish Venkatraman <
> > > > jagad...@apache.org>
> > > > wrote:
> > > >
> > > > > +1 from my side.
> > > > >
> > > > > Thank you Bharath for driving the release!
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Nov 27, 2017 at 9:50 AM, Bharath Kumara Subramanian <
> > > > > codin.mart...@gmail.com> wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > >
> > > > > >
> > > > > > We have added couple of major features to master since 0.13.1
> that
> > > > > warrants
> > > > > > a major release.
> > > > > >
> > > > > > Within LinkedIn, some of these features have already been tested
> as
> > > > part
> > > > > of
> > > > > > our test suites. We plan to continue our testing in coming weeks
> to
> > > > > > validate the stability prior to release.
> > > > > >
> > > > > > We wanted to kick off the discussion in open source forum to keep
> > the
> > > > > > momentum flowing.
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is the list of features that are part of the new release
> > > > > >
> > > > > >- SAMZA-1510  jira/browse/SAMZA-1510>
> > -
> > > > > Samza
> > > > > >SQL
> > > > > >- SAMZA-1417  jira/browse/SAMZA-1417>
> > -
> > > > Add
> > > > > >support for multistage batch to Samza on Hadoop
> > > > > >- SAMZA-1438  jira/browse/SAMZA-1438>
> > -
> > > > > > Event-hub
> > > > > >connectors for Samza
> > > > > >
> > > > > >
> > > > > >
> > > > > > We have also worked on stabilizing our 0.13 features. Here are
> some
> > > > > > highlights
> > > > > >
> > > > > >- SAMZA-1454  jira/browse/SAMZA-1454
> > >,
> > > > > >SAMZA-1493 
> -
> > > Add
> > > > > >support for durable state for high level API
> > > > > >- SAMZA-1417  jira/browse/SAMZA-1417>
> > > > > >SAMZA-1330 
> > > > > > SAMZA-1289
> > > > > > -
> > > Stabilization
> > > > of
> > > > > >ZooKeeper based deployment model
> > > > > >- SAMZA-1471  jira/browse/SAMZA-1471
> > >,
> > > > > >SAMZA-1392  >,
> > > > > > SAMZA-1465
> > > > > > -
> > Performance
> > > > > >improvements
> > > > > >
> > > > > >
> > > > > >
> > > > > > You can find the concrete list of the features here
> > > > > >  > > > > > project%20%3D%20samza%20AND%20fixVersion%20%3D%200.14.0%
> > > > > > 20AND%20resolution%20%3D%20fixed>
> > > > > > .
> > > > > >
> > > > > >
> > > > > >
> > > > > > Here is my proposal on our release schedule and timelines.
> > > > > >
> > > > > >1. Create a release candidate with the current 0.14.0 HEAD
> > > > > >2. Target a release vote on the week Dec 4st
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thoughts?
> > > > > >
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Bharath
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Haifeng (Fred)  Ji
>


Re: Historical container logs in YARN UI

2017-10-02 Thread Navina Ramesh
+Jon (Adding Jon to cc)


From: Yi Pan 
Sent: Monday, October 2, 2017 2:10:27 AM
To: dev@samza.apache.org
Subject: Re: Historical container logs in YARN UI

Hi, XiaoChuan,

Our SRE team have been using timeline server in YARN at LinkedIn to get the
historical container logs in our admin dashboard. @Jon Bringburst, can you
share some experience regarding to how to configure timeline server in YARN?

Thanks a lot!

-Yi

On Sat, Sep 30, 2017 at 1:08 PM, XiaoChuan Yu  wrote:

> Hi,
>
> Is there a way to view historical container logs in YARN UI?
> When I try view historical logs from YARN UI right now I get the follow
> message:
> Failed while trying to construct the redirect url to the log server. Log
> Server url may not be configured
> ...
>
> I configured log aggregation and timeline server in YARN.
> I know there's a history server implementation for Map Reduce. Is there a
> similar history server implementation available for Samza?
>
> Thanks,
> Xiaochuan Yu
>


[GitHub] samza pull request #302: Fixing broken link between Yarn Host Affinity and R...

2017-09-25 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/302

Fixing broken link between Yarn Host Affinity and Resource Localizati…

Fixing broken link between Yarn Host Affinity and Resource Localization 
pages under Documentation

Patch needs to be back-ported to 0.13.0 website! 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza website-link-fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/302.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #302


commit 880c917a2d317459b44dff3f8e76c8dec55a012d
Author: navina 
Date:   2017-09-26T00:17:02Z

Fixing broken link between Yarn Host Affinity and Resource Localization 
pages under Documentation




---


[GitHub] samza pull request #301: Samza versioned Release Notes

2017-09-25 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/301

Samza versioned Release Notes

Adding a versioned page for release/upgrade notes. We can start this 
process from the next major version release, aka 0.14.0. 

Please update this page as and when you add new features/configs/API or 
deprecate features/configs/API. Basically, anything that can be useful for 
Samza users trying to upgrade. 

Note: `site.version` is not necessarily same as samza release version. For 
now, I am using it as a placeholder. Hopefully, with the next generation of our 
website, it will be better defined. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza versioning

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #301


commit 8abeb85c8c0bb51d46a225038c6195c15aaf7627
Author: navina 
Date:   2017-09-25T23:13:31Z

Adding a versioned page for release / upgrade notes




---


Re: Connection timed out error while installing "Hello Samza"

2017-09-14 Thread Navina Ramesh
if it is already installed in your system, I don't think you need re-install 
them again. However, do make sure that there are no ACLs enabled on ZK because 
kafka may not support it.


> My guess Even if there are conflicts on earlier installations it should throw 
> some other error but not the connection time out

I think your connection timeout was related to when you were trying to download 
samza from apache.


If you can re-use the already install kafka and zookeeper client, then install 
yarn, and follow the remaining steps to build and deploy samza. In any case, 
more logs will be useful for us to debug.


> it could either related resolving "localhost" or firewall that prevents 
> communication between ports"

I meant this could be the reason why ZK is not able to install and/or talk to 
the kafka broker.


It totally stumps me why you can't downloads samza from your server. Is it 
possible for you try on another clean host and see if it works or fails for you?


Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Thursday, September 14, 2017 12:33:26 PM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

That is a good catch Naveena the server which I am trying to install is an 
Hadoop edge node, it has kafka broker and Zookeeper client already installed 
for my Hadoop
Does it matter If I am installing it on a separate folder?. My guess Even if 
there are conflicts on earlier installations it should throw some other error 
but not the connection time out

Coming to your question " it could either related resolving "localhost" or 
firewall that prevents communication between ports" - how to prove it is 
because of Firewall issue. I am able to clone Samza files from Apache git. I 
can download any external files on this server

~Sri

-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Thursday, September 14, 2017 2:51 PM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

I wonder if this has anything to do with previous kafka / zookeeper installed 
on your box. Just for sanity, try clearing /tmp/zookeeeper* and /tmp/kafka* 
before re-trying those steps.


Same as Yi, I strongly suspect issues with your local laptop setup - it could 
either related resolving "localhost" or firewall that prevents communication 
between ports.


Navina


From: Yi Pan 
Sent: Thursday, September 14, 2017 11:37:46 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi, Anantharaman,

I just did the same steps as you described in your email and all passed on my 
box. Hence, I strongly suspect that it is related to your local laptop network 
setup.

Could you post all the command line output when you ran the sequence of 
commands?

-Yi

On Thu, Sep 14, 2017 at 11:21 AM, Yi Pan  wrote:

> Hi, Ananarath,
>
> It is very strange that you are seeing this timeout exception that we
> do not see. I am trying to follow the exact steps you did to see
> whether there is anything broken. I will update you this afternoon.
>
> Meanwhile, could you check your hostname setup and firewall
> configuration to see whether your local laptop has blocked access via
> the public IP address to your laptop? Could you verify that your
> localhost is resolved to
> 127.0.0.1 and is accessible?
>
> -Yi
>
> On Thu, Sep 14, 2017 at 11:18 AM, Anantharaman, Srinatha (Contractor)
> < srinatha_ananthara...@comcast.com> wrote:
>
>> Yi,
>>
>> Is there any alternate way to install Samza  Or solution to the
>> connection time out error?
>>
>> Regards,
>> ~Sri
>>
>> From: Anantharaman, Srinatha (Contractor)
>> Sent: Wednesday, September 13, 2017 11:37 AM
>> To: dev@samza.apache.org
>> Subject: RE: Connection timed out error while installing "Hello Samza"
>>
>>
>> Yi,
>>
>>
>>
>> I am trying to build Samza locally by following the steps provided by
>> Navina.
>>
>> As per those steps kafka will be installed after Zookeeper, I am
>> getting Error while starting Zookeeper after it is installed
>>
>>
>>
>>
>>
>> Steps Followed :
>>
>>
>>
>> Yes. You can clone apache/samza locally and build it with:
>>
>>
>>
>>
>>
>> cd 
>>
>>
>>
>> gradle -b bootstrap.gradle
>>
>>
>>
>> ./gradlew clean build -x test
>>
>>
>>
>> ./gradlew publishToMavenLocal## This publishes a snapshot version of
>> the latest apache/samza into your local maven repo
>>
>>
>

Re: Connection timed out error while installing "Hello Samza"

2017-09-14 Thread Navina Ramesh
I wonder if this has anything to do with previous kafka / zookeeper installed 
on your box. Just for sanity, try clearing /tmp/zookeeeper* and /tmp/kafka* 
before re-trying those steps.


Same as Yi, I strongly suspect issues with your local laptop setup - it could 
either related resolving "localhost" or firewall that prevents communication 
between ports.


Navina


From: Yi Pan 
Sent: Thursday, September 14, 2017 11:37:46 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi, Anantharaman,

I just did the same steps as you described in your email and all passed on
my box. Hence, I strongly suspect that it is related to your local laptop
network setup.

Could you post all the command line output when you ran the sequence of
commands?

-Yi

On Thu, Sep 14, 2017 at 11:21 AM, Yi Pan  wrote:

> Hi, Ananarath,
>
> It is very strange that you are seeing this timeout exception that we do
> not see. I am trying to follow the exact steps you did to see whether there
> is anything broken. I will update you this afternoon.
>
> Meanwhile, could you check your hostname setup and firewall configuration
> to see whether your local laptop has blocked access via the public IP
> address to your laptop? Could you verify that your localhost is resolved to
> 127.0.0.1 and is accessible?
>
> -Yi
>
> On Thu, Sep 14, 2017 at 11:18 AM, Anantharaman, Srinatha (Contractor) <
> srinatha_ananthara...@comcast.com> wrote:
>
>> Yi,
>>
>> Is there any alternate way to install Samza  Or solution to the
>> connection time out error?
>>
>> Regards,
>> ~Sri
>>
>> From: Anantharaman, Srinatha (Contractor)
>> Sent: Wednesday, September 13, 2017 11:37 AM
>> To: dev@samza.apache.org
>> Subject: RE: Connection timed out error while installing "Hello Samza"
>>
>>
>> Yi,
>>
>>
>>
>> I am trying to build Samza locally by following the steps provided by
>> Navina.
>>
>> As per those steps kafka will be installed after Zookeeper, I am getting
>> Error while starting Zookeeper after it is installed
>>
>>
>>
>>
>>
>> Steps Followed :
>>
>>
>>
>> Yes. You can clone apache/samza locally and build it with:
>>
>>
>>
>>
>>
>> cd 
>>
>>
>>
>> gradle -b bootstrap.gradle
>>
>>
>>
>> ./gradlew clean build -x test
>>
>>
>>
>> ./gradlew publishToMavenLocal## This publishes a snapshot version of
>> the latest apache/samza into your local maven repo
>>
>>
>>
>>
>>
>> Then, head to hello-samza workspace and build again:
>>
>>
>>
>> cd 
>>
>>
>>
>> mvn clean package  ## This should create a build target
>>
>>
>>
>> ./bin/grid install zookeeper
>>
>>
>>
>> ./bin/grid start zookeeper
>>
>>
>>
>> ./bin/grid install kafka
>>
>>
>>
>> ./bin/grid start kafka
>>
>>
>>
>> ./bin/grid install yarn
>>
>>
>>
>> ./bin/grid start yarn
>>
>>
>>
>>
>>
>> mkdir -p deploy/samza
>>
>>
>>
>> tar -xvf ./target/hello-samza-*-SNAPSHOT-dist.tar.gz -C deploy/samza
>>
>>
>>
>>
>>
>> NOTE : BTW From the above steps I could not execute " gradle -b
>> bootstrap.gradle" since that command does not exists
>>
>>
>>
>> Regards,
>>
>> ~Sri
>>
>>
>>
>> -Original Message-
>> From: Yi Pan [mailto:nickpa...@gmail.com]
>> Sent: Tuesday, September 12, 2017 7:13 PM
>> To: dev@samza.apache.org<mailto:dev@samza.apache.org>
>> Subject: Re: Connection timed out error while installing "Hello Samza"
>>
>>
>>
>> Hi, Anantharam,
>>
>>
>>
>> Could you confirm at which step your setup failed? It seems that your
>> zookeeper server is running. Could you check to see whether your Kafka
>> broker is running? You can either do a telnet localhost 9092 or do a ps
>> auxww | grep kafka to see whether you got any broker running.
>>
>>
>>
>> Sometimes, the Kafka service takes time to start on a single laptop. You
>> can just try to run ./bin/grid start kafka
>>
>>
>>
>> again to see whether the service is up.
>>
>>
>>
>> Thanks!
>>
>>
>>
>> -Yi
>>
>>
>>
>> On Tue, Sep 12, 2017 at 11:38 AM, Anantharaman, Srinatha (Contracto

Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Yes. You can clone apache/samza locally and build it with:


cd 

gradle -b bootstrap.gradle

./gradlew clean build -x test

./gradlew publishToMavenLocal## This publishes a snapshot version of the 
latest apache/samza into your local maven repo


Then, head to hello-samza workspace and build again:

cd 

mvn clean package  ## This should create a build target

./bin/grid install zookeeper

./bin/grid start zookeeper

./bin/grid install kafka

./bin/grid start kafka

./bin/grid install yarn

./bin/grid start yarn


mkdir -p deploy/samza

tar -xvf ./target/hello-samza-*-SNAPSHOT-dist.tar.gz -C deploy/samza


After this, you can follow steps in the tutorial to "Run" the example Samza job.


HTH! Let me know if you need further help.

Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 9:21:53 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

Is there any other way we can install Hello Samza?

Regards,
~Sri

-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:42 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Ok. I tried again for the "latest" branch in hello-samza and it still works.


> While installing it says "Building samza from master..."

It is expected to build from "master" in apache/samza repo. So, the output line 
is expected.


It is weird that you are unable to connect. Is it possible you are behind a 
firewall or something? Can you try to ping "git.apache.org" ? Or try the setup 
on a different box?


Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 8:33:30 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

I tried again but still same error

While installing it says "Building samza from master..."

But when after I cloned I executed " git checkout latest"

Regards,
~Sri


-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:10 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node Initially I have 
installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper When I am trying bin/grid bootstrap getting connection 
timed out error It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap Bootstrapping the 
system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Ok. I tried again for the "latest" branch in hello-samza and it still works.


> While installing it says "Building samza from master..."

It is expected to build from "master" in apache/samza repo. So, the output line 
is expected.


It is weird that you are unable to connect. Is it possible you are behind a 
firewall or something? Can you try to ping "git.apache.org" ? Or try the setup 
on a different box?


Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 8:33:30 AM
To: dev@samza.apache.org
Subject: RE: Connection timed out error while installing "Hello Samza"

Navina,

I tried again but still same error

While installing it says "Building samza from master..."

But when after I cloned I executed " git checkout latest"

Regards,
~Sri


-Original Message-
From: Navina Ramesh [mailto:nram...@linkedin.com]
Sent: Tuesday, September 12, 2017 11:10 AM
To: dev@samza.apache.org
Subject: Re: Connection timed out error while installing "Hello Samza"

Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node Initially I have 
installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper When I am trying bin/grid bootstrap getting connection 
timed out error It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap Bootstrapping the 
system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: Connection timed out error while installing "Hello Samza"

2017-09-12 Thread Navina Ramesh
Hi Anantharaman,

It looks like a transient connection failure to connect to Apache's git. I 
tried on my host and it seems to be working.

Can you give it another shot?


If it still doesn't work, please let me know if you are running the command 
under the "master" or "latest" branch of samza-hello-samza.


Thanks!

Navina


From: Anantharaman, Srinatha (Contractor) 
Sent: Tuesday, September 12, 2017 7:45:37 AM
To: dev@samza.apache.org
Subject: Connection timed out error while installing "Hello Samza"

Hi,

I am trying to install "Hello Samza" on a single node
Initially I have installed Kafka, Yarn and Zookeeper using  bin/grid install 
kafka/yarn/zookeeper
When I am trying bin/grid bootstrap getting connection timed out error
It also mentions no kafka, yarn and zookeeper installed

Please find below the error message

[root@codehdplak-po-r19p bin]# cd ..
[root@codehdplak-po-r19p hello-samza]#  bin/grid install kafka
EXECUTING: install kafka
Using previously downloaded file /root/.samza/download/kafka_2.11-0.10.1.1.tgz
[root@codehdplak-po-r19p hello-samza]# bin/grid install yarn
EXECUTING: install yarn
Using previously downloaded file /root/.samza/download/hadoop-2.6.1.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid install zookeeper
EXECUTING: install zookeeper
Using previously downloaded file /root/.samza/download/zookeeper-3.4.3.tar.gz
[root@codehdplak-po-r19p hello-samza]# bin/grid bootstrap
Bootstrapping the system...
EXECUTING: stop kafka
No kafka server to stop
EXECUTING: stop yarn
no resourcemanager to stop
no nodemanager to stop
EXECUTING: stop zookeeper
JMX enabled by default
Using config: 
/app/home/eventsvc/samza-git/hello-samza/deploy/zookeeper/bin/../conf/zoo.cfg
Stopping zookeeper ... no zookeeper to stop (could not find file 
/tmp/zookeeper/zookeeper_server.pid)
EXECUTING: install samza
Building samza from master...
~/.samza/download /app/home/eventsvc/samza-git/hello-samza
Cloning into 'samza'...
fatal: unable to connect to git.apache.org:
git.apache.org[0: 54.84.58.65]: errno=Connection timed out


Could you please help me to resolve this issue?

Regards,
~Sri


Re: [VOTE] SEP-8: Add in-memory system consumer & producer

2017-09-06 Thread Navina Ramesh
Hi Bharath,


Really good design!


  1.  Based on your SEP, you have listed 3 implementation approaches. Do you 
know which one we are choosing? I suspect it is Approach C. Can you please 
confirm and update the SEP?
  2.  Perhaps rename "Test Plan" to "Proposed Usage" or "Usage Example"

Overall, +1 on this. We need this asap!! 😊

Thanks!

Navina


From: xinyu liu 
Sent: Wednesday, September 6, 2017 2:06:45 PM
To: dev@samza.apache.org
Subject: Re: [VOTE] SEP-8: Add in-memory system consumer & producer

+1 on the overall design. This will make testing a lot easier!

Thanks,
Xinyu

On Wed, Sep 6, 2017 at 10:45 AM, Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> Can you please vote for SEP-8?
> You can find the design document here
> <https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=71013043
> >.
>
> Thanks,
> Bharath
>


[VOTE] Apache Samza 0.13.1 RC0

2017-08-22 Thread Navina R (Apache)
Ran check-all on my Mac and it passes.

+1 (binding)

Thanks!
Navina

On Tue, Aug 22, 2017 at 4:20 PM, Fred Haifeng Ji 
wrote:

> +cc nav...@apache.org
>
>
> On Tue, Aug 22, 2017 at 11:19 AM, Fred Haifeng Ji 
> wrote:
>
>> Thanks to those who already tested the RC and voted.
>>
>> Due to the weekend and the eclipse day, we are extending the vote till
>> 1pm Wednesday 8/23.
>>
>> Thanks,
>>
>> Fred
>>
>> On Tue, Aug 22, 2017 at 9:59 AM, Jagadish Venkatraman <
>> jagadish1...@gmail.com> wrote:
>>
>>> Ran check-all.sh, and it succeeded!
>>>
>>> +1 (non binding)
>>>
>>> On Mon, Aug 21, 2017 at 4:34 PM, xinyu liu 
>>> wrote:
>>>
>>> > Built the src, and ran the tests using check-all.sh. Most of the tests
>>> ran
>>> > fine. There was an transient test failure (
>>> > https://issues.apache.org/jira/browse/SAMZA-1405), which seems to be
>>> > caused
>>> > by the testing env (further investigation needed). I reran the tests
>>> again
>>> > and it passed. Since this test doesn't affect the build itself, I am +1
>>> > (non-binding).
>>> >
>>> > Thanks,
>>> > Xinyu
>>> >
>>> > On Mon, Aug 21, 2017 at 2:24 PM, Yi Pan  wrote:
>>> >
>>> > > Downloaded the source, compiled and ran the integration tests. All
>>> > passed.
>>> > >
>>> > > +1 (binding) w/ the following minor comments:
>>> > > # Please make a note in the release note that this version requires
>>> JDK
>>> > > 1.8.0.112+ (I have test w/ JDK 1.8.0.121)
>>> > > # Please make sure that we publish artifacts compiled w/ Scala 2.10,
>>> > Scala
>>> > > 2.11, and Scala 2.12
>>> > >
>>> > > -Yi
>>> > >
>>> > > On Fri, Aug 18, 2017 at 11:59 AM, Fred Haifeng Ji <
>>> haifeng...@gmail.com>
>>> > > wrote:
>>> > >
>>> > > > This is a call for a vote on a release of Apache Samza 0.13.1.
>>> Thanks
>>> > to
>>> > > > everyone who has contributed to this release.
>>> > > >
>>> > > > The release candidate can be downloaded from here:
>>> > > > http://home.apache.org/~navina/samza-0.13.1-rc0/
>>> > > >
>>> > > >
>>> > > > The release candidate is signed with pgp key A211312E, which can be
>>> > found
>>> > > > on keyservers:
>>> > > > http://pgp.mit.edu/pks/lookup?op=get&search=0xEDFD8F9AA211312E
>>> > > >
>>> > > >
>>> > > > The git tag is release-0.13.1-rc0 and signed with the same pgp key:
>>> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
>>> > > > refs/tags/release-0.13.1-rc0
>>> > > >
>>> > > > Test binaries have been published to Maven's staging repository,
>>> and
>>> > are
>>> > > > available here:
>>> > > > *https://repository.apache.org/content/repositories/
>>> > orgapachesamza-1030/
>>> > > > <https://repository.apache.org/content/repositories/
>>> > orgapachesamza-1030/
>>> > > >*
>>> > > >
>>> > > >
>>> > > > 29 issues were resolved for this release: https://issues.apache
>>> > > > .org/jira/issues/?jql=project%20%3D%2012314526%20AND%20fixVe
>>> > > > rsion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20k
>>> ey%20ASC
>>> > > >
>>> > > >
>>> > > > The vote will be open for 72 hours (ending at 1:00PM Monday,
>>> > 08/21/2017).
>>> > > >
>>> > > > Please download the release candidate, check the hashes/signature,
>>> > build
>>> > > it
>>> > > > and test it, and then please vote:
>>> > > >
>>> > > >
>>> > > > [ ] +1 approve
>>> > > >
>>> > > > [ ] +0 no opinion
>>> > > >
>>> > > > [ ] -1 disapprove (and reason why)
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Fred Ji
>>> > > >
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> Jagadish V,
>>> Graduate Student,
>>> Department of Computer Science,
>>> Stanford University
>>>
>>
>>
>>
>> --
>> Haifeng (Fred)  Ji
>>
>
>
>
> --
> Haifeng (Fred)  Ji
>


Re: [VOTE] Apache Samza 0.13.1 RC0

2017-08-22 Thread Navina Ramesh
Ran check-all on Mac. Build looks good.


+1 (binding)


Thanks!
Navina


From: Fred Haifeng Ji 
Sent: Tuesday, August 22, 2017 11:19 AM
To: dev@samza.apache.org
Subject: Re: [VOTE] Apache Samza 0.13.1 RC0

[You don't often get email from haifeng...@gmail.com. Learn why this is 
important at http://aka.ms/LearnAboutSenderIdentification.]

Thanks to those who already tested the RC and voted.

Due to the weekend and the eclipse day, we are extending the vote till 1pm
Wednesday 8/23.

Thanks,

Fred

On Tue, Aug 22, 2017 at 9:59 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Ran check-all.sh, and it succeeded!
>
> +1 (non binding)
>
> On Mon, Aug 21, 2017 at 4:34 PM, xinyu liu  wrote:
>
> > Built the src, and ran the tests using check-all.sh. Most of the tests
> ran
> > fine. There was an transient test failure (
> > https://issues.apache.org/jira/browse/SAMZA-1405), which seems to be
> > caused
> > by the testing env (further investigation needed). I reran the tests
> again
> > and it passed. Since this test doesn't affect the build itself, I am +1
> > (non-binding).
> >
> > Thanks,
> > Xinyu
> >
> > On Mon, Aug 21, 2017 at 2:24 PM, Yi Pan  wrote:
> >
> > > Downloaded the source, compiled and ran the integration tests. All
> > passed.
> > >
> > > +1 (binding) w/ the following minor comments:
> > > # Please make a note in the release note that this version requires JDK
> > > 1.8.0.112+ (I have test w/ JDK 1.8.0.121)
> > > # Please make sure that we publish artifacts compiled w/ Scala 2.10,
> > Scala
> > > 2.11, and Scala 2.12
> > >
> > > -Yi
> > >
> > > On Fri, Aug 18, 2017 at 11:59 AM, Fred Haifeng Ji <
> haifeng...@gmail.com>
> > > wrote:
> > >
> > > > This is a call for a vote on a release of Apache Samza 0.13.1. Thanks
> > to
> > > > everyone who has contributed to this release.
> > > >
> > > > The release candidate can be downloaded from here:
> > > > http://home.apache.org/~navina/samza-0.13.1-rc0/
> > > >
> > > >
> > > > The release candidate is signed with pgp key A211312E, which can be
> > found
> > > > on keyservers:
> > > > http://pgp.mit.edu/pks/lookup?op=get&search=0xEDFD8F9AA211312E
> > > >
> > > >
> > > > The git tag is release-0.13.1-rc0 and signed with the same pgp key:
> > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > refs/tags/release-0.13.1-rc0
> > > >
> > > > Test binaries have been published to Maven's staging repository, and
> > are
> > > > available here:
> > > > *https://repository.apache.org/content/repositories/
> > orgapachesamza-1030/
> > > > <https://repository.apache.org/content/repositories/
> > orgapachesamza-1030/
> > > >*
> > > >
> > > >
> > > > 29 issues were resolved for this release: https://issues.apache
> > > > .org/jira/issues/?jql=project%20%3D%2012314526%20AND%20fixVe
> > > > rsion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> > > >
> > > >
> > > > The vote will be open for 72 hours (ending at 1:00PM Monday,
> > 08/21/2017).
> > > >
> > > > Please download the release candidate, check the hashes/signature,
> > build
> > > it
> > > > and test it, and then please vote:
> > > >
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > >
> > > > --
> > > > Fred Ji
> > > >
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



--
Haifeng (Fred)  Ji


[GitHub] samza pull request #274: SAMZA-1396 TestZkLocalApplicationRunner tests fails...

2017-08-16 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/274

SAMZA-1396 TestZkLocalApplicationRunner tests fails after SAMZA-1385

* Fixes ZkPath issues
* Fixes appname / jobname mismatch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1396

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/274.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #274


commit 877ab873c4f1943e200a8f7c4e127f02d2a66515
Author: Navina Ramesh 
Date:   2017-08-16T20:32:15Z

SAMZA-1396 TestZkLocalApplication tests fails after SAMZA-1385

commit 16f2c9686cbaa48fa4ae7f73c6a79539d301a7db
Author: Navina Ramesh 
Date:   2017-08-16T20:36:26Z

Removing debug statement




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [Discuss] Samza 0.13.1 release

2017-08-12 Thread Navina Ramesh
Fred,
Thanks for starting the release process. I am unable to open the link you have 
provided, though. It opens the JIRA SAMZA-1165, instead of the entire list of 
0.13.1 bug fixes. Can you please re-check?

Navina

On 8/11/17, 6:13 PM, "ignacio.so...@gmail.com on behalf of Ignacio Solis" 
 wrote:

+1

On Fri, Aug 11, 2017 at 3:52 PM, Jacob Maes  wrote:
> Looks good!
>
> +1
>
> On Thu, Aug 10, 2017 at 6:53 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> +1 for the release. thanks for the summary and for driving this Fred!
>>
>> On Thu, Aug 10, 2017 at 5:15 PM Fred Haifeng Ji 
>> wrote:
>>
>> > The format was messed up when sent from my yahoo mail to
>> > dev@samza.apache.org. I am resending it from my gmail account. Sorry 
for
>> > inconvenience!
>> >
>> > Hi all,
>> >
>> > There have been some new features and critical bug fixes added to 
master
>> > since 0.13.0 release, which makes Samza Standalone features more 
stable.
>> It
>> > is now good enough to warrant *a new minor release*. We will continue 
to
>> > test for stability and performance in the next few weeks.
>> >
>> > Here are the main JIRA tickets that will be included in this release 
(but
>> > not limited to):
>> > SAMZA-1165: Cleanup data created by ZkStandalone in ZK;
>> > SAMZA-1324: Add a metricsreporter lifecycle for JobCoordinator 
component
>> of
>> > StreamProcessor;
>> > SAMZA-1336: Standalone session expiration propagation;
>> > SAMZA-1337: LocalApplicationRunner needs to support StreamTask;
>> > SAMZA-1339: Add standalone integration tests;
>> > …
>> >
>> > There are also quite a few bug fixes in 0.13.1, *please check the
>> complete
>> > list of changes in 0.13.1 here
>> > <
>> > https://issues.apache.org/jira/browse/SAMZA-1165?jql=
>> 
project%20%3D%2012314526%20AND%20fixVersion%20%3D%2012340845%20ORDER%20BY%
>> 20priority%20DESC%2C%20key%20ASC
>> > >*
>> > .
>> >
>> > Most JIRAs in the list have been completed and merged, with the 
following
>> > one remaining, but we should try to get it completed before 0.13.1 is
>> > released.
>> > SAMZA-1385: Coordination utils in LocalApplicationRunner uses same Zk
>> node
>> > as ZkJobCoordinatorFactory for leader election
>> >
>> > Here's what I propose:
>> > 1. Cut an 0.13.1 release branch.
>> > 2. Work on getting the remaining open JIRA done.
>> > 3. Target a release vote by Aug 18.
>> >
>> > Thoughts?
>> >
>> > Fred
>> >
>> --
>> Sent from my iphone.
>>



-- 
Nacho - Ignacio Solis - iso...@igso.net






Re: Kafka client.id collision

2017-07-20 Thread Navina Ramesh (Apache)
Hi David,

I think this is expected to occur as a warning since we spin up all kafka
clients with the same client-id, which is $job.name + $job.id.

As Jagadish mentioned, it will be great if you can provide us the entire
log so that we can take a look.

As a side note for the samza contributors, I do believe the container spins
up kafka clients for each kafka systems defined, even if it is not used.
Iirc, we use `KafkaUtil.getClientId` for generating the client id. Perhaps
it makes sense to append another identifier with the client id (such as
system name or component name). That way, we won't lose the kafka-client
related metrics and there will be no overlap between the client ids.
Thoughts?

Thanks!
Navina

On Thu, Jul 20, 2017 at 9:13 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Can you share the entire log file if that's okay? The warning should be a
> red-herring IMHO.
>
> On Thu, Jul 20, 2017 at 7:50 AM Davide Simoncelli 
> wrote:
>
> > Hi,
> >
> > Thanks for the reply.
> >
> > It is a warning, but the application fails. Here is the logging:
> >
> >
> > 017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka version :
> 0.10.1.1
> > 2017-07-20 10:43:06.349 [main] AppInfoParser [INFO] Kafka commitId :
> > f10ef2720b03b247
> > 2017-07-20 10:43:06.351 [main] AppInfoParser [WARN] Error registering
> > AppInfo mbean
> > javax.management.InstanceAlreadyExistsException:
> > kafka.producer:type=app-info,id=samza_producer-wikipedia_feed-1
> > at com.sun.jmx.mbeanserver.Repository.addMBean(
> Repository.java:437)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.
> registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(
> DefaultMBeanServerInterceptor.java:900)
> > at
> > com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(
> DefaultMBeanServerInterceptor.java:324)
> > at
> > com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(
> JmxMBeanServer.java:522)
> > at
> > org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(
> AppInfoParser.java:58)
> > at
> > org.apache.kafka.clients.producer.KafkaProducer.(
> KafkaProducer.java:331)
> > at
> > org.apache.kafka.clients.producer.KafkaProducer.(
> KafkaProducer.java:163)
> > at
> > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> apply(KafkaSystemFactory.scala:89)
> > at
> > org.apache.samza.system.kafka.KafkaSystemFactory$$anonfun$3.
> apply(KafkaSystemFactory.scala:89)
> > at
> > org.apache.samza.system.kafka.KafkaSystemProducer.send(
> KafkaSystemProducer.scala:144)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamSystemProduce
> r.send(CoordinatorStreamSystemProducer.java:113)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.
> sendSetConfigMessage(CoordinatorStreamWriter.java:98)
> > at
> > org.apache.samza.coordinator.stream.CoordinatorStreamWriter.sendMessage(
> CoordinatorStreamWriter.java:82)
> > at
> > org.apache.samza.job.yarn.SamzaYarnAppMasterService.onInit(
> SamzaYarnAppMasterService.scala:68)
> > at
> > org.apache.samza.job.yarn.YarnClusterResourceManager.start(
> YarnClusterResourceManager.java:180)
> > at
> > org.apache.samza.clustermanager.ContainerProcessManager.start(
> ContainerProcessManager.java:167)
> > at
> > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.run(
> ClusterBasedJobCoordinator.java:154)
> > at
> > org.apache.samza.clustermanager.ClusterBasedJobCoordinator.main(
> ClusterBasedJobCoordinator.java:222)
> > 2017-07-20 10:43:06.549 [main] CoordinatorStreamWriter [INFO] Stopping
> the
> > coordinator stream producer.
> > 2017-07-20 10:43:06.549 [main] CoordinatorStreamSystemProducer [INFO]
> > Stopping coordinator stream producer.
> > 2017-07-20 10:43:06.549 [main] KafkaProducer [INFO] Closing the Kafka
> > producer with timeoutMillis = 9223372036854775807 ms.
> >
> >
> > > On 20 Jul 2017, at 3:16 pm, Jagadish Venkatraman <
> jagadish1...@gmail.com>
> > wrote:
> > >
> > > Hi Davide,
> > >
> > > Is this logged as an error or as a warning?
> > >
> > > IIUC, this warning should not fail the job. It may not cause som

Re: Samza Meetup

2017-07-20 Thread Navina Ramesh (Apache)
No worries. We would love to meet you in person too. Keep an eye out on the
mailing list for the Meetup link.

Cheers!
Navina

On Jul 20, 2017 08:37, "Renato Marroquín Mogrovejo" <
renatoj.marroq...@gmail.com> wrote:

> Thanks Jagadish and Navina!
> I am really interested in attending as I am in the area, it'd be my first
> in-person Samza meetup :D
> But unfortunately I don't have anything to present this time :(
>
>
> Renato M.
>
> 2017-07-18 23:46 GMT-07:00 Navina Ramesh (Apache) :
>
> > Hi Renato,
> >
> > We are planning for mid-August as a tentative target for the next meetup.
> >
> > If you are interested in participating or speaking at the meetup, please
> > let us know.
> >
> > Thanks!
> > Navina
> >
> >
> >
> > On Tue, Jul 18, 2017 at 10:36 AM, Renato Marroquín Mogrovejo <
> > renatoj.marroq...@gmail.com> wrote:
> >
> > > Hi Samza experts and users,
> > >
> > > I was wondering if there is going to be a meetup this summer or when
> the
> > > next one is.
> > > Thanks!
> > >
> > >
> > > Best,
> > >
> > > Renato M.
> > >
> >
>


Re: Samza Meetup

2017-07-18 Thread Navina Ramesh (Apache)
Hi Renato,

We are planning for mid-August as a tentative target for the next meetup.

If you are interested in participating or speaking at the meetup, please
let us know.

Thanks!
Navina



On Tue, Jul 18, 2017 at 10:36 AM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Samza experts and users,
>
> I was wondering if there is going to be a meetup this summer or when the
> next one is.
> Thanks!
>
>
> Best,
>
> Renato M.
>


Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh (Apache)
After a lot of Q&A, let's get this done :)

+1 (binding)

Thanks!
Navina

On Tue, Jun 20, 2017 at 10:31 AM, xinyu liu  wrote:

> +1 (non-binding) on this design.
>
> To me the task-count based groupers should work well in practice for
> partition expansion of system using hash for partitions, e.g. Kafka. It
> will not cause any state transfer between hosts so the runtime cost will be
> minimal. In the future when we support dynamically re-balancing the tasks,
> we can further scale the task count if needed.
>
> Thanks,
> Xinyu
>
> On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin  wrote:
>
> > Hi everyone,
> >
> > Can you please vote for SEP-5? The wiki can be found at
> > *https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams
> > <https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams>.*
> >
> > Thanks,
> > Dong
> >
>


Re: [VOTE] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh
After a lot of Q&A, let's get this done :)

+1 (binding)

Thanks!
Navina

On Tue, Jun 20, 2017 at 10:31 AM, xinyu liu  wrote:

> +1 (non-binding) on this design.
>
> To me the task-count based groupers should work well in practice for
> partition expansion of system using hash for partitions, e.g. Kafka. It
> will not cause any state transfer between hosts so the runtime cost will be
> minimal. In the future when we support dynamically re-balancing the tasks,
> we can further scale the task count if needed.
>
> Thanks,
> Xinyu
>
> On Mon, Jun 19, 2017 at 9:27 AM, Dong Lin  wrote:
>
> > Hi everyone,
> >
> > Can you please vote for SEP-5? The wiki can be found at
> > *https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams
> > <https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 5%3A+Enable+partition+expansion+of+input+streams>.*
> >
> > Thanks,
> > Dong
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-23 Thread Navina Ramesh
Yi,
Thanks for summarizing. I think we should deal with further code related
changes/discussions in the PR directly since this SEP has been open for a
while. Let's try to wrap up the discussions by today.

@Dong: Thanks for updating the SEP. I think the TestPlan section is TBD
right now. You can update it whenever you get to it. Thanks a bunch for
your patience!

Cheers!
Navina

On Thu, Jun 22, 2017 at 3:36 PM, Dong Lin  wrote:

> Hey Yi,
>
> Thanks for the detailed comment and the summary!
>
> To address your comments:
>
> 1) The current names are GroupByPartitionWithFixedTaskNum and
> GroupBySystemStreamPartitionWithFixedTaskNum. Instead of
> FixedTasksGroupByPartition
> and FixedTasksGroupBySystemStreamPartition, how about GroupByPartition
> FixedTasks and GroupBySystemStreamPartitionFixedTasks? The new names are
> equally long as the names you suggested. It seems a bit more intuitive
> because they would be prefixed with the grouper class name of their
> no-fixed-tasks counterpart. I have updated wiki with the new names. Can you
> let me know if it is OK?
>
> 2) Initially I want to design that config and interface later when we have
> more use-case so that we can have higher confidence in the interface
> design. But it seems that one common concern with the proposal is about its
> limitation assumption in the the old-partition-to-new-partition mapping. I
> have updated the wiki to illustrate the design of this interface and the
> new (and more general) assumption for the input system to use this
> partition expansion. Can you take a look and see if it is reasonable?
>
> 3) Yeah previously Jacob has raised the same concern and the solution is
> exactly the same as you suggested.
>
> Hey everyone,
>
> I have made non-trivial change to the wiki to illustrate the use of new
> config and interface for user to specify new-partition-to-old-partition
> mapping. Can you please help review it?
>
> Thanks,
> Dong
>
>
> On Thu, Jun 22, 2017 at 2:25 AM, Yi Pan  wrote:
>
> > Hi, Dong and everyone,
> >
> > Thanks for the detailed discussion on SEP-5! Really appreciate the
> thorough
> > consideration on this issue. I also noticed that Dong has updated the
> SEP-5
> > wiki to clarify:
> > 1) SEP-5 provides a solution to retain the same number of task/state w/o
> > re-partitioning (as illustrated in the stateful join example)
> > 2) Future work to expand number of tasks need to work together with
> > flexible re-partitioning to provide a complete solution
> >
> > Due to the cost to be paid in task number expansion:
> > 1) additional network I/O and latency in re-partitioning
> > 2) shuffling of the states among tasks
> > The current form of SEP-5 provides an alternative when partition
> expansion
> > in the messaging system is not due to increase of total input rate.
> >
> > The concern on the added complexity in grouper logic is valid. However,
> the
> > grouper-based solution is not completely unreasonable:
> > 1) Grouper is a public interface and we are already open to customized
> > implementation of groupers, although not being a main use case
> > 2) Deprecation of existing config-driven grouper needs longer time effort
> > to wait for fluent API has a better planner to automatically figuring out
> > the grouper to be used and stateful task expansion is automated. Hence,
> for
> > a foreseeable long time, grouper is still configured by the user.
> >
> > So, in general, I am in favor of the proposed SEP-5, given that it
> provides
> > a least-resistance to address some pain points for Samza users, w/o
> > breaking any existing use cases in opt-in mode.
> >
> > Some minor suggestions:
> > 1) The class names are too long. Can we change them to
> > FixedTasksGroupByPartition and FixedTasksGroupBySystemStreamPartition?
> > 2) I am still in favor of configurable partition expansion (i.e.
> new<->old
> > partition mapping) policy, since it makes this solution more general and
> > not fixed for Kafka. I am OK with default to power-of-2 expansion policy
> > and not introducing new config variable now.
> > 3) In the checkpoint/coordinator topic validation, KafkaCheckpointLogKey
> > class validates the current grouper factory class == the previous grouper
> > factory class in previous checkpoint. We need to make sure that we allow
> > the compatible change from GroupByPartition to
> FixedTasksGroupByPartition,
> > etc. Since FixedTasksGroupByPartition is a derived interface of
> > GroupByPartition, one possible solution is to check assignable (if
> current
> > grouper factory class is assignable to the previous gro

[GitHub] samza pull request #230: SAMZA-1340 - StreamProcessor does not propagate con...

2017-06-21 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/230

SAMZA-1340 - StreamProcessor does not propagate container failures from 
StreamTask

Storing the exception seen from the container in the 
`SamzaContainerListener#onFailure(Throwable)` in the StreamProcessor.
`JobCoordinator#stop` callback inspects this stored exception and invokes 
the correct callback for StreamProcessorLifecycleListener.
It is pretty difficult to add all test cases. Suggestion welcome for 
improving code/testing.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza LISAMZA-5272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/230.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #230


commit 839473cf0dea43ba565154c5377b138655db8a5d
Author: Navina Ramesh 
Date:   2017-06-19T22:49:00Z

Adding a streamprocessor test to verify containerexception is getting 
persisted in processor

commit 5475f6a301a37d9e2ede1d2a8e1e319cdc812fcc
Author: Navina Ramesh 
Date:   2017-06-20T01:16:27Z

Removing unused listener in StreamProcessorTestUtils

commit 213bb2e8afff6f6421cff0697b9b5700ba5c70a1
Author: Navina Ramesh 
Date:   2017-06-22T02:28:13Z

Fixing checkstyle




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
> But IMO it is the best available solution towards the support of
partition expansion in comparison to alternative, no?

At this time, relative to the other alternatives you have listed, this is a
path of least effort to solving this problem. I agree to that. :)

> I can merge those two sections or update the statement if the current 
> statement
has not clearly explained the reason of partition expansion in Kafka.

Given the significance of what you are actually trying to solve, I think it
will be better to have it in points. Let me come find you and we can update
it.

> I have updated wiki and added the task expansion to the Future Work section.
On the other hand I still keep it in the Rejected Alternative section to
explain why this future work does not replace the existing proposal in
SEP-5. Does this sound reasonable?

It is very confusing to me how the same point can be under "Future Work"
and "Rejected Alternative". There is no question about the future work
*replacing* SEP-5. Iiuc, this SEP is a subset for the partition expansion
solution. So, I don't think increasing task count should be a rejected
alternative.

> I am also not sure why a feature needs to be "utmost priority" in order
to be accepted. Can you explain a bit on that?

I don't think I ever claimed that the feature needs to be of "utmost
priority" to be accepted. I was just stating my opinion.


Thanks!
Navina

On Wed, Jun 21, 2017 at 3:52 PM, Dong Lin  wrote:

> Thanks much for the reply Navina. Please see my reply inline.
>
> On Wed, Jun 21, 2017 at 2:57 PM, Navina Ramesh (Apache)  >
> wrote:
>
> > Thanks to Jake, Dong and Kartik for keeping the discussion going.
> >
> > > Here are the pros and cons of the extra re-partitioning stage in
> > comparison
> > to SEP-5.
> >
> > I think that is good summarization of pros/cons for the repartitioning
> > stage based solution. Can you please include it in your SEP? It seems
> like
> > you already have access. If you are still unable to access the wiki page,
> > feel free to walk over to Samza area and find me!
> >
>
> Sure. I have added this summary to the Alternative Section.
>
>
> >
> > > I think there is always a way for user to mess up their job if they
> > configure the Samza job incorrectly.
> >
> > I don't think Jake or anyone is arguing about an "incorrectly" configured
> > Samza job. The question was towards how easy/difficult it is for users to
> > *not mess* up their job with incorrect configurations.
> >
> > > I also think the assumption made in this SEP is not particularly harder
> > to understand than other existing configs in Samza.
> >
> > I disagree here. Other configs don't require you understand more than one
> > assumption.
> >
> > There is already an overload of configs in Samza and I think we are
> trying
> > to shield it as much as possible from the users (esp. with fluent api).
> > More specifically, we don't want the user to know about the internals of
> > Samza such ssp grouper, taskname grouper etc. Since the proposed solution
> > makes the configuration more complex to understand, it *is a* burden on
> the
> > user.
> >
> > Just because configs are the way it is, it doesn't mean we increase the
> > complexity of it and push the burden on users to manage it correctly. My
> > two cents.
> >
>
> Sure, I agree the proposal requires user to understand the assumption in
> order to expand the partition of the topic. But it is very subjective as to
> whether the added complexity is acceptable or not. If there is better way
> to allow user to expand partition of the input stream without making
> assumption, then we can just do that. The current solution is not perfect.
> But IMO it is the best available solution towards the support of partition
> expansion in comparison to alternative, no?
>
>
> > Here are a few things that I believe are needed for wrapping up the SEP:
> >
> > 1. For the longest time, I thought partition expansion happens in Kafka
> > only when the volume of messages across partitions is too high. Based on
> > this assumption, I would only assume that re-mapping expanded partitions
> to
> > the same task will have adverse effect on the throughput/resource
> > utilization of the processor/container in Samza (for example, disk
> > utilization may increase significantly. With disk quota throttling, it
> > could cause the processor to drop.). However, after speaking with Xinyu,
> it
> > turns out that partition expansion also happens when there is a
> > per-partition data retention limit imposed by Kafka (

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-21 Thread Navina Ramesh (Apache)
Thanks to Jake, Dong and Kartik for keeping the discussion going.

> Here are the pros and cons of the extra re-partitioning stage in
comparison
to SEP-5.

I think that is good summarization of pros/cons for the repartitioning
stage based solution. Can you please include it in your SEP? It seems like
you already have access. If you are still unable to access the wiki page,
feel free to walk over to Samza area and find me!

> I think there is always a way for user to mess up their job if they
configure the Samza job incorrectly.

I don't think Jake or anyone is arguing about an "incorrectly" configured
Samza job. The question was towards how easy/difficult it is for users to
*not mess* up their job with incorrect configurations.

> I also think the assumption made in this SEP is not particularly harder
to understand than other existing configs in Samza.

I disagree here. Other configs don't require you understand more than one
assumption.

There is already an overload of configs in Samza and I think we are trying
to shield it as much as possible from the users (esp. with fluent api).
More specifically, we don't want the user to know about the internals of
Samza such ssp grouper, taskname grouper etc. Since the proposed solution
makes the configuration more complex to understand, it *is a* burden on the
user.

Just because configs are the way it is, it doesn't mean we increase the
complexity of it and push the burden on users to manage it correctly. My
two cents.

Here are a few things that I believe are needed for wrapping up the SEP:

1. For the longest time, I thought partition expansion happens in Kafka
only when the volume of messages across partitions is too high. Based on
this assumption, I would only assume that re-mapping expanded partitions to
the same task will have adverse effect on the throughput/resource
utilization of the processor/container in Samza (for example, disk
utilization may increase significantly. With disk quota throttling, it
could cause the processor to drop.). However, after speaking with Xinyu, it
turns out that partition expansion also happens when there is a
per-partition data retention limit imposed by Kafka (not sure if it is only
in LinkedIn or in Kafka open-source as well). Imo, this is the primary
use-case that we are trying to solve for in Samza and it is not very
obvious from the SEP.
@Dong, can you please explain *the circumstances under which partition
expansion can happen*, under "Motivation" section?  I disagree to the
current motivation described as -> "This design doc provides a solution to
increase partition number of the input streams of a stateful Samza job
while still ensuring the correctness of Samze job output. "
This is a solution, albeit not fully done through this SEP alone.

2. I think we are in consensus about the fact that increasing the task
number and handling the state correctly is a good solution for Samza in the
long-run. In your rejected alternatives, you mention "However, this feature
alone does not solve the problem of allowing partition expansion.". What
else is required to allow partition expansion? Can you please elaborate on
that in point #1 of the rejected alternatives? If there is still more work
to be done to support partition expansion in Samza, it is worthwhile to
mention it under *Future Work*, instead of under "Rejected Alternatives".
Perhaps you were waiting for edit permissions to the wiki. Please make this
change so it is well-tracked.

I am still not totally crazy about the proposed solution because it is not
clear for open-source, who or which use-cases stand to benefit. I am not
convinced that this problem is of utmost priority for the Samza community
*at this point of time*.

I am on the same page as Jake on this one. Not a +1, just a 0 (if that even
matters).

Thanks!
Navina

On Sun, Jun 18, 2017 at 12:04 AM, Dong Lin  wrote:

> BTW, I will update the SEP-5 wiki with our latest discussion after I have
> got the wiki edit access.
>
> On Sat, Jun 17, 2017 at 11:36 PM, Dong Lin  wrote:
>
> > Thanks everyone for the comment!
> >
> > I am currently leaning towards the current approach. I think Kartik
> raised
> > a good point that the extra repartitoning stage will also incur
> additional
> > throughput on Kafka in addition to the potential storage cost. Any other
> > Samza developers also chime in and provide your opinions on this
> proposal?
> >
> > Since this discussion thread has been open for three weeks, I will
> > initiate voting thread on Monday if there is no major revision
> suggestion.
> >
> > Thanks,
> > Dong
> >
> >
> > On Thu, Jun 15, 2017 at 6:32 PM, Kartik Paramasivam <
> > kparamasi...@linkedin.com.invalid> wrote:
> >
> >> Great discussion !
> >>
> >> Here are some more th

Re: [VOTE] Apache Samza 0.13.0 RC6

2017-06-08 Thread Navina Ramesh (Apache)
+1 (binding)

Thanks to everyone for diligently testing out the RCs and getting this
release out!

Cheers!
Navina

On Thu, Jun 8, 2017 at 9:09 AM, Chris Riccomini 
wrote:

> +1 (binding)
>
> On Wed, Jun 7, 2017 at 8:55 AM, Yi Pan  wrote:
>
> > +1 (binding)
> > build and ran all local integration tests on Linux.
> >
> > On Tue, Jun 6, 2017 at 4:01 PM, Boris S  wrote:
> >
> > > +1 (non-binding)
> > > build and tested on Linux (with python 2.7; 2.4 and 3.5 - didn't work)
> > >
> > > On Tue, Jun 6, 2017 at 2:49 PM, Jacob Maes 
> wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Built and tested on both OSX and RHEL with gradle 2.0 and 2.2
> > > respectively.
> > > >
> > > > Also verified the high level API + YARN host affinity on a test job
> > with
> > > 32
> > > > containers.
> > > >
> > > >
> > > >
> > > > On Tue, Jun 6, 2017 at 9:14 AM, xinyu liu 
> > wrote:
> > > >
> > > > > +1 (non-binding).
> > > > >
> > > > > Downloaded the source tar, built it and run check-all.sh on REHL6
> > with
> > > > > gradle 2.8. All passed.
> > > > >
> > > > > As a side note to Jagadish's comments, the build doesn't work on a
> > > higher
> > > > > gradle version either (gradle 3.5). Seems
> > > "-language:implicitConversions
> > > > > -language:reflectiveCalls" is not a valid build option anymore.
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Mon, Jun 5, 2017 at 10:06 AM, Jagadish Venkatraman <
> > > > > jagadish1...@gmail.com> wrote:
> > > > >
> > > > > > Checked out, ran tests, and all of them pass.
> > > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > I did get an error when running with gradle 2.4:
> > > > > > >>Could not resolve all dependencies for configuration
> > > > > > ':samza-kafka_2.11:compile'. > java.lang.
> > > UnsupportedOperationException
> > > > > (no
> > > > > > error message)
> > > > > >
> > > > > > However, when I used gradle 2.8, it was resolved.
> > > > > >
> > > > > > *gradle wrapper --gradle-version 2.8*
> > > > > >
> > > > > > Best,
> > > > > > Jagadish
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jun 5, 2017 at 8:37 AM, Jake Maes 
> > wrote:
> > > > > >
> > > > > > > This is a call for a vote on a release of Apache Samza 0.13.0.
> > > Thanks
> > > > > to
> > > > > > > everyone who has contributed to this release. We are very glad
> to
> > > see
> > > > > > some
> > > > > > > new contributors and features in this release.
> > > > > > >
> > > > > > > The release candidate can be downloaded from here:
> > > > > > > http://home.apache.org/~jmakes/samza-0.13.0-rc6/
> > > > > > >
> > > > > > > The release candidate is signed with pgp key 940AFC5A, which
> can
> > be
> > > > > found
> > > > > > > on keyservers:
> > > > > > > *http://pgp.mit.edu/pks/lookup?op=get&search=0x940AFC5A
> > > > > > > <http://pgp.mit.edu/pks/lookup?op=get&search=0x940AFC5A>*
> > > > > > >
> > > > > > > The git tag is release-0.13.0-rc6 and signed with the same pgp
> > key:
> > > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > > > refs/tags/release-0.13.0-rc6
> > > > > > >
> > > > > > > Test binaries have been published to Maven's staging
> repository,
> > > and
> > > > > are
> > > > > > > available here:
> > > > > > > https://repository.apache.org/content/repositories/
> > > > orgapachesamza-1026
> > > > > > >
> > > > > > > 144 issues were resolved for this release:
> > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%
> > > > > > > 20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > > > > > > 20AND%20status%20in%20(
> > > > > > > Resolved%2C%20Closed)
> > > > > > >
> > > > > > > The vote will be open for 72 hours (ending at 9:00AM Thursday,
> > > > > > 06/08/2017).
> > > > > > >
> > > > > > > Please download the release candidate, check the
> > hashes/signature,
> > > > > build
> > > > > > it
> > > > > > > and test it, and then please vote:
> > > > > > >
> > > > > > >
> > > > > > > [ ] +1 approve
> > > > > > >
> > > > > > > [ ] +0 no opinion
> > > > > > >
> > > > > > > [ ] -1 disapprove (and reason why)
> > > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > --
> > > > > > Jagadish V,
> > > > > > Graduate Student,
> > > > > > Department of Computer Science,
> > > > > > Stanford University
> > > > > >
> > > > >
> > > >
> > >
> >
>


Re: Wiki Spam

2017-06-07 Thread Navina Ramesh
> Given that admin privs are handed out to PMCs along with explicit
instructions not to change the permissions for the anonymous user, I'd
like to understand what went wrong in this case (with a view to ensuring
it doesn't happen again) before re-enabling admin permissions.

Agreed. Afaik, there are only 2 "active" PMCs in our project and I don't
believe either of us gave permissions for anonymous user.

> There were also a bunch of people who are neither PMC members nor
committers who had admin privs on your space. I'd very much prefer to
see admin privs limited to active PMC members and committers moving
forwards.

Yes. This was a mistake on our part as we should have been cautious on the
permissions we provide for contributors. Going forward, we want to correct
these permissions grants. We just want to make sure there is an avenue for
us to request permissions.

Thanks!

On Wed, Jun 7, 2017 at 12:47 PM, Mark Thomas  wrote:

> On 07/06/17 18:04, Jagadish Venkatraman wrote:
> > Hi Mark,
> >
> > Thanks for bringing this to our notice.
> >
> >>> This is because someone, going against ASF infrastructure policy,
> > altered the permissions for the anonymous user allowing them write
> > permissions
> >
> > Do we know when this occurred? I presume this was a lapse.
>
> It looks as if it was around the beginning of last month based on the
> dates of the pages I removed.
>
> >
> >>>  A samza-dev user has been created and configured to watch the
> > Samza wiki space for changes
> >
> > Sounds great! Does that mean that notifications for changes in the Samza
> > wiki space will now be sent to this mailing list?
>
> This wasn't working. It looks like those notifications will need to go
> to the commits list. I'll get that changed shortly and see if that fixes
> the problem.
>
> >>>  All users currently assigned permissions on the Samza wiki have had
> all
> > their permissions revoked except for viewing.
> >
> > We will re-assess all permissions, and set them up again.  I'm assuming
> > PMCs will still be able to do this?
>
> Not at the moment. PMC members currently have read access only.
>
> Given that admin privs are handed out to PMCs along with explicit
> instructions not to change the permissions for the anonymous user, I'd
> like to understand what went wrong in this case (with a view to ensuring
> it doesn't happen again) before re-enabling admin permissions.
>
> There were also a bunch of people who are neither PMC members nor
> committers who had admin privs on your space. I'd very much prefer to
> see admin privs limited to active PMC members and committers moving
> forwards.
>
> Mark
>
>
> >
> > Best,
> > Jagadish
> >
> > On Wed, Jun 7, 2017 at 6:13 AM, Mark Thomas  > <mailto:ma...@apache.org>> wrote:
> >
> > Dear Samza developer community,
> >
> > It has been brought to the infrastructure team's attention that your
> > wiki [1] is covered in spam. This is because someone, going against
> ASF
> > infrastructure policy, altered the permissions for the anonymous user
> > allowing them write permissions.
> >
> > During the investigation it was noticed that change notifications for
> > your wiki were not being sent to a public mailing list so that the
> > community could monitor all changes to the wiki.
> >
> > Therefore, the following actions have been taken:
> >
> > - All users currently assigned permissions on the Samza wiki have had
> > all their permissions revoked except for viewing.
> >
> > - A samza-dev user has been created and configured to watch the Samza
> > wiki space for changes
> >
> > Additionally, the spam pages will shortly be removed.
> >
> > Mark
> > on behalf of the ASF infrastructure team
> >
> > [1] https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza
> > <https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza>
> >
> >
>
>


-- 
Navina R.


Re: Wiki Spam

2017-06-07 Thread Navina Ramesh
Hi Mark,
Thanks for letting us know.

We will re-asses our permissions and set them up. Should we reach out to
Gavin to set them up? It will be great to have one or more of the PMCs have
access to assign permission to reduce the turn-over time. Please let us
know the procedure.

Thanks!
Navina

On Wed, Jun 7, 2017 at 10:04 AM, Jagadish Venkatraman 
wrote:

> Hi Mark,
>
> Thanks for bringing this to our notice.
>
> >> This is because someone, going against ASF infrastructure policy,
> altered the permissions for the anonymous user allowing them write
> permissions
>
> Do we know when this occurred? I presume this was a lapse.
>
> >>  A samza-dev user has been created and configured to watch the Samza
> wiki
> space for changes
>
> Sounds great! Does that mean that notifications for changes in the Samza
> wiki space will now be sent to this mailing list?
>
> >>  All users currently assigned permissions on the Samza wiki have had all
> their permissions revoked except for viewing.
>
> We will re-assess all permissions, and set them up again.  I'm assuming
> PMCs will still be able to do this?
>
> Best,
> Jagadish
>
> On Wed, Jun 7, 2017 at 6:13 AM, Mark Thomas  wrote:
>
> > Dear Samza developer community,
> >
> > It has been brought to the infrastructure team's attention that your
> > wiki [1] is covered in spam. This is because someone, going against ASF
> > infrastructure policy, altered the permissions for the anonymous user
> > allowing them write permissions.
> >
> > During the investigation it was noticed that change notifications for
> > your wiki were not being sent to a public mailing list so that the
> > community could monitor all changes to the wiki.
> >
> > Therefore, the following actions have been taken:
> >
> > - All users currently assigned permissions on the Samza wiki have had
> > all their permissions revoked except for viewing.
> >
> > - A samza-dev user has been created and configured to watch the Samza
> > wiki space for changes
> >
> > Additionally, the spam pages will shortly be removed.
> >
> > Mark
> > on behalf of the ASF infrastructure team
> >
> > [1] https://cwiki.apache.org/confluence/display/SAMZA/Apache+Samza
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
Dong,

Thanks for your prompt responses.

>  And usually the underlying system allows user to select arbitrary
partition
number if it supports partition expansion. Do you know any system that does not
meet these two requirement?

I am not aware of a system that won't meet the modulo requirement. I was
mostly questioning the requirement around *Stream Management* - which
expects the expansion of partitions to always happen by doubling the
partition count. That is different from saying "underlying system allows
user to select arbitrary partition number if it support partition
expansion". Please correct me if I have misunderstood what you meant in
that requirement :)

> Regarding your comment of the Motivation section

Thanks for updating it.

> , Kafka consumer can potentially fetch more data in one FetchResponse
with more partitions in the FetchRequest. This is because we limit the
maximum amount of data that can be fetch for a given partition in the
FetchResponse.

That makes sense. I didn't know that you had this reasoning in mind. Thanks
for explaining.

>To answer your question how partition expansion in Kafka impacts the
clients, Kafka consumer is able to automatically detect new partition of
the topic and reassign all (both old and new) partitions across consumers
in the consumer group IF you tell consumer the topic to be subscribed. But
consumer in Samza's container uses another way of subscription.

Got it.

Thanks!
Navina



On Wed, May 31, 2017 at 4:29 PM, Yi Pan  wrote:

> Hi, Don,
>
> Thanks for the detailed design doc for a long-waited feature in Samza!
> Really appreciate it! I did a quick pass and have the following comments:
>
> - minor: "limit the maximum size of partition" ==> "limit the maximum size
> of each partition"
> - "However, Samza currently is not able to handle partition expansion of
> the input streams"==>better point out "for stateful jobs". For stateless
> jobs, simply bouncing the job now can pick up the new partitions.
> - "it is possible (e.g. with Kafka) that messages with a given key exists
> in both partition 1 an 3. Because GroupByPartition will assign partition 1
> and 3 to different tasks, messages with the same key may be handled by
> different task/container/process and their state will be stored in
> different changelog partition." The problem statement is not super clear
> here. The issues with stateful jobs is: after GroupByPartition assign
> partition 1 and 3 to different tasks, the new task handling partition 3
> does not have the previous state to resume the work. e.g. a page-key based
> counter would start from 0 in the new task for a specific key, instead of
> resuming the previous count 50 held by task 1.
> - minor rewording: "the first solution in this doc" ==> "the solution
> proposed in this doc"
> - "Thus additional development work is needed in Kafka to meet this
> requirement" It would be good to link to a KIP if and when it exists
> - Instead of touching/deprecating the interface
> SystemStreamPartitionGrouper, I would recommend to have a different
> implementation class of the interface, which in the constructor of the
> grouper, takes two parameters: a) the previous task number read from the
> coordinator stream; b) the configured new-partition to old-partition
> mapping policy. Then, the grouper's interface method stays the same and the
> behavior of the grouper is more configurable which is good to support a
> broader set of use cases in addition to Kafka's built-in partition
> expansion policies.
> - Minor renaming suggestion to the new grouper class names:
> GroupByPartitionWithFixedTaskNum
> and GroupBySystemStreamPartitionWithFixedTaskNum
>
> Thanks!
>
> - Yi
>
> On Wed, May 31, 2017 at 10:33 AM, Dong Lin  wrote:
>
> > Hey Navina,
> >
> > Thanks much for the comment. Please see my response below.
> >
> > Regarding your biggest gripe with the SEP, I personally think the
> > operational requirement proposed in the KIP are pretty general and could
> be
> > easily enforced by other systems. The reason is that the module operation
> > is pretty standard and the default option when we choose partition. And
> > usually the underlying system allows user to select arbitrary partition
> > number if it supports partition expansion. Do you know any system that
> does
> > not meet these two requirement?
> >
> > Regarding your comment of the Motivation section, I renamed the first
> > section as "Problem and Goal" and specified that "*The goal of this
> > proposal is to enable partition expansion of the input streams*.". I also
> > put a sentence at the end of the Motivati

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-31 Thread Navina Ramesh (Apache)
Hey Dong,

>  I have updated the motivation section to clarify this.

Thanks for updating the motivation. Couple of notes here:

1.
> "The motivation of increasing partition number of Kafka topic includes 1)
limit the maximum size of a partition in order to improve broker
performance and 2) increase throughput of Kafka consumer in the Samza
container."

It's unclear to me how increasing the partition number will increase the
throughput of the kafka consumer in the container? Theoretically, you will
still be consuming the same amount of data in the container, irrespective
of whether it is coming from one partition or more than one expanded
partitions. Can you please explain it for me here, what you mean by that?

2. I believe the second paragraph under motivation is simply talking about
the scope of the current SEP. It will be easier to read if what solution is
included in this SEP and what is left out as not in scope. (for example,
expansions for stateful jobs is supported or not).

> We need to persist the task-to-sspList mapping in the
coordinator stream so that the job can derive the original number of
partitions of each input stream regardless of how many times the partition
has expanded. Does this make sense?

Yes. It does!

> I am not sure how this is related to the locality though. Can you clarify
your question if I haven't answered your question?

It's not related. I just meant to give an example of yet another
coordinator message that is persisted. Your ssp-to-task mapping is
following a similar pattern for persisting. Just wanted to clarify that.

> Can you let me know if this, together with the answers in the previous
email, addresses all your questions?

Yes. I believe you have addressed most of my questions. Thanks for taking
time to do that.

> Is there specific question you have regarding partition
expansion in Kafka?

I guess my questions are on how partition expansion in Kafka impacts the
clients. Iiuc, partition expansions are done manually in Kafka based on the
bytes-in rate of the partition. Do the existing kafka clients handle this
expansion automatically? if yes, how does it work? If not, are there plans
to support it in the future?

> Thus user's job should not need to bootstrap key/value store from the
changelog topic.

Why is this discussion relevant here? Key/value store / changelog topic
partition is scoped with the context of a task. Since we are not changing
the number of tasks, I don't think it is required to mention it here.

> The new method takes the SystemStreamPartition-to-Task assignment from
the previous job model which can be read from the coordinator stream.

Jobmodel is currently not persisted to coordinator stream. In your design,
you talk about writing separate coordinator messages for ssp-to-task
assignments. Hence, please correct this statement. It is kind of misleading
to the reader.

My biggest gripe with this SEP is that it seems like a tailor-made solution
that relies on the semantics of the Kafka system and yet, we are trying to
masquerade that as operational requirements for other systems interacting
with Samza. (Not to say that this is the first time such a choice is being
made in the Samza design). I am not seeing how this can a "general"
solution for all input systems. That's my two cents. I would like to hear
alternative points of view for this from other devs.

Please make sure you have enough eyes on this SEP. If you do, please start
a VOTE thread to approve this SEP.

Thanks!
Navina


On Mon, May 29, 2017 at 12:32 AM, Dong Lin  wrote:

> Hey Navina,
>
> I have updated the wiki based on your suggestion. More specifically, I have
> made the following changes:
>
> - Improved Problem section and Motivation section to describe why we use
> the solution in this proposal instead of tackling the problem of task
> expansion directly.
>
> - Illustrate the design in a way that doesn't bind to Kafka. Kafka is only
> used as example to illustrate why we want to expand partition expansion and
> whether the operational requirement can be supported when Kafka is used as
> the input system. Note that the proposed solution should work for any input
> system that meets the operational requirement described in the wiki.
>
> - Fixed the problem in the figure.
>
> - Added a new class GroupBySystemStreamPartitionFixedTaskNum to the wiki.
> Together with GroupByPartitionFixedTaskNum, it should ensure that we have a
> solution to enable partition expansion for all users that are using
> pre-defined grouper in Samza. Note that those users who use custom grouper
> would need to update their implementation.
>
> Can you let me know if this, together with the answers in the previous
> email, addresses all your questions? Thanks for taking time to review the
> proposal.
>
> Regards,
> Dong
>
>
>
>
>
&

[VOTE] Apache Samza 0.13.0 RC1

2017-05-24 Thread Navina Ramesh (Apache)
Hi everyone,

This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
everyone who has contributed to this release. We are very glad to see some
new contributors and features in this release.

The release candidate can be downloaded from here:
*http://home.apache.org/~navina/samza-0.13.0-rc1/
<http://home.apache.org/~navina/samza-0.13.0-rc1/>*

The release candidate is signed with pgp key 331C8F69 , which can be found
on keyservers:
http://pgp.mit.edu/pks/lookup?op=get&search=0x331C8F69

The git tag is release-0.13.0-rc1 and signed with the same pgp key:
https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.13.0-rc1

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1021

137 issues were resolved for this release:
https://issues.apache.org/jira/issues/?jql=project%20%
3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
20AND%20status%20in%20(Resolved%2C%20Closed)

The vote will be open for 3 *working* days (ending at 8:00PM Monday,
05/13/2017). We have an extended deadline this time as it is too close to a
long weekend.

Please download the release candidate, check the hashes/signature, build it
and test it, and then please vote:


[ ] +1 approve

[ ] +0 no opinion

[ ] -1 disapprove (and reason why)

Cheers!
Navina


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-05-24 Thread Navina Ramesh (Apache)
Thanks for the SEP, Dong. I have a couple of questions to understand your
proposal better:

* Under motivation, you mention that "_We expect this solution to work
similarly with other input system as well._", yet I don't see any
discussion on how it will work with other input systems. That is, what kind
of contract does samza expect from other input systems ? If we are not
planning to provide a generic solution, it might be worth calling it out in
the SEP.

* I understand the partition mapping logic you have proposed. But I think
the example explanation doesn't match the diagram. In the diagram, after
expansion, partiion-0 and partition-1 are pointing to bucket 0 and
partition-3 and partition-4 are pointing to bucket 1. I think the former
has to be partition-0 and partition-2 and the latter, is partition-1 and
partition-3. If I am wrong, please help me understand the logic :)

* I don't know how partition expansion in Kafka works. I am familiar with
how shard splitting happens in Kinesis - there is hierarchical relation
between the parent and child shards. This way, it will also allow the
shards to be merged back. Iiuc, Kafka only supports partition "expansion",
as opposed to "splits". Can you provide some context or link related to how
partition expansion works in Kafka?

* Are you only recommending that expansion can be supported for samza jobs
that use Kafka as input systems **and** configure the SSPGrouper as
GroupByPartitionFixedTaskNum? Sounds to me like this only applies for
GroupByPartition. Please correct me if I am wrong. What is the expectation
for custom SSP Groupers?

* Regarding storing SSP-to-Task assignment to coordinator stream: Today,
the JobModel encapsulates the data model in samza which also includes
**TaskModels**. TaskModel, typically shows the task-to-sspList mapping.
What is the reason for using a separate coordinator stream message
*SetSSPTaskMapping*? Is it because the JobModel itself is not persisted in
the coordinator stream today?  The reason locality exists outside of the
jobmodel is because *locality* information is written by each container,
where as it is consumed only by the leader jobcoordinator/AM. In this case,
the writer of the mapping information and the reader is still the leader
jobcoordinator/AM. So, I want to understand the motivation for this choice.

Cheers!
Navina

On Tue, May 23, 2017 at 5:45 PM, Dong Lin  wrote:

> Hi all,
>
> We created SEP-5: Enable partition expansion of input streams. Please find
> the SEP wiki in the link
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> 5%3A+Enable+partition+expansion+of+input+streams
> .
>
> You feedback is appreciated!
>
> Thanks,
> Dong
>


[GitHub] samza pull request #202: SAMZA-1307 - Fix ZkKeyBuilder null checks for pathP...

2017-05-23 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/202

SAMZA-1307 - Fix ZkKeyBuilder null checks for pathPrefix



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1307

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/202.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #202


commit 923d1f163d25e83fcead323e7b1107872bf161ba
Author: Navina Ramesh 
Date:   2017-05-24T06:10:32Z

SAMZA-1307 - Fix ZkKeyBuilder null checks for pathPrefix




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #201: Updating committer page on samza.apache.org

2017-05-23 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/201

Updating committer page on samza.apache.org

Giving more screen time for currently active Apache Samza committers (no 
JIRA)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza reOrgCommitters

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/201.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #201


commit 53b1c1d2157aee72d35d82809e4226c163a421a2
Author: Navina Ramesh 
Date:   2017-05-23T20:11:50Z

Re-arranging committer list order




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #195: SAMZA-1128 : Remove dependency of debounce timer fr...

2017-05-18 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/195

SAMZA-1128 : Remove dependency of debounce timer from the CoordinationUtils

This patch addresses the following:
* Removes CoordinationUtils#getBarrier, BarrierForVersionUpgrade interface
* Renamed ZkBarrierForVersionUpgrade to ZkBarrier and introduces a listener 
ZkBarrierListener
* Simplified the ZkBarrier class and its integration test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1128

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #195


commit 2b40ff2a269b982a6704b206a8aad8408ec0c48b
Author: Navina Ramesh 
Date:   2017-05-15T19:11:29Z

Removing getBarrier from CoordinationUtils

commit d68c7abfa97e3b45073a29fa091b398c013f11f1
Author: Navina Ramesh 
Date:   2017-05-16T00:06:59Z

Renamed waitForBarrier to joinBarrier; timeout to long; Introduced barrier 
listener and state; Removed debounce timer from ZkBarrier; getEphemeralPath 
removed from ZkUtils

commit 1fededa79f7367d59e1a05ef9a77543a13e37ed0
Author: Navina Ramesh 
Date:   2017-05-16T23:14:56Z

Fixing typos

commit 212f00cd9c97af533210f96898104f42d049e29b
Author: Navina Ramesh 
Date:   2017-05-17T02:56:18Z

Barrier Listener works. Timeout test doesn't

commit 4360d1a88c7099474130ae18370781e870959997
Author: Navina Ramesh 
Date:   2017-05-17T22:11:39Z

Fixed unit tests and timeout handling

commit b73204a10bf25407bae802f2a2134b052a4bc31b
Author: Navina Ramesh 
Date:   2017-05-18T00:23:24Z

StateChange handler was failing with class cast exception

commit 10f56c0a5bed62191e8caf7727e1dfb3978e4a85
Author: Navina Ramesh 
Date:   2017-05-18T06:24:46Z

Added BarrierKeyBuilder

commit 3a323adcad16d513b3b131526235d4caed9bd97a
Author: Navina Ramesh 
Date:   2017-05-18T06:38:42Z

Added some documentation

commit bbb310cf33da8ee223d32a1beb9e5d5ee65458df
Author: Navina Ramesh 
Date:   2017-05-18T06:42:15Z

Renamed TestZkBarrierForVersionUpgrade and made some variable name 
refactoring

commit f33e1860cc51cd816ee45b195dcf8e80bc3cbe9a
Author: Navina Ramesh 
Date:   2017-05-18T17:25:48Z

Fixing docs and checkstyle

commit 7e8760356e51db77c1c5af1d3d3cee36e5f3c88f
Author: Navina Ramesh 
Date:   2017-05-18T18:14:45Z

Moved BarrierState enum to Barrier Class

commit 96d795ebd6d3b91f0c6fe301b373e6fbbb502824
Author: Navina Ramesh 
Date:   2017-05-18T18:17:57Z

Removed barrier interface

commit 32cb11206b3f9a1b35b3e383a4f78c0e668aef8f
Author: Navina Ramesh 
Date:   2017-05-18T21:21:41Z

Adding more docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-17 Thread Navina Ramesh (Apache)
Prateek told me that he sent out a cancel email. It didn't reach the
mail-archive I think. Lately, we have this kind of issues where the emails
are not reaching our dev list.

On Wed, May 17, 2017 at 2:06 PM, Yi Pan  wrote:

> Hi, all,
>
> Based on the conversation above, can we officially cancel this vote?
>
> Thanks!
>
> -Yi
>
> On Mon, May 15, 2017 at 9:31 AM, Ignacio Solis  wrote:
>
> > Thanks!
> >
> > On Mon, May 15, 2017 at 8:00 AM, Navina Ramesh
> >  wrote:
> > > I will try to get the patch out today. Work doesn't look trivial. I am
> on
> > > it.
> > >
> > > Navina
> > >
> > > On May 14, 2017 23:10, "Ignacio Solis"  wrote:
> > >
> > >> We should hold off until it is solved.  How long will it take to fix
> > this?
> > >>
> > >> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> > >>  wrote:
> > >> > I just changed the status of this JIRA to "BLOCKER" -
> > >> > https://issues.apache.org/jira/browse/SAMZA-1128
> > >> >
> > >> > This causes a bug in standalone deployment where any failure in the
> > >> barrier
> > >> > protocol stops the scheduled executorservice. Unfortunately,
> > >> > CoordinationUtils creates its own scheduled executorservice, which
> is
> > >> > incorrect. Scheduled ExecutorService is meant to be the working
> queue
> > for
> > >> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran
> into
> > >> this
> > >> > bug during testing on Friday.
> > >> >
> > >> > veto for this release candidate.
> > >> >
> > >> > @Prateek/Jagadish:
> > >> > I recommend sending a "non-vote, testing release candidate" for this
> > >> > release until we complete all pending tasks (includes docs, tests
> > etc).
> > >> It
> > >> > will also be useful to share the pending tasks and their progress.
> In
> > >> case
> > >> > you have already shared it, I might have missed it since some emails
> > are
> > >> > bouncing off my inbox.
> > >> >
> > >> > Thanks!
> > >> > Navina
> > >> >
> > >> > On Sun, May 14, 2017 at 1:30 PM, Boris S  wrote:
> > >> >
> > >> >> I think we need to add SAMZA-1286 and
> > >> >> SAMZA-1279 to the release .
> > >> >>
> > >> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> > >> jagad...@apache.org
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > This is a call for a vote on a release of Apache Samza 0.13.0.
> > Thanks
> > >> to
> > >> >> > everyone who has contributed to this release. We are very glad to
> > see
> > >> >> some
> > >> >> > new contributors and features in this release.
> > >> >> >
> > >> >> > The release candidate can be downloaded from here:
> > >> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> > >> >> >
> > >> >> > The release candidate is signed with pgp key AF81FFBF, which can
> be
> > >> found
> > >> >> > on keyservers:
> > >> >> > http://pgp.mit.edu/pks/lookup?op=get&search=0xAF81FFBF
> > >> >> >
> > >> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp
> key:
> > >> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > >> >> > refs/tags/release-0.13.0-rc0
> > >> >> >
> > >> >> > Test binaries have been published to Maven's staging repository,
> > and
> > >> are
> > >> >> > available here:
> > >> >> > https://repository.apache.org/content/repositories/
> > >> orgapachesamza-1020
> > >> >> >
> > >> >> > 127 issues were resolved for this release:
> > >> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> > >> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > >> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> > >> >> >
> > >> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> > >> >> 05/13/2017).
> > >> >> >
> > >> >> > Please download the release candidate, check the
> hashes/signature,
> > >> build
> > >> >> it
> > >> >> > and test it, and then please vote:
> > >> >> >
> > >> >> >
> > >> >> > [ ] +1 approve
> > >> >> >
> > >> >> > [ ] +0 no opinion
> > >> >> >
> > >> >> > [ ] -1 disapprove (and reason why)
> > >> >> >
> > >> >> >
> > >> >> > +1 from my side for the release.
> > >> >> >
> > >> >> > Cheers!
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Nacho - Ignacio Solis - iso...@igso.net
> > >>
> >
> >
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
> >
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-17 Thread Navina Ramesh
Prateek told me that he sent out a cancel email. It didn't reach the
mail-archive I think. Lately, we have this kind of issues where the emails
are not reaching our dev list.

On Wed, May 17, 2017 at 2:06 PM, Yi Pan  wrote:

> Hi, all,
>
> Based on the conversation above, can we officially cancel this vote?
>
> Thanks!
>
> -Yi
>
> On Mon, May 15, 2017 at 9:31 AM, Ignacio Solis  wrote:
>
> > Thanks!
> >
> > On Mon, May 15, 2017 at 8:00 AM, Navina Ramesh
> >  wrote:
> > > I will try to get the patch out today. Work doesn't look trivial. I am
> on
> > > it.
> > >
> > > Navina
> > >
> > > On May 14, 2017 23:10, "Ignacio Solis"  wrote:
> > >
> > >> We should hold off until it is solved.  How long will it take to fix
> > this?
> > >>
> > >> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
> > >>  wrote:
> > >> > I just changed the status of this JIRA to "BLOCKER" -
> > >> > https://issues.apache.org/jira/browse/SAMZA-1128
> > >> >
> > >> > This causes a bug in standalone deployment where any failure in the
> > >> barrier
> > >> > protocol stops the scheduled executorservice. Unfortunately,
> > >> > CoordinationUtils creates its own scheduled executorservice, which
> is
> > >> > incorrect. Scheduled ExecutorService is meant to be the working
> queue
> > for
> > >> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran
> into
> > >> this
> > >> > bug during testing on Friday.
> > >> >
> > >> > veto for this release candidate.
> > >> >
> > >> > @Prateek/Jagadish:
> > >> > I recommend sending a "non-vote, testing release candidate" for this
> > >> > release until we complete all pending tasks (includes docs, tests
> > etc).
> > >> It
> > >> > will also be useful to share the pending tasks and their progress.
> In
> > >> case
> > >> > you have already shared it, I might have missed it since some emails
> > are
> > >> > bouncing off my inbox.
> > >> >
> > >> > Thanks!
> > >> > Navina
> > >> >
> > >> > On Sun, May 14, 2017 at 1:30 PM, Boris S  wrote:
> > >> >
> > >> >> I think we need to add SAMZA-1286 and
> > >> >> SAMZA-1279 to the release .
> > >> >>
> > >> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> > >> jagad...@apache.org
> > >> >> >
> > >> >> wrote:
> > >> >>
> > >> >> > This is a call for a vote on a release of Apache Samza 0.13.0.
> > Thanks
> > >> to
> > >> >> > everyone who has contributed to this release. We are very glad to
> > see
> > >> >> some
> > >> >> > new contributors and features in this release.
> > >> >> >
> > >> >> > The release candidate can be downloaded from here:
> > >> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> > >> >> >
> > >> >> > The release candidate is signed with pgp key AF81FFBF, which can
> be
> > >> found
> > >> >> > on keyservers:
> > >> >> > http://pgp.mit.edu/pks/lookup?op=get&search=0xAF81FFBF
> > >> >> >
> > >> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp
> key:
> > >> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > >> >> > refs/tags/release-0.13.0-rc0
> > >> >> >
> > >> >> > Test binaries have been published to Maven's staging repository,
> > and
> > >> are
> > >> >> > available here:
> > >> >> > https://repository.apache.org/content/repositories/
> > >> orgapachesamza-1020
> > >> >> >
> > >> >> > 127 issues were resolved for this release:
> > >> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> > >> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > >> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> > >> >> >
> > >> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> > >> >> 05/13/2017).
> > >> >> >
> > >> >> > Please download the release candidate, check the
> hashes/signature,
> > >> build
> > >> >> it
> > >> >> > and test it, and then please vote:
> > >> >> >
> > >> >> >
> > >> >> > [ ] +1 approve
> > >> >> >
> > >> >> > [ ] +0 no opinion
> > >> >> >
> > >> >> > [ ] -1 disapprove (and reason why)
> > >> >> >
> > >> >> >
> > >> >> > +1 from my side for the release.
> > >> >> >
> > >> >> > Cheers!
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Nacho - Ignacio Solis - iso...@igso.net
> > >>
> >
> >
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-4: Adjunct Data Store for Unbounded DataSets

2017-05-16 Thread Navina Ramesh (Apache)
Thanks for trying 3 times, Wei. Sorry about the trouble. Not sure where the
problem lies. Looking forward to review your design.

Navina

On Tue, May 16, 2017 at 8:56 AM, Wei Song  wrote:

> Hey everyone,
>
> I created a proposal for SAMZA-1278
> <https://issues.apache.org/jira/browse/SAMZA-1278>, Adjunct Data Store
> for Unbounded DataSets, which introduces an automatic mechanism to store
> adjunct data for stream tasks.
>
> https://cwiki.apache.org/confluence/display/SAMZA/Adjunct+Da
> ta+Store+for+Unbounded+DataSets
>
> Please review and comments are welcome!
>
> For those who are not actively following the master branch, you may have
> more questions than others. Feel free to ask them here.
>
> P.S. this is the 3rd try, sent this last week, but apparently no one at
> Linkedin has received, including samza-dev here just to be sure.
>
> --
> Thanks,
> -Wei
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-15 Thread Navina Ramesh
I will try to get the patch out today. Work doesn't look trivial. I am on
it.

Navina

On May 14, 2017 23:10, "Ignacio Solis"  wrote:

> We should hold off until it is solved.  How long will it take to fix this?
>
> On Sun, May 14, 2017 at 10:13 PM, Navina Ramesh (Apache)
>  wrote:
> > I just changed the status of this JIRA to "BLOCKER" -
> > https://issues.apache.org/jira/browse/SAMZA-1128
> >
> > This causes a bug in standalone deployment where any failure in the
> barrier
> > protocol stops the scheduled executorservice. Unfortunately,
> > CoordinationUtils creates its own scheduled executorservice, which is
> > incorrect. Scheduled ExecutorService is meant to be the working queue for
> > the ZkJobCoordinator. This needs to be fixed. Bharath already ran into
> this
> > bug during testing on Friday.
> >
> > veto for this release candidate.
> >
> > @Prateek/Jagadish:
> > I recommend sending a "non-vote, testing release candidate" for this
> > release until we complete all pending tasks (includes docs, tests etc).
> It
> > will also be useful to share the pending tasks and their progress. In
> case
> > you have already shared it, I might have missed it since some emails are
> > bouncing off my inbox.
> >
> > Thanks!
> > Navina
> >
> > On Sun, May 14, 2017 at 1:30 PM, Boris S  wrote:
> >
> >> I think we need to add SAMZA-1286 and
> >> SAMZA-1279 to the release .
> >>
> >> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman <
> jagad...@apache.org
> >> >
> >> wrote:
> >>
> >> > This is a call for a vote on a release of Apache Samza 0.13.0. Thanks
> to
> >> > everyone who has contributed to this release. We are very glad to see
> >> some
> >> > new contributors and features in this release.
> >> >
> >> > The release candidate can be downloaded from here:
> >> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> >> >
> >> > The release candidate is signed with pgp key AF81FFBF, which can be
> found
> >> > on keyservers:
> >> > http://pgp.mit.edu/pks/lookup?op=get&search=0xAF81FFBF
> >> >
> >> > The git tag is release-0.13.0-rc0 and signed with the same pgp key:
> >> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> >> > refs/tags/release-0.13.0-rc0
> >> >
> >> > Test binaries have been published to Maven's staging repository, and
> are
> >> > available here:
> >> > https://repository.apache.org/content/repositories/
> orgapachesamza-1020
> >> >
> >> > 127 issues were resolved for this release:
> >> > https://issues.apache.org/jira/issues/?jql=project%20%
> >> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> >> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> >> >
> >> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> >> 05/13/2017).
> >> >
> >> > Please download the release candidate, check the hashes/signature,
> build
> >> it
> >> > and test it, and then please vote:
> >> >
> >> >
> >> > [ ] +1 approve
> >> >
> >> > [ ] +0 no opinion
> >> >
> >> > [ ] -1 disapprove (and reason why)
> >> >
> >> >
> >> > +1 from my side for the release.
> >> >
> >> > Cheers!
> >> >
> >>
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>


Re: [VOTE] Apache Samza 0.13.0 RC0

2017-05-14 Thread Navina Ramesh (Apache)
I just changed the status of this JIRA to "BLOCKER" -
https://issues.apache.org/jira/browse/SAMZA-1128

This causes a bug in standalone deployment where any failure in the barrier
protocol stops the scheduled executorservice. Unfortunately,
CoordinationUtils creates its own scheduled executorservice, which is
incorrect. Scheduled ExecutorService is meant to be the working queue for
the ZkJobCoordinator. This needs to be fixed. Bharath already ran into this
bug during testing on Friday.

veto for this release candidate.

@Prateek/Jagadish:
I recommend sending a "non-vote, testing release candidate" for this
release until we complete all pending tasks (includes docs, tests etc). It
will also be useful to share the pending tasks and their progress. In case
you have already shared it, I might have missed it since some emails are
bouncing off my inbox.

Thanks!
Navina

On Sun, May 14, 2017 at 1:30 PM, Boris S  wrote:

> I think we need to add SAMZA-1286 and
> SAMZA-1279 to the release .
>
> On Wed, May 10, 2017 at 7:51 PM, Jagadish Venkatraman  >
> wrote:
>
> > This is a call for a vote on a release of Apache Samza 0.13.0. Thanks to
> > everyone who has contributed to this release. We are very glad to see
> some
> > new contributors and features in this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~jagadish/samza-0.13.0-rc0/
> >
> > The release candidate is signed with pgp key AF81FFBF, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get&search=0xAF81FFBF
> >
> > The git tag is release-0.13.0-rc0 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.13.0-rc0
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1020
> >
> > 127 issues were resolved for this release:
> > https://issues.apache.org/jira/issues/?jql=project%20%
> > 3D%20SAMZA%20AND%20fixVersion%20in%20(0.13%2C%200.13.0)%
> > 20AND%20status%20in%20(Resolved%2C%20Closed)
> >
> > The vote will be open for 72 hours (ending at 8:00PM Saturday,
> 05/13/2017).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> >
> > +1 from my side for the release.
> >
> > Cheers!
> >
>


[GitHub] samza pull request #179: SAMZA-1276 : Adding customReporters passed by Strea...

2017-05-09 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/179

SAMZA-1276 : Adding customReporters passed by StreamProcessor into 
SamzaContainer

SamzaContainer is not adding custom reporters along side class-loaded 
reporters from config. This was missed in 
[SAMZA-1080](https://issues.apache.org/jira/browse/SAMZA-1080). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1276

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/179.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #179


commit 5c25a1081d8fadc633388082850d18985543d774
Author: Navina Ramesh 
Date:   2017-05-10T00:54:56Z

Adding customReporters passed by StreamProcessor into SamzaContainer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #173: SAMZA-1272 : ZkCoordinationUtils deletes the entire...

2017-05-08 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/173

SAMZA-1272 : ZkCoordinationUtils deletes the entire Zk tree on reset

* ZkCoordinationUtils has a reset interface that deletes the entire Zk 
tree. This is not desirable.
* Also, fixed flakiness in unit test by unique barrier name in each of the 
unit tests. Otherwise, they share the same path on Zk and fail during 
concurrent test execution


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1272

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/173.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #173


commit a2cf394c555ca93a6d654edfb6a0b44273dc7b1b
Author: Navina Ramesh 
Date:   2017-05-09T00:31:18Z

Blah

commit 52da0c76af503e5d6d27ada8011cc78a987e90eb
Author: Navina Ramesh 
Date:   2017-05-09T01:39:49Z

Fixing the unit test for barrier




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #166: SAMZA-1150 : Handling Error propagation between ZkJ...

2017-05-05 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/166

SAMZA-1150 : Handling Error propagation between ZkJobCoordinator & 
DebounceTimer

* Treats all errors in jobcoordinator as FATAL and shuts-down the 
streamprocessor
* [Bug] Fixed bug reported in SAMZA-1241
* Introduced a callback to be associated with the timer (same callback for 
every Runnable failure)

**TBD**: some more unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1150

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/166.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #166


commit 8996cb56190bdf5da21f9d767b7db7ee062fc1e8
Author: Navina Ramesh 
Date:   2017-04-14T00:20:38Z

Remove containerId from SamzaContainer.apply

commit 6931cc62a3e0908fc1dee48055a0d9f95072746e
Author: Navina Ramesh 
Date:   2017-04-14T00:22:01Z

Removing onBecomingLeader for readability

commit 528086dba116d3a9b2772d493e9430784236c5ca
Author: Navina Ramesh 
Date:   2017-04-14T00:26:01Z

Removing awaitStart

commit eb16378fb0b127563a3f0a92ca1b1249813d60c8
Author: Navina Ramesh 
Date:   2017-04-14T19:12:46Z

Added JobCoordinator Listener. Trying to remove Samzacontainer controller

commit 4d21ee56fafc00daa4c70e2ff275e66f34f3c1a4
Author: Navina Ramesh 
Date:   2017-04-25T01:36:14Z

Adding ProcessorErrorHandler, SamzaContainerStatus, JobCoordinatorListener, 
SamzaContainerListener

commit 38e3f40d2c464dc5b570b175e670060881f554d9
Author: Navina Ramesh 
Date:   2017-04-26T23:01:24Z

Documenting state transitions for SamzaContainer

commit bfe1f17d16cc52c809b084328287c93d39452dfa
Author: Navina Ramesh 
Date:   2017-04-26T23:03:00Z

adding some log lines in LocalContainerRunner

commit 376f6ac2eb21977cd7a958198efe33da75c8afa9
Author: Navina Ramesh 
Date:   2017-04-27T01:36:25Z

Fixed integration test failures by throwing the exception in the listener 
for ThreadJob

commit 38271d69d72a731914c36f502f49be7c7d0ec235
Author: Navina Ramesh 
Date:   2017-04-27T19:42:54Z

Added a few tests in TestSamzaContainer

commit 2df4666319676e56600c523712c54ee6f842fae4
Author: Navina Ramesh 
Date:   2017-04-27T23:38:29Z

Added test for sp.stop()

commit 6f6255e3f34d0aa9f437f0589bf02d0f3451e089
Author: Navina Ramesh 
Date:   2017-04-28T00:23:32Z

Adding setContainerListener explicitly in SamzaContainer

commit 53bb48c453d1e48f978a457d43addc5f32fce3ef
Author: Navina Ramesh 
Date:   2017-04-28T02:38:24Z

Added documentation in JobCoordinator interface

commit 012eeff7d347d5e273d3938f62dedf59b849ea86
Author: Navina Ramesh 
Date:   2017-04-28T02:43:00Z

Removed ProcessorErrorHandler

commit 48cf739fc2f585bb7a21161a66270bb992419f5d
Author: Navina Ramesh 
Date:   2017-04-28T02:45:47Z

Removing commetned out code

commit 6bf290545eb3485c8530995219e42cc134fed10f
Author: Navina Ramesh 
Date:   2017-04-28T05:00:40Z

Adding docs to JobCoordinatorListener

commit db96e5d0dcef3f44d0e986938089c88c1ddebdd7
Author: Navina Ramesh 
Date:   2017-04-28T05:36:02Z

Added javadocs for SamzaContainerListener

commit ca27a8795b85d539122a8fb19fd8fbdc8297634b
Author: Navina Ramesh 
Date:   2017-04-28T06:33:49Z

Cleaning up StreamProcessor code and jobCoordinator docs

commit 442035912f2e60484692570d56be33bf5eb459a3
Author: Navina Ramesh 
Date:   2017-04-28T07:09:36Z

Fixing standaloneJobCoordinator

commit 525ac15cf2e141970fe921214ed3f71cd7dadc37
Author: Navina Ramesh 
Date:   2017-04-28T07:14:04Z

Adding null checks on processorListener

commit 23a6f762b462a8aaf81126c9060587c0de042269
Author: Navina Ramesh 
Date:   2017-04-28T07:32:37Z

Fixing ZkJobCoordinator

commit 22e5ddd5f0217b8254b0def2fc2a2a7f94bfe7ee
Author: Navina Ramesh 
Date:   2017-05-02T22:19:46Z

Removing unused variable in ZkJobCoordinator

commit 4a5800512271b3be7236898373f51b89935014fe
Author: Navina Ramesh 
Date:   2017-05-02T22:38:15Z

Adding javadoc to sp.stop and removing incorrect comment in ZkControllerImpl

commit 6c580d3fdbcd8f02ecff08d296eda1128865e4de
Author: Navina Ramesh 
Date:   2017-05-02T23:32:29Z

Fixing LocalContainerRunner

commit 3a5d1811b4de9a229f5ea4b68cc61b81b69908e6
Author: Navina Ramesh 
Date:   2017-05-03T00:06:58Z

Minor changes to codestyle and docs

commit ee29c0f75f409a11b40ceaa313cfea772772f864
Author: Navina Ramesh 
Date:   2017-05-03T02:23:16Z

Adding IllegalContainerStateException

commit d58d789409d302a4ca024ed2d3455a440dfb61c9
Author: Navina Ramesh 
Date:   2017-05-03T05:32:47Z

Moving pkg private methods after public methods in StreamProcessor

commit edfad1eee572febe5a2b66946c0cfdea19c8f354
Author: Navina Ramesh 
Date:   2017-05-03T05:58:58Z

Fixing comments based on Prateek's suggestion

[GitHub] samza pull request #162: SAMZA-1228 : StreamProcessor should stop JmxServer

2017-05-03 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/162

SAMZA-1228 : StreamProcessor should stop JmxServer

This is not the solution posted in SAMZA-1228. For now, we are moving 
jmxserver lifecycle to be within the container. Ideally, it should be within 
the Streamprocessor so that the job coordinator can also be associated with the 
same instance. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1228

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/162.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #162


commit 13a44dbaa8432125cdc2eebc184fe683b4e615d5
Author: Navina Ramesh 
Date:   2017-05-04T04:44:26Z

SAMZA-1228 : StreamProcessor should stop JmxServer




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] SEP 3 : Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh
+1 (binding)

Awesome work.

Cheers!
Navina

On Wed, May 3, 2017 at 12:01 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> +1 from my side (as described in my previous email).
>
> Thanks for incorporating all feedback from my previous review.
>
> Nice work!
>
> On Wed, May 3, 2017 at 11:46 AM, Abhishek Shivanna 
> wrote:
>
> > Hey everyone,
> >
> > This is the voting thread for SEP 3: Heart-beat mechanism between
> > JobCoordinator and all running containers
> > The Wiki page that discusses the SEP is:
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > 3%3A+Heart-beat+mechanism+between+JobCoordinator+and+
> > all+running+containers
> >
> > Please vote.
> >
> > Thanks,
> > Abhishek
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh (Apache)
Abhishek,
Thanks for clarifying and updating the SEP.

Cheers!
Navina

On Wed, May 3, 2017 at 8:20 PM, Jagadish Venkatraman  wrote:

> Navina,
>
>
> >> The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both
> internal
> APIs and have a concrete implementations.
>
> More specifically, both of these are purely internal implementation classes
> (and have nothing to do with any pluggable public API that we expose)
>
> Best,
> Jagadish
>
> On Wed, May 3, 2017 at 7:34 PM, Abhishek Shivanna 
> wrote:
>
> > Hey Navina,
> >
> > Thank you for reviewing the SEP.
> >
> > > Are you planning on exposing this monitor class as a public api? What
> is
> > the significance of doing so?
> >
> > Sorry for the confusion of having implementation details under "public
> > interfaces".
> > The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both
> > internal APIs
> > and have a concrete implementations.
> >
> > > Is "Execution Container ID" the name of the environmental variable? I
> > don't
> > think environmental variables can contain whitespace??
> >
> > Again, confusion that stemmed from my initial draft. I have fixed the SEP
> > with the actual name in the implementation.
> >
> > > I think the first sentence corresponds to your design. The second one
> is
> > more of an implementation detail. You may want to split it up or just
> > discard one of them. I got confused reading them together because one
> talks
> > about adding to container and the other about the ContainerRunner.
> >
> > Fixed the SEP to make it more clear.
> >
> > Thanks,
> > Abhishek
> >
> >
> > On Wed, May 3, 2017 at 2:08 PM, Navina Ramesh (Apache) <
> nav...@apache.org>
> > wrote:
> >
> > > Hi Abhishek,
> > > I checked your latest proposal in SEP and it looks good to me.
> > >
> > > QQ:
> > > > A new ContainerHeartbeatMonitor class that accepts a
> > > ContainerHeartbeatClient (which has the business logic to make
> heartbeat
> > > checks on the JC endpoint) and a callback.
> > >
> > > Are you planning on exposing this monitor class as a public api? What
> is
> > > the significance of doing so?
> > >
> > > > set an environment variable with the "Execution Container ID" during
> > > container launch. This can be read from the container to make requests
> to
> > > the above endpoint.
> > >
> > > Is "Execution Container ID" the name of the environmental variable? I
> > don't
> > > think environmental variables can contain whitespace??
> > >
> > > > On the container side we start a new thread that periodically polls
> > this
> > > endpoint described above to check if the container is valid. If its
> not,
> > we
> > > shutdown the run loop and raise an error (so that the exit code is non
> 0
> > so
> > > that YARN reschedules the container)
> > > The plan is to setup a monitor in the LocalContainerRunner class that
> > > schedules a thread to check the above endpoint at regular intervals. On
> > > failure the thread modifies state on the LocalContainerRunner to denote
> > > that there was an error. This state is checked during exit in the
> > > LocalContainerRunner to exit with a non-zero code.
> > >
> > > I think the first sentence corresponds to your design. The second one
> is
> > > more of an implementation detail. You may want to split it up or just
> > > discard one of them. I got confused reading them together because one
> > talks
> > > about adding to container and the other about the ContainerRunner.
> > >
> > > Design looks pretty elegant and easily portable.
> > >
> > > Thanks!
> > > Navina
> > >
> > >
> > > On Wed, May 3, 2017 at 9:52 AM, Abhishek Shivanna 
> > > wrote:
> > >
> > > > Hey Jagadish,
> > > >
> > > > Thank you for taking the time to review the design.
> > > > I agree with moving the heartbeat into the the LocalContainerRunner
> > > instead
> > > > of fitting it into the SamzaContainer. I will update the SEP with the
> > new
> > > > design changes.
> > > > Also agree with the changes to the configuration and choosing
> suitable
> > > > defaults should be good enough.
> > > >
> > > > Thanks,
> > > >

Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh
Abhishek,
Thanks for clarifying and updating the SEP.

Cheers!
Navina

On Wed, May 3, 2017 at 8:20 PM, Jagadish Venkatraman  wrote:

> Navina,
>
>
> >> The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both
> internal
> APIs and have a concrete implementations.
>
> More specifically, both of these are purely internal implementation classes
> (and have nothing to do with any pluggable public API that we expose)
>
> Best,
> Jagadish
>
> On Wed, May 3, 2017 at 7:34 PM, Abhishek Shivanna 
> wrote:
>
> > Hey Navina,
> >
> > Thank you for reviewing the SEP.
> >
> > > Are you planning on exposing this monitor class as a public api? What
> is
> > the significance of doing so?
> >
> > Sorry for the confusion of having implementation details under "public
> > interfaces".
> > The ContainerHeartbeatMonitor and the ContainerHeartbeatClient are both
> > internal APIs
> > and have a concrete implementations.
> >
> > > Is "Execution Container ID" the name of the environmental variable? I
> > don't
> > think environmental variables can contain whitespace??
> >
> > Again, confusion that stemmed from my initial draft. I have fixed the SEP
> > with the actual name in the implementation.
> >
> > > I think the first sentence corresponds to your design. The second one
> is
> > more of an implementation detail. You may want to split it up or just
> > discard one of them. I got confused reading them together because one
> talks
> > about adding to container and the other about the ContainerRunner.
> >
> > Fixed the SEP to make it more clear.
> >
> > Thanks,
> > Abhishek
> >
> >
> > On Wed, May 3, 2017 at 2:08 PM, Navina Ramesh (Apache) <
> nav...@apache.org>
> > wrote:
> >
> > > Hi Abhishek,
> > > I checked your latest proposal in SEP and it looks good to me.
> > >
> > > QQ:
> > > > A new ContainerHeartbeatMonitor class that accepts a
> > > ContainerHeartbeatClient (which has the business logic to make
> heartbeat
> > > checks on the JC endpoint) and a callback.
> > >
> > > Are you planning on exposing this monitor class as a public api? What
> is
> > > the significance of doing so?
> > >
> > > > set an environment variable with the "Execution Container ID" during
> > > container launch. This can be read from the container to make requests
> to
> > > the above endpoint.
> > >
> > > Is "Execution Container ID" the name of the environmental variable? I
> > don't
> > > think environmental variables can contain whitespace??
> > >
> > > > On the container side we start a new thread that periodically polls
> > this
> > > endpoint described above to check if the container is valid. If its
> not,
> > we
> > > shutdown the run loop and raise an error (so that the exit code is non
> 0
> > so
> > > that YARN reschedules the container)
> > > The plan is to setup a monitor in the LocalContainerRunner class that
> > > schedules a thread to check the above endpoint at regular intervals. On
> > > failure the thread modifies state on the LocalContainerRunner to denote
> > > that there was an error. This state is checked during exit in the
> > > LocalContainerRunner to exit with a non-zero code.
> > >
> > > I think the first sentence corresponds to your design. The second one
> is
> > > more of an implementation detail. You may want to split it up or just
> > > discard one of them. I got confused reading them together because one
> > talks
> > > about adding to container and the other about the ContainerRunner.
> > >
> > > Design looks pretty elegant and easily portable.
> > >
> > > Thanks!
> > > Navina
> > >
> > >
> > > On Wed, May 3, 2017 at 9:52 AM, Abhishek Shivanna 
> > > wrote:
> > >
> > > > Hey Jagadish,
> > > >
> > > > Thank you for taking the time to review the design.
> > > > I agree with moving the heartbeat into the the LocalContainerRunner
> > > instead
> > > > of fitting it into the SamzaContainer. I will update the SEP with the
> > new
> > > > design changes.
> > > > Also agree with the changes to the configuration and choosing
> suitable
> > > > defaults should be good enough.
> > > >
> > > > Thanks,
> > > >

[GitHub] samza pull request #54: SAMZA-1084 - User thread does not see errors from th...

2017-05-03 Thread navina
Github user navina closed the pull request at:

https://github.com/apache/samza/pull/54


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] SEP-3: Heart-beat mechanism between JobCoordinator and all running containers

2017-05-03 Thread Navina Ramesh (Apache)
Hi Abhishek,
I checked your latest proposal in SEP and it looks good to me.

QQ:
> A new ContainerHeartbeatMonitor class that accepts a
ContainerHeartbeatClient (which has the business logic to make heartbeat
checks on the JC endpoint) and a callback.

Are you planning on exposing this monitor class as a public api? What is
the significance of doing so?

> set an environment variable with the "Execution Container ID" during
container launch. This can be read from the container to make requests to
the above endpoint.

Is "Execution Container ID" the name of the environmental variable? I don't
think environmental variables can contain whitespace??

> On the container side we start a new thread that periodically polls this
endpoint described above to check if the container is valid. If its not, we
shutdown the run loop and raise an error (so that the exit code is non 0 so
that YARN reschedules the container)
The plan is to setup a monitor in the LocalContainerRunner class that
schedules a thread to check the above endpoint at regular intervals. On
failure the thread modifies state on the LocalContainerRunner to denote
that there was an error. This state is checked during exit in the
LocalContainerRunner to exit with a non-zero code.

I think the first sentence corresponds to your design. The second one is
more of an implementation detail. You may want to split it up or just
discard one of them. I got confused reading them together because one talks
about adding to container and the other about the ContainerRunner.

Design looks pretty elegant and easily portable.

Thanks!
Navina


On Wed, May 3, 2017 at 9:52 AM, Abhishek Shivanna  wrote:

> Hey Jagadish,
>
> Thank you for taking the time to review the design.
> I agree with moving the heartbeat into the the LocalContainerRunner instead
> of fitting it into the SamzaContainer. I will update the SEP with the new
> design changes.
> Also agree with the changes to the configuration and choosing suitable
> defaults should be good enough.
>
> Thanks,
> Abhishek
>
>
>
> On Wed, Apr 26, 2017 at 3:23 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
> > Hi Abhishek,
> >
> > Heartbeat between the AM and container has been a long awaited Samza
> > feature. It will go a long way in ensuring our reliability! +1 for this
> > SEP.
> >
> > *High level comments:*
> >
> > Currently, the only use-case for the heartbeat mechanism seems to be when
> > running Samza on Yarn. IMHO, it makes sense to pull the heart beat logic
> > into the *LocalContainerRunner* instead of baking it into the
> > *SamzaContainer* class. Long term, we can re-visit this when we have a
> > pluggable liveness detection mechanism.
> >
> > I'm thinking of a flow like this:
> >
> > There is a separate component (or a thread) inside LocalContainerRunner
> > that periodically polls the coordinator, and determines if it should
> > continue running. If the coordinator determines that the container should
> > not run, the *LocalContainerRunner* cleanly shuts-down the container and
> > the process exits with a non-zero exit status.
> >
> > The following nice properties fall out:
> >
> >- We can remove the proposed config *job.container.validator.enabled.
> *
> >- We can also remove the proposed *Killable* interface since
> >*SamzaContainer* (and runLoops) don't have to implement *Killable *
> >anymore. The life-cycle is managed by the *LocalContainerRunner* that
> >started it.
> >
> > *On the proposed public interfaces:*
> >
> > *job.container.validator.enabled:  *I am not in favor of adding this as
> a
> > new public config. IIUC, When running Samza jobs on Yarn, we always want
> > the validator/heartbeats to be enabled. OTOH, when running Samza jobs in
> > standalone mode, we currently do not have a pluggable mechanism for
> > heartbeat.
> >
> > *job.container.schedule.ms <http://job.container.schedule.ms>: *It does
> > seem that we can pick a sensible default, and be done with it (instead of
> > adding a new config)? Is there a reason this needs to be configurable?
> >
> > *On proposed Killable interface: *
> >
> > Not entirely sure we need this new "*Killable"* interface (esp. given
> that
> > there's currently only one implementation - *SamzaContainer*).
> >
> >- The *LocalContainerRunner* can instead directly invoke shut-down on
> >the *SamzaContainer* when its heart-beat expires. The extra level of
> >indirection (making *SamzaContainer* to implement *Killable*) is
> >probably unnecessary IMHO.
> >
> >
> >- 

Re: Review Request 58866: fixed SAMZA-1248. use processor id for stand alone barrier

2017-05-01 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58866/#review173529
---


Ship it!




- Navina Ramesh


On May 1, 2017, 11:24 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58866/
> ---
> 
> (Updated May 1, 2017, 11:24 p.m.)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Bugs: SAMZA-1248
> https://issues.apache.org/jira/browse/SAMZA-1248
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> use processor id for stand alone barrier
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
> 2535654cee37feeb472517b8673a7bb12b3cc1fc 
>   samza-core/src/main/java/org/apache/samza/zk/ZkUtils.java 
> fee840511fbc19da2e19525a97fcfb5812a70a53 
>   samza-core/src/test/java/org/apache/samza/zk/TestZkUtils.java 
> b8dc2953ead2fb11fa22db5ec30b19a74a779830 
> 
> 
> Diff: https://reviews.apache.org/r/58866/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



[GitHub] samza pull request #153: SAMZA-1251 - Remove DebounceTimer dependency from Z...

2017-05-01 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/153

SAMZA-1251 - Remove DebounceTimer dependency from ZkLeaderElector

Fyi: This PR depends on PR #148 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1251

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/153.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #153


commit d81cfdca4b84f02b86ab70657c8c4636e8902b9a
Author: Navina Ramesh 
Date:   2017-04-14T00:20:38Z

Remove containerId from SamzaContainer.apply

commit c4a10242b6e85345ed4515b98ec407435c1fdce1
Author: Navina Ramesh 
Date:   2017-04-14T00:22:01Z

Removing onBecomingLeader for readability

commit 56028361552b37a27991c6cac1f3e00cc3d3a0f2
Author: Navina Ramesh 
Date:   2017-04-14T00:26:01Z

Removing awaitStart

commit fd99fd65fb437afed04240b3971b3cefc1f52f1d
Author: Navina Ramesh 
Date:   2017-04-14T19:12:46Z

Added JobCoordinator Listener. Trying to remove Samzacontainer controller

commit e77aa502df74cedb87c50a8e039135975504381e
Author: Navina Ramesh 
Date:   2017-04-25T01:36:14Z

Adding ProcessorErrorHandler, SamzaContainerStatus, JobCoordinatorListener, 
SamzaContainerListener

commit 3cbf259c1e9fea7a4d24af93a812e75d9947aac8
Author: Navina Ramesh 
Date:   2017-04-26T23:01:24Z

Documenting state transitions for SamzaContainer

commit 679b2f54aa7a39c8dae688f9b446aa9bad9d267f
Author: Navina Ramesh 
Date:   2017-04-26T23:03:00Z

adding some log lines in LocalContainerRunner

commit 3b65cc983d7d734d6fdf2a81cb155fbad0e774b3
Author: Navina Ramesh 
Date:   2017-04-27T01:36:25Z

Fixed integration test failures by throwing the exception in the listener 
for ThreadJob

commit b1b61f58b2e06a1e7f5fc602fe9007d4c1a003a0
Author: Navina Ramesh 
Date:   2017-04-27T19:42:54Z

Added a few tests in TestSamzaContainer

commit a2db96924ebd479e2110fc611c86c3c310336212
Author: Navina Ramesh 
Date:   2017-04-27T23:38:29Z

Added test for sp.stop()

commit bc74cd5670aacfe5c4eae7968973e68f9f700876
Author: Navina Ramesh 
Date:   2017-04-28T00:23:32Z

Adding setContainerListener explicitly in SamzaContainer

commit 07adf3c6ce39a893a0995498bc012cf6c14c43be
Author: Navina Ramesh 
Date:   2017-04-28T02:38:24Z

Added documentation in JobCoordinator interface

commit 78a73540cc0cd84db286737b190c596dcde93d1f
Author: Navina Ramesh 
Date:   2017-04-28T02:43:00Z

Removed ProcessorErrorHandler

commit 5d1b28c6b566ca691a955d94bb1daf29a96737ef
Author: Navina Ramesh 
Date:   2017-04-28T02:45:47Z

Removing commetned out code

commit 42ffc7d6c1d5e657b35d9482df30f0e201bdbb27
Author: Navina Ramesh 
Date:   2017-04-28T05:00:40Z

Adding docs to JobCoordinatorListener

commit 5ff163cf19c85875a0e2a8d85682487186ffc6c5
Author: Navina Ramesh 
Date:   2017-04-28T05:36:02Z

Added javadocs for SamzaContainerListener

commit f3551656037a058aebd62e9f7dacaafeb49d2f94
Author: Navina Ramesh 
Date:   2017-04-28T06:33:49Z

Cleaning up StreamProcessor code and jobCoordinator docs

commit c624d75afd77dd028a4406d6e07d2ef801098b03
Author: Navina Ramesh 
Date:   2017-04-28T07:09:36Z

Fixing standaloneJobCoordinator

commit c116a3c55149a4cca738a66ec925569385568be9
Author: Navina Ramesh 
Date:   2017-04-28T07:14:04Z

Adding null checks on processorListener

commit 6f0715c4944255409bd78fd178c8e9976e60f485
Author: Navina Ramesh 
Date:   2017-04-28T07:32:37Z

Fixing ZkJobCoordinator

commit 73e993d65d8d3a5ce26af581055ac829981b9b7c
Author: Navina Ramesh 
Date:   2017-04-29T01:24:52Z

LeaderElector should explicitly register a listener. Cleaning up the 
ZkLeaderElector implementation

commit 25d0eddfbae2aacbe03751074bcdb035cdf6bd18
Author: Navina Ramesh 
Date:   2017-05-01T18:47:52Z

Moving LeaderElectorListener out of ZkController and refactoring 
ZkJobCoordinator to use LeaderElector as a library w/o ZkControlelr in the 
middle

commit 1a9533da71c92804aef5a4268afe08b92cc6572d
Author: Navina Ramesh 
Date:   2017-05-01T18:53:25Z

Adding some javadocs to ZkController and ZkControllerListener interface

commit c599fd1cef65ecbfe9c60815ace1411558cb4174
Author: Navina Ramesh 
Date:   2017-05-01T20:16:18Z

[Bug-fix] OnBecomeLeader and OnProcessorChange should be queued up under 
the same name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 58866: fixed SAMZA-1248. use processor id for stand alone barrier

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58866/#review173409
---




samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java
Line 65 (original), 65 (patched)
<https://reviews.apache.org/r/58866/#comment246410>

Why is newJobModel useful? Please add some comments as it is not very 
obvious.


- Navina Ramesh


On April 28, 2017, 11:54 p.m., Boris Shkolnik wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/58866/
> ---
> 
> (Updated April 28, 2017, 11:54 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1248
> https://issues.apache.org/jira/browse/SAMZA-1248
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> use processor id for stand alone barrier
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
> 2535654cee37feeb472517b8673a7bb12b3cc1fc 
>   samza-core/src/main/java/org/apache/samza/zk/ZkUtils.java 
> fee840511fbc19da2e19525a97fcfb5812a70a53 
>   samza-core/src/test/java/org/apache/samza/zk/TestZkUtils.java 
> b8dc2953ead2fb11fa22db5ec30b19a74a779830 
> 
> 
> Diff: https://reviews.apache.org/r/58866/diff/1/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Boris Shkolnik
> 
>



Re: Review Request 58851: SAMZA-1212 - Refactor interaction between StreamProcessor, JobCoordinator and SamzaContainer

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58851/
---

(Updated April 28, 2017, 6:50 p.m.)


Review request for samza and Prateek Maheshwari.


Bugs: SAMZA-1212
https://issues.apache.org/jira/browse/SAMZA-1212


Repository: samza


Description
---

(Same as PR - https://github.com/apache/samza/pull/148)
See SAMZA-1212 for motivation toward this refactoring.

Changes here are:
- Removed awaitStart (blocking) method in StreamProcessor, JobCoordinator and 
SamzaContainer
- Introduced SamzaContainerListener and JobCoordinatorListener interface 
implemented by StreamProcessor
- Introduced SamzaContainerStatus to handler failures and lifecycle using 
Listener interfaces


Diffs
-

  samza-core/src/main/java/org/apache/samza/SamzaContainerStatus.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinator.java 
af2ef6a0338a0f0ab015e615a5dc213941095801 
  
samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinatorFactory.java
 7f7e1ede822cf16b78e6e753ebc083a17ebf2aca 
  
samza-core/src/main/java/org/apache/samza/processor/JobCoordinatorListener.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerController.java
 4af413a14aaa3976f45b0646a3feb745ea3f0e97 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerListener.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java 
191059443e3d65869207a5f1e11526f97833f468 
  
samza-core/src/main/java/org/apache/samza/processor/StreamProcessorLifecycleListener.java
 7bca074a4d83bb9bc2434b6769ecf39c5694e2f9 
  samza-core/src/main/java/org/apache/samza/runtime/LocalContainerRunner.java 
80350dfc02b577faf0dce00cf5695c23d202ad9c 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinator.java
 0d74fb82590ba6f183905c9b0328b16d88adc0ab 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinatorFactory.java
 0faeca917aa5fb12acef9fb539d81a01255a0441 
  samza-core/src/main/java/org/apache/samza/zk/ZkBarrierForVersionUpgrade.java 
0afd840dc2083dc78b853423f27776d6b5a2538f 
  samza-core/src/main/java/org/apache/samza/zk/ZkControllerImpl.java 
61f78762a3a1a50687ec00f783685f53d17bd645 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
2535654cee37feeb472517b8673a7bb12b3cc1fc 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinatorFactory.java 
a44565c083dc73b0f5d56174d82e9ae62136cf02 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
8481c92b5666710edd8381526f824daed4dd27c5 
  samza-core/src/main/scala/org/apache/samza/job/local/ThreadJobFactory.scala 
dcef3af45bf5fe139be7744276adaddac3fb3505 
  samza-core/src/test/java/org/apache/samza/processor/TestStreamProcessor.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
010ff7e85ff1c5e507f3e9fa7d6c196b58d929ab 
  
samza-core/src/test/scala/org/apache/samza/processor/StreamProcessorTestUtils.scala
 PRE-CREATION 
  
samza-kafka/src/test/java/org/apache/samza/system/kafka/TestKafkaSystemAdminJava.java
 a786468722cc49b4b6c3c67d89a6b09f1be4c939 
  
samza-test/src/test/java/org/apache/samza/test/processor/TestStreamProcessor.java
 f37a224f64eec162e60e3a891b257175dbf4ec3c 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 29fb6d3f6e07f356d4a25556221fa76ecdc7bf77 


Diff: https://reviews.apache.org/r/58851/diff/1/


Testing
---

unit tests and ./gradlew clean build


Thanks,

Navina Ramesh



Review Request 58851: SAMZA-1212 - Refactor interaction between StreamProcessor, JobCoordinator and SamzaContainer

2017-04-28 Thread Navina Ramesh via Review Board

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/58851/
---

Review request for samza and Prateek Maheshwari.


Repository: samza


Description
---

(Same as PR - https://github.com/apache/samza/pull/148)
See SAMZA-1212 for motivation toward this refactoring.

Changes here are:
- Removed awaitStart (blocking) method in StreamProcessor, JobCoordinator and 
SamzaContainer
- Introduced SamzaContainerListener and JobCoordinatorListener interface 
implemented by StreamProcessor
- Introduced SamzaContainerStatus to handler failures and lifecycle using 
Listener interfaces


Diffs
-

  samza-core/src/main/java/org/apache/samza/SamzaContainerStatus.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinator.java 
af2ef6a0338a0f0ab015e615a5dc213941095801 
  
samza-core/src/main/java/org/apache/samza/coordinator/JobCoordinatorFactory.java
 7f7e1ede822cf16b78e6e753ebc083a17ebf2aca 
  
samza-core/src/main/java/org/apache/samza/processor/JobCoordinatorListener.java 
PRE-CREATION 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerController.java
 4af413a14aaa3976f45b0646a3feb745ea3f0e97 
  
samza-core/src/main/java/org/apache/samza/processor/SamzaContainerListener.java 
PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java 
191059443e3d65869207a5f1e11526f97833f468 
  
samza-core/src/main/java/org/apache/samza/processor/StreamProcessorLifecycleListener.java
 7bca074a4d83bb9bc2434b6769ecf39c5694e2f9 
  samza-core/src/main/java/org/apache/samza/runtime/LocalContainerRunner.java 
80350dfc02b577faf0dce00cf5695c23d202ad9c 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinator.java
 0d74fb82590ba6f183905c9b0328b16d88adc0ab 
  
samza-core/src/main/java/org/apache/samza/standalone/StandaloneJobCoordinatorFactory.java
 0faeca917aa5fb12acef9fb539d81a01255a0441 
  samza-core/src/main/java/org/apache/samza/zk/ZkBarrierForVersionUpgrade.java 
0afd840dc2083dc78b853423f27776d6b5a2538f 
  samza-core/src/main/java/org/apache/samza/zk/ZkControllerImpl.java 
61f78762a3a1a50687ec00f783685f53d17bd645 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinator.java 
2535654cee37feeb472517b8673a7bb12b3cc1fc 
  samza-core/src/main/java/org/apache/samza/zk/ZkJobCoordinatorFactory.java 
a44565c083dc73b0f5d56174d82e9ae62136cf02 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
8481c92b5666710edd8381526f824daed4dd27c5 
  samza-core/src/main/scala/org/apache/samza/job/local/ThreadJobFactory.scala 
dcef3af45bf5fe139be7744276adaddac3fb3505 
  samza-core/src/test/java/org/apache/samza/processor/TestStreamProcessor.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
010ff7e85ff1c5e507f3e9fa7d6c196b58d929ab 
  
samza-core/src/test/scala/org/apache/samza/processor/StreamProcessorTestUtils.scala
 PRE-CREATION 
  
samza-kafka/src/test/java/org/apache/samza/system/kafka/TestKafkaSystemAdminJava.java
 a786468722cc49b4b6c3c67d89a6b09f1be4c939 
  
samza-test/src/test/java/org/apache/samza/test/processor/TestStreamProcessor.java
 f37a224f64eec162e60e3a891b257175dbf4ec3c 
  
samza-test/src/test/scala/org/apache/samza/test/integration/StreamTaskTestUtil.scala
 29fb6d3f6e07f356d4a25556221fa76ecdc7bf77 


Diff: https://reviews.apache.org/r/58851/diff/1/


Testing
---

unit tests and ./gradlew clean build


Thanks,

Navina Ramesh



[GitHub] samza pull request #148: SAMZA-1212 - Refactor interaction between StreamPro...

2017-04-28 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/148

SAMZA-1212 - Refactor interaction between StreamProcessor, JobCoordinator 
and SamzaContainer

See SAMZA-1212 for motivation toward this refactoring.
Changes here are:
* Removed awaitStart (blocking) method in StreamProcessor, JobCoordinator 
and SamzaContainer
* Introduced SamzaContainerListener and JobCoordinatorListener interface 
implemented by StreamProcessor
* Introduced SamzaContainerStatus to handler failures and lifecycle using 
Listener interfaces

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1212

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/148.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #148


commit d81cfdca4b84f02b86ab70657c8c4636e8902b9a
Author: Navina Ramesh 
Date:   2017-04-14T00:20:38Z

Remove containerId from SamzaContainer.apply

commit c4a10242b6e85345ed4515b98ec407435c1fdce1
Author: Navina Ramesh 
Date:   2017-04-14T00:22:01Z

Removing onBecomingLeader for readability

commit 56028361552b37a27991c6cac1f3e00cc3d3a0f2
Author: Navina Ramesh 
Date:   2017-04-14T00:26:01Z

Removing awaitStart

commit fd99fd65fb437afed04240b3971b3cefc1f52f1d
Author: Navina Ramesh 
Date:   2017-04-14T19:12:46Z

Added JobCoordinator Listener. Trying to remove Samzacontainer controller

commit e77aa502df74cedb87c50a8e039135975504381e
Author: Navina Ramesh 
Date:   2017-04-25T01:36:14Z

Adding ProcessorErrorHandler, SamzaContainerStatus, JobCoordinatorListener, 
SamzaContainerListener

commit 3cbf259c1e9fea7a4d24af93a812e75d9947aac8
Author: Navina Ramesh 
Date:   2017-04-26T23:01:24Z

Documenting state transitions for SamzaContainer

commit 679b2f54aa7a39c8dae688f9b446aa9bad9d267f
Author: Navina Ramesh 
Date:   2017-04-26T23:03:00Z

adding some log lines in LocalContainerRunner

commit 3b65cc983d7d734d6fdf2a81cb155fbad0e774b3
Author: Navina Ramesh 
Date:   2017-04-27T01:36:25Z

Fixed integration test failures by throwing the exception in the listener 
for ThreadJob

commit b1b61f58b2e06a1e7f5fc602fe9007d4c1a003a0
Author: Navina Ramesh 
Date:   2017-04-27T19:42:54Z

Added a few tests in TestSamzaContainer

commit a2db96924ebd479e2110fc611c86c3c310336212
Author: Navina Ramesh 
Date:   2017-04-27T23:38:29Z

Added test for sp.stop()

commit bc74cd5670aacfe5c4eae7968973e68f9f700876
Author: Navina Ramesh 
Date:   2017-04-28T00:23:32Z

Adding setContainerListener explicitly in SamzaContainer

commit 07adf3c6ce39a893a0995498bc012cf6c14c43be
Author: Navina Ramesh 
Date:   2017-04-28T02:38:24Z

Added documentation in JobCoordinator interface

commit 78a73540cc0cd84db286737b190c596dcde93d1f
Author: Navina Ramesh 
Date:   2017-04-28T02:43:00Z

Removed ProcessorErrorHandler

commit 5d1b28c6b566ca691a955d94bb1daf29a96737ef
Author: Navina Ramesh 
Date:   2017-04-28T02:45:47Z

Removing commetned out code

commit 42ffc7d6c1d5e657b35d9482df30f0e201bdbb27
Author: Navina Ramesh 
Date:   2017-04-28T05:00:40Z

Adding docs to JobCoordinatorListener

commit 5ff163cf19c85875a0e2a8d85682487186ffc6c5
Author: Navina Ramesh 
Date:   2017-04-28T05:36:02Z

Added javadocs for SamzaContainerListener

commit f3551656037a058aebd62e9f7dacaafeb49d2f94
Author: Navina Ramesh 
Date:   2017-04-28T06:33:49Z

Cleaning up StreamProcessor code and jobCoordinator docs

commit c624d75afd77dd028a4406d6e07d2ef801098b03
Author: Navina Ramesh 
Date:   2017-04-28T07:09:36Z

Fixing standaloneJobCoordinator

commit c116a3c55149a4cca738a66ec925569385568be9
Author: Navina Ramesh 
Date:   2017-04-28T07:14:04Z

Adding null checks on processorListener

commit 6f0715c4944255409bd78fd178c8e9976e60f485
Author: Navina Ramesh 
Date:   2017-04-28T07:32:37Z

Fixing ZkJobCoordinator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #146: SAMZA-1224 : Revert job coordinator factory config ...

2017-04-27 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/146

SAMZA-1224 : Revert job coordinator factory config to the old format

We didn't release since adding this config. So, it is ok to change the 
format now.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1224

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/146.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #146


commit 29119b09374383be0c6bae8d0f1b6e3cc9454d57
Author: nramesh 
Date:   2017-04-27T21:31:55Z

SAMZA-1224 : Revert job coordinator factory config to the old format




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #139: SAMZA-1220 : Add thread name to SamzaContainer shut...

2017-04-25 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/139

SAMZA-1220 : Add thread name to SamzaContainer shutdown hook and prevent 
shutdown deadlock

* SamzaContainerExceptionHandler is written in Java and used by 
LocalContainerRunner.java

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1220

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/139.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #139


commit 6683ea958903b3fc2b1546592db1a7818bf933d9
Author: nramesh 
Date:   2017-04-24T23:51:06Z

Rewriting SamzaContainerExceptionHandler in java and using it in 
LocalContainerRunner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-21 Thread Navina Ramesh
Hey Yi,
Thanks for lot for your work on this document. I know it must have been
crazy trying to put-together everything in a single doc :)

Here are my comments. Sorry about the delay :(

1. It will be useful to set some background for the benefit of the
community members who haven't been following design docs in the JIRAs. Can
you briefly explain the definition of StreamApplication and how it
translates to jobs through the stack.

2. "Problem" section doesn't seem to describe any problem that
ApplicationRunner is solving :) Imo, ApplicationRunner basically provides a
unified programming pattern for the user to execute StreamApplications
defined using fluent-api or task-level API. I think the problem and
motivation section can use a little bit of re-wording.

3. In the "Overview of ApplicationRunner" section:
* How the components within ApplicationRunner interact isn't very obvious
from the overview image. For example, ExecutionPlanner translates a
"StreamApplication" into an "ExecutionPlan" which is essentially a
specification of the DAG. (Please correct me, if I am wrong here!). The
ExecutionPlan is used by the JobRunner to launch Samza jobs.
* The roles of ExecutionPlanner and JobRunner are fairly well-defined.
StreamManager seems like a util class that helps class-load systems and
create streams. The ExecutionPlan will be consumed by JobRunner and
JobRunner will use StreamManager to create intermediate streams, prior to
launching jobs. It doesn't sound like a StreamManager is a "component" of
the ApplicationRunner.
* What is the role of the RuntimeEnvironment? That has not been explained.
Maybe explaining that will fill the gap in understanding for the readers. I
see that you have tried to explain the flow of control in the code using
the sequence diagram. Perhaps, if we can articulate the
roles/responsibilities of the RuntimeEnvironment, there will not be a need
for the control flow diagram.

4. How is runtime environment defined by the user? Is it configurable ?
Answering these questions in the doc will be useful

5. In the "Interaction between RuntimeEnvironment and ApplicationRunners"
section:
* Samza container is interacting with the RuntimeEnvironment. Does that
make the RuntimeEnvironment as a shared component between the
LocalApplicationRunner and the SamzaContainer? It doesn't seem to be the
case for RemoteApplicationRunner. So, I am confused as to why it is
different.

6. In general, what does "app.class" config represent?  It seems
straightforward when a "StreamApplication" is defined. Is it applicable
when using low-level task api?

7. Interface defintions:
* Perhaps when you implement this, can you specifically callout if each
method is blocking or not in the javadoc ?

8. Minor nit-picking:
* "ApplicationRunners in Different Execution Environments" -> should it be
RuntimeEnvironments as that is the terminology used in the rest of the
document.
* In the "How this works in standalone deployment" section:
* "Deploy the application to standalone hosts" and *Run run-local-app.sh on
each node to start/stop the local application* are probably just a single
step - Deploy the application to standalone hosts using run-local-app.sh??


General question:
It seems like, even with extensive changes to the interfaces/programming
model, we are still class loading the components for most parts. In such a
world, we are not close to integrating with frameworks that already have a
lifecycle model and can provide instantiated objects directly. For example,
in the Samza as a library use-case, it makes sense for the user to provide
a JmxServer or a taskFactory or a custom metricReporter for the
StreamProcessor. One of the motivations for this case was that most
applications are already running within a servlet/jetty container model
with its own lifecycle. If ApplicationRunner(s) is the unified interface,
doesn't that prohibit Samza from being integrated with such frameworks?

Thanks!
Navina

On Thu, Apr 20, 2017 at 10:06 AM, Jacob Maes  wrote:

> Thanks for the SEP!
>
> +1 on introducing these new components
> -1 on the current definition of their roles (see Design feedback below)
>
> *Design*
>
>- If LocalJobRunner and RemoteJobRunner handle the different methods of
>launching a Job, what additional value do the different types of
>ApplicationRunner and RuntimeEnvironment provide? It seems like a red
> flag
>that all 3 would need to change from environment to environment. It
>indicates that they don't have proper modularity. The
> call-sequence-figures
>support this; LocalApplicationRunner and RemoteApplicationRunner make
> the
>same calls and the diagram only varies after jobRunner.start()
>- As far as I can tell, the only difference between Local and Remote
>

Re: [VOTE] Samza Logo

2017-04-18 Thread Navina Ramesh
@Renato:
Thanks for your feedback. Always appreciate help from our contributors :)

@Jagadish:
Thanks for stating your points. It is clear that you are against the
butterfly ones. But are you in support of any of the others? Please vote :)

Thanks!
Navina

On Tue, Apr 18, 2017 at 10:38 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> FWIW, I have a contrarian perspective on this one. Here's my 2 cents:
>
> I'm -1 for having for our logo to do anything with a butterfly.
>
> - Samza and Kafka are separate top-level projects. I do not think the
> connection to Franz Kafka's novel on "metamorphosis", and the fact that a
> salesman named "Samsa" in the novel was transformed to an "insect" should
> dictate our logo.  Agreed, the butterfly is a cute insect remotely
> relatable(?) to stream processing via a convoluted story.
>
> - For a choice of a mascot, I'd much rather have something that signifies
> scale, sturdiness or swiftness instead of a cute butterfly :-)
>
> - The 2 other non-butterfly logos at-least have a "node", "stream",
> "edges", "graph" like feel which I like.
>
> Thanks,
> Jagadish
>
> On Sat, Apr 15, 2017 at 12:24 PM, Jacob Maes  wrote:
>
> > I think I voted the exact opposite to everyone else in this thread.
> >
> > I don't want anything to do with a butterfly. The metaphor is even
> further
> > removed from the Samsa story than a cockroach, so I think we should give
> up
> > on that. I don't want a mascot; we're not building a university football
> > team. And as animals go, the only slower one I can think of is a sloth,
> so
> > I don't feel a butterfly says "scalable stream processing". This,
> combined
> > with my preference to eschew the color red for logos, puts the red
> > butterfly last.
> >
> > The blue butterfly is a little more abstract and formal looking, but
> still
> > a butterfly, so that is second to last.
> >
> > The other 2 are very close, in my opinion.
> >
> > The one with the circles is reminiscent of orbital loops, which gives me
> > the feeling of scale. It also has the dots at varying places along the
> > lines, which to me conveys the different proportions of input/output
> stream
> > sizes/TTLs. And the cyclical shape could also be used for animations
> > portraying the concept of "reprocessing"
> >
> > The one with the "S" dots reminds me of the Kafka logo without the lines.
> > If the lines are the streams and the dots are processing nodes, then I
> > think it's clever for the Samza logo to be a "negative" of the Kafka one.
> > That's not to say samza is any more related to Kafka than it is; but if
> the
> > Kafka logo says "streams" then to me this Samza logo says "processors"
> >
> > My 2 cents.
> >
> > On Fri, Apr 14, 2017 at 11:05 PM, Ignacio Solis  wrote:
> >
> > > You're making me feel bad for linking that one! :-)
> > >
> > > I don't see it as a maze. To me, that one is like circles that turn,
> > > representing the processing. Like cogs on an engine. The little
> > > circles are like the messages. The concentric circles are like the
> > > streams.
> > >
> > > The red butterfly is my second favorite.
> > >
> > > Vote note:
> > >
> > > Once we close voting we'll look at the actual results.  The way the
> > > ranking gets calculated it you don't vote for a design at all, that
> > > vote does not get factored in. It assumes you have no opinion. So if
> > > somebody votes 5 stars on A and 1 start on B.  And second person only
> > > votes 5 stars on B, then the ranking would be  A-5 stars, B-3 stars.
> > > (or something along those lines).   So if you only vote on the 5 star
> > > ones, you're missing your vote on the ones you don't like.
> > >
> > > So, once we close, we'll see how people voted.
> > >
> > > Nacho
> > >
> > >
> > > On Fri, Apr 14, 2017 at 9:21 PM, Yi Pan  wrote:
> > > > Really? The one with the maze on the left currently is top one? I
> can't
> > > > relate to that either. My favorite was the logo w/ Taiji symbol.
> Since
> > > that
> > > > did not make the top 4, I am voting for the red bufferfly one, same
> as
> > > > Navina.
> > > >
> > > > -Yi
> > > >
> > > > On Fri, Apr 14, 2017 at 3:33 PM, Navina Ramesh
>

[GitHub] samza pull request #125: SAMZA-1213 - StreamProcessorLifeCycleAware interfac...

2017-04-17 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/125

SAMZA-1213 - StreamProcessorLifeCycleAware interface should not use 
processorId

Refactoring LocalApplicationRunner s.t. each processor has its own listener 
instance, instead of a single listener keeping track of all processors.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1213

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/125.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #125


commit 0c71815c6906300b237ae4b6d8f0ab3ce810ffcb
Author: Navina Ramesh 
Date:   2017-04-18T01:12:36Z

SAMZA-1213 - StreamProcessorLifeCycleAware interface should not use 
processorId




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Samza Logo

2017-04-14 Thread Navina Ramesh
I prefer to have open discussions in the official mailing list or JIRA
since it is an open-community. It also helps track the discussions.

Fwiw, I am in favor of the red themed butterfly design because:
1. Knowing the origin of the name "Samza" (from Gregor Samsa character in
"Metamorphosis"), it isn't very far-fetched in terms of relating stream
processing to some kind of transformation. Butterfly is probably the
prettiest insect to associate with "metamorphosis", without giving the
impression of a bug :)
2. Red theme ties it with the current logo, although we can improvise on
the gradients.
3. We can have a "mascot" , instead of an abstract symbol.

One comment on the butterfly one - it seems to have only 1 antenna.

-1 for the dots only logo. It feels like a color-blindness test :D
-1 for the blue-based logo - it is just not relatable and it's an extreme
change from the current one.

I couldn't relate to the circular one. What are are trying to portray/imply
here. That we are a bunch of disconnected links?

Thanks!
Navina

On Fri, Apr 14, 2017 at 3:16 PM, Ignacio Solis  wrote:

> Vote directly at design crowd.  But feel free to leave comments here,
> maybe you can try to persuade people or argue for your favorite. :-)
>
> Nacho
>
> On Fri, Apr 14, 2017 at 2:31 PM, Navina Ramesh
>  wrote:
> > Hi Nacho,
> > Do you want us to vote on this mail thread or directly at design crowd?
> >
> > Thanks!
> > Navina
> >
> > On Fri, Apr 14, 2017 at 2:19 PM, Ignacio Solis  wrote:
> >
> >> Hi folks.
> >>
> >> After some feedback and culling, we are down to 4 candidates.  Please
> >> vote on your favorite designs. We will be able to make minor
> >> modifications to the selected logo as we talk to the designer.  We can
> >> always have changes in colors or fonts.
> >>
> >> http://www.designcrowd.com/vote/apachesamzalogo
> >>
> >> This poll will stay open for about a week to collect all votes and
> >> comments.
> >>
> >> For completeness, the relevant JIRA is this:
> >> https://issues.apache.org/jira/browse/SAMZA-
> >>
> >> Cheers,
> >>
> >> Nacho
> >>
> >> --
> >> Nacho - Ignacio Solis - iso...@igso.net
> >>
> >
> >
> >
> > --
> > Navina R.
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


Re: [VOTE] Samza Logo

2017-04-14 Thread Navina Ramesh
Hi Nacho,
Do you want us to vote on this mail thread or directly at design crowd?

Thanks!
Navina

On Fri, Apr 14, 2017 at 2:19 PM, Ignacio Solis  wrote:

> Hi folks.
>
> After some feedback and culling, we are down to 4 candidates.  Please
> vote on your favorite designs. We will be able to make minor
> modifications to the selected logo as we talk to the designer.  We can
> always have changes in colors or fonts.
>
> http://www.designcrowd.com/vote/apachesamzalogo
>
> This poll will stay open for about a week to collect all votes and
> comments.
>
> For completeness, the relevant JIRA is this:
> https://issues.apache.org/jira/browse/SAMZA-
>
> Cheers,
>
> Nacho
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


[GitHub] samza pull request #121: SAMZA-1208 - IllegalFormatConversionException in Lo...

2017-04-12 Thread navina
GitHub user navina reopened a pull request:

https://github.com/apache/samza/pull/121

SAMZA-1208 - IllegalFormatConversionException in LocalContainerRunner



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #121


commit 2ab3fe6386cd7bdd8124ac7e368b8e54d53cb270
Author: Navina Ramesh 
Date:   2017-04-12T22:30:05Z

SAMZA-1208 - IllegalFormatConversionException in LocalContainerRunner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #121: SAMZA-1208 - IllegalFormatConversionException in Lo...

2017-04-12 Thread navina
Github user navina closed the pull request at:

https://github.com/apache/samza/pull/121


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #121: SAMZA-1208 - IllegalFormatConversionException in Lo...

2017-04-12 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/121

SAMZA-1208 - IllegalFormatConversionException in LocalContainerRunner



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1208

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/121.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #121


commit 2ab3fe6386cd7bdd8124ac7e368b8e54d53cb270
Author: Navina Ramesh 
Date:   2017-04-12T22:30:05Z

SAMZA-1208 - IllegalFormatConversionException in LocalContainerRunner




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Samza Logo designs

2017-04-10 Thread Navina Ramesh
Hi Nacho,
I rated on the designcrowd website directly. In terms of feedback:
1. I like the concept of using a butterfly to indicate "metamorphosis".
However, a lot of the designs there look like a bow-tie :)
2. I think we should stick with a red theme for the name.

Sorry about the late response.

Thanks!
Navina



On Sun, Apr 9, 2017 at 1:11 PM, Ignacio Solis  wrote:

> I'll leave this open one more day for feedback. Then wait a couple of
> days for redesigns, then do a final round.
>
> On Fri, Apr 7, 2017 at 1:11 PM, Ignacio Solis  wrote:
> > Hi folks.
> >
> > I started a Designcrowd campaign for a logo.  I got some initial
> > designs.  I would like some feedback - voting.
> >
> > This is NOT a final design, just a way to get some more feedback for
> > the designers.  We do want to get a design selected before the next
> > release.
> >
> > Please provide feedback to individual designs if you want (along with
> > voting), and provide general feedback here on this mailing list.
> >
> > Things that would be helpful include design ideas, concepts, themes, etc.
> >
> > http://www.designcrowd.com/vote/samza-logo-poll-phase1
> >
> > For completeness, the relevant JIRA is this:
> > https://issues.apache.org/jira/browse/SAMZA-
> >
> > Cheers,
> >
> > Nacho
> >
> > --
> > Nacho - Ignacio Solis - iso...@igso.net
>
>
>
> --
> Nacho - Ignacio Solis - iso...@igso.net
>



-- 
Navina R.


[GitHub] samza pull request #112: Samza-1187 : TestZkProcessorLatch tests share state...

2017-04-05 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/112

Samza-1187 : TestZkProcessorLatch tests share state causing transient 
failures in CI builds

* Removed testSingleCountdown - didn't understand what was being tested
* The timeout behavior for ZkProcessorLatch has not been documented. Hence, 
I have fixed testLatchExpires as best I could understand. 
* Removed redundant/unused member variables from ZkProcessorLatch

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1187

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/112.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #112


commit 2d2b167b63d91793988f63ac6be44eebf9294c20
Author: navina 
Date:   2017-04-04T21:58:15Z

Fixing testLatchSizeOne

commit 2675d7e1be0eb21849382e456ae2099b53dd02e5
Author: navina 
Date:   2017-04-04T22:18:13Z

Fixed testLatchSizeOneWithTwoParticipants and remove ZkConfig from 
ZkProcessorLatch constructor

commit 923cbb3aec085f39ffe532b49c2ef02f428b6b09
Author: navina 
Date:   2017-04-04T23:14:50Z

Fixed testLatchSizeN

commit efdc14a55f99deea7061ce97adac919a82c7f3e8
Author: navina 
Date:   2017-04-04T23:19:30Z

Changing certain member variables to local variables

commit fc7b23378d32d59e2890522d141505fec66d4179
Author: navina 
Date:   2017-04-04T23:24:25Z

Renaming processorId member variable to participantId

commit 76e6bae0d506f793ea38b5d159a1d48596e98890
Author: navina 
Date:   2017-04-04T23:46:33Z

Fixing testLatchExpires

commit ed62f9d6e764c9f3655ef4b9c147496ab4fca96c
Author: navina 
Date:   2017-04-04T23:49:15Z

Changing sop to log statements

commit 0631d9bc273146fec9c6fb3df897e27a50a51a17
Author: navina 
Date:   2017-04-05T03:09:48Z

Fixing checkstyle

commit 7d39906a963168a67059f822192e2dd15ccfad45
Author: navina 
Date:   2017-04-05T03:19:02Z

Refactoring runnable code

commit 82afa27a62ce5742a1796dd1fb23977b014fefb6
Author: navina 
Date:   2017-04-05T06:56:29Z

adjusting spacing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[RESULT] [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-04-03 Thread Navina Ramesh (Apache)
Hi everyone,

The vote on SEP-1 passes with 7 +1 votes (3 binding) and no -1.

Votes are as follows:
+1 (binding) - Navina Ramesh, Yi Pan, Yan Fang
+1 (non-binding) - Boris Shkolnik, Xinyu Liu, Renato Marroquin Mogrovejo,
Ignacio Solis

The following are the discuss and vote mail threads:
DISCUSS mail thread -
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_AdCicQ8rBO%3DXuYQ%40mail.gmail.com%3E

VOTE mail thread -
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzutAX23PYv3%2BN%2BGkXbDTrF0kvRG5aHRDifX5rJ%3Din0VtzA%40mail.gmail.com%3E

Thanks to everyone who participated.

Cheers!
Navina


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-04-03 Thread Navina Ramesh (Apache)
+1 (binding) from me :)

Navina

On Sun, Apr 2, 2017 at 9:31 PM, Ignacio Solis  wrote:

> +1 (non binding)
>
> May this be the first of many SEPs...  I mean just as many as needed. :-)
>
> Nacho
>
> On Sat, Apr 1, 2017 at 1:03 PM, Kartik Paramasivam
>  wrote:
> > +1 (non binding)
> >
> > Great to see the SEP process being followed.
> >
> > cheers
> > Kartik
> >
> > On Thu, Mar 30, 2017 at 1:48 PM, Renato Marroquín Mogrovejo <
> > renatoj.marroq...@gmail.com> wrote:
> >
> >> Thanks for the answers Navina!
> >>
> >> +1 (non-binding)
> >>
> >> 2017-03-30 22:32 GMT+02:00 Navina Ramesh 
> :
> >>
> >> > Hi Renato,
> >> >
> >> > > Having the big proposals documented on SEPs is really great to have
> a
> >> > good understanding on the system!
> >> > I agree. Our previous design process was not being strictly enforced.
> We
> >> > hope to enforce it going forward as there are major changes coming
> into
> >> the
> >> > next release.
> >> >
> >> > > So this means that inside a container there will be a single
> processor?
> >> > StreamProcessor is nothing more than a Samza container, along with an
> >> > instance of JobCoordinator in it. Think about it as a thin-wrapper
> around
> >> > SamzaContainer and JobCoordinator instance. You can find more details
> on
> >> > this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
> >> > Going forward, we want a Samza job to consist of one or more
> >> > StreamProcessors, instead of N SamzaContainers and 1 AppMaster.
> >> >
> >> > >  is this related to SAMZA-1080 somehow?
> >> > Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
> >> > JobCoordinator. In fact, at LinkedIn, one of the teams is already
> using
> >> > this API with the StandaloneJobCoordinator and delegating partition
> >> > distribution to kafka high-level consumer (since systemconsumer is
> >> > pluggable in Samza, we have some internal wrappers around high-level
> >> > consumer). It has been working really well for stateless
> applications, I
> >> > believe.
> >> >
> >> > Cheers!
> >> > Navina
> >> >
> >> > On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
> >> > renatoj.marroq...@gmail.com> wrote:
> >> >
> >> > > Hi Navina,
> >> > >
> >> > > Thanks for the great proposal! Having the big proposals documented
> on
> >> > SEPs
> >> > > is really great to have a good understanding on the system!
> >> > > I have only a clarification question, the proposal states that every
> >> > > containerId is the same as the processorId. So this means that
> inside a
> >> > > container there will be a single processor? is this related to
> >> SAMZA-1080
> >> > > somehow?
> >> > >
> >> > >
> >> > > Best,
> >> > >
> >> > > Renato M.
> >> > >
> >> > > 2017-03-30 20:45 GMT+02:00 Navina Ramesh
>  >> >:
> >> > >
> >> > > > Hi Yi,
> >> > > > Good question. Three reasons:
> >> > > >
> >> > > > 1. In SAMZA-881, we came up with a set of responsibilities for the
> >> > > > JobCoordinator. One of them was to generate/assign processorId.
> So,
> >> it
> >> > > > makes sense to keep getProcessorId() within JobCoordinator
> interface.
> >> > > > 2. StreamProcessor was initially introduced as a user-facing API
> >> > > > SAMZA-1080. ProcessorId was an argument in StreamProcessor
> >> constructor.
> >> > > It
> >> > > > was pushing the burden of guaranteeing unique among the processors
> >> of a
> >> > > job
> >> > > > to the user. This was not favorable.
> >> > > > 3. In general, I think we have consensus that the
> >> processorIdGenerator
> >> > is
> >> > > > going to specific to a runtime environment. Hence, it seems more
> >> > > > appropriate to move it to a lower abstraction layer that deals
> with
> >> the
> >> > > > underlying execution environment.
> >> > > >
> >> > > > Let me know if 

[GitHub] samza pull request #107: SAMZA-1182 - Commenting out some of the flaky tests

2017-03-31 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/107

SAMZA-1182 - Commenting out some of the flaky tests



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1182

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/107.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #107


commit de287bde5c373dad27c897bacebafbd986d5642b
Author: navina 
Date:   2017-03-31T19:17:01Z

Commenting out some of the flaky tests




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Renato,

> Having the big proposals documented on SEPs is really great to have a
good understanding on the system!
I agree. Our previous design process was not being strictly enforced. We
hope to enforce it going forward as there are major changes coming into the
next release.

> So this means that inside a container there will be a single processor?
StreamProcessor is nothing more than a Samza container, along with an
instance of JobCoordinator in it. Think about it as a thin-wrapper around
SamzaContainer and JobCoordinator instance. You can find more details on
this idea here - https://issues.apache.org/jira/browse/SAMZA-1063
Going forward, we want a Samza job to consist of one or more
StreamProcessors, instead of N SamzaContainers and 1 AppMaster.

>  is this related to SAMZA-1080 somehow?
Yep. SAMZA-1080 introduces StreamProcessor with an almost pass-through
JobCoordinator. In fact, at LinkedIn, one of the teams is already using
this API with the StandaloneJobCoordinator and delegating partition
distribution to kafka high-level consumer (since systemconsumer is
pluggable in Samza, we have some internal wrappers around high-level
consumer). It has been working really well for stateless applications, I
believe.

Cheers!
Navina

On Thu, Mar 30, 2017 at 1:23 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> Hi Navina,
>
> Thanks for the great proposal! Having the big proposals documented on SEPs
> is really great to have a good understanding on the system!
> I have only a clarification question, the proposal states that every
> containerId is the same as the processorId. So this means that inside a
> container there will be a single processor? is this related to SAMZA-1080
> somehow?
>
>
> Best,
>
> Renato M.
>
> 2017-03-30 20:45 GMT+02:00 Navina Ramesh :
>
> > Hi Yi,
> > Good question. Three reasons:
> >
> > 1. In SAMZA-881, we came up with a set of responsibilities for the
> > JobCoordinator. One of them was to generate/assign processorId. So, it
> > makes sense to keep getProcessorId() within JobCoordinator interface.
> > 2. StreamProcessor was initially introduced as a user-facing API
> > SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor.
> It
> > was pushing the burden of guaranteeing unique among the processors of a
> job
> > to the user. This was not favorable.
> > 3. In general, I think we have consensus that the processorIdGenerator is
> > going to specific to a runtime environment. Hence, it seems more
> > appropriate to move it to a lower abstraction layer that deals with the
> > underlying execution environment.
> >
> > Let me know if you have a different perspective on this.
> >
> > Cheers!
> > Navina
> >
> > On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:
> >
> > > @Navina,
> > >
> > > Sorry to chime in late. One question:
> > > 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> > > Because JobCoordinator provides coordination service across many
> > > processors, an interface getProcessorId() in JobCoordinator is
> confusing
> > > regarding to which processorId we are getting.
> > >
> > > Otherwise, the proposal looks good.
> > >
> > > -Yi
> > >
> > > On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
> > >  > > > wrote:
> > >
> > > > Good to hear from you, Yan. Thanks! :)
> > > >
> > > > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang 
> > wrote:
> > > >
> > > > > +1 . Thanks for the proposal, Navina. :)
> > > > >
> > > > > Fang, Yan
> > > > > yanfang...@gmail.com
> > > > >
> > > > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > > > pmaheshw...@linkedin.com.invalid> wrote:
> > > > >
> > > > > > +1 (non binding) from me.
> > > > > >
> > > > > > - Prateek
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S 
> wrote:
> > > > > >
> > > > > > > +1 Looks good to me.
> > > > > > >
> > > > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu <
> > xinyuliu...@gmail.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 on my side. Very happy to see this proposal. This is a
> > blocker
> > > > for
> > > > > > > > integrating fluent API with StreamProcessor, and hopefully we
> > can
> > &

Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-30 Thread Navina Ramesh
Hi everyone,
Apologize for re-chiming in late on this issue.

> I'm not sure I agree with the policy (removing migration code and wanting
people to upgrade seem at odds to me), but minimally I think we should not
assume people are upgrading to each new Samza version.

I agree that we should not assume that people will upgrade by stepping
through each version of Samza. However, I don't agree that migration code
should not be removed at all. Thinking in terms of a project management and
maintenance, I think it is a common practice (at least in companies, if not
in open-source and I could be wrong too :D ) to keep migration code only
for the version it applies. It does add significant overhead to maintain
version upgrade/migration code across all future versions.

In this case, this was the first time we tried "automatic upgrade" from one
version to the other (0.9 -> 0.10). We could have done a better job at
documenting the upgrade steps with each version. I wish we had more
outspoken voices in the community sooner than later :)

Every project takes times to iron out issues related to release and version
upgrade. I am glad that we have so much feedback now. As Yi suggested, the
SEP process is a starting step towards documenting our changes across
versions. Additionally, we will work on adding a dedicated page for
upgrades and these will be available for all of the *upcoming* versions.

Please let us know if you have any other concerns or ideas on how we can
improve on our process.

@XiaoChuan: Unfortunately, we don't have proper documentation on upgrading
Samza across various versions. Like I mentioned before, we will put in
extra efforts going forward. There aren't any migration/upgrade steps
needed for versions post 0.10.*. You should be able to simply upgrade
without any issues. Upgrade from 0.9 to 0.10 is an exceptional case. Happy
to help you out in case you encounter more issues.

Cheers!
Navina

On Thu, Mar 30, 2017 at 11:04 AM, XiaoChuan Yu  wrote:

> Is there some sort of document on how to upgrade Samza through various
> versions like the page here for Kafka:
> https://kafka.apache.org/documentation/#upgrade ?
> Having something like this would be ideal.
> On Thu, Mar 30, 2017 at 1:51 PM Thomas Becker  wrote:
>
> > Thanks for the reply Yi, and I apologize if I came off a bit snarky.
> > I'm not sure I agree with the policy (removing migration code and
> > wanting people to upgrade seem at odds to me), but minimally I think we
> > should not assume people are upgrading to each new Samza version. We
> > have done so when features or fixes warrant, and even then on a per-job
> > basis, and I would expect this is a common practice.
> >
> > -Tommy
> >
> > On Thu, 2017-03-30 at 09:50 -0700, Yi Pan wrote:
> > > Hi, Thomas,
> > >
> > > Sorry to hear that you were hit by the removal of migration in Samza
> > > 0.11.
> > > The reason we removed it is following a deprecate-removal policy in
> > > two
> > > versions. We are not aware that people still using 0.9 after we
> > > released
> > > 0.11 and were not expecting a direct upgrade from 0.9 to 0.12.
> > > Document can
> > > be better to capture that. We are making changes to the design
> > > proposal
> > > s.t. it is more transparent and open to the whole community, through
> > > the
> > > newly proposed SEP process. These kind of breaking changes will go
> > > through
> > > the SEP discuss-vote process in the future and hopefully capture all
> > > these
> > > kind of concerns earlier.
> > >
> > > Best!
> > >
> > > -Yi
> > >
> > > On Thu, Mar 30, 2017 at 7:45 AM, Thomas Becker 
> > > wrote:
> > >
> > > >
> > > > Yes, we were burned by this. The changelog mapping will be
> > > > regenerated
> > > > instead of migrated and the result will completely hose the job
> > > > (because the mapping was not generated deterministically in
> > > > previous
> > > > versions of Samza). I don't understand why the migration code was
> > > > removed but it was, and to the best of my knowledge the necessity
> > > > to
> > > > not skip version 0.10.0 when upgrading was not documented, let
> > > > alone
> > > > enforced.
> > > >
> > > > On Mon, 2017-03-27 at 10:07 -0700, Jagadish Venkatraman wrote:
> > > > >
> > > > > Good observation Jake!
> > > > >
> > > > > The code for migration was removed in Samza 11. The migration
> > > > > would
> > > > > read
> > &g

Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-30 Thread Navina Ramesh
Hi Yi,
Good question. Three reasons:

1. In SAMZA-881, we came up with a set of responsibilities for the
JobCoordinator. One of them was to generate/assign processorId. So, it
makes sense to keep getProcessorId() within JobCoordinator interface.
2. StreamProcessor was initially introduced as a user-facing API
SAMZA-1080. ProcessorId was an argument in StreamProcessor constructor. It
was pushing the burden of guaranteeing unique among the processors of a job
to the user. This was not favorable.
3. In general, I think we have consensus that the processorIdGenerator is
going to specific to a runtime environment. Hence, it seems more
appropriate to move it to a lower abstraction layer that deals with the
underlying execution environment.

Let me know if you have a different perspective on this.

Cheers!
Navina

On Thu, Mar 30, 2017 at 9:42 AM, Yi Pan  wrote:

> @Navina,
>
> Sorry to chime in late. One question:
> 1. Why is it in JobCoordinator, and why not in StreamProcessor class?
> Because JobCoordinator provides coordination service across many
> processors, an interface getProcessorId() in JobCoordinator is confusing
> regarding to which processorId we are getting.
>
> Otherwise, the proposal looks good.
>
> -Yi
>
> On Wed, Mar 29, 2017 at 7:57 PM, Navina Ramesh
>  > wrote:
>
> > Good to hear from you, Yan. Thanks! :)
> >
> > On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang  wrote:
> >
> > > +1 . Thanks for the proposal, Navina. :)
> > >
> > > Fang, Yan
> > > yanfang...@gmail.com
> > >
> > > On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> > > pmaheshw...@linkedin.com.invalid> wrote:
> > >
> > > > +1 (non binding) from me.
> > > >
> > > > - Prateek
> > > >
> > > > On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:
> > > >
> > > > > +1 Looks good to me.
> > > > >
> > > > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu 
> > > > wrote:
> > > > >
> > > > > > +1 on my side. Very happy to see this proposal. This is a blocker
> > for
> > > > > > integrating fluent API with StreamProcessor, and hopefully we can
> > get
> > > > it
> > > > > > resolved soon :).
> > > > > >
> > > > > > Thanks,
> > > > > > Xinyu
> > > > > >
> > > > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > > > nav...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> > > Samza.
> > > > > > > For reference, here is the wiki link:
> > > > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > > > >
> > > > > > > Link to discussion mail thread:
> > > > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > > > gmail.com%3E
> > > > > > >
> > > > > > > Please vote on this SEP asap. :)
> > > > > > >
> > > > > > > Thanks!
> > > > > > > Navina
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>



-- 
Navina R.


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-29 Thread Navina Ramesh
Good to hear from you, Yan. Thanks! :)

On Wed, Mar 29, 2017 at 7:48 PM, Yan Fang  wrote:

> +1 . Thanks for the proposal, Navina. :)
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Mar 30, 2017 at 4:24 AM, Prateek Maheshwari <
> pmaheshw...@linkedin.com.invalid> wrote:
>
> > +1 (non binding) from me.
> >
> > - Prateek
> >
> > On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:
> >
> > > +1 Looks good to me.
> > >
> > > On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu 
> > wrote:
> > >
> > > > +1 on my side. Very happy to see this proposal. This is a blocker for
> > > > integrating fluent API with StreamProcessor, and hopefully we can get
> > it
> > > > resolved soon :).
> > > >
> > > > Thanks,
> > > > Xinyu
> > > >
> > > > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > > > nav...@apache.org>
> > > > wrote:
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > This is a voting thread for SEP-1: Semantics of ProcessorId in
> Samza.
> > > > > For reference, here is the wiki link:
> > > > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > > > >
> > > > > Link to discussion mail thread:
> > > > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > > > AdCicQ8rBO%3DXuYQ%40mail.
> > > > > gmail.com%3E
> > > > >
> > > > > Please vote on this SEP asap. :)
> > > > >
> > > > > Thanks!
> > > > > Navina
> > > > >
> > > >
> > >
> >
>



-- 
Navina R.


[GitHub] samza pull request #103: SAMZA-1126 - Semantics of processorId in Samza

2017-03-29 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/103

SAMZA-1126 - Semantics of processorId in Samza

Implementation based on 
[SEP-1](https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1126

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #103


commit b0b31979ab5da4d83878152058f446c42dbce82f
Author: navina 
Date:   2017-03-29T22:09:05Z

Removing CoordinationService from JobCoordinatorFactory interface

commit d7e649368b9b05ad307368581fc28d396289bfb7
Author: navina 
Date:   2017-03-23T21:22:58Z

Added ProcessorIdGenerator. StreamProcessor tests fail due to containerId 
type mismatch in Containermodel

commit edfac1163cdf7191d83feaf3c11bc27dfd41d3c2
Author: navina 
Date:   2017-03-24T18:41:11Z

First pass at changing to string; ContainerModel has both processorId and 
containerId

commit 94e000127d79a7106bcd3e9c7d9363f0596a3c16
Author: navina 
Date:   2017-03-24T20:49:49Z

Added jobmodel deserialization for compatibility between old and new models

commit 9290483466f4504306201d78522de23a03753855
Author: navina 
Date:   2017-03-24T21:21:49Z

Adding custom deserializer for ContainerModel

commit fea84abe7707f93af0a133d7d30be7dfa4311911
Author: navina 
Date:   2017-03-24T23:50:12Z

updating code in TestSamzaContainer.scala

commit f8b85a52731039996cf9357c7c85adc36e98ae5a
Author: navina 
Date:   2017-03-25T01:57:30Z

Mostly builds. TaskProxy stuff needs to be fixed

commit 17d905ff920c70e4c95d5fd3b5c23ff94c549f25
Author: navina 
Date:   2017-03-27T21:26:06Z

Removed static method from ProcessorIdGenerator and made it a ClassLoader 
util

commit da4566201b834da8aa1dcead594ace3a98332132
Author: navina 
Date:   2017-03-28T18:15:46Z

Changing containerId to string in YarnClusterResourceManager state

commit 987f2ffdb1c581eadc7907fd1050a36d9dacd7cc
Author: navina 
Date:   2017-03-28T18:29:27Z

Fixing some documents

commit 96a104394add60bea554ce6766a4ec2459c8745a
Author: navina 
Date:   2017-03-29T21:23:11Z

Fixed breaking changes after rebase

commit 8d32878d5d29b39c4a5363f9407ef1473efa3fd4
Author: navina 
Date:   2017-03-30T01:06:29Z

Commenting a flaky test and rebasing




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #102: SAMZA-1175 - Removing CoordinationService from JobC...

2017-03-29 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/102

SAMZA-1175 - Removing CoordinationService from JobCoordinatorFactory 
interface

@sborya I have made the changes. Please review. Thanks!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1175

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/102.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #102


commit b0b31979ab5da4d83878152058f446c42dbce82f
Author: navina 
Date:   2017-03-29T22:09:05Z

Removing CoordinationService from JobCoordinatorFactory interface




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-28 Thread Navina Ramesh
Thanks, Xinyu. I have already implemented a draft. Waiting for the voting
to close soon.

Navina

On Tue, Mar 28, 2017 at 2:17 PM, Boris S  wrote:

> +1 Looks good to me.
>
> On Tue, Mar 28, 2017 at 2:00 PM, xinyu liu  wrote:
>
> > +1 on my side. Very happy to see this proposal. This is a blocker for
> > integrating fluent API with StreamProcessor, and hopefully we can get it
> > resolved soon :).
> >
> > Thanks,
> > Xinyu
> >
> > On Tue, Mar 28, 2017 at 11:28 AM, Navina Ramesh (Apache) <
> > nav...@apache.org>
> > wrote:
> >
> > > Hi everyone,
> > >
> > > This is a voting thread for SEP-1: Semantics of ProcessorId in Samza.
> > > For reference, here is the wiki link:
> > > https://cwiki.apache.org/confluence/display/SAMZA/SEP-
> > > 1%3A+Semantics+of+ProcessorId+in+Samza
> > >
> > > Link to discussion mail thread:
> > > http://mail-archives.apache.org/mod_mbox/samza-dev/201703.
> > > mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_
> > AdCicQ8rBO%3DXuYQ%40mail.
> > > gmail.com%3E
> > >
> > > Please vote on this SEP asap. :)
> > >
> > > Thanks!
> > > Navina
> > >
> >
>



-- 
Navina R.


[VOTE] SEP-1: Semantics of ProcessorId in Samza

2017-03-28 Thread Navina Ramesh (Apache)
Hi everyone,

This is a voting thread for SEP-1: Semantics of ProcessorId in Samza.
For reference, here is the wiki link:
https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza

Link to discussion mail thread:
http://mail-archives.apache.org/mod_mbox/samza-dev/201703.mbox/%3CCANazzuuHiO%3DvZQyFbTiYU-0Sfh3riK%3Dz4j_AdCicQ8rBO%3DXuYQ%40mail.gmail.com%3E

Please vote on this SEP asap. :)

Thanks!
Navina


Re: Steps to Upgrading Samza (0.9 to 0.12)

2017-03-27 Thread Navina Ramesh (Apache)
@Jake: Yes. We removed the migration code (for 0.9 to 0.10) in the 0.11
release, I believe.

@XiaoChuan: As per Jagadish's recommendation, if you have changelog backed
stores, you should upgrade from 0.9.1 to 0.10.0 before upgrading to samza
0.12.0.

I checked with LinkedIn's internal release notes. The most significant
change listed is adding a new configuration *job.coordinator.system*. This
system can be the same as your currently configured checkpoint system
(task.checkpoint.system). I am assuming you are using
KafkaCheckpointManagerFactory. If you are using other custom checkpoint
managers, the migration may be more involved. Please let us know and we can
try to help you out.

Feel free to email us if you have more questions.

Cheers!
Navina

On Mon, Mar 27, 2017 at 10:07 AM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Good observation Jake!
>
> The code for migration was removed in Samza 11. The migration would read
> change-log offsets from the checkpoint topic and write them to the
> coordinator stream.
>
> If you're using change-logged stores, I'd recommend upgrading from 0.9.1 to
> 0.10.0 first.
> Otherwise, you will loose offsets for change-logged stores.
>
> I suspect you should be okay for 0.10.0 to 0.12 upgrade.
>
> On Mon, Mar 27, 2017 at 9:30 AM, Jacob Maes  wrote:
>
> > As I recall, samza 0.10 introduced the coordinator stream and there was
> > code to do an automatic migration to use that feature. @navina, @yi, do
> you
> > know if that migration code is still in samza 12?
> >
> > If not, then it's probably better to update from 0.9.1 to 0.10.0 and then
> > to 0.12.0. I don't think there were any changes requiring migration
> between
> > 0.10.and 0.12, so upgrading directly from 0.10 to 0.12 is probably less
> of
> > an issue.
> >
> > On Fri, Mar 24, 2017 at 11:05 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Hi Xiaochuan,
> > >
> > > >> Do I need to upgrade Kafka and/or YARN?
> > >
> > > *Yarn version:*
> > >
> > >- Samza 0.12 supports Yarn 2.6.1 and 2.7.1.
> > >- If you already have 2.6.0 installed (as you have said), I believe
> > you
> > >will be fine. (but I'm not sure)
> > >
> > > *Kafka version: *
> > >
> > >- Samza 0.12 upgraded the version of Kafka to 0.10.
> > >- If your Kafka brokers are on an older version of Kafka, you should
> > >upgrade them to use at-least 0.10. Kafka clients are usually
> > >incompatible with older versions of brokers.
> > >
> > > *Java version: *
> > >
> > >
> > >
> > >- Samza 0.12 binaries are compiled using Java 8.  Hence, they cannot
> > be
> > >run on older versions of the Java run-time.
> > >
> > >
> > > >> I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > information would be relevant in this case so please ask away.
> > >
> > > I'd first start by upgrading the Kafka brokers (assuming you're on Java
> > 8+
> > > already).
> > > Let us know how the migration goes!
> > >
> > > Thanks,
> > > Jagadish
> > >
> > >
> > > On Fri, Mar 24, 2017 at 8:23 PM, XiaoChuan Yu 
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > What are the general steps for upgrading Samza from 0.9 to 0.12?
> > > > Do I need to upgrade Kafka and/or YARN?
> > > >
> > > > I don't know how Samza was setup initially but we currently have the
> > > > following setup:
> > > >
> > > > Samza version: 0.9.1
> > > > YARN version: Hadoop 2.6.0-cdh5.4.8
> > > > Kafka version: 0.9.0.1
> > > >
> > > > I think installation of Kafka and YARN were managed through Puppet.
> > > > I'm extremely new to Samza in terms of operations aspect. I'm not
> sure
> > > what
> > > > information would be relevant in this case so please ask away.
> > > >
> > > > Thanks,
> > > > Xiaochuan Yu
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-24 Thread Navina R
>  Seems like they're (usually) going to be used by the framework, are
pretty simple to write, and could probably be written as a common Util
method if we find them repetitive.
Hmm.. To be honest, I didn't see the value of it too.  When Xinyu suggested
this change, I assumed it was the newly accepted pattern going forward.
Perhaps, it makes sense when the API is user-facing. In this case, a util
class is simpler to use. Let's get a consensus on this pattern and then, I
will change it.

@Xinyu: Is there any specific advantage of using the static interface
method pattern for class loading and instance creation?

Cheers!
Navina




On Fri, Mar 24, 2017 at 11:06 AM, Prateek Maheshwari <
pmaheshw...@linkedin.com> wrote:

> Hi Navina,
>
> 1. Assuming the environment can put the processor ID in the config,
> ProcessorIdGenerator#generateProcessorId(Config config) makes sense.
> Passing all of Config is rather broad, but I don't think we have an
> environment specific subset class for config yet, so should be OK.
>
> 2. I don't yet see the value of putting the class-loading helper default
> methods in multiple public interfaces. Seems like they're (usually) going
> to be used by the framework, are pretty simple to write, and could probably
> be written as a common Util method if we find them repetitive. Maybe skip
> this method for now and add this once we have some clarity on this new
> pattern?
>
> If we keep it, let's name it "ProcessorIdGenerator#fromConfig(Config
> config)" to be consistent with "ApplicationRunner#fromConfig(Config
> config)"?
>
> 3. I think we're in agreement that the Processor doesn't need to have
> access to the ProcessorIdGenerator, only the JobCoordinator does. With
> JobCoordinator only exposing #getProcessorId I think we're good.
>
> +1 from me with these minor changes. Thanks for the proposal!
>
> Best,
> Prateek
>
> On Thu, Mar 23, 2017 at 11:35 PM, Navina R  wrote:
>
>> Hi Prateek,
>> > 1. Do you have any examples of custom processor IDs? Wondering what
>> information/classes ProcessorIdGenerator would need to be able to generate
>> one.
>> Yeah. When I was trying to implement the proposal, I was wondering the
>> same thing as well. However, it might end up being specific to the
>> environment. In case of Yarn, I would expect the AppMaster to still specify
>> the processorID as an environment variable. In case of Rain (which is a
>> Linkedin specific deployment framework),it will probably be a combination
>> of sliceId and instanceId. Given these variations, I am not sure what can
>> be generic enough to encompass all these information. Maybe we can pass the
>> config to the instance factory. But, let me know if you have other ideas.
>>
>>
>>  > 2. The default "static" getProcessorIdGeneratorFromConfig should be
>> on the ProcessorIdGenerator interface, not the JobCoordinator. Also, prefer
>> removing the fromConfig suffix from the method name and calling it create
>> instead of get? Not sure what the convention here is.
>> You are right. It should be in ProcessorIdGenerator. I can remove the
>> fromConfig suffix. Should I just call it createInstance, similar to the
>> Java apis ? I am not sure what the convention is either.
>>
>> > I think the JobCoordinator should still only have a #getProcessorId
>> method instead of #getProcessorIdGenerator.
>> JobCoordinator still has getProcessorId. If I move the
>> getProcessorIdGenerator helper to ProcessorIdGenerator, it makes sense?
>>
>> > Theoretically, a processor/container doesn't need to generate multiple
>> IDs, it only needs to know its own
>> I believe the reasoning originated from SAMZA-881, where we identified
>> the requirements for running Samza in distributed mode and the
>> responsibilities of each component in a homogenous stream processing mode.
>> Theoretically speaking, it is the job coordinator that will assign
>> identifiers to its processor. Practically, since it is bound to the runtime
>> environment, it seems appropriate for the job coordinator to generate the
>> id. If you haven't read SAMZA-881, you should give it a read. If we go by
>> the assumption that a leader processor (in simpler terms, a central
>> authority) generates the JobModel, it needs to "know" the identifiers of
>> all processors.
>> An alternative model is be the one where the leader spawns the processors
>> and also, "assigns" identifiers for all processors (for example, in Yarn
>> today). The latter model is restrictive in that:
>> 1. it expects the leader to know the number of

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-23 Thread Navina R
Hi Prateek,
> 1. Do you have any examples of custom processor IDs? Wondering what
information/classes ProcessorIdGenerator would need to be able to generate
one.
Yeah. When I was trying to implement the proposal, I was wondering the same
thing as well. However, it might end up being specific to the environment.
In case of Yarn, I would expect the AppMaster to still specify the
processorID as an environment variable. In case of Rain (which is a
Linkedin specific deployment framework),it will probably be a combination
of sliceId and instanceId. Given these variations, I am not sure what can
be generic enough to encompass all these information. Maybe we can pass the
config to the instance factory. But, let me know if you have other ideas.


 > 2. The default "static" getProcessorIdGeneratorFromConfig should be on
the ProcessorIdGenerator interface, not the JobCoordinator. Also, prefer
removing the fromConfig suffix from the method name and calling it create
instead of get? Not sure what the convention here is.
You are right. It should be in ProcessorIdGenerator. I can remove the
fromConfig suffix. Should I just call it createInstance, similar to the
Java apis ? I am not sure what the convention is either.

> I think the JobCoordinator should still only have a #getProcessorId
method instead of #getProcessorIdGenerator.
JobCoordinator still has getProcessorId. If I move the
getProcessorIdGenerator helper to ProcessorIdGenerator, it makes sense?

> Theoretically, a processor/container doesn't need to generate multiple
IDs, it only needs to know its own
I believe the reasoning originated from SAMZA-881, where we identified the
requirements for running Samza in distributed mode and the responsibilities
of each component in a homogenous stream processing mode. Theoretically
speaking, it is the job coordinator that will assign identifiers to its
processor. Practically, since it is bound to the runtime environment, it
seems appropriate for the job coordinator to generate the id. If you
haven't read SAMZA-881, you should give it a read. If we go by the
assumption that a leader processor (in simpler terms, a central authority)
generates the JobModel, it needs to "know" the identifiers of all
processors.
An alternative model is be the one where the leader spawns the processors
and also, "assigns" identifiers for all processors (for example, in Yarn
today). The latter model is restrictive in that:
1. it expects the leader to know the number of processors required in the
job
2. leader processor is different from other processors, thus, making a
Samza job a more heterogenous set of processors

Hope these points make sense. Jagadish can add more, and even correct me if
I got anything wrong here :)

> 1. The SEP uses both ProcessorIdentifier and ProcessorIdGenerator as
synonyms. Let's update to use ProcessorIdGenerator consistently.
Will do

> 2. Minor: 'processor.id' configuration: I'm assuming this still needs to
be unique for each processor in the job? If so, probably worth calling out
in the SEP or configuration docs. We can also document it as deprecated and
a candidate for removal in near future (maybe 0.14?).
Yes. That is still a requirement. I think I updated the document regarding
deprecating and removing it. I will re-check.

Thanks for your comments.

Cheers!
Navina

On Thu, Mar 23, 2017 at 12:04 PM, Prateek Maheshwari <
pmaheshw...@linkedin.com> wrote:

> Hi Navina,
>
> Thanks for SEP-1, looks pretty good to me. A few questions/comments:
>
> Implementation/Interface related:
> 1. Do you have any examples of custom processor IDs? Wondering what
> information/classes ProcessorIdGenerator would need to be able to generate
> one.
> 2. The default "static" getProcessorIdGeneratorFromConfig should be on
> the ProcessorIdGenerator interface, not the JobCoordinator. Also, prefer
> removing the fromConfig suffix from the method name and calling it create
> instead of get? Not sure what the convention here is. @Xinyu, any
> preferences?
> 3. +1 for removing the constructor parameter, but I think the
> JobCoordinator should still only have a #getProcessorId method instead of
> #getProcessorIdGenerator. Theoretically, a processor/container doesn't need
> to generate multiple IDs, it only needs to know its own. @Jagadish, prefer
> the more restrictive API unless you have a use case for the more general
> one in mind.
>
> Documentation/SEP related:
> 1. The SEP uses both ProcessorIdentifier and ProcessorIdGenerator as
> synonyms. Let's update to use ProcessorIdGenerator consistently.
> 2. Minor: 'processor.id' configuration: I'm assuming this still needs to
> be unique for each processor in the job? If so, probably worth calling out
> in the SEP or configuration docs. We can also document it as deprecated and
> a

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh
Hi everyone,
I have updated the SEP
<https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza>
based on all the feedback. Feel free to comment.

I will start the [vote] mail thread, if there are no further questions
within the next 24 hours.

Thanks!
Navina

On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) 
wrote:

> Hi Jagadish,
> Thanks for the suggestion. You are right in that it should be the
> responsibility of the JobCoordinator to assign identifiers.
>
> > 'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> I think this is a consequence of our initial StandaloneJobCoordinator,
> which is pretty much a pass-through. I didn't see any usage for
> getProcessorId() and was wondering why we put it in the JobCoordinator
> interface. I think I should keep your design proposal from last year handy
> :) Thanks for pitching in!
>
>
> @All:
> Yesterday, there was a discussion on naming of the configuration used in
> this SEP - whether it should be within the "job" scope or "app" scope
> (introduced by SAMZA-1041
> <https://issues.apache.org/jira/browse/SAMZA-1041>).  Multi-stage feature
> and fluent-api for Samza introduces the notion of "application". Since the
> processorId generator config applies to all jobs within a Samza
> application, we decided to add the config for generator under "app" scope.
> Further details on config scope changes can be found in SAMZA-1120.
> <https://issues.apache.org/jira/browse/SAMZA-1120>
>
> I will send out an update once I change the SEP based on yesterday's
> meeting and Jagadish's idea.
>
> Thanks!
> Navina
>
> On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> Thanks for writing this SEP!
>>
>> Here's an alternate approach instead of taking the "String processorId" as
>> a parameter in the constructor. In my view, the "processorId" could be
>> generated by the StreamProcessor internally (instead of being generated
>> up-stream and passed in). The Job Coordinator API could be as follows:
>>
>>
>> public interface JobCoordinator {
>>
>>  ProcessorIdGenerator getProcessorIDGenerator();
>>
>> // could be String getProcessorID()
>>
>>  JobModel getJobModel();
>>
>> }
>>
>> public interface ProcessorIDGenerator {
>>
>>  String getProcessorID();
>> }
>>
>>
>> For instance, an Yarn job coordinator can merely parse the ID from config,
>> and return it. A Zk backed implementation of the Job coordinator can agree
>> on IDs using coordination leveraging Zk. One nice property with this
>> approach is that it keeps all logic related to coordination, agreement on
>> the Job Model, leader election (with potentially pluggable components for
>> each) inside the JobCoordinator.
>>
>> To be clear, I'm all for pluggability for ID generation logic that this
>> SEP
>> advocates. I'm only wondering if this logic could instead reside inside
>> the
>> Job Coordinator (which is internal to the StreamProcessor) instead of
>> relying on something external to it?
>>
>> Of course, there may be other considerations around the way the current
>> code is structured that may prevent this. Let me know if you agree with
>> this change.
>>
>> Thanks,
>> Jag
>>
>>
>> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>> > > wrote:
>>
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > discussion email once I am done.
>> >
>> > Perfect! :)
>> >
>> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu 
>> wrote:
>> >
>> > > Right, the static factory is very simple as you said. It's pretty
>> > > convenient for the client to use.
>> > >
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > > discussion email once I am done.
>> > >
>> > > Thanks,
>> > > Xinyu
>> > >
>> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
>> > nav...@apache.org
>> > > >
>> > > wrote:
>> > >
>> > > > > One minor thing I found is that the name of the config is camel
>> case
>> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
>> all
>

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh (Apache)
Hi everyone,
I have updated the SEP
<https://cwiki.apache.org/confluence/display/SAMZA/SEP-1%3A+Semantics+of+ProcessorId+in+Samza>
based
on all the feedback. Feel free to comment.

I will start the [vote] mail thread, if there are no further questions
within the next 24 hours.

Thanks!
Navina

On Tue, Mar 21, 2017 at 10:33 AM, Navina Ramesh (Apache) 
wrote:

> Hi Jagadish,
> Thanks for the suggestion. You are right in that it should be the
> responsibility of the JobCoordinator to assign identifiers.
>
> > 'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> I think this is a consequence of our initial StandaloneJobCoordinator,
> which is pretty much a pass-through. I didn't see any usage for
> getProcessorId() and was wondering why we put it in the JobCoordinator
> interface. I think I should keep your design proposal from last year handy
> :) Thanks for pitching in!
>
>
> @All:
> Yesterday, there was a discussion on naming of the configuration used in
> this SEP - whether it should be within the "job" scope or "app" scope
> (introduced by SAMZA-1041
> <https://issues.apache.org/jira/browse/SAMZA-1041>).  Multi-stage feature
> and fluent-api for Samza introduces the notion of "application". Since the
> processorId generator config applies to all jobs within a Samza
> application, we decided to add the config for generator under "app" scope.
> Further details on config scope changes can be found in SAMZA-1120.
> <https://issues.apache.org/jira/browse/SAMZA-1120>
>
> I will send out an update once I change the SEP based on yesterday's
> meeting and Jagadish's idea.
>
> Thanks!
> Navina
>
> On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
> jagadish1...@gmail.com> wrote:
>
>> Thanks for writing this SEP!
>>
>> Here's an alternate approach instead of taking the "String processorId" as
>> a parameter in the constructor. In my view, the "processorId" could be
>> generated by the StreamProcessor internally (instead of being generated
>> up-stream and passed in). The Job Coordinator API could be as follows:
>>
>>
>> public interface JobCoordinator {
>>
>>  ProcessorIdGenerator getProcessorIDGenerator();
>>
>> // could be String getProcessorID()
>>
>>  JobModel getJobModel();
>>
>> }
>>
>> public interface ProcessorIDGenerator {
>>
>>  String getProcessorID();
>> }
>>
>>
>> For instance, an Yarn job coordinator can merely parse the ID from config,
>> and return it. A Zk backed implementation of the Job coordinator can agree
>> on IDs using coordination leveraging Zk. One nice property with this
>> approach is that it keeps all logic related to coordination, agreement on
>> the Job Model, leader election (with potentially pluggable components for
>> each) inside the JobCoordinator.
>>
>> To be clear, I'm all for pluggability for ID generation logic that this
>> SEP
>> advocates. I'm only wondering if this logic could instead reside inside
>> the
>> Job Coordinator (which is internal to the StreamProcessor) instead of
>> relying on something external to it?
>>
>> Of course, there may be other considerations around the way the current
>> code is structured that may prevent this. Let me know if you agree with
>> this change.
>>
>> Thanks,
>> Jag
>>
>>
>> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>> > > wrote:
>>
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > discussion email once I am done.
>> >
>> > Perfect! :)
>> >
>> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu 
>> wrote:
>> >
>> > > Right, the static factory is very simple as you said. It's pretty
>> > > convenient for the client to use.
>> > >
>> > > I am working on the ApplicationRunner SEP right now. Will send out the
>> > > discussion email once I am done.
>> > >
>> > > Thanks,
>> > > Xinyu
>> > >
>> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
>> > nav...@apache.org
>> > > >
>> > > wrote:
>> > >
>> > > > > One minor thing I found is that the name of the config is camel
>> case
>> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
>> all
>

Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-21 Thread Navina Ramesh (Apache)
Hi Jagadish,
Thanks for the suggestion. You are right in that it should be the
responsibility of the JobCoordinator to assign identifiers.

> 'm only wondering if this logic could instead reside inside the
Job Coordinator (which is internal to the StreamProcessor) instead of
relying on something external to it?

I think this is a consequence of our initial StandaloneJobCoordinator,
which is pretty much a pass-through. I didn't see any usage for
getProcessorId() and was wondering why we put it in the JobCoordinator
interface. I think I should keep your design proposal from last year handy
:) Thanks for pitching in!


@All:
Yesterday, there was a discussion on naming of the configuration used in
this SEP - whether it should be within the "job" scope or "app" scope
(introduced by SAMZA-1041 <https://issues.apache.org/jira/browse/SAMZA-1041>).
Multi-stage feature and fluent-api for Samza introduces the notion of
"application". Since the processorId generator config applies to all jobs
within a Samza application, we decided to add the config for generator
under "app" scope. Further details on config scope changes can be found in
SAMZA-1120. <https://issues.apache.org/jira/browse/SAMZA-1120>

I will send out an update once I change the SEP based on yesterday's
meeting and Jagadish's idea.

Thanks!
Navina

On Mon, Mar 20, 2017 at 5:22 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this SEP!
>
> Here's an alternate approach instead of taking the "String processorId" as
> a parameter in the constructor. In my view, the "processorId" could be
> generated by the StreamProcessor internally (instead of being generated
> up-stream and passed in). The Job Coordinator API could be as follows:
>
>
> public interface JobCoordinator {
>
>  ProcessorIdGenerator getProcessorIDGenerator();
>
> // could be String getProcessorID()
>
>  JobModel getJobModel();
>
> }
>
> public interface ProcessorIDGenerator {
>
>  String getProcessorID();
> }
>
>
> For instance, an Yarn job coordinator can merely parse the ID from config,
> and return it. A Zk backed implementation of the Job coordinator can agree
> on IDs using coordination leveraging Zk. One nice property with this
> approach is that it keeps all logic related to coordination, agreement on
> the Job Model, leader election (with potentially pluggable components for
> each) inside the JobCoordinator.
>
> To be clear, I'm all for pluggability for ID generation logic that this SEP
> advocates. I'm only wondering if this logic could instead reside inside the
> Job Coordinator (which is internal to the StreamProcessor) instead of
> relying on something external to it?
>
> Of course, there may be other considerations around the way the current
> code is structured that may prevent this. Let me know if you agree with
> this change.
>
> Thanks,
> Jag
>
>
> On Thu, Mar 16, 2017 at 5:21 PM, Navina Ramesh
>  > wrote:
>
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > discussion email once I am done.
> >
> > Perfect! :)
> >
> > On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu 
> wrote:
> >
> > > Right, the static factory is very simple as you said. It's pretty
> > > convenient for the client to use.
> > >
> > > I am working on the ApplicationRunner SEP right now. Will send out the
> > > discussion email once I am done.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > > One minor thing I found is that the name of the config is camel
> case
> > > > (*processor.idGenerator.class*). Seems Samza's practice is to use
> all
> > > > lower
> > > > case configs with "." delimiter. Do you think we should stick to this
> > > > convention?
> > > >
> > > > I am always torn between the "convention" we have and the better way
> of
> > > > doing things. But I don't have strong opinions about it. I can change
> > it.
> > > >
> > > > > One more suggestion is to have a static factory method in the
> > > > ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > > >
> > > > I couldn't grasp these requirements from the ApplicationRunner
> design.
> > It
> > > > will be great if you can put it out in an SEP :)
> > > >
> > > > I can add the static fac

Re: [DISCUSS] Support Scala 2.12

2017-03-17 Thread Navina Ramesh
Thanks for creating the DISCUSS email!

This is good. It's a good idea to update to 2.12 since it looks like we are
fully backward compatible with older versions. +1 from me.

Cheers!
Navina

On Fri, Mar 17, 2017 at 1:34 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for starting this discussion and the patch. +1 for supporting scala
> 2.12.  I assume the changes are fully backwards compatible with scala 2.10,
> 2.11 (as evidenced by your check-all)?
>
> Also, another observation is that the generated Samza binaries will have
> 2.12 as the suffix for the future release (I this should be totally OK).
>
>
> On Fri, Mar 17, 2017 at 1:26 PM, Maksim Logvinenko 
> wrote:
>
> > Hi guys,
> >
> > I’ve created JIRA and already submitted patch which adds support of scala
> > 2.12. Here is the ticket: https://issues.apache.org/
> jira/browse/SAMZA-1135
> > .
> > Nothing serious: I’ve removed JavaConversions usage (because it’s marked
> as
> > deprecated now) and bumped kafka and scalatest versions since previous
> > versions don’t have scala 2.12 support. I run ./bin/check-all.sh on my
> > laptop and it was successful for all scala versions (2.10, 2.11 and 2.12)
> > and for both YARN versions.
> >
> > Thanks,
> > Maxim Logvinenko
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread Navina Ramesh
> I am working on the ApplicationRunner SEP right now. Will send out the
discussion email once I am done.

Perfect! :)

On Thu, Mar 16, 2017 at 5:13 PM, xinyu liu  wrote:

> Right, the static factory is very simple as you said. It's pretty
> convenient for the client to use.
>
> I am working on the ApplicationRunner SEP right now. Will send out the
> discussion email once I am done.
>
> Thanks,
> Xinyu
>
> On Thu, Mar 16, 2017 at 4:50 PM, Navina Ramesh (Apache)  >
> wrote:
>
> > > One minor thing I found is that the name of the config is camel case
> > (*processor.idGenerator.class*). Seems Samza's practice is to use all
> > lower
> > case configs with "." delimiter. Do you think we should stick to this
> > convention?
> >
> > I am always torn between the "convention" we have and the better way of
> > doing things. But I don't have strong opinions about it. I can change it.
> >
> > > One more suggestion is to have a static factory method in the
> > ProcessorIdGenerator (Like what we have in ApplicationRunner):
> >
> > I couldn't grasp these requirements from the ApplicationRunner design. It
> > will be great if you can put it out in an SEP :)
> >
> > I can add the static factory method for it. Just to clarify, the static
> > method simply class loads the ProcessorIdGenerator ? It uses reflection
> to
> > create the instance ?
> >
> > Thanks!
> > Navina
> >
> >
> >
> > On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu 
> wrote:
> >
> > > The proposal looks great to me! Changing the id type to string will
> make
> > > sure this can work with other types of cluster which doesn't support
> > > integer id. The interface and config provides a pluggable way to have
> > > different id generators for different use cases. One minor thing I
> found
> > is
> > > that the name of the config is camel case
> (*processor.idGenerator.class*
> > ).
> > > Seems Samza's practice is to use all lower case configs with "."
> > delimiter.
> > > Do you think we should stick to this convention?
> > >
> > > One more suggestion is to have a static factory method in
> > > the ProcessorIdGenerator (Like what we have in ApplicationRunner):
> > >
> > > static ProcessIdGenerator fromConfig(Config config) { ... }.
> > >
> > > With this, It will be more convenient for the ApplicationRunner to
> > > construct the generator. What do you think?
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> > > nav...@apache.org>
> > > wrote:
> > >
> > > > Hi everyone,
> > > > I created a proposal for SAMZA-1126, which addresses the semantics of
> > > > ProcessorId in Samza. For most purposes, ProcessorId is same as the
> > > logical
> > > > id that Samza assigns for each Yarn container. It is primarily used
> in
> > > > JobModel as a key for the corresponding ContainerModel and also, in
> > > > container-level metrics. We are expanding the applicability of
> > > processorId
> > > > to be beyond a fixed set of processors.
> > > >
> > > > Please review and comment on this SEP.
> > > >
> > > > For those who are not actively following the master branch, you may
> > have
> > > > more questions than others. Feel free to ask them here.
> > > >
> > > > @Xinyu: Since you are working on SAMZA-1067 and other related
> > integration
> > > > APIs, can you please add an SEP for SAMZA-1067 ? This will help
> others
> > > (adn
> > > > me as well) get on the same page with your design/code. Let me know
> if
> > > > SEP-1 will work per your design for ApplicationRunner.
> > > >
> > > > Thanks!
> > > > Navina
> > > >
> > >
> >
>



-- 
Navina R.


Re: [DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-16 Thread Navina Ramesh (Apache)
> One minor thing I found is that the name of the config is camel case
(*processor.idGenerator.class*). Seems Samza's practice is to use all lower
case configs with "." delimiter. Do you think we should stick to this
convention?

I am always torn between the "convention" we have and the better way of
doing things. But I don't have strong opinions about it. I can change it.

> One more suggestion is to have a static factory method in the
ProcessorIdGenerator (Like what we have in ApplicationRunner):

I couldn't grasp these requirements from the ApplicationRunner design. It
will be great if you can put it out in an SEP :)

I can add the static factory method for it. Just to clarify, the static
method simply class loads the ProcessorIdGenerator ? It uses reflection to
create the instance ?

Thanks!
Navina



On Thu, Mar 16, 2017 at 4:31 PM, xinyu liu  wrote:

> The proposal looks great to me! Changing the id type to string will make
> sure this can work with other types of cluster which doesn't support
> integer id. The interface and config provides a pluggable way to have
> different id generators for different use cases. One minor thing I found is
> that the name of the config is camel case (*processor.idGenerator.class*).
> Seems Samza's practice is to use all lower case configs with "." delimiter.
> Do you think we should stick to this convention?
>
> One more suggestion is to have a static factory method in
> the ProcessorIdGenerator (Like what we have in ApplicationRunner):
>
> static ProcessIdGenerator fromConfig(Config config) { ... }.
>
> With this, It will be more convenient for the ApplicationRunner to
> construct the generator. What do you think?
>
> Thanks,
> Xinyu
>
> On Wed, Mar 15, 2017 at 10:59 PM, Navina Ramesh (Apache) <
> nav...@apache.org>
> wrote:
>
> > Hi everyone,
> > I created a proposal for SAMZA-1126, which addresses the semantics of
> > ProcessorId in Samza. For most purposes, ProcessorId is same as the
> logical
> > id that Samza assigns for each Yarn container. It is primarily used in
> > JobModel as a key for the corresponding ContainerModel and also, in
> > container-level metrics. We are expanding the applicability of
> processorId
> > to be beyond a fixed set of processors.
> >
> > Please review and comment on this SEP.
> >
> > For those who are not actively following the master branch, you may have
> > more questions than others. Feel free to ask them here.
> >
> > @Xinyu: Since you are working on SAMZA-1067 and other related integration
> > APIs, can you please add an SEP for SAMZA-1067 ? This will help others
> (adn
> > me as well) get on the same page with your design/code. Let me know if
> > SEP-1 will work per your design for ApplicationRunner.
> >
> > Thanks!
> > Navina
> >
>


[DISCUSS] SEP-1: Semantics of ProcessorId in Samza

2017-03-15 Thread Navina Ramesh (Apache)
Hi everyone,
I created a proposal for SAMZA-1126, which addresses the semantics of
ProcessorId in Samza. For most purposes, ProcessorId is same as the logical
id that Samza assigns for each Yarn container. It is primarily used in
JobModel as a key for the corresponding ContainerModel and also, in
container-level metrics. We are expanding the applicability of processorId
to be beyond a fixed set of processors.

Please review and comment on this SEP.

For those who are not actively following the master branch, you may have
more questions than others. Feel free to ask them here.

@Xinyu: Since you are working on SAMZA-1067 and other related integration
APIs, can you please add an SEP for SAMZA-1067 ? This will help others (adn
me as well) get on the same page with your design/code. Let me know if
SEP-1 will work per your design for ApplicationRunner.

Thanks!
Navina


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Xinyu,
I considered doing that as an example. But I want to keep SEP to be only
for technical discussions and not process related proposals.

Navina

On Mar 14, 2017 17:23, "xinyu liu"  wrote:

> +1 on this proposal too. Could you actually put this proposal as the first
> SEP (like SEP-0), so it serves an example of how it will look like in
> practice?
>
> Xinyu
>
> On Tue, Mar 14, 2017 at 3:34 PM, Navina Ramesh
>  > wrote:
>
> > Just to clarify: The proposal for code and design process change is
> > attached as a PDF/markdown to the JIRA - SAMZA-1141.
> >
> > Also, please show your support specifically for code and design process.
> My
> > bad for not calling it out earlier :)
> >
> > Thanks!
> > Navina
> >
> > On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
> > jagadish1...@gmail.com> wrote:
> >
> > > Thanks for writing this up.
> > >
> > > I'm +1 on this proposal.
> > >
> > >
> > >
> > > On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache) <
> > nav...@apache.org
> > > >
> > > wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > We switched to using Pull Requests for code reviews a few months
> back.
> > > > Clearly, there are some drawbacks to that model and we are trying to
> > > > address the shortcomings. I have gathered input from some of the
> > > committers
> > > > regarding what is missing the code review process and what can be
> > > improved.
> > > > Please take a look and provide feedback.
> > > >
> > > > Additionally, we are considering moving to a KIP/FLIP-like model for
> > > > submitting design proposals (major changes to samza). Lately, there
> > have
> > > > been some major feature discussions that are not documented
> > consistently
> > > in
> > > > a centralized location. The proposal in SAMZA-1141
> > > > <https://issues.apache.org/jira/browse/SAMZA-1141> address the
> design
> > > > review process as well. Please review it too. I have already created
> a
> > > wiki
> > > > page
> > > > <https://cwiki.apache.org/confluence/display/SAMZA/
> > > > Samza+Enhancement+Proposal>
> > > > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > > > template. Going forward, let's start adding all major change
> proposals
> > to
> > > > the wiki and discuss the design on the mailing list.
> > > >
> > > > Your cooperation is highly appreciated during this period of
> transition
> > > in
> > > > the process :)
> > > >
> > > > Feedbacks welcome!
> > > >
> > > > Thanks!
> > > > --
> > > > Navina R
> > > >
> > > > PS: Alternatives name suggestions for "SEP" are welcome !
> > > >
> > >
> > >
> > >
> > > --
> > > Jagadish V,
> > > Graduate Student,
> > > Department of Computer Science,
> > > Stanford University
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>


Re: [DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh
Just to clarify: The proposal for code and design process change is
attached as a PDF/markdown to the JIRA - SAMZA-1141.

Also, please show your support specifically for code and design process. My
bad for not calling it out earlier :)

Thanks!
Navina

On Tue, Mar 14, 2017 at 3:30 PM, Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> Thanks for writing this up.
>
> I'm +1 on this proposal.
>
>
>
> On Tue, Mar 14, 2017 at 3:15 PM, Navina Ramesh (Apache)  >
> wrote:
>
> > Hi everyone,
> >
> > We switched to using Pull Requests for code reviews a few months back.
> > Clearly, there are some drawbacks to that model and we are trying to
> > address the shortcomings. I have gathered input from some of the
> committers
> > regarding what is missing the code review process and what can be
> improved.
> > Please take a look and provide feedback.
> >
> > Additionally, we are considering moving to a KIP/FLIP-like model for
> > submitting design proposals (major changes to samza). Lately, there have
> > been some major feature discussions that are not documented consistently
> in
> > a centralized location. The proposal in SAMZA-1141
> > <https://issues.apache.org/jira/browse/SAMZA-1141> address the design
> > review process as well. Please review it too. I have already created a
> wiki
> > page
> > <https://cwiki.apache.org/confluence/display/SAMZA/
> > Samza+Enhancement+Proposal>
> > describing the Samza Enhancement Proposal (SEP) process and an SEP
> > template. Going forward, let's start adding all major change proposals to
> > the wiki and discuss the design on the mailing list.
> >
> > Your cooperation is highly appreciated during this period of transition
> in
> > the process :)
> >
> > Feedbacks welcome!
> >
> > Thanks!
> > --
> > Navina R
> >
> > PS: Alternatives name suggestions for "SEP" are welcome !
> >
>
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>



-- 
Navina R.


[DISCUSS] SAMZA-1141 - Apache Samza Development Process Improvements

2017-03-14 Thread Navina Ramesh (Apache)
Hi everyone,

We switched to using Pull Requests for code reviews a few months back.
Clearly, there are some drawbacks to that model and we are trying to
address the shortcomings. I have gathered input from some of the committers
regarding what is missing the code review process and what can be improved.
Please take a look and provide feedback.

Additionally, we are considering moving to a KIP/FLIP-like model for
submitting design proposals (major changes to samza). Lately, there have
been some major feature discussions that are not documented consistently in
a centralized location. The proposal in SAMZA-1141
<https://issues.apache.org/jira/browse/SAMZA-1141> address the design
review process as well. Please review it too. I have already created a wiki
page
<https://cwiki.apache.org/confluence/display/SAMZA/Samza+Enhancement+Proposal>
describing the Samza Enhancement Proposal (SEP) process and an SEP
template. Going forward, let's start adding all major change proposals to
the wiki and discuss the design on the mailing list.

Your cooperation is highly appreciated during this period of transition in
the process :)

Feedbacks welcome!

Thanks!
-- 
Navina R

PS: Alternatives name suggestions for "SEP" are welcome !


[GitHub] samza pull request #54: SAMZA-1084 - User thread does not see errors from th...

2017-02-15 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/54

SAMZA-1084 - User thread does not see errors from the processor thread

Adding a ProcessorLifecycleCallback for specifying user-defined error 
handling

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza SAMZA-1084

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit f473f73cbb60047e77eac5d715e2dd2b5ef51234
Author: navina 
Date:   2017-02-15T21:32:25Z

SAMZA-1084 - User thread does not see errors from the processor thread when 
using the StreamProcessor API




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [VOTE] Apache Samza 0.12.0 RC2

2017-02-13 Thread Navina Ramesh
I ran check-all against Mac and integration tests on Linux. Looks good with
no concerning issues.

+1 (binding)

Thanks!
Navina

On Fri, Feb 10, 2017 at 9:25 AM, Boris S  wrote:

> I also successfully ran the integration tests on Linux. All passed.
> +1 non-binding
>
> On Wed, Feb 8, 2017 at 4:57 PM, Jacob Maes  wrote:
>
> > Build and integration tests were successful for me.
> >
> > +1 non-binding
> >
> > On Wed, Feb 8, 2017 at 4:48 PM, xinyu liu  wrote:
> >
> > > Ran build, checkAll and integration tests. All passed.
> > >
> > > +1 non-binding.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Wed, Feb 8, 2017 at 4:18 PM, Boris S  wrote:
> > >
> > > > Cloned the release and ran build, test and checkAll.sh
> > > > All passed.
> > > > Verified MD5 and the signature.
> > > > Got warning - "this key is not certified with a trusted signature". I
> > > guess
> > > > it is ok.
> > > >
> > > > +1
> > > >
> > > > On Mon, Feb 6, 2017 at 5:32 PM, Jagadish Venkatraman <
> > > > jagadish1...@gmail.com
> > > > > wrote:
> > > >
> > > > > This is a call for a vote on a release of Apache Samza 0.12.0.
> Thanks
> > > to
> > > > > everyone who has contributed to this release. We are very glad to
> see
> > > > some
> > > > > new contributors in this release.
> > > > >
> > > > > The release candidate can be downloaded from here:
> > > > > http://home.apache.org/~jagadish/samza-0.12.0-rc2/
> > > > >
> > > > > The release candidate is signed with pgp key AF81FFBF, which can be
> > > found
> > > > > on keyservers:
> > > > > http://pgp.mit.edu/pks/lookup?op=get&search=0xAF81FFBF
> > > > >
> > > > > The git tag is release-0.12.0-rc2 and signed with the same pgp key:
> > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > refs/tags/release-0.12.0-rc2
> > > > >
> > > > > Test binaries have been published to Maven's staging repository,
> and
> > > are
> > > > > available here:
> > > > > https://repository.apache.org/content/repositories/
> > orgapachesamza-1018
> > > > >
> > > > > Note that the binaries were built with JDK8 without incident.
> > > > >
> > > > > 26 issues were resolved for this release:
> > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20S
> > > > > AMZA%20AND%20fixVersion%20in%20(0.12%2C%200.12.0)%20AND%20st
> > > > > atus%20in%20(Resolved%2C%20Closed)
> > > > >
> > > > > The vote will be open for 72 hours (end in 6PM Thursday, 02/09/2017
> > ).
> > > > >
> > > > > Please download the release candidate, check the hashes/signature,
> > > build
> > > > it
> > > > > and test it, and then please vote:
> > > > >
> > > > >
> > > > > [ ] +1 approve
> > > > >
> > > > > [ ] +0 no opinion
> > > > >
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > >
> > > > > +1 from my side for the release.
> > > > >
> > > > > Cheers!
> > > > >
> > > > > --
> > > > > Jagadish V,
> > > > > Graduate Student,
> > > > > Department of Computer Science,
> > > > > Stanford University
> > > > >
> > > >
> > >
> >
>



-- 
Navina R.


[GitHub] samza pull request #48: SAMZA-1082 : Implement Leader Election using ZK

2017-02-06 Thread navina
GitHub user navina opened a pull request:

https://github.com/apache/samza/pull/48

SAMZA-1082 : Implement Leader Election using ZK

Simple implementation of leader election recipe along with unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/navina/samza LeaderElector

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/48.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #48


commit 37c2c8beddbf61325da80ea190e1f8e70d7c1bbc
Author: navina 
Date:   2017-01-23T21:16:49Z

Extracting files related to LeaderElection

commit aaaf24e2b159b21f28670171b027c2a69ea2737c
Author: navina 
Date:   2017-02-02T01:40:13Z

Adding EmbeddedZookeeper for testing

commit 317cf167ba9d5bd08e9ebe753892c2bd44befeb0
Author: navina 
Date:   2017-02-02T02:14:36Z

Adding tests for ZkKeyBuilder

commit 1734f8f905cafb2b1416422ac8809da30c3a848b
Author: navina 
Date:   2017-02-03T18:51:14Z

Adding tests for ZkUtils

commit 6dd6b8d8e891540ee75f0db5966afe5e7f599dd0
Author: navina 
Date:   2017-02-04T02:43:12Z

Adding tests for ZkLeaderElector




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163355
---


Fix it, then Ship it!




Some nits and comments. Otherwise, looks good. Thanks! +1


docs/learn/documentation/versioned/hdfs/consumer.md (line 39)
<https://reviews.apache.org/r/52570/#comment234764>

This line is confusing. Are you implying that I can read from non-avro 
formatted files that are in HDFS ? 
What is the significance of the SingleFileHdfsReader interface ? It is not 
clear to the reader.



docs/learn/documentation/versioned/hdfs/consumer.md (line 89)
<https://reviews.apache.org/r/52570/#comment234762>

Nit: Can you move the explanation of what advanced partitioning is  outside 
of the code block? 
You can emphasize the reserved term note by doing -> 
**note**  , when it is outside the code block



docs/learn/documentation/versioned/jobs/configuration-table.html (line 1822)
<https://reviews.apache.org/r/52570/#comment234763>

Look like a typo. It should "systems.*, instead of "system.*" ?


- Navina Ramesh


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



Re: Review Request 52570: SAMZA-1025: documentation for hdfs system consumer

2017-01-27 Thread Navina Ramesh


> On Jan. 25, 2017, 10:36 p.m., Jagadish Venkatraman wrote:
> > docs/learn/documentation/versioned/hdfs/consumer.md, line 67
> > <https://reviews.apache.org/r/52570/diff/2/?file=1613256#file1613256line67>
> >
> > The relationship between whitelist and blacklist was not very obvious 
> > to me.
> > 
> > Is the behavior that the whitelist is applied first, and the blacklist 
> > is applied to the matched files later? (to determine which files are to be 
> > ignored).
> 
> Hai Lu wrote:
> The order doesn't matter. (X & whitelist) - blacklist == (X - blacklist) 
> & whitelist

This is assuming that whitelist and blacklist are mutually exclusive, right?


- Navina


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52570/#review163011
---


On Jan. 27, 2017, 5:48 p.m., Hai Lu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52570/
> ---
> 
> (Updated Jan. 27, 2017, 5:48 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Bugs: SAMZA-1025
> https://issues.apache.org/jira/browse/SAMZA-1025
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> documentation for hdfs system consumer
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/hdfs/consumer.md PRE-CREATION 
>   docs/learn/documentation/versioned/hdfs/producer.md 
> b0e936f5b0a9c945ea7f02bfc2536ef50f017bf6 
>   docs/learn/documentation/versioned/index.html 
> d0b14ece94341e2cb937cf32db480e69f93303c2 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> ba5ebbc54b5c64f82f35ed781dad7023a8f920e1 
> 
> Diff: https://reviews.apache.org/r/52570/diff/
> 
> 
> Testing
> ---
> 
> N/A
> 
> 
> Thanks,
> 
> Hai Lu
> 
>



  1   2   3   4   5   6   >