Re: [VOTE] Apache Samza 0.13.1 RC0

2017-08-23 Thread Chinmay Soman
Downloaded source, compiled and ran all tests. All passed. Also verified
signature.

+1 (Binding)

On Wed, Aug 23, 2017 at 11:27 AM, Boris S <bor...@gmail.com> wrote:

> Hi,
> I've downloaded and built it on Linux.
> Ran unit and integration tests. All passed.
> The only issue I've encountered was a version of python.
> Version below 2.5 and above 3 all failed. 2.7 was the only version that
> worked.
>
> +1 (non binding)
>
> On Fri, Aug 18, 2017 at 11:59 AM, Fred Haifeng Ji <haifeng...@gmail.com>
> wrote:
>
> > This is a call for a vote on a release of Apache Samza 0.13.1. Thanks to
> > everyone who has contributed to this release.
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~navina/samza-0.13.1-rc0/
> >
> >
> > The release candidate is signed with pgp key A211312E, which can be found
> > on keyservers:
> > http://pgp.mit.edu/pks/lookup?op=get=0xEDFD8F9AA211312E
> >
> >
> > The git tag is release-0.13.1-rc0 and signed with the same pgp key:
> > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > refs/tags/release-0.13.1-rc0
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > *https://repository.apache.org/content/repositories/orgapachesamza-1030/
> > <https://repository.apache.org/content/repositories/orgapachesamza-1030/
> >*
> >
> >
> > 29 issues were resolved for this release: https://issues.apache
> > .org/jira/issues/?jql=project%20%3D%2012314526%20AND%20fixVe
> > rsion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> >
> >
> > The vote will be open for 72 hours (ending at 1:00PM Monday, 08/21/2017).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> >
> > --
> > Fred Ji
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: [VOTE] Apache Samza 0.12.0 RC2

2017-02-14 Thread Chinmay Soman
Downloaded release (on Mac), checked build test (and checkall).

Verified the pgp key (although with the warning).

+1 Binding.

On Mon, Feb 13, 2017 at 3:26 PM, Renato Marroquín Mogrovejo <
renatoj.marroq...@gmail.com> wrote:

> I also run check-all against Debian and build was successful, although I
> saw a bunch of this error:
>
> apache-samza-0.12.0-src/samza-hdfs/src/main/java/org/apache/
> samza/system/hdfs/HdfsSystemConsumer.java:59:
> error: unmappable character for encoding ASCII
>  *
>  
> ?
>
> I don't think they are a blocker, it's some characters not being able to be
> parsed when doing the scalaCompile task for the samza-hdfs component. Is it
> worth opening a JIRA to fix this?
>
>
> Best,
>
> Renato M.
>
>
> 2017-02-13 22:20 GMT+01:00 Navina Ramesh <nram...@linkedin.com.invalid>:
>
> > I ran check-all against Mac and integration tests on Linux. Looks good
> with
> > no concerning issues.
> >
> > +1 (binding)
> >
> > Thanks!
> > Navina
> >
> > On Fri, Feb 10, 2017 at 9:25 AM, Boris S <bor...@gmail.com> wrote:
> >
> > > I also successfully ran the integration tests on Linux. All passed.
> > > +1 non-binding
> > >
> > > On Wed, Feb 8, 2017 at 4:57 PM, Jacob Maes <jacob.m...@gmail.com>
> wrote:
> > >
> > > > Build and integration tests were successful for me.
> > > >
> > > > +1 non-binding
> > > >
> > > > On Wed, Feb 8, 2017 at 4:48 PM, xinyu liu <xinyuliu...@gmail.com>
> > wrote:
> > > >
> > > > > Ran build, checkAll and integration tests. All passed.
> > > > >
> > > > > +1 non-binding.
> > > > >
> > > > > Thanks,
> > > > > Xinyu
> > > > >
> > > > > On Wed, Feb 8, 2017 at 4:18 PM, Boris S <bor...@gmail.com> wrote:
> > > > >
> > > > > > Cloned the release and ran build, test and checkAll.sh
> > > > > > All passed.
> > > > > > Verified MD5 and the signature.
> > > > > > Got warning - "this key is not certified with a trusted
> > signature". I
> > > > > guess
> > > > > > it is ok.
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > On Mon, Feb 6, 2017 at 5:32 PM, Jagadish Venkatraman <
> > > > > > jagadish1...@gmail.com
> > > > > > > wrote:
> > > > > >
> > > > > > > This is a call for a vote on a release of Apache Samza 0.12.0.
> > > Thanks
> > > > > to
> > > > > > > everyone who has contributed to this release. We are very glad
> to
> > > see
> > > > > > some
> > > > > > > new contributors in this release.
> > > > > > >
> > > > > > > The release candidate can be downloaded from here:
> > > > > > > http://home.apache.org/~jagadish/samza-0.12.0-rc2/
> > > > > > >
> > > > > > > The release candidate is signed with pgp key AF81FFBF, which
> can
> > be
> > > > > found
> > > > > > > on keyservers:
> > > > > > > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF
> > > > > > >
> > > > > > > The git tag is release-0.12.0-rc2 and signed with the same pgp
> > key:
> > > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=
> > > > > > > refs/tags/release-0.12.0-rc2
> > > > > > >
> > > > > > > Test binaries have been published to Maven's staging
> repository,
> > > and
> > > > > are
> > > > > > > available here:
> > > > > > > https://repository.apache.org/content/repositories/
> > > > orgapachesamza-1018
> > > > > > >
> > > > > > > Note that the binaries were built with JDK8 without incident.
> > > > > > >
> > > > > > > 26 issues were resolved for this release:
> > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20S
> > > > > > > AMZA%20AND%20fixVersion%20in%20(0.12%2C%200.12.0)%20AND%20st
> > > > > > > atus%20in%20(Resolved%2C%20Closed)
> > > > > > >
> > > > > > > The vote will be open for 72 hours (end in 6PM Thursday,
> > 02/09/2017
> > > > ).
> > > > > > >
> > > > > > > Please download the release candidate, check the
> > hashes/signature,
> > > > > build
> > > > > > it
> > > > > > > and test it, and then please vote:
> > > > > > >
> > > > > > >
> > > > > > > [ ] +1 approve
> > > > > > >
> > > > > > > [ ] +0 no opinion
> > > > > > >
> > > > > > > [ ] -1 disapprove (and reason why)
> > > > > > >
> > > > > > >
> > > > > > > +1 from my side for the release.
> > > > > > >
> > > > > > > Cheers!
> > > > > > >
> > > > > > > --
> > > > > > > Jagadish V,
> > > > > > > Graduate Student,
> > > > > > > Department of Computer Science,
> > > > > > > Stanford University
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Navina R.
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: Review Request 45912: SAMZA-0.10.0: fix the bug of SamzaObjectMapper

2016-04-08 Thread Chinmay Soman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45912/#review127925
---


Ship it!




- Chinmay Soman


On April 8, 2016, 11:40 p.m., Yuanchi Ning wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45912/
> ---
> 
> (Updated April 8, 2016, 11:40 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> enable the SerializationConfig AUTO_DETECT_GETTERS in SamzaObjectMapper, 
> otherwise it will cause JsonSerde misbehave (wouldn't discover the basic 
> types like HashMap Entry)
> JIRA ticket link: https://issues.apache.org/jira/browse/SAMZA-933
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
>  717b5dcad2aa22540deb08962bf2833e7dc5baa5 
>   samza-core/src/test/scala/org/apache/samza/serializers/TestJsonSerde.scala 
> 4f1c14ce3838163c5af8c9d076238e0ed32619e1 
> 
> Diff: https://reviews.apache.org/r/45912/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Yuanchi Ning
> 
>



Re: Review Request 45912: SAMZA-0.10.0: fix the bug of SamzaObjectMapper

2016-04-07 Thread Chinmay Soman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/45912/#review127702
---



Lets add a unit test for this.

- Chinmay Soman


On April 8, 2016, 12:02 a.m., Yuanchi Ning wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/45912/
> ---
> 
> (Updated April 8, 2016, 12:02 a.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> enable the SerializationConfig AUTO_DETECT_GETTERS in SamzaObjectMapper, 
> otherwise it wouldn't discover the basic types like HashMap Entry
> 
> 
> Diffs
> -
> 
>   
> samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java
>  717b5dcad2aa22540deb08962bf2833e7dc5baa5 
> 
> Diff: https://reviews.apache.org/r/45912/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Yuanchi Ning
> 
>



Re: [DISCUSS] Moving to github/pull-request for code review and check-in

2016-02-20 Thread Chinmay Soman
;>> wrote:
> > >>>>>>>
> > >>>>>>> Hi, all,
> > >>>>>>>
> > >>>>>>> I want to start the discussion on our code review/commit process.
> > >>>>>>>
> > >>>>>>> I felt that our code review and check-in process is a little bit
> > >>>>>>> cumbersome:
> > >>>>>>> - developers need to create RBs and attach diff to JIRA
> > >>>>>>> - committers need to review RBs, dowload diff and apply, then
> push.
> > >>>>>>>
> > >>>>>>> It would be much lighter if we take the pull request only
> approach,
> > >>> as
> > >>>>>>> Kafka already converted to:
> > >>>>>>> - for the developers, the only thing needed is to open a pull
> > >>> request.
> > >>>>>>> - for committers, review and apply patch is from the same PR and
> > >>> merge
> > >>>>>> can
> > >>>>>>> be done directly on remote git repo.
> > >>>>>>>
> > >>>>>>> Of course, there might be some hookup scripts that we will need
> to
> > >>> link
> > >>>>>>> JIRA w/ pull request in github, which Kafka already does. Any
> > >>> comments
> > >>>>>> and
> > >>>>>>> feedbacks are welcome!
> > >>>>>>>
> > >>>>>>> Thanks!
> > >>>>>>>
> > >>>>>>> -Yi
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Navina R.
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> Milinda Pathirage
> > >>>
> > >>> PhD Student | Research Assistant
> > >>> School of Informatics and Computing | Data to Insight Center
> > >>> Indiana University
> > >>>
> > >>> twitter: milindalakmal
> > >>> skype: milinda.pathirage
> > >>> blog: http://milinda.pathirage.org
> > >>>
> > >>
> > >
> > >
> > > --
> > > Sent from my iphone.
> >
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: Review Request 39032: SAMZA-787: task.log4j.system should not be guessed if not configured

2016-02-02 Thread Chinmay Soman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/39032/#review117572
---


Ship it!




- Chinmay Soman


On Oct. 6, 2015, 12:44 a.m., Navina Ramesh wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/39032/
> ---
> 
> (Updated Oct. 6, 2015, 12:44 a.m.)
> 
> 
> Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Yi 
> Pan (Data Infrastructure).
> 
> 
> Bugs: SAMZA-787
> https://issues.apache.org/jira/browse/SAMZA-787
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> SAMZA-787: task.log4j.system should not be guessed if not configured
> 
> 
> Diffs
> -
> 
>   docs/learn/documentation/versioned/jobs/configuration-table.html 
> b42c34cf55096467ab20507d82af7d002abd8e1e 
>   docs/learn/documentation/versioned/jobs/logging.md 
> a3bb054c0febabf2d7b6e3e18fa710f93382d744 
>   samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java 
> 209296dd48d64159d14bf20a286d971b54fb710e 
>   
> samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java 
> 894845332f2a08b984c587884ed13f2854b4d492 
>   
> samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java 
> f7d3cbecfb8901f5fe1a290ce9b08446ce48f4df 
>   
> samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java
>  3e812405edf1ad4fca43784b1668ab3ae6ff6475 
> 
> Diff: https://reviews.apache.org/r/39032/diff/
> 
> 
> Testing
> ---
> 
> ./bin/check-all.sh passed
> 
> 
> Thanks,
> 
> Navina Ramesh
> 
>



Re: Question on zero downtime deployment of Samza jobs

2016-01-14 Thread Chinmay Soman
FYI: I've forwarded it to Yi's personal email. Yi: please feel free to
email Peter (CC'ed) or me about any questions. FYI: the real solution would
be to implement standby containers. This solution is an attempt to do the
same.

On Thu, Jan 14, 2016 at 10:17 AM, Yi Pan <nickpa...@gmail.com> wrote:

> It might be the mail list restriction. Could you try to my personal email
> (i.e. nickpa...@gmail.com)?
>
> On Thu, Jan 14, 2016 at 10:15 AM, Chinmay Soman <chinmay.cere...@gmail.com
> >
> wrote:
>
> > It shows as attached in my sent email. That's weird.
> >
> > On Thu, Jan 14, 2016 at 10:14 AM, Yi Pan <nickpa...@gmail.com> wrote:
> >
> > > Hi, Chinmay,
> > >
> > > Did you forget to attach? I did not see the attachment in your last
> > email.
> > > Or is it due to the mail list restriction?
> > >
> > > On Thu, Jan 14, 2016 at 10:12 AM, Chinmay Soman <
> > chinmay.cere...@gmail.com
> > > >
> > > wrote:
> > >
> > > > Sorry for the long delay (I was trying to find the design doc we
> made).
> > > > Please see attachment for the design doc. I'm also CC'ing Peter Huang
> > > (the
> > > > intern) who worked on this.
> > > >
> > > > Disclaimer: Given this was an internship project, we've cut some
> > corners.
> > > > We plan to come up with an improved version in some time.
> > > >
> > > > On Wed, Jan 6, 2016 at 11:31 AM, Yi Pan <nickpa...@gmail.com> wrote:
> > > >
> > > >> Hi, Chinmay,
> > > >>
> > > >> That's awesome! Could you share some design doc of this feature? We
> > > would
> > > >> love to have this feature in LinkedIn as well!
> > > >>
> > > >> -Yi
> > > >>
> > > >> On Wed, Jan 6, 2016 at 10:02 AM, Chinmay Soman <
> > > chinmay.cere...@gmail.com
> > > >> >
> > > >> wrote:
> > > >>
> > > >> > FYI: As part of an Uber internship project, we were working on
> > exactly
> > > >> this
> > > >> > problem. Our approach was to do a rolling restart of all the
> > > containers
> > > >> > wherein we start a "replica" container for each primary container
> > and
> > > >> let
> > > >> > it "catch up" before we do the switch. Of course this doesn't
> > > guarantee
> > > >> > zero downtime, but it does guarantee minimum time to upgrade each
> > such
> > > >> > container.
> > > >> >
> > > >> > The code is still in POC, but we do plan to finish this and make
> > this
> > > >> > available. Let me know if you're interested in trying it out.
> > > >> >
> > > >> > FYI: the sticky container deployment will also minimize the time
> to
> > > >> upgrade
> > > >> > / deploy since majority of the upgrade time is taken up by the
> > > >> container in
> > > >> > reading all the changelog (if any). Upgrade / re-deploy will also
> > > take a
> > > >> > long time if the checkpoint topic is not log compacted (which is
> > true
> > > in
> > > >> > our environment).
> > > >> >
> > > >> > Thanks,
> > > >> > C
> > > >> >
> > > >> > On Wed, Jan 6, 2016 at 9:56 AM, Bae, Jae Hyeon <
> metac...@gmail.com>
> > > >> wrote:
> > > >> >
> > > >> > > Hi Samza devs and users
> > > >> > >
> > > >> > > I know this will be tricky in Samza because Samza Kafka consumer
> > is
> > > >> not
> > > >> > > coordinated externally, but do you have any idea how to deploy
> > samza
> > > >> jobs
> > > >> > > with zero downtime?
> > > >> > >
> > > >> > > Thank you
> > > >> > > Best, Jae
> > > >> > >
> > > >> >
> > > >> >
> > > >> >
> > > >> > --
> > > >> > Thanks and regards
> > > >> >
> > > >> > Chinmay Soman
> > > >> >
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Thanks and regards
> > > >
> > > > Chinmay Soman
> > > >
> > >
> >
> >
> >
> > --
> > Thanks and regards
> >
> > Chinmay Soman
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: Question on zero downtime deployment of Samza jobs

2016-01-14 Thread Chinmay Soman
It shows as attached in my sent email. That's weird.

On Thu, Jan 14, 2016 at 10:14 AM, Yi Pan <nickpa...@gmail.com> wrote:

> Hi, Chinmay,
>
> Did you forget to attach? I did not see the attachment in your last email.
> Or is it due to the mail list restriction?
>
> On Thu, Jan 14, 2016 at 10:12 AM, Chinmay Soman <chinmay.cere...@gmail.com
> >
> wrote:
>
> > Sorry for the long delay (I was trying to find the design doc we made).
> > Please see attachment for the design doc. I'm also CC'ing Peter Huang
> (the
> > intern) who worked on this.
> >
> > Disclaimer: Given this was an internship project, we've cut some corners.
> > We plan to come up with an improved version in some time.
> >
> > On Wed, Jan 6, 2016 at 11:31 AM, Yi Pan <nickpa...@gmail.com> wrote:
> >
> >> Hi, Chinmay,
> >>
> >> That's awesome! Could you share some design doc of this feature? We
> would
> >> love to have this feature in LinkedIn as well!
> >>
> >> -Yi
> >>
> >> On Wed, Jan 6, 2016 at 10:02 AM, Chinmay Soman <
> chinmay.cere...@gmail.com
> >> >
> >> wrote:
> >>
> >> > FYI: As part of an Uber internship project, we were working on exactly
> >> this
> >> > problem. Our approach was to do a rolling restart of all the
> containers
> >> > wherein we start a "replica" container for each primary container and
> >> let
> >> > it "catch up" before we do the switch. Of course this doesn't
> guarantee
> >> > zero downtime, but it does guarantee minimum time to upgrade each such
> >> > container.
> >> >
> >> > The code is still in POC, but we do plan to finish this and make this
> >> > available. Let me know if you're interested in trying it out.
> >> >
> >> > FYI: the sticky container deployment will also minimize the time to
> >> upgrade
> >> > / deploy since majority of the upgrade time is taken up by the
> >> container in
> >> > reading all the changelog (if any). Upgrade / re-deploy will also
> take a
> >> > long time if the checkpoint topic is not log compacted (which is true
> in
> >> > our environment).
> >> >
> >> > Thanks,
> >> > C
> >> >
> >> > On Wed, Jan 6, 2016 at 9:56 AM, Bae, Jae Hyeon <metac...@gmail.com>
> >> wrote:
> >> >
> >> > > Hi Samza devs and users
> >> > >
> >> > > I know this will be tricky in Samza because Samza Kafka consumer is
> >> not
> >> > > coordinated externally, but do you have any idea how to deploy samza
> >> jobs
> >> > > with zero downtime?
> >> > >
> >> > > Thank you
> >> > > Best, Jae
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > Thanks and regards
> >> >
> >> > Chinmay Soman
> >> >
> >>
> >
> >
> >
> > --
> > Thanks and regards
> >
> > Chinmay Soman
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: How to synchronize KeyValueStore and Kafka cleanup

2015-10-02 Thread Chinmay Soman
> Does KV-store consume automatically from a Kafka topic?
Yes - if you've configured changelog stream for your store

>  Does it consume only on restore()?
It consumes only during container initialization (again, assuming if you
have changelog configured)

> implement the StreamTask job to consume a Kafka topic and call add()
method?
Why wouldn't you want to use a changelog ?


On Fri, Oct 2, 2015 at 3:09 PM, Bae, Jae Hyeon <metac...@gmail.com> wrote:

> Thanks Yi Pan, I have one more question.
>
> Does KV-store consume automatically from a Kafka topic? Does it consume
> only on restore()? If so, do I have to implement the StreamTask job to
> consume a Kafka topic and call add() method?
>
> On Fri, Oct 2, 2015 at 2:01 PM, Yi Pan <nickpa...@gmail.com> wrote:
>
> > Hi, Jae Hyeon,
> >
> > Good to see you back on the mailing list again! Regarding to your
> > questions, please see the answers below:
> >
> > > My KeyValueStore usage is a little bit different from usual cases
> because
> > > >  I have to cache all unique ids for the past six hours, which can be
> > > > configured for the retention usage. Unique ids won't be repeated such
> > as
> > > > timestamp. In this case, log.cleanup.policy=compact will keep growing
> > the
> > > > KeyValueStore size, right?
> > >
> >
> > It will grow as big as the accumulative size of your unique ids.
> >
> >
> > > >
> > > > Can I use Samza KeyValueStore for the topics
> > > > with log.cleanup.policy=delete? If not, what's your recommended way
> for
> > > > state management of non-changelog Kafka topic? If it's possible, how
> > does
> > > > Kafka cleanup remove outdated records in KeyValueStore?
> > >
> >
> > I am not quite sure about your definition of "non-changelog" Kafka
> topics.
> > If you want to retire some of the old records in a KV-store periodically,
> > you will have to run the pruning manually in the window() method in the
> > current release. In the upcoming 0.10 release, we have incorporated
> RocksDB
> > TTL features in the KV-store, which would automatically prune the old
> > entries in the RocksDB store automatically. That said, the upcoming TTL
> > feature is not fully synchronized w/ the Kafka cleanup yet and is an
> > on-going work in the future. The recommendation is to use the TTL feature
> > and set the Kafka changelog to be time-retention based, w/ a retention
> time
> > longer than the RocksDB TTL to ensure no data loss.
> >
> > Hope the above answered your questions.
> >
> > Cheers!
> >
> > -Yi
> >
>



-- 
Thanks and regards

Chinmay Soman


Re: changelog compaction problem

2015-07-29 Thread Chinmay Soman
Just curious,

Can you double check if you have log compaction enabled on your Kafka
brokers ?

On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev w...@fastmail.fm wrote:

 Hello,

 I have a problem with changelog in one of my samza jobs grows indefinitely.

 The job is quite simple, it reads messages from the input kafka topic, and
 either creates or updates a key in task-local samza store. Once in a minute
 the window method kicks-in, it iterates over all keys in the store and
 deletes some of them, selecting on the contents of their value.

 Message rate in input topic is about 3000 messages per second. The input
 topic is partitioned in 48 partitions. Average number of keys, kept in the
 store is more or less stable and do not exceed 1 keys per task. Average
 size of values is 50 bytes. So I expected that sum of all segments' size in
 kafka data directory for the job's changelog topic should not exceed
 1*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days
 running from scratch) and it is growing.

 I tried to change default segment size for changelog topic in kafka, and
 it worked a bit - instead of 500Mbyte segments I have now 50Mbyte segments,
 but it did not heal the indefinite data growth problem.

 Moreover, if I stop the job and start it again it cannot restart, it
 breaks right after reading all records from changelog topic.

 Did somebody have similar problem? How it could be resolved?

 Best regards,
 Vladimir

 --
 Vladimir Lebedev
 w...@fastmail.fm




-- 
Thanks and regards

Chinmay Soman


Re: [DISCUSS] Release 0.10.0

2015-07-29 Thread Chinmay Soman
I can take care of SAMZA-340, SAMZA-683  and will follow up with Luis for
SAMZA-401,2,3,4

On Wed, Jul 29, 2015 at 12:10 AM, Dan danharve...@gmail.com wrote:

 I agree SAMZA-741 for the ElasticSearch producer should be in too so we've
 got a better API as part of that release.

  - Dan


 On 29 July 2015 at 07:51, Yan Fang yanfang...@gmail.com wrote:

  Actually, I also want to include a few patch-available features,
  especially:
 
  1. broadcast stream (SAMZA-676)
  - waiting for review
 
  2. graphite support (SAMZA-340)
  3. meter and histogram (SAMZA-683)
  4. utility (SAMZA-401)
  - 2,3,4 belong to Luis, if he does not have time to update, since
 they
  only need some small changes, we can edit it and get +1 from another
  committer.
 
  5. hdfs producer (SAMZA-693)
  - I am reviewing.
 
  6. upgrade yarn to 2.7.1 (SAMZA-563)
 - though I am reviewing, this ticket is negotiable if we want to put
  into the 0.10.0 release. If we do not, I think, when users enable the
  worker-persisting and container-persisting features, Samza will not be
 able
  to handle it. (Some classes are only available after yarn 2.5.0 while
 Samza
  currently only support yarn 2.4.0)
 
  7. others: scrooge, class loader isolation, etc.
  - those are waiting for reviewing too.
 
  My opinion is that, if we can clean up all the patch-available tickets,
 it
  will be great. Most of them have been already reviewed more than once.
 So I
  think it should not be very time-consuming to have them in the 0.10.0
  release.
 
  What do you think?
 
  Of course, another must-have is the bug-fix of the Stream Appender. :)
 
  Thanks,
 
  Fang, Yan
  yanfang...@gmail.com
 
  On Tue, Jul 28, 2015 at 10:27 PM, Roger Hoover roger.hoo...@gmail.com
  wrote:
 
   Thanks, Yi.
  
   I propose that we also include SAMZA-741 for Elasticsearch versioning
   support with the new ES producer.  I think it's very close to being
  merged.
  
   Roger
  
  
   On Tue, Jul 28, 2015 at 10:08 PM, Yi Pan nickpa...@gmail.com wrote:
  
Hi, all,
   
I want to start the discussion on the release schedule for 0.10.0.
  There
are a few important features that we plan to release in 0.10.0 and I
  want
to start this thread s.t. we can agree on what to include in 0.10.0
release.
   
There are the following main features added in 0.10.0:
- RocksDB TTL support
- Add CoordinatorStream and disable CheckpointManager
- Elasticsearch Producer
- Host affinity
And other 0.10.0 tickets:
   
   
  
 
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(%22In%20Progress%22%2C%20Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.10.0
   
I propose to cut a 0.10.0 release after we get the following issues
resolved:
- SAMZA-615: Migrate checkpoint from checkpoint topic to Coordinator
   stream
- SAMZA-617: YARN host affinity in Samza
   
Thoughts?
   
Thanks!
   
-Yi
   
  
 




-- 
Thanks and regards

Chinmay Soman


Re: [DISCUSS] Samza 0.9.1 release

2015-06-16 Thread Chinmay Soman
+1

On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh 
nram...@linkedin.com.invalid wrote:

 +1 for the release!

 On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote:

 +1 Agreed.
 
 Thanks!
 
 On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote:
 
  Agreed on this.
 
  Thanks,
 
  Fang, Yan
  yanfang...@gmail.com
 
  On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com
  wrote:
 
   Hi all,
  
   We have been running a couple of our jobs against `0.9.1` branch last
  week
   at LinkedIn with some critical bug fixes back-ported, including:
  
   SAMZA-608
   Deserialization error causes SystemConsumers to hang
  
   SAMZA-616
   Shutdown hook does not wait for container to finish
  
   SAMZA-658
   Iterator.remove breaks caching layer
  
   SAMZA-662 / 686
   Samza auto-creates changelog stream without sufficient partitions when
   container number  1
  
   I am proposing a release vote on the current 0.9.1 branch for these
 bug
   fixes. Thoughts?
  
   -- Guozhang
  
 




-- 
Thanks and regards

Chinmay Soman


Re: improving hello-samza / testing

2015-06-16 Thread Chinmay Soman
We've built a driver program which kinda falls along approach (1) listed in
your email.

The driver program accepts a custom task object and has a way to inject
data - which in turn invokes the process method. For now we're assuming
logical time and use the frequency of process() invocations to deduce when
to invoke the window() method (for eg: invoke window once for every 4 calls
to process).

We've also built our own Incoming and Outgoing envelope - which is just in
the form of a Java List. This is how the result is evaluated (either get
the full result list or define a callback which is invoked every time a
collector.send is called). Its still work in progress and the goal is to
make unit testing along the lines of Storm unit tests.

On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org
wrote:

 Hey Tim,

 This is a really good discussion to have. The testing that I've seen with
 Samza falls into two categories:

 1. Instantiate your StreamTask, and mock all params in the process()/init()
 methods.
 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds
 messages into a topic, and validates it gets messages back out from the
 output topic.
 3. A full blown integration test that uses Zopkio.

 For an example of (2), in practice, have a look at TestStatefuleTask:



 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD

 As you can see, writing this kind of integration test can be a bit painful.

 (3) is documented here:

   http://samza.apache.org/contribute/tests.html

 Another way to test would be to start a full-blown container using
 ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out
 the system consumer/system producer.

 Has anyone else tested Samza in other ways?

 Cheers,
 Chris

 On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com
 wrote:

  I'm learning samza by the hello-samza project and notice the lack of
  tests.  Where's a good place to learn how folks are properly testing
  things written with samza?
 
  Thanks,
  --tim
 




-- 
Thanks and regards

Chinmay Soman


Re: Review Request 34011: Add support for a Graphite Metrics Reporter

2015-05-10 Thread Chinmay Soman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34011/#review83169
---



build.gradle
https://reviews.apache.org/r/34011/#comment134045

This shouldn't be in Samza-core. Create a separate module/project called 
samza-graphite (just like samza-log4j / samza-kv)



gradle/dependency-versions.gradle
https://reviews.apache.org/r/34011/#comment134046

This is kinda surprising. We already have a metricsVersion defined ? I 
don't see it being used.



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteCounter.java
https://reviews.apache.org/r/34011/#comment134047

Please move your changes to a separate module (not in samza-core)



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteReporterFactory.java
https://reviews.apache.org/r/34011/#comment134049

I think lets pass the container name to the Samza graphite reporter 
factory. Might be useful if want to somehow aggregate metrics across all tasks 
in a container in the future.



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134050

Add unit for value

Make this static private ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134052

Why not TimeUnit.MILLISECONDS.toNanos(value) ?

The 'value' is in milliseconds right ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134051

Make this static private ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134053

Make it private ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134054

Samza's Snapshot has a getMax



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134056

Make all custom methods as private



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java
https://reviews.apache.org/r/34011/#comment134057

Throw an exception ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteTimer.java
https://reviews.apache.org/r/34011/#comment134058

Check if duration = 0L



samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteTimer.java
https://reviews.apache.org/r/34011/#comment134059

This will create a new object in each invocation. Why not store this as a 
member variable ?



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134078

We should make this similar to the way metric names are constructed in 
other reporter (for instance check JmxReporter)



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134079

Make it private



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134080

Make it private static



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134081

Use this format: getLong (name, default_value)



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134082

Include the source as well (look at JmxReporter)



samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java
https://reviews.apache.org/r/34011/#comment134083

Include the source as well (look at JmxReporter)

for this and other metric names


- Chinmay Soman


On May 9, 2015, 6:31 a.m., Luis De Pombo wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 https://reviews.apache.org/r/34011/
 ---
 
 (Updated May 9, 2015, 6:31 a.m.)
 
 
 Review request for samza.
 
 
 Repository: samza
 
 
 Description
 ---
 
 Add support for a Graphite Metrics Reporter
 
 
 Diffs
 -
 
   build.gradle ac80a8664180e556ec83e229e04e3d8c56b70506 
   checkstyle/import-control.xml 5f8e103a2e43f96518b20de1c7cbd84e0af24842 
   gradle/dependency-versions.gradle ee6dfc411b7ab90b187df79f109884127953862e 
   
 samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteCounter.java
  PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteGauge.java 
 PRE-CREATION 
   
 samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteReporterFactory.java
  PRE-CREATION 
   
 samza

Re: Samza serializers jar for 0.9.0 ?

2015-04-29 Thread Chinmay Soman
Oh never mind - I forgot it being merged into core. Sorry for the noise.

On Wed, Apr 29, 2015 at 12:03 PM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 I don't see the serializers jar being published in Maven. Yan - know
 anything about this ?

 --
 Thanks and regards

 Chinmay Soman




-- 
Thanks and regards

Chinmay Soman


Samza serializers jar for 0.9.0 ?

2015-04-29 Thread Chinmay Soman
I don't see the serializers jar being published in Maven. Yan - know
anything about this ?

-- 
Thanks and regards

Chinmay Soman


Re: Extra Systems and other extensions.

2015-04-15 Thread Chinmay Soman
+1 ! I was going to do this for my use case as well. Would love to have
this !

On Wed, Apr 15, 2015 at 9:24 AM, Roger Hoover roger.hoo...@gmail.com
wrote:

 Dan,

 This is great.  Would love to have a common ElasticSearch system producer.

 Cheers,

 Roger

 On Tue, Apr 14, 2015 at 1:34 PM, Dan danharve...@gmail.com wrote:

  Thanks Jakob, I agree they'll be more maintained and tested if they're in
  the main repo so that's great.
 
  I'll sort out Jira's and get some patches of what we've got working now
 out
  for review.
 
   - Dan
 
 
  On 14 April 2015 at 19:56, Jakob Homan jgho...@gmail.com wrote:
 
   Hey Dan-
   I'd love for the Elastic Search stuff to be added to the main code, as
   a separate module.  Keeping these in the main source code keeps them
   more likely to be maintained and correct.
  
   The EvironemtnConfigRewriter can likely go in the same place as the
   ConfigRewriter interface, since it doesn't depend on Kafka as the
   current RegexTopicRewriter does.
  
   If you could open JIRAs for these, that would be great.  Happy to
   shepard the code in.
   -Jakob
  
   On 14 April 2015 at 11:46, Dan danharve...@gmail.com wrote:
Hey,
   
At state.com we've started to write some generic extensions to Samza
   that
we think would be more generally useful. We've got
a ElasticsearchSystemProducer/Factory to output to an Elasticsearch
  index
and EnvironmentConfigRewriter to modify config from environment
  variable.
   
What's the best way for us to add things like this? Do you want more
modules in the main project or should we just create some separate
   projects
on github?
   
It would be good to get core extensions like these shared to be
 tested
   and
used by more people
   
Thanks,
Dan
  
 




-- 
Thanks and regards

Chinmay Soman


Re: [VOTE] Apache Samza 0.9.0 RC0

2015-03-29 Thread Chinmay Soman
 here:

 http://people.apache.org/~yanfang/samza-0.9.0-rc0/

 The release candidate is signed with pgp key
  CAC06239EA00BA80,
 which
  is
 included in the repository's KEYS file:



   
  
 

   
  
 
 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95

 and can also be found on keyservers:


   http://pgp.mit.edu/pks/lookup?op=getsearch=0xCAC06239EA00BA80

 The git tag is release-0.9.0-rc0 and signed with the same
  pgp
key:



   
  
 

   
  
 
 https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=1039f7ede6490f9420dcecd6adc7677b97e78bcf

 Test binaries have been published to Maven's staging
   repository,
 and
   are
 available here:


  

  
 https://repository.apache.org/content/repositories/orgapachesamza-1005/

 Note that the binaries were built with JDK6 without
  incident.

 95 issues were resolved for this release:



   
  
 

   
  
 
 https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.0%20AND%20status%20in%20(Resolved%2C%20Closed)

 The vote will be open for 72 hours ( end in 4:00pm Friday,
 03/27/2015
   ).
 Please download the release candidate, check the
hashes/signature,
   build
it
 and test it, and then please vote:

 [ ] +1 approve
 [ ] +0 no opinion
 [ ] -1 disapprove (and reason why)

 +1 from my side for the release.

 Fang, Yan
 yanfang...@gmail.com
 +1 (206) 849-4108

   
  
 

   
  
 




-- 
Thanks and regards

Chinmay Soman


Re: cannot be cast to java.lang.String

2015-03-25 Thread Chinmay Soman
I think this is a bit specific to Samza. In the KafkaSystemProducer class,
it does something like this:

envelope.getMessage.asInstanceOf[Array[Byte]]
and not just 'byte[]'. This is why we need to be explicit about the
serialization format.


On Wed, Mar 25, 2015 at 3:14 AM, Jordi Blasi Uribarri jbl...@nextel.es
wrote:

 I am using the Kafka command line producer, so I understand that I am
 sending a String.

 bin/kafka-console-producer.sh --broker-list localhost:9092 --topic
 syslog

 What is actually the difference between a string and a json? Is it just a
 matter of deserialization or is there any kind of metadata included that
 specifies the contest type?

 How do I enable the debug mode?

 Thanks,

 Jordi

 -Mensaje original-
 De: Chinmay Soman [mailto:chinmay.cere...@gmail.com]
 Enviado el: lunes, 23 de marzo de 2015 17:55
 Para: dev@samza.apache.org
 Asunto: Re: cannot be cast to java.lang.String

 Hey Jordi,

 This is because you're sending String and not json in your output topic.
 Try setting string on the output stream as well (if you haven't already).

 If you have done that - then please enable debug mode and attach the log
 somewhere so that we can take a look.

 On Mon, Mar 23, 2015 at 9:52 AM, Jordi Blasi Uribarri jbl...@nextel.es
 wrote:

  Looks like that was one error.
 
  I have set the property like this:
  systems.kafka.streams.syslog.samza.msg.serde=string
 
  But I am still getting the same error. Now I am seeing a different
  thing in the log previous to the exception:
 
  23 mar 2015 05:49:31  INFO KafkaSystemProducer - Creating a new
  producer for system kafka.
  23 mar 2015 05:49:31  INFO ProducerConfig - ProducerConfig values:
  value.serializer = class
  org.apache.kafka.common.serialization.ByteArraySerializer
  key.serializer = class
  org.apache.kafka.common.serialization.ByteArraySerializer
  block.on.buffer.full = true
  retry.backoff.ms = 100
  buffer.memory = 33554432
  batch.size = 16384
  metrics.sample.window.ms = 3
  metadata.max.age.ms = 30
  receive.buffer.bytes = 32768
  timeout.ms = 3
  max.in.flight.requests.per.connection = 1
  bootstrap.servers = [broker01:9092]
  metric.reporters = []
  client.id = samza_producer-samzafroga_job1-1-1427086163149-3
  compression.type = none
  retries = 2147483647
  max.request.size = 1048576
  send.buffer.bytes = 131072
  acks = 1
  reconnect.backoff.ms = 10
  linger.ms = 0
  metrics.num.samples = 2
  metadata.fetch.timeout.ms = 6
 
  23 mar 2015 05:49:31 ERROR SamzaContainer - Caught exception in
  process loop.
  java.lang.ClassCastException: java.lang.String cannot be cast to [B
  at
 
 org.apache.samza.system.kafka.KafkaSystemProducer.send(KafkaSystemProducer.scala:80)
  at
  org.apache.samza.system.SystemProducers.send(SystemProducers.scala:87)
  at
 
 org.apache.samza.task.TaskInstanceCollector.send(TaskInstanceCollector.scala:61)
  at samzafroga.job1.process(job1.java:21)
  at
 
 org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp(TaskInstance.scala:129)
  at
 
 org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(TaskInstanceExceptionHandler.scala:54)
  at
  org.apache.samza.container.TaskInstance.process(TaskInstance.scala:128)
  at
 
 org.apache.samza.container.RunLoop$$anonfun$process$2.apply(RunLoop.scala:114)
  at
  org.apache.samza.util.TimerUtils$class.updateTimer(TimerUtils.scala:37)
  at
 org.apache.samza.container.RunLoop.updateTimer(RunLoop.scala:36)
  at org.apache.samza.container.RunLoop.process(RunLoop.scala:100)
  at org.apache.samza.container.RunLoop.run(RunLoop.scala:83)
  at
  org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:549)
  at
  org.apache.samza.job.local.ThreadJob$$anon$1.run(ThreadJob.scala:42)
 
  Thanks.
 
  Jordi
  -Mensaje original-
  De: Chinmay Soman [mailto:chinmay.cere...@gmail.com]
  Enviado el: lunes, 23 de marzo de 2015 17:36
  Para: dev@samza.apache.org
  Asunto: Re: cannot be cast to java.lang.String
 
  Have you tried setting this :
 
  systems.kafka.streams.syslog.samza.msg.serde=string   // And assuming
  you've defined a 'string' serializer in your config
 
  OR
 
  systems.kafka.streams.syslog.samza.msg.serde=json // Depending on the
  corresponding format of your input data
 
  On Mon, Mar 23, 2015 at 9:24 AM, Jordi Blasi Uribarri
  jbl...@nextel.es
  wrote:
 
   Hi,
  
   As I understand it, I am setting kafka as the system name, beste
   as the output topic in the system and syslog as the input topic.
   Both topics syslog and beste are working correctly as I am streaming
   some syslogs to the syslog topic and I am testing beste with an
   internal

Re: New to Samza/Yarn and having Kafka issues

2015-03-23 Thread Chinmay Soman
Oh, I just meant in your job config:

metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory
metrics.reporters=jmx


On Mon, Mar 23, 2015 at 5:13 PM, Ash W Matheson ash.mathe...@gmail.com
wrote:

 I'm assuming I have Jmx defined ... where would that get set?

 On Mon, Mar 23, 2015 at 5:08 PM, Chinmay Soman chinmay.cere...@gmail.com
 wrote:

  Hey Ash,
 
  Can you see your job metrics (if you have the Jmx metrics defined) to see
  if your job is actually doing anything ? My only guess at this point is
 the
  process method is not being called because somehow there's no incoming
  data. I could be totally wrong of course.
 
  On Mon, Mar 23, 2015 at 4:28 PM, Ash W Matheson ash.mathe...@gmail.com
  wrote:
 
   Just to be clear, here's what's changed from the default hello-samza
  repo:
  
   wikipedia-parser.properties==
   task.inputs=kafka.myTopic
   systems.kafka.consumer.zookeeper.connect=
   ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:2181/
   systems.kafka.consumer.auto.offset.reset=smallest
  
   WikipediaParserStreamTask.java =
 public void process(IncomingMessageEnvelope envelope,
 MessageCollector
   collector, TaskCoordinator coordinator) {
   MapString, Object jsonObject = (MapString, Object)
   envelope.getMessage();
   WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject);
  
   try {
 System.out.println(event.getRawEvent());
 // MapString, Object parsedJsonObject =
  parse(event.getRawEvent());
  
 // parsedJsonObject.put(channel, event.getChannel());
 // parsedJsonObject.put(source, event.getSource());
 // parsedJsonObject.put(time, event.getTime());
  
 // collector.send(new OutgoingMessageEnvelope(new
   SystemStream(kafka, wikipedia-edits), parsedJsonObject));
  
   as well as the aforementioned changes to the log4j.xml file.
  
   The data pushed into the 'myTopic' topic is nothing more than a
 sentence.
  
  
   On Mon, Mar 23, 2015 at 4:16 PM, Ash W Matheson 
 ash.mathe...@gmail.com
   wrote:
  
yep, modified log4j.xml to look like this:
   
  root
priority value=debug /
appender-ref ref=RollingAppender/
appender-ref ref=jmx /
  /root
   
Not sure what you mean by #2.
   
However, I'm running now, not seeing any exceptions, but still not
  seeing
any output from System.out.println(...)
   
On Mon, Mar 23, 2015 at 11:29 AM, Naveen Somasundaram 
nsomasunda...@linkedin.com.invalid wrote:
   
Hey Ash,
   1. Did you happen to modify your log4j.xml ?
   2. Can you print the class path that was printed when
  the
job started ? I am wondering if log4j was not loaded or not present
 in
   the
path where it’s looking for. If you have been using hello samza, it
   should
have pulled it from Maven.
   
Thanks,
Naveen
   
On Mar 22, 2015, at 10:35 AM, Ash W Matheson 
 ash.mathe...@gmail.com
wrote:
   
 Hey all,

 Evaluating Samza currently and am running into some odd issues.

 I'm currently working off the 'hello-samza' repo and trying to
  parse a
 simple kafka topic that I've produced through an extenal java app
(nothing
 other than a series of sentences) and it's failing pretty hard for
  me.
The
 base 'hello-samza' set of apps works fine, but as soon as I change
  the
 configuration to look at a different Kafka/zookeeper I get the
following in
 the userlogs:

 2015-03-22 17:07:09 KafkaSystemAdmin [WARN] Unable to fetch last
   offsets
 for streams [myTopic] due to kafka.common.KafkaException: fetching
   topic
 metadata for topics [Set(myTopic)] from broker
 [ArrayBuffer(id:0,host:redacted,port:9092)] failed. Retrying.


 The modifications are pretty straightforward.  In the
 Wikipedia-parser.properties, I've changed the following:
 task.inputs=kafka.myTopic
 systems.kafka.consumer.zookeeper.connect=redacted:2181/
 systems.kafka.consumer.auto.offset.reset=smallest
 systems.kafka.producer.metadata.broker.list=redacted:9092

 and in the actual java file WikipediaParserStreamTask.java
  public void process(IncomingMessageEnvelope envelope,
   MessageCollector
 collector, TaskCoordinator coordinator) {
MapString, Object jsonObject = (MapString, Object)
 envelope.getMessage();
WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject);

try {
System.out.println(event.getRawEvent());

 And then following the compile/extract/run process outlined in the
 hello-samza website.

 Any thoughts?  I've looked online for any 'super simple' examples
 of
 ingesting kafka in samza with very little success.
   
   
   
  
 
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 




-- 
Thanks and regards

Chinmay Soman


Re: New to Samza/Yarn and having Kafka issues

2015-03-23 Thread Chinmay Soman
I changed the systems.kafka.samza.msg.serde=json to 'string' a while back,
but that caused a separate exception.  However that was many, MANY attempts
ago.

This may not work because that will set all serialization formats (input
and output) to json / string. In your case you're inputting string and
outputting json. So you might have to set that explicitly.

On Mon, Mar 23, 2015 at 9:24 PM, Chinmay Soman chinmay.cere...@gmail.com
wrote:

 Since you're producing String data to 'myTopic', can you try setting the
 string serialization in your config ?

 serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory

 systems.kafka.streams.myTopic.samza.msg.serde=string


 On Mon, Mar 23, 2015 at 9:17 PM, Ash W Matheson ash.mathe...@gmail.com
 wrote:

 more info - new exception message:

 Exception in thread main
 org.apache.samza.system.SystemConsumersException: Cannot deserialize an
 incoming message.

 Updated the diff in pastebin with the changes.

 On Mon, Mar 23, 2015 at 8:41 PM, Ash W Matheson ash.mathe...@gmail.com
 wrote:

  Gah!  Yeah, those were gone several revisions ago but didn't get nuked
 in
  the last iteration.
 
  OK, let me do a quick test to see if that was my problem all along.
 
  On Mon, Mar 23, 2015 at 8:38 PM, Navina Ramesh 
  nram...@linkedin.com.invalid wrote:
 
  Hey Ash,
  I was referring to the lines before the try block.
 
  MapString, Object jsonObject = (MapString, Object)
  envelope.getMessage();
  WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject);
 
  try {
System.out.println([DWH] should see this);
System.out.println(event.getRawEvent());
  …
 
 
  Did you remove those lines as well?
 
  Navina
 
  On 3/23/15, 8:31 PM, Ash W Matheson ash.mathe...@gmail.com wrote:
 
  Just looking at the diff I posted and it's:
  
  
 1.  try {
 2. -  MapString, Object parsedJsonObject =
  parse(event.getRawEvent(
 ));
 3. +  System.out.println([DWH] should see this);
 4. +  System.out.println(event.getRawEvent());
 5. +  // MapString, Object parsedJsonObject = parse(
 event.getRawEvent());
  
  
  I've removed the Map and added two System.out.println calls.  So no,
  there
  shouldn't be any reference to
  MapString, Object parsedJsonObject = parse(event.getRawEvent());
  in the source java file.
  
  
  On Mon, Mar 23, 2015 at 7:42 PM, Ash W Matheson 
 ash.mathe...@gmail.com
  wrote:
  
   I'm in transit right now but if memory serves me everything should
 be
   commented out of that method except for the System.out.println call.
  I'll
   be home shortly and can confirm.
   On Mar 23, 2015 7:28 PM, Navina Ramesh
 nram...@linkedin.com.invalid
  
   wrote:
  
   Hi Ash,
   I just ran wikipedia-parser with your patch. Looks like you have
 set
  the
   message serde correctly in the configs. However, the original code
  still
   converts it into a Map for consumption in the WikipediaFeedEvent.
   I am seeing the following (expected):
  
   2015-03-23 19:17:49 SamzaContainerExceptionHandler [ERROR] Uncaught
   exception in thread (name=main). Exiting process now.
   java.lang.ClassCastException: java.lang.String cannot be cast to
   java.util.Map
at
  
  
 
 
 samza.examples.wikipedia.task.WikipediaParserStreamTask.process(Wikipedi
  aPa
   rserStreamTask.java:38)
at
  
  
 
 
 org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp(
  Tas
   kInstance.scala:133)
  
   Did you make the changes to fix this error? Your patch doesn¹t
 seem to
   have that.
   Line 38 MapString, Object jsonObject = (MapString, Object)
   envelope.getMessage();
  
  
  
   Lmk so I can investigate further.
  
   Cheers!
   Navina
  
   On 3/23/15, 6:43 PM, Ash W Matheson ash.mathe...@gmail.com
 wrote:
  
   If anyone's interested, I've posted a diff of the project here:
   http://pastebin.com/6ZW6Y1Vu
   and the python publisher here: http://pastebin.com/2NvTFDFx
   
   if you want to take a stab at it.
   
   On Mon, Mar 23, 2015 at 6:04 PM, Ash W Matheson
  ash.mathe...@gmail.com
   wrote:
   
Ok, so very simple test, all running on a local machine, not
 across
networks and all in the hello-samza repo this time around.
   
I've got the datapusher.py file set up to push data into
 localhost.
  One
event per second.
And a modified hello-samza where I've modified the
WikipediaParserStreamTask.java class to simply read what's
 there.
   
Running them both now and I'm seeing in the stderr files
   
 (deploy/yarn/logs/userlogs/application_X/container_/stderr)
  the
following:
   
Exception in thread main
org.apache.samza.system.SystemConsumersException: Cannot
  deserialize an
incoming message.
at
   
  
  
 
 
 org.apache.samza.system.SystemConsumers.update(SystemConsumers.scala:2
  93)
at org.apache.samza.system.SystemConsumers.org
   
 
 $apache$samza$system$SystemConsumers$$poll(SystemConsumers.scala:260

Re: Kafka topic naming conventions

2015-03-18 Thread Chinmay Soman
Thats what we're doing as well - appending partition count to the kafka
topic name. This actually helps keep track of the #partitions for each
topic (since Kafka doesn't have a Metadata store yet).

In case of topic expansion - we actually just resort to creating a new
topic. Although that is an overhead - the thought process is that this will
minimize operational errors. Also, this is necessary to do in case we're
doing some kind of joins.


On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:

 On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org wrote:
  One thing I haven't seen, but might be relevant, is including partition
  counts in the topic.

 Yeah, but then if you change the partition count later on, you've got
 incorrect information forever. Or you need to create a new stream,
 which might be a nice forcing function to make sure your join isn't
 screwed up.  There'd need to be something somewhere to enforce that
 though.




-- 
Thanks and regards

Chinmay Soman


Re: Kafka topic naming conventions

2015-03-18 Thread Chinmay Soman
Yeah ! It does seem a bit hackish - but I think this approach promises less
config/operation errors.

Although I think some of these checks can be built within Samza - assuming
Kafka has a metadata store in the near future - the Samza container can
validate the #topics against this store.

On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini criccom...@apache.org
wrote:

 Hey Chinmay,

 Cool, this is good feedback. I didn't think I was *that* crazy. :)

 Cheers,
 Chris

 On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman chinmay.cere...@gmail.com
 wrote:

  Thats what we're doing as well - appending partition count to the kafka
  topic name. This actually helps keep track of the #partitions for each
  topic (since Kafka doesn't have a Metadata store yet).
 
  In case of topic expansion - we actually just resort to creating a new
  topic. Although that is an overhead - the thought process is that this
 will
  minimize operational errors. Also, this is necessary to do in case we're
  doing some kind of joins.
 
 
  On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote:
 
   On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org
  wrote:
One thing I haven't seen, but might be relevant, is including
 partition
counts in the topic.
  
   Yeah, but then if you change the partition count later on, you've got
   incorrect information forever. Or you need to create a new stream,
   which might be a nice forcing function to make sure your join isn't
   screwed up.  There'd need to be something somewhere to enforce that
   though.
  
 
 
 
  --
  Thanks and regards
 
  Chinmay Soman
 




-- 
Thanks and regards

Chinmay Soman


Question on hello-samza (Kafka startup and shutdown)

2015-02-19 Thread Chinmay Soman
Sending to a wider audience to know if anyone is also seeing this issue.

It seems Kafka gets in a weird state everytime I do bin/grid stop all  (and
then start all).

I keep getting a LeaderNotAvailable exception on the producer side. It
seems this happens everytime Kafka hasn't been shut down properly. This
issue goes away if I use the following sequence:

* bin/grid stop kafka
* bin/grid stop zookeeper (after like 5 seconds).

(and then start everything).

Has anyone else seen this ?

-- 
Thanks and regards

Chinmay Soman