Re: [VOTE] Apache Samza 0.13.1 RC0
Downloaded source, compiled and ran all tests. All passed. Also verified signature. +1 (Binding) On Wed, Aug 23, 2017 at 11:27 AM, Boris S <bor...@gmail.com> wrote: > Hi, > I've downloaded and built it on Linux. > Ran unit and integration tests. All passed. > The only issue I've encountered was a version of python. > Version below 2.5 and above 3 all failed. 2.7 was the only version that > worked. > > +1 (non binding) > > On Fri, Aug 18, 2017 at 11:59 AM, Fred Haifeng Ji <haifeng...@gmail.com> > wrote: > > > This is a call for a vote on a release of Apache Samza 0.13.1. Thanks to > > everyone who has contributed to this release. > > > > The release candidate can be downloaded from here: > > http://home.apache.org/~navina/samza-0.13.1-rc0/ > > > > > > The release candidate is signed with pgp key A211312E, which can be found > > on keyservers: > > http://pgp.mit.edu/pks/lookup?op=get=0xEDFD8F9AA211312E > > > > > > The git tag is release-0.13.1-rc0 and signed with the same pgp key: > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h= > > refs/tags/release-0.13.1-rc0 > > > > Test binaries have been published to Maven's staging repository, and are > > available here: > > *https://repository.apache.org/content/repositories/orgapachesamza-1030/ > > <https://repository.apache.org/content/repositories/orgapachesamza-1030/ > >* > > > > > > 29 issues were resolved for this release: https://issues.apache > > .org/jira/issues/?jql=project%20%3D%2012314526%20AND%20fixVe > > rsion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC > > > > > > The vote will be open for 72 hours (ending at 1:00PM Monday, 08/21/2017). > > > > Please download the release candidate, check the hashes/signature, build > it > > and test it, and then please vote: > > > > > > [ ] +1 approve > > > > [ ] +0 no opinion > > > > [ ] -1 disapprove (and reason why) > > > > > > -- > > Fred Ji > > > -- Thanks and regards Chinmay Soman
Re: [VOTE] Apache Samza 0.12.0 RC2
Downloaded release (on Mac), checked build test (and checkall). Verified the pgp key (although with the warning). +1 Binding. On Mon, Feb 13, 2017 at 3:26 PM, Renato Marroquín Mogrovejo < renatoj.marroq...@gmail.com> wrote: > I also run check-all against Debian and build was successful, although I > saw a bunch of this error: > > apache-samza-0.12.0-src/samza-hdfs/src/main/java/org/apache/ > samza/system/hdfs/HdfsSystemConsumer.java:59: > error: unmappable character for encoding ASCII > * > > ? > > I don't think they are a blocker, it's some characters not being able to be > parsed when doing the scalaCompile task for the samza-hdfs component. Is it > worth opening a JIRA to fix this? > > > Best, > > Renato M. > > > 2017-02-13 22:20 GMT+01:00 Navina Ramesh <nram...@linkedin.com.invalid>: > > > I ran check-all against Mac and integration tests on Linux. Looks good > with > > no concerning issues. > > > > +1 (binding) > > > > Thanks! > > Navina > > > > On Fri, Feb 10, 2017 at 9:25 AM, Boris S <bor...@gmail.com> wrote: > > > > > I also successfully ran the integration tests on Linux. All passed. > > > +1 non-binding > > > > > > On Wed, Feb 8, 2017 at 4:57 PM, Jacob Maes <jacob.m...@gmail.com> > wrote: > > > > > > > Build and integration tests were successful for me. > > > > > > > > +1 non-binding > > > > > > > > On Wed, Feb 8, 2017 at 4:48 PM, xinyu liu <xinyuliu...@gmail.com> > > wrote: > > > > > > > > > Ran build, checkAll and integration tests. All passed. > > > > > > > > > > +1 non-binding. > > > > > > > > > > Thanks, > > > > > Xinyu > > > > > > > > > > On Wed, Feb 8, 2017 at 4:18 PM, Boris S <bor...@gmail.com> wrote: > > > > > > > > > > > Cloned the release and ran build, test and checkAll.sh > > > > > > All passed. > > > > > > Verified MD5 and the signature. > > > > > > Got warning - "this key is not certified with a trusted > > signature". I > > > > > guess > > > > > > it is ok. > > > > > > > > > > > > +1 > > > > > > > > > > > > On Mon, Feb 6, 2017 at 5:32 PM, Jagadish Venkatraman < > > > > > > jagadish1...@gmail.com > > > > > > > wrote: > > > > > > > > > > > > > This is a call for a vote on a release of Apache Samza 0.12.0. > > > Thanks > > > > > to > > > > > > > everyone who has contributed to this release. We are very glad > to > > > see > > > > > > some > > > > > > > new contributors in this release. > > > > > > > > > > > > > > The release candidate can be downloaded from here: > > > > > > > http://home.apache.org/~jagadish/samza-0.12.0-rc2/ > > > > > > > > > > > > > > The release candidate is signed with pgp key AF81FFBF, which > can > > be > > > > > found > > > > > > > on keyservers: > > > > > > > http://pgp.mit.edu/pks/lookup?op=get=0xAF81FFBF > > > > > > > > > > > > > > The git tag is release-0.12.0-rc2 and signed with the same pgp > > key: > > > > > > > https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h= > > > > > > > refs/tags/release-0.12.0-rc2 > > > > > > > > > > > > > > Test binaries have been published to Maven's staging > repository, > > > and > > > > > are > > > > > > > available here: > > > > > > > https://repository.apache.org/content/repositories/ > > > > orgapachesamza-1018 > > > > > > > > > > > > > > Note that the binaries were built with JDK8 without incident. > > > > > > > > > > > > > > 26 issues were resolved for this release: > > > > > > > https://issues.apache.org/jira/issues/?jql=project%20%3D%20S > > > > > > > AMZA%20AND%20fixVersion%20in%20(0.12%2C%200.12.0)%20AND%20st > > > > > > > atus%20in%20(Resolved%2C%20Closed) > > > > > > > > > > > > > > The vote will be open for 72 hours (end in 6PM Thursday, > > 02/09/2017 > > > > ). > > > > > > > > > > > > > > Please download the release candidate, check the > > hashes/signature, > > > > > build > > > > > > it > > > > > > > and test it, and then please vote: > > > > > > > > > > > > > > > > > > > > > [ ] +1 approve > > > > > > > > > > > > > > [ ] +0 no opinion > > > > > > > > > > > > > > [ ] -1 disapprove (and reason why) > > > > > > > > > > > > > > > > > > > > > +1 from my side for the release. > > > > > > > > > > > > > > Cheers! > > > > > > > > > > > > > > -- > > > > > > > Jagadish V, > > > > > > > Graduate Student, > > > > > > > Department of Computer Science, > > > > > > > Stanford University > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > Navina R. > > > -- Thanks and regards Chinmay Soman
Re: Review Request 45912: SAMZA-0.10.0: fix the bug of SamzaObjectMapper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45912/#review127925 --- Ship it! - Chinmay Soman On April 8, 2016, 11:40 p.m., Yuanchi Ning wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45912/ > --- > > (Updated April 8, 2016, 11:40 p.m.) > > > Review request for samza. > > > Repository: samza > > > Description > --- > > enable the SerializationConfig AUTO_DETECT_GETTERS in SamzaObjectMapper, > otherwise it will cause JsonSerde misbehave (wouldn't discover the basic > types like HashMap Entry) > JIRA ticket link: https://issues.apache.org/jira/browse/SAMZA-933 > > > Diffs > - > > > samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java > 717b5dcad2aa22540deb08962bf2833e7dc5baa5 > samza-core/src/test/scala/org/apache/samza/serializers/TestJsonSerde.scala > 4f1c14ce3838163c5af8c9d076238e0ed32619e1 > > Diff: https://reviews.apache.org/r/45912/diff/ > > > Testing > --- > > > Thanks, > > Yuanchi Ning > >
Re: Review Request 45912: SAMZA-0.10.0: fix the bug of SamzaObjectMapper
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/45912/#review127702 --- Lets add a unit test for this. - Chinmay Soman On April 8, 2016, 12:02 a.m., Yuanchi Ning wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/45912/ > --- > > (Updated April 8, 2016, 12:02 a.m.) > > > Review request for samza. > > > Repository: samza > > > Description > --- > > enable the SerializationConfig AUTO_DETECT_GETTERS in SamzaObjectMapper, > otherwise it wouldn't discover the basic types like HashMap Entry > > > Diffs > - > > > samza-core/src/main/java/org/apache/samza/serializers/model/SamzaObjectMapper.java > 717b5dcad2aa22540deb08962bf2833e7dc5baa5 > > Diff: https://reviews.apache.org/r/45912/diff/ > > > Testing > --- > > > Thanks, > > Yuanchi Ning > >
Re: [DISCUSS] Moving to github/pull-request for code review and check-in
;>> wrote: > > >>>>>>> > > >>>>>>> Hi, all, > > >>>>>>> > > >>>>>>> I want to start the discussion on our code review/commit process. > > >>>>>>> > > >>>>>>> I felt that our code review and check-in process is a little bit > > >>>>>>> cumbersome: > > >>>>>>> - developers need to create RBs and attach diff to JIRA > > >>>>>>> - committers need to review RBs, dowload diff and apply, then > push. > > >>>>>>> > > >>>>>>> It would be much lighter if we take the pull request only > approach, > > >>> as > > >>>>>>> Kafka already converted to: > > >>>>>>> - for the developers, the only thing needed is to open a pull > > >>> request. > > >>>>>>> - for committers, review and apply patch is from the same PR and > > >>> merge > > >>>>>> can > > >>>>>>> be done directly on remote git repo. > > >>>>>>> > > >>>>>>> Of course, there might be some hookup scripts that we will need > to > > >>> link > > >>>>>>> JIRA w/ pull request in github, which Kafka already does. Any > > >>> comments > > >>>>>> and > > >>>>>>> feedbacks are welcome! > > >>>>>>> > > >>>>>>> Thanks! > > >>>>>>> > > >>>>>>> -Yi > > >>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> Navina R. > > >>>> > > >>> > > >>> > > >>> > > >>> -- > > >>> Milinda Pathirage > > >>> > > >>> PhD Student | Research Assistant > > >>> School of Informatics and Computing | Data to Insight Center > > >>> Indiana University > > >>> > > >>> twitter: milindalakmal > > >>> skype: milinda.pathirage > > >>> blog: http://milinda.pathirage.org > > >>> > > >> > > > > > > > > > -- > > > Sent from my iphone. > > > > > -- Thanks and regards Chinmay Soman
Re: Review Request 39032: SAMZA-787: task.log4j.system should not be guessed if not configured
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/39032/#review117572 --- Ship it! - Chinmay Soman On Oct. 6, 2015, 12:44 a.m., Navina Ramesh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/39032/ > --- > > (Updated Oct. 6, 2015, 12:44 a.m.) > > > Review request for samza, Yan Fang, Chinmay Soman, Chris Riccomini, and Yi > Pan (Data Infrastructure). > > > Bugs: SAMZA-787 > https://issues.apache.org/jira/browse/SAMZA-787 > > > Repository: samza > > > Description > --- > > SAMZA-787: task.log4j.system should not be guessed if not configured > > > Diffs > - > > docs/learn/documentation/versioned/jobs/configuration-table.html > b42c34cf55096467ab20507d82af7d002abd8e1e > docs/learn/documentation/versioned/jobs/logging.md > a3bb054c0febabf2d7b6e3e18fa710f93382d744 > samza-log4j/src/main/java/org/apache/samza/config/Log4jSystemConfig.java > 209296dd48d64159d14bf20a286d971b54fb710e > > samza-log4j/src/main/java/org/apache/samza/logging/log4j/StreamAppender.java > 894845332f2a08b984c587884ed13f2854b4d492 > > samza-log4j/src/test/java/org/apache/samza/config/TestLog4jSystemConfig.java > f7d3cbecfb8901f5fe1a290ce9b08446ce48f4df > > samza-log4j/src/test/java/org/apache/samza/logging/log4j/TestStreamAppender.java > 3e812405edf1ad4fca43784b1668ab3ae6ff6475 > > Diff: https://reviews.apache.org/r/39032/diff/ > > > Testing > --- > > ./bin/check-all.sh passed > > > Thanks, > > Navina Ramesh > >
Re: Question on zero downtime deployment of Samza jobs
FYI: I've forwarded it to Yi's personal email. Yi: please feel free to email Peter (CC'ed) or me about any questions. FYI: the real solution would be to implement standby containers. This solution is an attempt to do the same. On Thu, Jan 14, 2016 at 10:17 AM, Yi Pan <nickpa...@gmail.com> wrote: > It might be the mail list restriction. Could you try to my personal email > (i.e. nickpa...@gmail.com)? > > On Thu, Jan 14, 2016 at 10:15 AM, Chinmay Soman <chinmay.cere...@gmail.com > > > wrote: > > > It shows as attached in my sent email. That's weird. > > > > On Thu, Jan 14, 2016 at 10:14 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > > > Hi, Chinmay, > > > > > > Did you forget to attach? I did not see the attachment in your last > > email. > > > Or is it due to the mail list restriction? > > > > > > On Thu, Jan 14, 2016 at 10:12 AM, Chinmay Soman < > > chinmay.cere...@gmail.com > > > > > > > wrote: > > > > > > > Sorry for the long delay (I was trying to find the design doc we > made). > > > > Please see attachment for the design doc. I'm also CC'ing Peter Huang > > > (the > > > > intern) who worked on this. > > > > > > > > Disclaimer: Given this was an internship project, we've cut some > > corners. > > > > We plan to come up with an improved version in some time. > > > > > > > > On Wed, Jan 6, 2016 at 11:31 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > > > > > >> Hi, Chinmay, > > > >> > > > >> That's awesome! Could you share some design doc of this feature? We > > > would > > > >> love to have this feature in LinkedIn as well! > > > >> > > > >> -Yi > > > >> > > > >> On Wed, Jan 6, 2016 at 10:02 AM, Chinmay Soman < > > > chinmay.cere...@gmail.com > > > >> > > > > >> wrote: > > > >> > > > >> > FYI: As part of an Uber internship project, we were working on > > exactly > > > >> this > > > >> > problem. Our approach was to do a rolling restart of all the > > > containers > > > >> > wherein we start a "replica" container for each primary container > > and > > > >> let > > > >> > it "catch up" before we do the switch. Of course this doesn't > > > guarantee > > > >> > zero downtime, but it does guarantee minimum time to upgrade each > > such > > > >> > container. > > > >> > > > > >> > The code is still in POC, but we do plan to finish this and make > > this > > > >> > available. Let me know if you're interested in trying it out. > > > >> > > > > >> > FYI: the sticky container deployment will also minimize the time > to > > > >> upgrade > > > >> > / deploy since majority of the upgrade time is taken up by the > > > >> container in > > > >> > reading all the changelog (if any). Upgrade / re-deploy will also > > > take a > > > >> > long time if the checkpoint topic is not log compacted (which is > > true > > > in > > > >> > our environment). > > > >> > > > > >> > Thanks, > > > >> > C > > > >> > > > > >> > On Wed, Jan 6, 2016 at 9:56 AM, Bae, Jae Hyeon < > metac...@gmail.com> > > > >> wrote: > > > >> > > > > >> > > Hi Samza devs and users > > > >> > > > > > >> > > I know this will be tricky in Samza because Samza Kafka consumer > > is > > > >> not > > > >> > > coordinated externally, but do you have any idea how to deploy > > samza > > > >> jobs > > > >> > > with zero downtime? > > > >> > > > > > >> > > Thank you > > > >> > > Best, Jae > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > -- > > > >> > Thanks and regards > > > >> > > > > >> > Chinmay Soman > > > >> > > > > >> > > > > > > > > > > > > > > > > -- > > > > Thanks and regards > > > > > > > > Chinmay Soman > > > > > > > > > > > > > > > -- > > Thanks and regards > > > > Chinmay Soman > > > -- Thanks and regards Chinmay Soman
Re: Question on zero downtime deployment of Samza jobs
It shows as attached in my sent email. That's weird. On Thu, Jan 14, 2016 at 10:14 AM, Yi Pan <nickpa...@gmail.com> wrote: > Hi, Chinmay, > > Did you forget to attach? I did not see the attachment in your last email. > Or is it due to the mail list restriction? > > On Thu, Jan 14, 2016 at 10:12 AM, Chinmay Soman <chinmay.cere...@gmail.com > > > wrote: > > > Sorry for the long delay (I was trying to find the design doc we made). > > Please see attachment for the design doc. I'm also CC'ing Peter Huang > (the > > intern) who worked on this. > > > > Disclaimer: Given this was an internship project, we've cut some corners. > > We plan to come up with an improved version in some time. > > > > On Wed, Jan 6, 2016 at 11:31 AM, Yi Pan <nickpa...@gmail.com> wrote: > > > >> Hi, Chinmay, > >> > >> That's awesome! Could you share some design doc of this feature? We > would > >> love to have this feature in LinkedIn as well! > >> > >> -Yi > >> > >> On Wed, Jan 6, 2016 at 10:02 AM, Chinmay Soman < > chinmay.cere...@gmail.com > >> > > >> wrote: > >> > >> > FYI: As part of an Uber internship project, we were working on exactly > >> this > >> > problem. Our approach was to do a rolling restart of all the > containers > >> > wherein we start a "replica" container for each primary container and > >> let > >> > it "catch up" before we do the switch. Of course this doesn't > guarantee > >> > zero downtime, but it does guarantee minimum time to upgrade each such > >> > container. > >> > > >> > The code is still in POC, but we do plan to finish this and make this > >> > available. Let me know if you're interested in trying it out. > >> > > >> > FYI: the sticky container deployment will also minimize the time to > >> upgrade > >> > / deploy since majority of the upgrade time is taken up by the > >> container in > >> > reading all the changelog (if any). Upgrade / re-deploy will also > take a > >> > long time if the checkpoint topic is not log compacted (which is true > in > >> > our environment). > >> > > >> > Thanks, > >> > C > >> > > >> > On Wed, Jan 6, 2016 at 9:56 AM, Bae, Jae Hyeon <metac...@gmail.com> > >> wrote: > >> > > >> > > Hi Samza devs and users > >> > > > >> > > I know this will be tricky in Samza because Samza Kafka consumer is > >> not > >> > > coordinated externally, but do you have any idea how to deploy samza > >> jobs > >> > > with zero downtime? > >> > > > >> > > Thank you > >> > > Best, Jae > >> > > > >> > > >> > > >> > > >> > -- > >> > Thanks and regards > >> > > >> > Chinmay Soman > >> > > >> > > > > > > > > -- > > Thanks and regards > > > > Chinmay Soman > > > -- Thanks and regards Chinmay Soman
Re: How to synchronize KeyValueStore and Kafka cleanup
> Does KV-store consume automatically from a Kafka topic? Yes - if you've configured changelog stream for your store > Does it consume only on restore()? It consumes only during container initialization (again, assuming if you have changelog configured) > implement the StreamTask job to consume a Kafka topic and call add() method? Why wouldn't you want to use a changelog ? On Fri, Oct 2, 2015 at 3:09 PM, Bae, Jae Hyeon <metac...@gmail.com> wrote: > Thanks Yi Pan, I have one more question. > > Does KV-store consume automatically from a Kafka topic? Does it consume > only on restore()? If so, do I have to implement the StreamTask job to > consume a Kafka topic and call add() method? > > On Fri, Oct 2, 2015 at 2:01 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Hi, Jae Hyeon, > > > > Good to see you back on the mailing list again! Regarding to your > > questions, please see the answers below: > > > > > My KeyValueStore usage is a little bit different from usual cases > because > > > > I have to cache all unique ids for the past six hours, which can be > > > > configured for the retention usage. Unique ids won't be repeated such > > as > > > > timestamp. In this case, log.cleanup.policy=compact will keep growing > > the > > > > KeyValueStore size, right? > > > > > > > It will grow as big as the accumulative size of your unique ids. > > > > > > > > > > > > Can I use Samza KeyValueStore for the topics > > > > with log.cleanup.policy=delete? If not, what's your recommended way > for > > > > state management of non-changelog Kafka topic? If it's possible, how > > does > > > > Kafka cleanup remove outdated records in KeyValueStore? > > > > > > > I am not quite sure about your definition of "non-changelog" Kafka > topics. > > If you want to retire some of the old records in a KV-store periodically, > > you will have to run the pruning manually in the window() method in the > > current release. In the upcoming 0.10 release, we have incorporated > RocksDB > > TTL features in the KV-store, which would automatically prune the old > > entries in the RocksDB store automatically. That said, the upcoming TTL > > feature is not fully synchronized w/ the Kafka cleanup yet and is an > > on-going work in the future. The recommendation is to use the TTL feature > > and set the Kafka changelog to be time-retention based, w/ a retention > time > > longer than the RocksDB TTL to ensure no data loss. > > > > Hope the above answered your questions. > > > > Cheers! > > > > -Yi > > > -- Thanks and regards Chinmay Soman
Re: changelog compaction problem
Just curious, Can you double check if you have log compaction enabled on your Kafka brokers ? On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev w...@fastmail.fm wrote: Hello, I have a problem with changelog in one of my samza jobs grows indefinitely. The job is quite simple, it reads messages from the input kafka topic, and either creates or updates a key in task-local samza store. Once in a minute the window method kicks-in, it iterates over all keys in the store and deletes some of them, selecting on the contents of their value. Message rate in input topic is about 3000 messages per second. The input topic is partitioned in 48 partitions. Average number of keys, kept in the store is more or less stable and do not exceed 1 keys per task. Average size of values is 50 bytes. So I expected that sum of all segments' size in kafka data directory for the job's changelog topic should not exceed 1*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days running from scratch) and it is growing. I tried to change default segment size for changelog topic in kafka, and it worked a bit - instead of 500Mbyte segments I have now 50Mbyte segments, but it did not heal the indefinite data growth problem. Moreover, if I stop the job and start it again it cannot restart, it breaks right after reading all records from changelog topic. Did somebody have similar problem? How it could be resolved? Best regards, Vladimir -- Vladimir Lebedev w...@fastmail.fm -- Thanks and regards Chinmay Soman
Re: [DISCUSS] Release 0.10.0
I can take care of SAMZA-340, SAMZA-683 and will follow up with Luis for SAMZA-401,2,3,4 On Wed, Jul 29, 2015 at 12:10 AM, Dan danharve...@gmail.com wrote: I agree SAMZA-741 for the ElasticSearch producer should be in too so we've got a better API as part of that release. - Dan On 29 July 2015 at 07:51, Yan Fang yanfang...@gmail.com wrote: Actually, I also want to include a few patch-available features, especially: 1. broadcast stream (SAMZA-676) - waiting for review 2. graphite support (SAMZA-340) 3. meter and histogram (SAMZA-683) 4. utility (SAMZA-401) - 2,3,4 belong to Luis, if he does not have time to update, since they only need some small changes, we can edit it and get +1 from another committer. 5. hdfs producer (SAMZA-693) - I am reviewing. 6. upgrade yarn to 2.7.1 (SAMZA-563) - though I am reviewing, this ticket is negotiable if we want to put into the 0.10.0 release. If we do not, I think, when users enable the worker-persisting and container-persisting features, Samza will not be able to handle it. (Some classes are only available after yarn 2.5.0 while Samza currently only support yarn 2.4.0) 7. others: scrooge, class loader isolation, etc. - those are waiting for reviewing too. My opinion is that, if we can clean up all the patch-available tickets, it will be great. Most of them have been already reviewed more than once. So I think it should not be very time-consuming to have them in the 0.10.0 release. What do you think? Of course, another must-have is the bug-fix of the Stream Appender. :) Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jul 28, 2015 at 10:27 PM, Roger Hoover roger.hoo...@gmail.com wrote: Thanks, Yi. I propose that we also include SAMZA-741 for Elasticsearch versioning support with the new ES producer. I think it's very close to being merged. Roger On Tue, Jul 28, 2015 at 10:08 PM, Yi Pan nickpa...@gmail.com wrote: Hi, all, I want to start the discussion on the release schedule for 0.10.0. There are a few important features that we plan to release in 0.10.0 and I want to start this thread s.t. we can agree on what to include in 0.10.0 release. There are the following main features added in 0.10.0: - RocksDB TTL support - Add CoordinatorStream and disable CheckpointManager - Elasticsearch Producer - Host affinity And other 0.10.0 tickets: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20status%20in%20(%22In%20Progress%22%2C%20Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%200.10.0 I propose to cut a 0.10.0 release after we get the following issues resolved: - SAMZA-615: Migrate checkpoint from checkpoint topic to Coordinator stream - SAMZA-617: YARN host affinity in Samza Thoughts? Thanks! -Yi -- Thanks and regards Chinmay Soman
Re: [DISCUSS] Samza 0.9.1 release
+1 On Tue, Jun 16, 2015 at 11:17 AM, Navina Ramesh nram...@linkedin.com.invalid wrote: +1 for the release! On 6/16/15, 11:03 AM, Yi Pan nickpa...@gmail.com wrote: +1 Agreed. Thanks! On Tue, Jun 16, 2015 at 10:15 AM, Yan Fang yanfang...@gmail.com wrote: Agreed on this. Thanks, Fang, Yan yanfang...@gmail.com On Tue, Jun 16, 2015 at 10:14 AM, Guozhang Wang wangg...@gmail.com wrote: Hi all, We have been running a couple of our jobs against `0.9.1` branch last week at LinkedIn with some critical bug fixes back-ported, including: SAMZA-608 Deserialization error causes SystemConsumers to hang SAMZA-616 Shutdown hook does not wait for container to finish SAMZA-658 Iterator.remove breaks caching layer SAMZA-662 / 686 Samza auto-creates changelog stream without sufficient partitions when container number 1 I am proposing a release vote on the current 0.9.1 branch for these bug fixes. Thoughts? -- Guozhang -- Thanks and regards Chinmay Soman
Re: improving hello-samza / testing
We've built a driver program which kinda falls along approach (1) listed in your email. The driver program accepts a custom task object and has a way to inject data - which in turn invokes the process method. For now we're assuming logical time and use the frequency of process() invocations to deduce when to invoke the window() method (for eg: invoke window once for every 4 calls to process). We've also built our own Incoming and Outgoing envelope - which is just in the form of a Java List. This is how the result is evaluated (either get the full result list or define a callback which is invoked every time a collector.send is called). Its still work in progress and the goal is to make unit testing along the lines of Storm unit tests. On Tue, Jun 16, 2015 at 5:17 PM, Chris Riccomini criccom...@apache.org wrote: Hey Tim, This is a really good discussion to have. The testing that I've seen with Samza falls into two categories: 1. Instantiate your StreamTask, and mock all params in the process()/init() methods. 2. A mini-ontegration test that starts ZooKeeper, and Kafka, and feeds messages into a topic, and validates it gets messages back out from the output topic. 3. A full blown integration test that uses Zopkio. For an example of (2), in practice, have a look at TestStatefuleTask: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob;f=samza-test/src/test/scala/org/apache/samza/test/integration/TestStatefulTask.scala;h=ea702a919348305ff95ce0b4ca1996a13aff04ec;hb=HEAD As you can see, writing this kind of integration test can be a bit painful. (3) is documented here: http://samza.apache.org/contribute/tests.html Another way to test would be to start a full-blown container using ThreadJobFactory/ProcessJobFactory, but use a MockSystemFactory to mock out the system consumer/system producer. Has anyone else tested Samza in other ways? Cheers, Chris On Tue, Jun 16, 2015 at 11:00 AM, Tim Williams william...@gmail.com wrote: I'm learning samza by the hello-samza project and notice the lack of tests. Where's a good place to learn how folks are properly testing things written with samza? Thanks, --tim -- Thanks and regards Chinmay Soman
Re: Review Request 34011: Add support for a Graphite Metrics Reporter
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34011/#review83169 --- build.gradle https://reviews.apache.org/r/34011/#comment134045 This shouldn't be in Samza-core. Create a separate module/project called samza-graphite (just like samza-log4j / samza-kv) gradle/dependency-versions.gradle https://reviews.apache.org/r/34011/#comment134046 This is kinda surprising. We already have a metricsVersion defined ? I don't see it being used. samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteCounter.java https://reviews.apache.org/r/34011/#comment134047 Please move your changes to a separate module (not in samza-core) samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteReporterFactory.java https://reviews.apache.org/r/34011/#comment134049 I think lets pass the container name to the Samza graphite reporter factory. Might be useful if want to somehow aggregate metrics across all tasks in a container in the future. samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134050 Add unit for value Make this static private ? samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134052 Why not TimeUnit.MILLISECONDS.toNanos(value) ? The 'value' is in milliseconds right ? samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134051 Make this static private ? samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134053 Make it private ? samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134054 Samza's Snapshot has a getMax samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134056 Make all custom methods as private samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteSnapshot.java https://reviews.apache.org/r/34011/#comment134057 Throw an exception ? samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteTimer.java https://reviews.apache.org/r/34011/#comment134058 Check if duration = 0L samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteTimer.java https://reviews.apache.org/r/34011/#comment134059 This will create a new object in each invocation. Why not store this as a member variable ? samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134078 We should make this similar to the way metric names are constructed in other reporter (for instance check JmxReporter) samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134079 Make it private samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134080 Make it private static samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134081 Use this format: getLong (name, default_value) samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134082 Include the source as well (look at JmxReporter) samza-core/src/main/java/org/apache/samza/metrics/reporter/SamzaGraphiteReporter.java https://reviews.apache.org/r/34011/#comment134083 Include the source as well (look at JmxReporter) for this and other metric names - Chinmay Soman On May 9, 2015, 6:31 a.m., Luis De Pombo wrote: --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/34011/ --- (Updated May 9, 2015, 6:31 a.m.) Review request for samza. Repository: samza Description --- Add support for a Graphite Metrics Reporter Diffs - build.gradle ac80a8664180e556ec83e229e04e3d8c56b70506 checkstyle/import-control.xml 5f8e103a2e43f96518b20de1c7cbd84e0af24842 gradle/dependency-versions.gradle ee6dfc411b7ab90b187df79f109884127953862e samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteCounter.java PRE-CREATION samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteGauge.java PRE-CREATION samza-core/src/main/java/org/apache/samza/metrics/reporter/GraphiteReporterFactory.java PRE-CREATION samza
Re: Samza serializers jar for 0.9.0 ?
Oh never mind - I forgot it being merged into core. Sorry for the noise. On Wed, Apr 29, 2015 at 12:03 PM, Chinmay Soman chinmay.cere...@gmail.com wrote: I don't see the serializers jar being published in Maven. Yan - know anything about this ? -- Thanks and regards Chinmay Soman -- Thanks and regards Chinmay Soman
Samza serializers jar for 0.9.0 ?
I don't see the serializers jar being published in Maven. Yan - know anything about this ? -- Thanks and regards Chinmay Soman
Re: Extra Systems and other extensions.
+1 ! I was going to do this for my use case as well. Would love to have this ! On Wed, Apr 15, 2015 at 9:24 AM, Roger Hoover roger.hoo...@gmail.com wrote: Dan, This is great. Would love to have a common ElasticSearch system producer. Cheers, Roger On Tue, Apr 14, 2015 at 1:34 PM, Dan danharve...@gmail.com wrote: Thanks Jakob, I agree they'll be more maintained and tested if they're in the main repo so that's great. I'll sort out Jira's and get some patches of what we've got working now out for review. - Dan On 14 April 2015 at 19:56, Jakob Homan jgho...@gmail.com wrote: Hey Dan- I'd love for the Elastic Search stuff to be added to the main code, as a separate module. Keeping these in the main source code keeps them more likely to be maintained and correct. The EvironemtnConfigRewriter can likely go in the same place as the ConfigRewriter interface, since it doesn't depend on Kafka as the current RegexTopicRewriter does. If you could open JIRAs for these, that would be great. Happy to shepard the code in. -Jakob On 14 April 2015 at 11:46, Dan danharve...@gmail.com wrote: Hey, At state.com we've started to write some generic extensions to Samza that we think would be more generally useful. We've got a ElasticsearchSystemProducer/Factory to output to an Elasticsearch index and EnvironmentConfigRewriter to modify config from environment variable. What's the best way for us to add things like this? Do you want more modules in the main project or should we just create some separate projects on github? It would be good to get core extensions like these shared to be tested and used by more people Thanks, Dan -- Thanks and regards Chinmay Soman
Re: [VOTE] Apache Samza 0.9.0 RC0
here: http://people.apache.org/~yanfang/samza-0.9.0-rc0/ The release candidate is signed with pgp key CAC06239EA00BA80, which is included in the repository's KEYS file: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=blob_plain;f=KEYS;hb=6f5bafb6cd93934781161eb6b1868d11ea347c95 and can also be found on keyservers: http://pgp.mit.edu/pks/lookup?op=getsearch=0xCAC06239EA00BA80 The git tag is release-0.9.0-rc0 and signed with the same pgp key: https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=1039f7ede6490f9420dcecd6adc7677b97e78bcf Test binaries have been published to Maven's staging repository, and are available here: https://repository.apache.org/content/repositories/orgapachesamza-1005/ Note that the binaries were built with JDK6 without incident. 95 issues were resolved for this release: https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.9.0%20AND%20status%20in%20(Resolved%2C%20Closed) The vote will be open for 72 hours ( end in 4:00pm Friday, 03/27/2015 ). Please download the release candidate, check the hashes/signature, build it and test it, and then please vote: [ ] +1 approve [ ] +0 no opinion [ ] -1 disapprove (and reason why) +1 from my side for the release. Fang, Yan yanfang...@gmail.com +1 (206) 849-4108 -- Thanks and regards Chinmay Soman
Re: cannot be cast to java.lang.String
I think this is a bit specific to Samza. In the KafkaSystemProducer class, it does something like this: envelope.getMessage.asInstanceOf[Array[Byte]] and not just 'byte[]'. This is why we need to be explicit about the serialization format. On Wed, Mar 25, 2015 at 3:14 AM, Jordi Blasi Uribarri jbl...@nextel.es wrote: I am using the Kafka command line producer, so I understand that I am sending a String. bin/kafka-console-producer.sh --broker-list localhost:9092 --topic syslog What is actually the difference between a string and a json? Is it just a matter of deserialization or is there any kind of metadata included that specifies the contest type? How do I enable the debug mode? Thanks, Jordi -Mensaje original- De: Chinmay Soman [mailto:chinmay.cere...@gmail.com] Enviado el: lunes, 23 de marzo de 2015 17:55 Para: dev@samza.apache.org Asunto: Re: cannot be cast to java.lang.String Hey Jordi, This is because you're sending String and not json in your output topic. Try setting string on the output stream as well (if you haven't already). If you have done that - then please enable debug mode and attach the log somewhere so that we can take a look. On Mon, Mar 23, 2015 at 9:52 AM, Jordi Blasi Uribarri jbl...@nextel.es wrote: Looks like that was one error. I have set the property like this: systems.kafka.streams.syslog.samza.msg.serde=string But I am still getting the same error. Now I am seeing a different thing in the log previous to the exception: 23 mar 2015 05:49:31 INFO KafkaSystemProducer - Creating a new producer for system kafka. 23 mar 2015 05:49:31 INFO ProducerConfig - ProducerConfig values: value.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer block.on.buffer.full = true retry.backoff.ms = 100 buffer.memory = 33554432 batch.size = 16384 metrics.sample.window.ms = 3 metadata.max.age.ms = 30 receive.buffer.bytes = 32768 timeout.ms = 3 max.in.flight.requests.per.connection = 1 bootstrap.servers = [broker01:9092] metric.reporters = [] client.id = samza_producer-samzafroga_job1-1-1427086163149-3 compression.type = none retries = 2147483647 max.request.size = 1048576 send.buffer.bytes = 131072 acks = 1 reconnect.backoff.ms = 10 linger.ms = 0 metrics.num.samples = 2 metadata.fetch.timeout.ms = 6 23 mar 2015 05:49:31 ERROR SamzaContainer - Caught exception in process loop. java.lang.ClassCastException: java.lang.String cannot be cast to [B at org.apache.samza.system.kafka.KafkaSystemProducer.send(KafkaSystemProducer.scala:80) at org.apache.samza.system.SystemProducers.send(SystemProducers.scala:87) at org.apache.samza.task.TaskInstanceCollector.send(TaskInstanceCollector.scala:61) at samzafroga.job1.process(job1.java:21) at org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp(TaskInstance.scala:129) at org.apache.samza.container.TaskInstanceExceptionHandler.maybeHandle(TaskInstanceExceptionHandler.scala:54) at org.apache.samza.container.TaskInstance.process(TaskInstance.scala:128) at org.apache.samza.container.RunLoop$$anonfun$process$2.apply(RunLoop.scala:114) at org.apache.samza.util.TimerUtils$class.updateTimer(TimerUtils.scala:37) at org.apache.samza.container.RunLoop.updateTimer(RunLoop.scala:36) at org.apache.samza.container.RunLoop.process(RunLoop.scala:100) at org.apache.samza.container.RunLoop.run(RunLoop.scala:83) at org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:549) at org.apache.samza.job.local.ThreadJob$$anon$1.run(ThreadJob.scala:42) Thanks. Jordi -Mensaje original- De: Chinmay Soman [mailto:chinmay.cere...@gmail.com] Enviado el: lunes, 23 de marzo de 2015 17:36 Para: dev@samza.apache.org Asunto: Re: cannot be cast to java.lang.String Have you tried setting this : systems.kafka.streams.syslog.samza.msg.serde=string // And assuming you've defined a 'string' serializer in your config OR systems.kafka.streams.syslog.samza.msg.serde=json // Depending on the corresponding format of your input data On Mon, Mar 23, 2015 at 9:24 AM, Jordi Blasi Uribarri jbl...@nextel.es wrote: Hi, As I understand it, I am setting kafka as the system name, beste as the output topic in the system and syslog as the input topic. Both topics syslog and beste are working correctly as I am streaming some syslogs to the syslog topic and I am testing beste with an internal
Re: New to Samza/Yarn and having Kafka issues
Oh, I just meant in your job config: metrics.reporter.jmx.class=org.apache.samza.metrics.reporter.JmxReporterFactory metrics.reporters=jmx On Mon, Mar 23, 2015 at 5:13 PM, Ash W Matheson ash.mathe...@gmail.com wrote: I'm assuming I have Jmx defined ... where would that get set? On Mon, Mar 23, 2015 at 5:08 PM, Chinmay Soman chinmay.cere...@gmail.com wrote: Hey Ash, Can you see your job metrics (if you have the Jmx metrics defined) to see if your job is actually doing anything ? My only guess at this point is the process method is not being called because somehow there's no incoming data. I could be totally wrong of course. On Mon, Mar 23, 2015 at 4:28 PM, Ash W Matheson ash.mathe...@gmail.com wrote: Just to be clear, here's what's changed from the default hello-samza repo: wikipedia-parser.properties== task.inputs=kafka.myTopic systems.kafka.consumer.zookeeper.connect= ec2-xxx-xxx-xxx-xxx.compute-1.amazonaws.com:2181/ systems.kafka.consumer.auto.offset.reset=smallest WikipediaParserStreamTask.java = public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { MapString, Object jsonObject = (MapString, Object) envelope.getMessage(); WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject); try { System.out.println(event.getRawEvent()); // MapString, Object parsedJsonObject = parse(event.getRawEvent()); // parsedJsonObject.put(channel, event.getChannel()); // parsedJsonObject.put(source, event.getSource()); // parsedJsonObject.put(time, event.getTime()); // collector.send(new OutgoingMessageEnvelope(new SystemStream(kafka, wikipedia-edits), parsedJsonObject)); as well as the aforementioned changes to the log4j.xml file. The data pushed into the 'myTopic' topic is nothing more than a sentence. On Mon, Mar 23, 2015 at 4:16 PM, Ash W Matheson ash.mathe...@gmail.com wrote: yep, modified log4j.xml to look like this: root priority value=debug / appender-ref ref=RollingAppender/ appender-ref ref=jmx / /root Not sure what you mean by #2. However, I'm running now, not seeing any exceptions, but still not seeing any output from System.out.println(...) On Mon, Mar 23, 2015 at 11:29 AM, Naveen Somasundaram nsomasunda...@linkedin.com.invalid wrote: Hey Ash, 1. Did you happen to modify your log4j.xml ? 2. Can you print the class path that was printed when the job started ? I am wondering if log4j was not loaded or not present in the path where it’s looking for. If you have been using hello samza, it should have pulled it from Maven. Thanks, Naveen On Mar 22, 2015, at 10:35 AM, Ash W Matheson ash.mathe...@gmail.com wrote: Hey all, Evaluating Samza currently and am running into some odd issues. I'm currently working off the 'hello-samza' repo and trying to parse a simple kafka topic that I've produced through an extenal java app (nothing other than a series of sentences) and it's failing pretty hard for me. The base 'hello-samza' set of apps works fine, but as soon as I change the configuration to look at a different Kafka/zookeeper I get the following in the userlogs: 2015-03-22 17:07:09 KafkaSystemAdmin [WARN] Unable to fetch last offsets for streams [myTopic] due to kafka.common.KafkaException: fetching topic metadata for topics [Set(myTopic)] from broker [ArrayBuffer(id:0,host:redacted,port:9092)] failed. Retrying. The modifications are pretty straightforward. In the Wikipedia-parser.properties, I've changed the following: task.inputs=kafka.myTopic systems.kafka.consumer.zookeeper.connect=redacted:2181/ systems.kafka.consumer.auto.offset.reset=smallest systems.kafka.producer.metadata.broker.list=redacted:9092 and in the actual java file WikipediaParserStreamTask.java public void process(IncomingMessageEnvelope envelope, MessageCollector collector, TaskCoordinator coordinator) { MapString, Object jsonObject = (MapString, Object) envelope.getMessage(); WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject); try { System.out.println(event.getRawEvent()); And then following the compile/extract/run process outlined in the hello-samza website. Any thoughts? I've looked online for any 'super simple' examples of ingesting kafka in samza with very little success. -- Thanks and regards Chinmay Soman -- Thanks and regards Chinmay Soman
Re: New to Samza/Yarn and having Kafka issues
I changed the systems.kafka.samza.msg.serde=json to 'string' a while back, but that caused a separate exception. However that was many, MANY attempts ago. This may not work because that will set all serialization formats (input and output) to json / string. In your case you're inputting string and outputting json. So you might have to set that explicitly. On Mon, Mar 23, 2015 at 9:24 PM, Chinmay Soman chinmay.cere...@gmail.com wrote: Since you're producing String data to 'myTopic', can you try setting the string serialization in your config ? serializers.registry.string.class=org.apache.samza.serializers.StringSerdeFactory systems.kafka.streams.myTopic.samza.msg.serde=string On Mon, Mar 23, 2015 at 9:17 PM, Ash W Matheson ash.mathe...@gmail.com wrote: more info - new exception message: Exception in thread main org.apache.samza.system.SystemConsumersException: Cannot deserialize an incoming message. Updated the diff in pastebin with the changes. On Mon, Mar 23, 2015 at 8:41 PM, Ash W Matheson ash.mathe...@gmail.com wrote: Gah! Yeah, those were gone several revisions ago but didn't get nuked in the last iteration. OK, let me do a quick test to see if that was my problem all along. On Mon, Mar 23, 2015 at 8:38 PM, Navina Ramesh nram...@linkedin.com.invalid wrote: Hey Ash, I was referring to the lines before the try block. MapString, Object jsonObject = (MapString, Object) envelope.getMessage(); WikipediaFeedEvent event = new WikipediaFeedEvent(jsonObject); try { System.out.println([DWH] should see this); System.out.println(event.getRawEvent()); … Did you remove those lines as well? Navina On 3/23/15, 8:31 PM, Ash W Matheson ash.mathe...@gmail.com wrote: Just looking at the diff I posted and it's: 1. try { 2. - MapString, Object parsedJsonObject = parse(event.getRawEvent( )); 3. + System.out.println([DWH] should see this); 4. + System.out.println(event.getRawEvent()); 5. + // MapString, Object parsedJsonObject = parse( event.getRawEvent()); I've removed the Map and added two System.out.println calls. So no, there shouldn't be any reference to MapString, Object parsedJsonObject = parse(event.getRawEvent()); in the source java file. On Mon, Mar 23, 2015 at 7:42 PM, Ash W Matheson ash.mathe...@gmail.com wrote: I'm in transit right now but if memory serves me everything should be commented out of that method except for the System.out.println call. I'll be home shortly and can confirm. On Mar 23, 2015 7:28 PM, Navina Ramesh nram...@linkedin.com.invalid wrote: Hi Ash, I just ran wikipedia-parser with your patch. Looks like you have set the message serde correctly in the configs. However, the original code still converts it into a Map for consumption in the WikipediaFeedEvent. I am seeing the following (expected): 2015-03-23 19:17:49 SamzaContainerExceptionHandler [ERROR] Uncaught exception in thread (name=main). Exiting process now. java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map at samza.examples.wikipedia.task.WikipediaParserStreamTask.process(Wikipedi aPa rserStreamTask.java:38) at org.apache.samza.container.TaskInstance$$anonfun$process$1.apply$mcV$sp( Tas kInstance.scala:133) Did you make the changes to fix this error? Your patch doesn¹t seem to have that. Line 38 MapString, Object jsonObject = (MapString, Object) envelope.getMessage(); Lmk so I can investigate further. Cheers! Navina On 3/23/15, 6:43 PM, Ash W Matheson ash.mathe...@gmail.com wrote: If anyone's interested, I've posted a diff of the project here: http://pastebin.com/6ZW6Y1Vu and the python publisher here: http://pastebin.com/2NvTFDFx if you want to take a stab at it. On Mon, Mar 23, 2015 at 6:04 PM, Ash W Matheson ash.mathe...@gmail.com wrote: Ok, so very simple test, all running on a local machine, not across networks and all in the hello-samza repo this time around. I've got the datapusher.py file set up to push data into localhost. One event per second. And a modified hello-samza where I've modified the WikipediaParserStreamTask.java class to simply read what's there. Running them both now and I'm seeing in the stderr files (deploy/yarn/logs/userlogs/application_X/container_/stderr) the following: Exception in thread main org.apache.samza.system.SystemConsumersException: Cannot deserialize an incoming message. at org.apache.samza.system.SystemConsumers.update(SystemConsumers.scala:2 93) at org.apache.samza.system.SystemConsumers.org $apache$samza$system$SystemConsumers$$poll(SystemConsumers.scala:260
Re: Kafka topic naming conventions
Thats what we're doing as well - appending partition count to the kafka topic name. This actually helps keep track of the #partitions for each topic (since Kafka doesn't have a Metadata store yet). In case of topic expansion - we actually just resort to creating a new topic. Although that is an overhead - the thought process is that this will minimize operational errors. Also, this is necessary to do in case we're doing some kind of joins. On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote: On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org wrote: One thing I haven't seen, but might be relevant, is including partition counts in the topic. Yeah, but then if you change the partition count later on, you've got incorrect information forever. Or you need to create a new stream, which might be a nice forcing function to make sure your join isn't screwed up. There'd need to be something somewhere to enforce that though. -- Thanks and regards Chinmay Soman
Re: Kafka topic naming conventions
Yeah ! It does seem a bit hackish - but I think this approach promises less config/operation errors. Although I think some of these checks can be built within Samza - assuming Kafka has a metadata store in the near future - the Samza container can validate the #topics against this store. On Wed, Mar 18, 2015 at 6:16 PM, Chris Riccomini criccom...@apache.org wrote: Hey Chinmay, Cool, this is good feedback. I didn't think I was *that* crazy. :) Cheers, Chris On Wed, Mar 18, 2015 at 6:10 PM, Chinmay Soman chinmay.cere...@gmail.com wrote: Thats what we're doing as well - appending partition count to the kafka topic name. This actually helps keep track of the #partitions for each topic (since Kafka doesn't have a Metadata store yet). In case of topic expansion - we actually just resort to creating a new topic. Although that is an overhead - the thought process is that this will minimize operational errors. Also, this is necessary to do in case we're doing some kind of joins. On Wed, Mar 18, 2015 at 5:59 PM, Jakob Homan jgho...@gmail.com wrote: On 18 March 2015 at 17:48, Chris Riccomini criccom...@apache.org wrote: One thing I haven't seen, but might be relevant, is including partition counts in the topic. Yeah, but then if you change the partition count later on, you've got incorrect information forever. Or you need to create a new stream, which might be a nice forcing function to make sure your join isn't screwed up. There'd need to be something somewhere to enforce that though. -- Thanks and regards Chinmay Soman -- Thanks and regards Chinmay Soman
Question on hello-samza (Kafka startup and shutdown)
Sending to a wider audience to know if anyone is also seeing this issue. It seems Kafka gets in a weird state everytime I do bin/grid stop all (and then start all). I keep getting a LeaderNotAvailable exception on the producer side. It seems this happens everytime Kafka hasn't been shut down properly. This issue goes away if I use the following sequence: * bin/grid stop kafka * bin/grid stop zookeeper (after like 5 seconds). (and then start everything). Has anyone else seen this ? -- Thanks and regards Chinmay Soman