Was looking through the code a little and it looks like the BootstrappingChooser could use the list of SSPs passed into it's register() method to figure out which partitions it need to monitor.
I wanted to try to build Samza to play around with it but I'm getting error trying to build off of both the 0.9.0 and 0.9.1 branches. thedude:samza (0.9.1) $ ./gradlew clean build To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. :clean :samza-api:clean :samza-core_2.10:clean :samza-kafka_2.10:clean UP-TO-DATE :samza-kv-inmemory_2.10:clean UP-TO-DATE :samza-kv-rocksdb_2.10:clean UP-TO-DATE :samza-kv_2.10:clean UP-TO-DATE :samza-log4j:clean UP-TO-DATE :samza-shell:clean UP-TO-DATE :samza-test_2.10:clean UP-TO-DATE :samza-yarn_2.10:clean UP-TO-DATE :assemble UP-TO-DATE :rat Rat report: build/rat/rat-report.html :check :build :samza-api:compileJava :samza-api:processResources UP-TO-DATE :samza-api:classes :samza-api:jar :samza-api:javadoc /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for ssp void setStartingOffset(SystemStreamPartition ssp, String offset); ^ /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for offset void setStartingOffset(SystemStreamPartition ssp, String offset); ^ 2 warnings :samza-api:javadocJar :samza-api:sourcesJar :samza-api:signArchives SKIPPED :samza-api:assemble :samza-api:compileTestJava :samza-api:processTestResources UP-TO-DATE :samza-api:testClasses :samza-api:test :samza-api:check :samza-api:build :samza-core_2.10:compileJava :samza-core_2.10:compileScala [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: error: object SamzaObjectMapper is not a member of package org.apache.samza.serializers.model [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper [ant:scalac] ^ [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: error: object TaskModel is not a member of package org.apache.samza.job.model [ant:scalac] import org.apache.samza.job.model.TaskModel [ant:scalac] ^ ... I've got JDK 8 installed. Wondering that makes a difference or not. I'd appreciate any help. Thanks, Roger On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > I think I see what's happening. > > When there are 8 tasks and I set yarn.container.count=8, then each > container is responsible for a single task. However, the > systemStreamLagCounts map ( > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77) > and laggingSystemStreamPartitions ( > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83) > are configured to track all partitions for the bootstrap topic rather than > just the one partition assigned to this task. > > Later in the log, we see that the task/container completed bootstrap for > it's own partition. > > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser > [DEBUG] Bootstrap stream partition is fully caught up: > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] > > but the Bootstrapping Chooser still thinks that the remaining partitions > (assigned to other tasks in other containers) need to be completed. JMX at > this point shows 7 lagging partitions of the 8 original partition count. > > I'm wondering why no one has run into this. Doesn't LinkedIn use > partitioned bootstrapped topics? > > Thanks, > > Roger > > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > >> Hi Yan, >> >> I've uploaded a file with TRACE level logging here: >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz >> >> I really appreciate your help as this is a critical issue for me. >> >> Thanks, >> >> Roger >> >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang...@gmail.com> wrote: >> >>> Hi Roger, >>> >>> " but it only spawns one container and still hangs after bootstrap" >>> -- this probably is due to your local machine does not have enough >>> resource for the second container. Because I checked your log file, each >>> container is about 4GB. >>> >>> "When I run it on our YARN cluster with a single container, it works >>> correctly. When I tried it with 5 containers, it gets hung after >>> consuming >>> the bootstrap topic." >>> -- Have you figure it out? I have a looked at your log and also the >>> code. My suspect is that, there is a null enveloper somehow blocking the >>> process. If you can paste the trace level log, it will be more helpful >>> because many logs in chooser are trace level. >>> >>> Thanks, >>> >>> Fang, Yan >>> yanfang...@gmail.com >>> >>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoo...@gmail.com> >>> wrote: >>> >>> > I need some help. I have a job which bootstraps one stream and then is >>> > supposed to read from two. When I run it on our YARN cluster with a >>> single >>> > container, it works correctly. When I tried it with 5 containers, it >>> gets >>> > hung after consuming the bootstrap topic. I ran it with the grid >>> script on >>> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one >>> > container and still hangs after bootstrap. >>> > >>> > Debug logs are here: http://pastebin.com/af3KPvju >>> > >>> > I looked at JMX metrics and see: >>> > - Task Metrics - no value for kafka offset of non-bootstrapped stream >>> > - SystemConsumerMetrics >>> > - choose null keeps incrementing >>> > - ssps-needed-by-chooser 1 >>> > - unprocessed-messages 62k >>> > - Bootstrapping Chooser >>> > - lagging partitions 4 >>> > - laggin-batch-streams - 4 >>> > - batch-resets - 0 >>> > >>> > Has anyone seen this or can offer ideas of how to better debug it? >>> > >>> > I'm using Samza 0.9.0 and YARN 2.4.0. >>> > >>> > Thanks! >>> > >>> > Roger >>> > >>> >> >> >