Thanks, Yan. I'll give it a try. Sent from my iPhone
> On Jun 21, 2015, at 10:02 PM, Yan Fang <yanfang...@gmail.com> wrote: > > Hi Roger, > > I will try to look at the issue tomorrow if my time allows. > > First thing first: > > The build has some unexpected results. A quick fix: > > 1. apply https://issues.apache.org/jira/browse/SAMZA-712 > 2. add > > sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs = > [] > > at line 126 of build.gradle. > > Sorry for the inconvenience. > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > >> Was looking through the code a little and it looks like the >> BootstrappingChooser could use the list of SSPs passed into it's register() >> method to figure out which partitions it need to monitor. >> >> I wanted to try to build Samza to play around with it but I'm getting error >> trying to build off of both the 0.9.0 and 0.9.1 branches. >> >> thedude:samza (0.9.1) $ ./gradlew clean build >> >> To honour the JVM settings for this build a new JVM will be forked. Please >> consider using the daemon: >> http://gradle.org/docs/2.0/userguide/gradle_daemon.html. >> >> :clean >> >> :samza-api:clean >> >> :samza-core_2.10:clean >> >> :samza-kafka_2.10:clean UP-TO-DATE >> >> :samza-kv-inmemory_2.10:clean UP-TO-DATE >> >> :samza-kv-rocksdb_2.10:clean UP-TO-DATE >> >> :samza-kv_2.10:clean UP-TO-DATE >> >> :samza-log4j:clean UP-TO-DATE >> >> :samza-shell:clean UP-TO-DATE >> >> :samza-test_2.10:clean UP-TO-DATE >> >> :samza-yarn_2.10:clean UP-TO-DATE >> >> :assemble UP-TO-DATE >> >> :rat >> >> Rat report: build/rat/rat-report.html >> >> :check >> >> :build >> >> :samza-api:compileJava >> >> :samza-api:processResources UP-TO-DATE >> >> :samza-api:classes >> >> :samza-api:jar >> >> :samza-api:javadoc >> >> >> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: >> warning: no @param for ssp >> >> void setStartingOffset(SystemStreamPartition ssp, String offset); >> >> ^ >> >> >> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: >> warning: no @param for offset >> >> void setStartingOffset(SystemStreamPartition ssp, String offset); >> >> ^ >> >> 2 warnings >> >> :samza-api:javadocJar >> >> :samza-api:sourcesJar >> >> :samza-api:signArchives SKIPPED >> >> :samza-api:assemble >> >> :samza-api:compileTestJava >> >> :samza-api:processTestResources UP-TO-DATE >> >> :samza-api:testClasses >> >> :samza-api:test >> >> :samza-api:check >> >> :samza-api:build >> >> :samza-core_2.10:compileJava >> >> :samza-core_2.10:compileScala >> >> [ant:scalac] >> >> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: >> error: object SamzaObjectMapper is not a member of package >> org.apache.samza.serializers.model >> >> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper >> >> [ant:scalac] ^ >> >> [ant:scalac] >> >> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: >> error: object TaskModel is not a member of package >> org.apache.samza.job.model >> >> [ant:scalac] import org.apache.samza.job.model.TaskModel >> >> [ant:scalac] ^ >> >> ... >> >> >> I've got JDK 8 installed. Wondering that makes a difference or not. I'd >> appreciate any help. >> >> Thanks, >> >> Roger >> >> >> >> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoo...@gmail.com> >> wrote: >> >>> I think I see what's happening. >>> >>> When there are 8 tasks and I set yarn.container.count=8, then each >>> container is responsible for a single task. However, the >>> systemStreamLagCounts map ( >> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77 >> ) >>> and laggingSystemStreamPartitions ( >> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83 >> ) >>> are configured to track all partitions for the bootstrap topic rather >> than >>> just the one partition assigned to this task. >>> >>> Later in the log, we see that the task/container completed bootstrap for >>> it's own partition. >>> >>> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser >>> [DEBUG] Bootstrap stream partition is fully caught up: >>> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] >>> >>> but the Bootstrapping Chooser still thinks that the remaining partitions >>> (assigned to other tasks in other containers) need to be completed. JMX >> at >>> this point shows 7 lagging partitions of the 8 original partition count. >>> >>> I'm wondering why no one has run into this. Doesn't LinkedIn use >>> partitioned bootstrapped topics? >>> >>> Thanks, >>> >>> Roger >>> >>> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoo...@gmail.com> >>> wrote: >>> >>>> Hi Yan, >>>> >>>> I've uploaded a file with TRACE level logging here: >>>> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz >>>> >>>> I really appreciate your help as this is a critical issue for me. >>>> >>>> Thanks, >>>> >>>> Roger >>>> >>>> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang...@gmail.com> >> wrote: >>>> >>>>> Hi Roger, >>>>> >>>>> " but it only spawns one container and still hangs after bootstrap" >>>>> -- this probably is due to your local machine does not have enough >>>>> resource for the second container. Because I checked your log file, >> each >>>>> container is about 4GB. >>>>> >>>>> "When I run it on our YARN cluster with a single container, it works >>>>> correctly. When I tried it with 5 containers, it gets hung after >>>>> consuming >>>>> the bootstrap topic." >>>>> -- Have you figure it out? I have a looked at your log and also the >>>>> code. My suspect is that, there is a null enveloper somehow blocking >> the >>>>> process. If you can paste the trace level log, it will be more helpful >>>>> because many logs in chooser are trace level. >>>>> >>>>> Thanks, >>>>> >>>>> Fang, Yan >>>>> yanfang...@gmail.com >>>>> >>>>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoo...@gmail.com> >>>>> wrote: >>>>> >>>>>> I need some help. I have a job which bootstraps one stream and then >> is >>>>>> supposed to read from two. When I run it on our YARN cluster with a >>>>> single >>>>>> container, it works correctly. When I tried it with 5 containers, it >>>>> gets >>>>>> hung after consuming the bootstrap topic. I ran it with the grid >>>>> script on >>>>>> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns >> one >>>>>> container and still hangs after bootstrap. >>>>>> >>>>>> Debug logs are here: http://pastebin.com/af3KPvju >>>>>> >>>>>> I looked at JMX metrics and see: >>>>>> - Task Metrics - no value for kafka offset of non-bootstrapped stream >>>>>> - SystemConsumerMetrics >>>>>> - choose null keeps incrementing >>>>>> - ssps-needed-by-chooser 1 >>>>>> - unprocessed-messages 62k >>>>>> - Bootstrapping Chooser >>>>>> - lagging partitions 4 >>>>>> - laggin-batch-streams - 4 >>>>>> - batch-resets - 0 >>>>>> >>>>>> Has anyone seen this or can offer ideas of how to better debug it? >>>>>> >>>>>> I'm using Samza 0.9.0 and YARN 2.4.0. >>>>>> >>>>>> Thanks! >>>>>> >>>>>> Roger >>