Hi Roger, I will try to look at the issue tomorrow if my time allows.
First thing first: The build has some unexpected results. A quick fix: 1. apply https://issues.apache.org/jira/browse/SAMZA-712 2. add sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs = [] at line 126 of build.gradle. Sorry for the inconvenience. Thanks, Fang, Yan yanfang...@gmail.com On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > Was looking through the code a little and it looks like the > BootstrappingChooser could use the list of SSPs passed into it's register() > method to figure out which partitions it need to monitor. > > I wanted to try to build Samza to play around with it but I'm getting error > trying to build off of both the 0.9.0 and 0.9.1 branches. > > thedude:samza (0.9.1) $ ./gradlew clean build > > To honour the JVM settings for this build a new JVM will be forked. Please > consider using the daemon: > http://gradle.org/docs/2.0/userguide/gradle_daemon.html. > > :clean > > :samza-api:clean > > :samza-core_2.10:clean > > :samza-kafka_2.10:clean UP-TO-DATE > > :samza-kv-inmemory_2.10:clean UP-TO-DATE > > :samza-kv-rocksdb_2.10:clean UP-TO-DATE > > :samza-kv_2.10:clean UP-TO-DATE > > :samza-log4j:clean UP-TO-DATE > > :samza-shell:clean UP-TO-DATE > > :samza-test_2.10:clean UP-TO-DATE > > :samza-yarn_2.10:clean UP-TO-DATE > > :assemble UP-TO-DATE > > :rat > > Rat report: build/rat/rat-report.html > > :check > > :build > > :samza-api:compileJava > > :samza-api:processResources UP-TO-DATE > > :samza-api:classes > > :samza-api:jar > > :samza-api:javadoc > > > /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: > warning: no @param for ssp > > void setStartingOffset(SystemStreamPartition ssp, String offset); > > ^ > > > /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: > warning: no @param for offset > > void setStartingOffset(SystemStreamPartition ssp, String offset); > > ^ > > 2 warnings > > :samza-api:javadocJar > > :samza-api:sourcesJar > > :samza-api:signArchives SKIPPED > > :samza-api:assemble > > :samza-api:compileTestJava > > :samza-api:processTestResources UP-TO-DATE > > :samza-api:testClasses > > :samza-api:test > > :samza-api:check > > :samza-api:build > > :samza-core_2.10:compileJava > > :samza-core_2.10:compileScala > > [ant:scalac] > > /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: > error: object SamzaObjectMapper is not a member of package > org.apache.samza.serializers.model > > [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper > > [ant:scalac] ^ > > [ant:scalac] > > /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: > error: object TaskModel is not a member of package > org.apache.samza.job.model > > [ant:scalac] import org.apache.samza.job.model.TaskModel > > [ant:scalac] ^ > > ... > > > I've got JDK 8 installed. Wondering that makes a difference or not. I'd > appreciate any help. > > Thanks, > > Roger > > > > On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover <roger.hoo...@gmail.com> > wrote: > > > I think I see what's happening. > > > > When there are 8 tasks and I set yarn.container.count=8, then each > > container is responsible for a single task. However, the > > systemStreamLagCounts map ( > > > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77 > ) > > and laggingSystemStreamPartitions ( > > > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83 > ) > > are configured to track all partitions for the bootstrap topic rather > than > > just the one partition assigned to this task. > > > > Later in the log, we see that the task/container completed bootstrap for > > it's own partition. > > > > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser > > [DEBUG] Bootstrap stream partition is fully caught up: > > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] > > > > but the Bootstrapping Chooser still thinks that the remaining partitions > > (assigned to other tasks in other containers) need to be completed. JMX > at > > this point shows 7 lagging partitions of the 8 original partition count. > > > > I'm wondering why no one has run into this. Doesn't LinkedIn use > > partitioned bootstrapped topics? > > > > Thanks, > > > > Roger > > > > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover <roger.hoo...@gmail.com> > > wrote: > > > >> Hi Yan, > >> > >> I've uploaded a file with TRACE level logging here: > >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz > >> > >> I really appreciate your help as this is a critical issue for me. > >> > >> Thanks, > >> > >> Roger > >> > >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang <yanfang...@gmail.com> > wrote: > >> > >>> Hi Roger, > >>> > >>> " but it only spawns one container and still hangs after bootstrap" > >>> -- this probably is due to your local machine does not have enough > >>> resource for the second container. Because I checked your log file, > each > >>> container is about 4GB. > >>> > >>> "When I run it on our YARN cluster with a single container, it works > >>> correctly. When I tried it with 5 containers, it gets hung after > >>> consuming > >>> the bootstrap topic." > >>> -- Have you figure it out? I have a looked at your log and also the > >>> code. My suspect is that, there is a null enveloper somehow blocking > the > >>> process. If you can paste the trace level log, it will be more helpful > >>> because many logs in chooser are trace level. > >>> > >>> Thanks, > >>> > >>> Fang, Yan > >>> yanfang...@gmail.com > >>> > >>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <roger.hoo...@gmail.com> > >>> wrote: > >>> > >>> > I need some help. I have a job which bootstraps one stream and then > is > >>> > supposed to read from two. When I run it on our YARN cluster with a > >>> single > >>> > container, it works correctly. When I tried it with 5 containers, it > >>> gets > >>> > hung after consuming the bootstrap topic. I ran it with the grid > >>> script on > >>> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns > one > >>> > container and still hangs after bootstrap. > >>> > > >>> > Debug logs are here: http://pastebin.com/af3KPvju > >>> > > >>> > I looked at JMX metrics and see: > >>> > - Task Metrics - no value for kafka offset of non-bootstrapped stream > >>> > - SystemConsumerMetrics > >>> > - choose null keeps incrementing > >>> > - ssps-needed-by-chooser 1 > >>> > - unprocessed-messages 62k > >>> > - Bootstrapping Chooser > >>> > - lagging partitions 4 > >>> > - laggin-batch-streams - 4 > >>> > - batch-resets - 0 > >>> > > >>> > Has anyone seen this or can offer ideas of how to better debug it? > >>> > > >>> > I'm using Samza 0.9.0 and YARN 2.4.0. > >>> > > >>> > Thanks! > >>> > > >>> > Roger > >>> > > >>> > >> > >> > > >