Re: Samza hung after bootstrapping
Thanks, Yan. I'll give it a try. Sent from my iPhone > On Jun 21, 2015, at 10:02 PM, Yan Fang wrote: > > Hi Roger, > > I will try to look at the issue tomorrow if my time allows. > > First thing first: > > The build has some unexpected results. A quick fix: > > 1. apply https://issues.apache.org/jira/browse/SAMZA-712 > 2. add > > sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs = > [] > > at line 126 of build.gradle. > > Sorry for the inconvenience. > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover > wrote: > >> Was looking through the code a little and it looks like the >> BootstrappingChooser could use the list of SSPs passed into it's register() >> method to figure out which partitions it need to monitor. >> >> I wanted to try to build Samza to play around with it but I'm getting error >> trying to build off of both the 0.9.0 and 0.9.1 branches. >> >> thedude:samza (0.9.1) $ ./gradlew clean build >> >> To honour the JVM settings for this build a new JVM will be forked. Please >> consider using the daemon: >> http://gradle.org/docs/2.0/userguide/gradle_daemon.html. >> >> :clean >> >> :samza-api:clean >> >> :samza-core_2.10:clean >> >> :samza-kafka_2.10:clean UP-TO-DATE >> >> :samza-kv-inmemory_2.10:clean UP-TO-DATE >> >> :samza-kv-rocksdb_2.10:clean UP-TO-DATE >> >> :samza-kv_2.10:clean UP-TO-DATE >> >> :samza-log4j:clean UP-TO-DATE >> >> :samza-shell:clean UP-TO-DATE >> >> :samza-test_2.10:clean UP-TO-DATE >> >> :samza-yarn_2.10:clean UP-TO-DATE >> >> :assemble UP-TO-DATE >> >> :rat >> >> Rat report: build/rat/rat-report.html >> >> :check >> >> :build >> >> :samza-api:compileJava >> >> :samza-api:processResources UP-TO-DATE >> >> :samza-api:classes >> >> :samza-api:jar >> >> :samza-api:javadoc >> >> >> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: >> warning: no @param for ssp >> >> void setStartingOffset(SystemStreamPartition ssp, String offset); >> >> ^ >> >> >> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: >> warning: no @param for offset >> >> void setStartingOffset(SystemStreamPartition ssp, String offset); >> >> ^ >> >> 2 warnings >> >> :samza-api:javadocJar >> >> :samza-api:sourcesJar >> >> :samza-api:signArchives SKIPPED >> >> :samza-api:assemble >> >> :samza-api:compileTestJava >> >> :samza-api:processTestResources UP-TO-DATE >> >> :samza-api:testClasses >> >> :samza-api:test >> >> :samza-api:check >> >> :samza-api:build >> >> :samza-core_2.10:compileJava >> >> :samza-core_2.10:compileScala >> >> [ant:scalac] >> >> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: >> error: object SamzaObjectMapper is not a member of package >> org.apache.samza.serializers.model >> >> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper >> >> [ant:scalac]^ >> >> [ant:scalac] >> >> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: >> error: object TaskModel is not a member of package >> org.apache.samza.job.model >> >> [ant:scalac] import org.apache.samza.job.model.TaskModel >> >> [ant:scalac]^ >> >> ... >> >> >> I've got JDK 8 installed. Wondering that makes a difference or not. I'd >> appreciate any help. >> >> Thanks, >> >> Roger >> >> >> >> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover >> wrote: >> >>> I think I see what's happening. >>> >>> When there are 8 tasks and I set yarn.container.count=8, then each >>> container is responsible for a single task. However, the >>> systemStreamLagCounts map ( >> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77 >> ) >>> and laggingSystemStreamPartitions ( >> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83 >> ) >>> are configured to track all partitions for the bootstrap topic rather >> than >>> just the one partition assigned to this task. >>> >>> Later in the log, we see that the task/container completed bootstrap for >>> it's own partition. >>> >>> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser >>> [DEBUG] Bootstrap stream partition is fully caught up: >>> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] >>> >>> but the Bootstrapping Chooser still thinks that the remaining partitions >>> (assigned to other tasks in other containers) need to be completed. JMX >> at >>> this point shows 7 lagging partitions of the 8 original partition count. >>> >>> I'm wondering why no one has run into this. Doesn't LinkedIn use >>> partitioned bootstrapped topics? >>> >>> Thanks, >>> >>> Roger >>> >>> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover >>> wrote: >>
Re: Samza hung after bootstrapping
Hi Roger, I will try to look at the issue tomorrow if my time allows. First thing first: The build has some unexpected results. A quick fix: 1. apply https://issues.apache.org/jira/browse/SAMZA-712 2. add sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs = [] at line 126 of build.gradle. Sorry for the inconvenience. Thanks, Fang, Yan yanfang...@gmail.com On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover wrote: > Was looking through the code a little and it looks like the > BootstrappingChooser could use the list of SSPs passed into it's register() > method to figure out which partitions it need to monitor. > > I wanted to try to build Samza to play around with it but I'm getting error > trying to build off of both the 0.9.0 and 0.9.1 branches. > > thedude:samza (0.9.1) $ ./gradlew clean build > > To honour the JVM settings for this build a new JVM will be forked. Please > consider using the daemon: > http://gradle.org/docs/2.0/userguide/gradle_daemon.html. > > :clean > > :samza-api:clean > > :samza-core_2.10:clean > > :samza-kafka_2.10:clean UP-TO-DATE > > :samza-kv-inmemory_2.10:clean UP-TO-DATE > > :samza-kv-rocksdb_2.10:clean UP-TO-DATE > > :samza-kv_2.10:clean UP-TO-DATE > > :samza-log4j:clean UP-TO-DATE > > :samza-shell:clean UP-TO-DATE > > :samza-test_2.10:clean UP-TO-DATE > > :samza-yarn_2.10:clean UP-TO-DATE > > :assemble UP-TO-DATE > > :rat > > Rat report: build/rat/rat-report.html > > :check > > :build > > :samza-api:compileJava > > :samza-api:processResources UP-TO-DATE > > :samza-api:classes > > :samza-api:jar > > :samza-api:javadoc > > > /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: > warning: no @param for ssp > > void setStartingOffset(SystemStreamPartition ssp, String offset); > >^ > > > /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: > warning: no @param for offset > > void setStartingOffset(SystemStreamPartition ssp, String offset); > >^ > > 2 warnings > > :samza-api:javadocJar > > :samza-api:sourcesJar > > :samza-api:signArchives SKIPPED > > :samza-api:assemble > > :samza-api:compileTestJava > > :samza-api:processTestResources UP-TO-DATE > > :samza-api:testClasses > > :samza-api:test > > :samza-api:check > > :samza-api:build > > :samza-core_2.10:compileJava > > :samza-core_2.10:compileScala > > [ant:scalac] > > /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: > error: object SamzaObjectMapper is not a member of package > org.apache.samza.serializers.model > > [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper > > [ant:scalac]^ > > [ant:scalac] > > /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: > error: object TaskModel is not a member of package > org.apache.samza.job.model > > [ant:scalac] import org.apache.samza.job.model.TaskModel > > [ant:scalac]^ > > ... > > > I've got JDK 8 installed. Wondering that makes a difference or not. I'd > appreciate any help. > > Thanks, > > Roger > > > > On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover > wrote: > > > I think I see what's happening. > > > > When there are 8 tasks and I set yarn.container.count=8, then each > > container is responsible for a single task. However, the > > systemStreamLagCounts map ( > > > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77 > ) > > and laggingSystemStreamPartitions ( > > > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83 > ) > > are configured to track all partitions for the bootstrap topic rather > than > > just the one partition assigned to this task. > > > > Later in the log, we see that the task/container completed bootstrap for > > it's own partition. > > > > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser > > [DEBUG] Bootstrap stream partition is fully caught up: > > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] > > > > but the Bootstrapping Chooser still thinks that the remaining partitions > > (assigned to other tasks in other containers) need to be completed. JMX > at > > this point shows 7 lagging partitions of the 8 original partition count. > > > > I'm wondering why no one has run into this. Doesn't LinkedIn use > > partitioned bootstrapped topics? > > > > Thanks, > > > > Roger > > > > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover > > wrote: > > > >> Hi Yan, > >> > >> I've uploaded a file with TRACE level logging here: > >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz > >> > >> I really appreciate your help as this is a critical issue for me. > >> > >> Thanks, > >> > >> Roger > >> > >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang > wrote: > >> > >>> Hi Roger, > >>> > >>> " but it onl
Re: Samza hung after bootstrapping
Was looking through the code a little and it looks like the BootstrappingChooser could use the list of SSPs passed into it's register() method to figure out which partitions it need to monitor. I wanted to try to build Samza to play around with it but I'm getting error trying to build off of both the 0.9.0 and 0.9.1 branches. thedude:samza (0.9.1) $ ./gradlew clean build To honour the JVM settings for this build a new JVM will be forked. Please consider using the daemon: http://gradle.org/docs/2.0/userguide/gradle_daemon.html. :clean :samza-api:clean :samza-core_2.10:clean :samza-kafka_2.10:clean UP-TO-DATE :samza-kv-inmemory_2.10:clean UP-TO-DATE :samza-kv-rocksdb_2.10:clean UP-TO-DATE :samza-kv_2.10:clean UP-TO-DATE :samza-log4j:clean UP-TO-DATE :samza-shell:clean UP-TO-DATE :samza-test_2.10:clean UP-TO-DATE :samza-yarn_2.10:clean UP-TO-DATE :assemble UP-TO-DATE :rat Rat report: build/rat/rat-report.html :check :build :samza-api:compileJava :samza-api:processResources UP-TO-DATE :samza-api:classes :samza-api:jar :samza-api:javadoc /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for ssp void setStartingOffset(SystemStreamPartition ssp, String offset); ^ /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49: warning: no @param for offset void setStartingOffset(SystemStreamPartition ssp, String offset); ^ 2 warnings :samza-api:javadocJar :samza-api:sourcesJar :samza-api:signArchives SKIPPED :samza-api:assemble :samza-api:compileTestJava :samza-api:processTestResources UP-TO-DATE :samza-api:testClasses :samza-api:test :samza-api:check :samza-api:build :samza-core_2.10:compileJava :samza-core_2.10:compileScala [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43: error: object SamzaObjectMapper is not a member of package org.apache.samza.serializers.model [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper [ant:scalac]^ [ant:scalac] /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40: error: object TaskModel is not a member of package org.apache.samza.job.model [ant:scalac] import org.apache.samza.job.model.TaskModel [ant:scalac]^ ... I've got JDK 8 installed. Wondering that makes a difference or not. I'd appreciate any help. Thanks, Roger On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover wrote: > I think I see what's happening. > > When there are 8 tasks and I set yarn.container.count=8, then each > container is responsible for a single task. However, the > systemStreamLagCounts map ( > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77) > and laggingSystemStreamPartitions ( > https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83) > are configured to track all partitions for the bootstrap topic rather than > just the one partition assigned to this task. > > Later in the log, we see that the task/container completed bootstrap for > it's own partition. > > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser > [DEBUG] Bootstrap stream partition is fully caught up: > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] > > but the Bootstrapping Chooser still thinks that the remaining partitions > (assigned to other tasks in other containers) need to be completed. JMX at > this point shows 7 lagging partitions of the 8 original partition count. > > I'm wondering why no one has run into this. Doesn't LinkedIn use > partitioned bootstrapped topics? > > Thanks, > > Roger > > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover > wrote: > >> Hi Yan, >> >> I've uploaded a file with TRACE level logging here: >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz >> >> I really appreciate your help as this is a critical issue for me. >> >> Thanks, >> >> Roger >> >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang wrote: >> >>> Hi Roger, >>> >>> " but it only spawns one container and still hangs after bootstrap" >>> -- this probably is due to your local machine does not have enough >>> resource for the second container. Because I checked your log file, each >>> container is about 4GB. >>> >>> "When I run it on our YARN cluster with a single container, it works >>> correctly. When I tried it with 5 containers, it gets hung after >>> consuming >>> the bootstrap topic." >>>-- Have you figure it out? I have a looked at your log and also the >>> code. My suspect is that, there is a null enveloper somehow blocking the >>> process. If you can paste the trace level log, it will be more helpful >>> because many logs in chooser are trace level. >>> >>> Thanks, >>> >>> Fang, Yan >>> yanfang...@gmail.com >>> >
Re: Samza hung after bootstrapping
I think I see what's happening. When there are 8 tasks and I set yarn.container.count=8, then each container is responsible for a single task. However, the systemStreamLagCounts map ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77) and laggingSystemStreamPartitions ( https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83) are configured to track all partitions for the bootstrap topic rather than just the one partition assigned to this task. Later in the log, we see that the task/container completed bootstrap for it's own partition. 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser [DEBUG] Bootstrap stream partition is fully caught up: SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0] but the Bootstrapping Chooser still thinks that the remaining partitions (assigned to other tasks in other containers) need to be completed. JMX at this point shows 7 lagging partitions of the 8 original partition count. I'm wondering why no one has run into this. Doesn't LinkedIn use partitioned bootstrapped topics? Thanks, Roger On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover wrote: > Hi Yan, > > I've uploaded a file with TRACE level logging here: > http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz > > I really appreciate your help as this is a critical issue for me. > > Thanks, > > Roger > > On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang wrote: > >> Hi Roger, >> >> " but it only spawns one container and still hangs after bootstrap" >> -- this probably is due to your local machine does not have enough >> resource for the second container. Because I checked your log file, each >> container is about 4GB. >> >> "When I run it on our YARN cluster with a single container, it works >> correctly. When I tried it with 5 containers, it gets hung after >> consuming >> the bootstrap topic." >>-- Have you figure it out? I have a looked at your log and also the >> code. My suspect is that, there is a null enveloper somehow blocking the >> process. If you can paste the trace level log, it will be more helpful >> because many logs in chooser are trace level. >> >> Thanks, >> >> Fang, Yan >> yanfang...@gmail.com >> >> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover >> wrote: >> >> > I need some help. I have a job which bootstraps one stream and then is >> > supposed to read from two. When I run it on our YARN cluster with a >> single >> > container, it works correctly. When I tried it with 5 containers, it >> gets >> > hung after consuming the bootstrap topic. I ran it with the grid >> script on >> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one >> > container and still hangs after bootstrap. >> > >> > Debug logs are here: http://pastebin.com/af3KPvju >> > >> > I looked at JMX metrics and see: >> > - Task Metrics - no value for kafka offset of non-bootstrapped stream >> > - SystemConsumerMetrics >> > - choose null keeps incrementing >> > - ssps-needed-by-chooser 1 >> > - unprocessed-messages 62k >> > - Bootstrapping Chooser >> > - lagging partitions 4 >> > - laggin-batch-streams - 4 >> > - batch-resets - 0 >> > >> > Has anyone seen this or can offer ideas of how to better debug it? >> > >> > I'm using Samza 0.9.0 and YARN 2.4.0. >> > >> > Thanks! >> > >> > Roger >> > >> > >
Re: Samza hung after bootstrapping
Hi Yan, I've uploaded a file with TRACE level logging here: http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz I really appreciate your help as this is a critical issue for me. Thanks, Roger On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang wrote: > Hi Roger, > > " but it only spawns one container and still hangs after bootstrap" > -- this probably is due to your local machine does not have enough > resource for the second container. Because I checked your log file, each > container is about 4GB. > > "When I run it on our YARN cluster with a single container, it works > correctly. When I tried it with 5 containers, it gets hung after consuming > the bootstrap topic." >-- Have you figure it out? I have a looked at your log and also the > code. My suspect is that, there is a null enveloper somehow blocking the > process. If you can paste the trace level log, it will be more helpful > because many logs in chooser are trace level. > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover > wrote: > > > I need some help. I have a job which bootstraps one stream and then is > > supposed to read from two. When I run it on our YARN cluster with a > single > > container, it works correctly. When I tried it with 5 containers, it > gets > > hung after consuming the bootstrap topic. I ran it with the grid script > on > > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one > > container and still hangs after bootstrap. > > > > Debug logs are here: http://pastebin.com/af3KPvju > > > > I looked at JMX metrics and see: > > - Task Metrics - no value for kafka offset of non-bootstrapped stream > > - SystemConsumerMetrics > > - choose null keeps incrementing > > - ssps-needed-by-chooser 1 > > - unprocessed-messages 62k > > - Bootstrapping Chooser > > - lagging partitions 4 > > - laggin-batch-streams - 4 > > - batch-resets - 0 > > > > Has anyone seen this or can offer ideas of how to better debug it? > > > > I'm using Samza 0.9.0 and YARN 2.4.0. > > > > Thanks! > > > > Roger > > >
Re: Samza hung after bootstrapping
Thank you, Yan. I'll get a trace level log as soon as I can. Sent from my iPhone > On Jun 19, 2015, at 12:05 PM, Yan Fang wrote: > > Hi Roger, > > " but it only spawns one container and still hangs after bootstrap" >-- this probably is due to your local machine does not have enough > resource for the second container. Because I checked your log file, each > container is about 4GB. > > "When I run it on our YARN cluster with a single container, it works > correctly. When I tried it with 5 containers, it gets hung after consuming > the bootstrap topic." > -- Have you figure it out? I have a looked at your log and also the > code. My suspect is that, there is a null enveloper somehow blocking the > process. If you can paste the trace level log, it will be more helpful > because many logs in chooser are trace level. > > Thanks, > > Fang, Yan > yanfang...@gmail.com > > On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover > wrote: > >> I need some help. I have a job which bootstraps one stream and then is >> supposed to read from two. When I run it on our YARN cluster with a single >> container, it works correctly. When I tried it with 5 containers, it gets >> hung after consuming the bootstrap topic. I ran it with the grid script on >> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one >> container and still hangs after bootstrap. >> >> Debug logs are here: http://pastebin.com/af3KPvju >> >> I looked at JMX metrics and see: >> - Task Metrics - no value for kafka offset of non-bootstrapped stream >> - SystemConsumerMetrics >>- choose null keeps incrementing >> - ssps-needed-by-chooser 1 >> - unprocessed-messages 62k >> - Bootstrapping Chooser >> - lagging partitions 4 >> - laggin-batch-streams - 4 >> - batch-resets - 0 >> >> Has anyone seen this or can offer ideas of how to better debug it? >> >> I'm using Samza 0.9.0 and YARN 2.4.0. >> >> Thanks! >> >> Roger >>
Re: Samza hung after bootstrapping
Hi Roger, " but it only spawns one container and still hangs after bootstrap" -- this probably is due to your local machine does not have enough resource for the second container. Because I checked your log file, each container is about 4GB. "When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic." -- Have you figure it out? I have a looked at your log and also the code. My suspect is that, there is a null enveloper somehow blocking the process. If you can paste the trace level log, it will be more helpful because many logs in chooser are trace level. Thanks, Fang, Yan yanfang...@gmail.com On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover wrote: > I need some help. I have a job which bootstraps one stream and then is > supposed to read from two. When I run it on our YARN cluster with a single > container, it works correctly. When I tried it with 5 containers, it gets > hung after consuming the bootstrap topic. I ran it with the grid script on > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one > container and still hangs after bootstrap. > > Debug logs are here: http://pastebin.com/af3KPvju > > I looked at JMX metrics and see: > - Task Metrics - no value for kafka offset of non-bootstrapped stream > - SystemConsumerMetrics > - choose null keeps incrementing > - ssps-needed-by-chooser 1 > - unprocessed-messages 62k > - Bootstrapping Chooser > - lagging partitions 4 > - laggin-batch-streams - 4 > - batch-resets - 0 > > Has anyone seen this or can offer ideas of how to better debug it? > > I'm using Samza 0.9.0 and YARN 2.4.0. > > Thanks! > > Roger >
Samza hung after bootstrapping
I need some help. I have a job which bootstraps one stream and then is supposed to read from two. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. I ran it with the grid script on my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one container and still hangs after bootstrap. Debug logs are here: http://pastebin.com/af3KPvju I looked at JMX metrics and see: - Task Metrics - no value for kafka offset of non-bootstrapped stream - SystemConsumerMetrics - choose null keeps incrementing - ssps-needed-by-chooser 1 - unprocessed-messages 62k - Bootstrapping Chooser - lagging partitions 4 - laggin-batch-streams - 4 - batch-resets - 0 Has anyone seen this or can offer ideas of how to better debug it? I'm using Samza 0.9.0 and YARN 2.4.0. Thanks! Roger