Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
Thanks, Yan.  I'll give it a try.

Sent from my iPhone

> On Jun 21, 2015, at 10:02 PM, Yan Fang  wrote:
> 
> Hi Roger,
> 
> I will try to look at the issue tomorrow if my time allows.
> 
> First thing first:
> 
> The build has some unexpected results. A quick fix:
> 
> 1. apply https://issues.apache.org/jira/browse/SAMZA-712
> 2. add
> 
> sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs =
> []
> 
> at line 126 of build.gradle.
> 
> Sorry for the inconvenience.
> 
> Thanks,
> 
> Fang, Yan
> yanfang...@gmail.com
> 
> On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover 
> wrote:
> 
>> Was looking through the code a little and it looks like the
>> BootstrappingChooser could use the list of SSPs passed into it's register()
>> method to figure out which partitions it need to monitor.
>> 
>> I wanted to try to build Samza to play around with it but I'm getting error
>> trying to build off of both the 0.9.0 and 0.9.1 branches.
>> 
>> thedude:samza (0.9.1) $ ./gradlew clean build
>> 
>> To honour the JVM settings for this build a new JVM will be forked. Please
>> consider using the daemon:
>> http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
>> 
>> :clean
>> 
>> :samza-api:clean
>> 
>> :samza-core_2.10:clean
>> 
>> :samza-kafka_2.10:clean UP-TO-DATE
>> 
>> :samza-kv-inmemory_2.10:clean UP-TO-DATE
>> 
>> :samza-kv-rocksdb_2.10:clean UP-TO-DATE
>> 
>> :samza-kv_2.10:clean UP-TO-DATE
>> 
>> :samza-log4j:clean UP-TO-DATE
>> 
>> :samza-shell:clean UP-TO-DATE
>> 
>> :samza-test_2.10:clean UP-TO-DATE
>> 
>> :samza-yarn_2.10:clean UP-TO-DATE
>> 
>> :assemble UP-TO-DATE
>> 
>> :rat
>> 
>> Rat report: build/rat/rat-report.html
>> 
>> :check
>> 
>> :build
>> 
>> :samza-api:compileJava
>> 
>> :samza-api:processResources UP-TO-DATE
>> 
>> :samza-api:classes
>> 
>> :samza-api:jar
>> 
>> :samza-api:javadoc
>> 
>> 
>> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
>> warning: no @param for ssp
>> 
>>  void setStartingOffset(SystemStreamPartition ssp, String offset);
>> 
>>   ^
>> 
>> 
>> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
>> warning: no @param for offset
>> 
>>  void setStartingOffset(SystemStreamPartition ssp, String offset);
>> 
>>   ^
>> 
>> 2 warnings
>> 
>> :samza-api:javadocJar
>> 
>> :samza-api:sourcesJar
>> 
>> :samza-api:signArchives SKIPPED
>> 
>> :samza-api:assemble
>> 
>> :samza-api:compileTestJava
>> 
>> :samza-api:processTestResources UP-TO-DATE
>> 
>> :samza-api:testClasses
>> 
>> :samza-api:test
>> 
>> :samza-api:check
>> 
>> :samza-api:build
>> 
>> :samza-core_2.10:compileJava
>> 
>> :samza-core_2.10:compileScala
>> 
>> [ant:scalac]
>> 
>> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
>> error: object SamzaObjectMapper is not a member of package
>> org.apache.samza.serializers.model
>> 
>> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper
>> 
>> [ant:scalac]^
>> 
>> [ant:scalac]
>> 
>> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
>> error: object TaskModel is not a member of package
>> org.apache.samza.job.model
>> 
>> [ant:scalac] import org.apache.samza.job.model.TaskModel
>> 
>> [ant:scalac]^
>> 
>> ...
>> 
>> 
>> I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
>> appreciate any help.
>> 
>> Thanks,
>> 
>> Roger
>> 
>> 
>> 
>> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover 
>> wrote:
>> 
>>> I think I see what's happening.
>>> 
>>> When there are 8 tasks and I set yarn.container.count=8, then each
>>> container is responsible for a single task.  However, the
>>> systemStreamLagCounts map (
>> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77
>> )
>>> and laggingSystemStreamPartitions (
>> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83
>> )
>>> are configured to track all partitions for the bootstrap topic rather
>> than
>>> just the one partition assigned to this task.
>>> 
>>> Later in the log, we see that the task/container completed bootstrap for
>>> it's own partition.
>>> 
>>> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
>>> [DEBUG] Bootstrap stream partition is fully caught up:
>>> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
>>> 
>>> but the Bootstrapping Chooser still thinks that the remaining partitions
>>> (assigned to other tasks in other containers) need to be completed.  JMX
>> at
>>> this point shows 7 lagging partitions of the 8 original partition count.
>>> 
>>> I'm wondering why no one has run into this.  Doesn't LinkedIn use
>>> partitioned bootstrapped topics?
>>> 
>>> Thanks,
>>> 
>>> Roger
>>> 
>>> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover 
>>> wrote:
>>

Re: Samza hung after bootstrapping

2015-06-21 Thread Yan Fang
Hi Roger,

I will try to look at the issue tomorrow if my time allows.

First thing first:

The build has some unexpected results. A quick fix:

1. apply https://issues.apache.org/jira/browse/SAMZA-712
2. add

sourceSets.main.scala.srcDir "src/main/java" sourceSets.main.java.srcDirs =
[]

at line 126 of build.gradle.

Sorry for the inconvenience.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Sun, Jun 21, 2015 at 3:55 PM, Roger Hoover 
wrote:

> Was looking through the code a little and it looks like the
> BootstrappingChooser could use the list of SSPs passed into it's register()
> method to figure out which partitions it need to monitor.
>
> I wanted to try to build Samza to play around with it but I'm getting error
> trying to build off of both the 0.9.0 and 0.9.1 branches.
>
> thedude:samza (0.9.1) $ ./gradlew clean build
>
> To honour the JVM settings for this build a new JVM will be forked. Please
> consider using the daemon:
> http://gradle.org/docs/2.0/userguide/gradle_daemon.html.
>
> :clean
>
> :samza-api:clean
>
> :samza-core_2.10:clean
>
> :samza-kafka_2.10:clean UP-TO-DATE
>
> :samza-kv-inmemory_2.10:clean UP-TO-DATE
>
> :samza-kv-rocksdb_2.10:clean UP-TO-DATE
>
> :samza-kv_2.10:clean UP-TO-DATE
>
> :samza-log4j:clean UP-TO-DATE
>
> :samza-shell:clean UP-TO-DATE
>
> :samza-test_2.10:clean UP-TO-DATE
>
> :samza-yarn_2.10:clean UP-TO-DATE
>
> :assemble UP-TO-DATE
>
> :rat
>
> Rat report: build/rat/rat-report.html
>
> :check
>
> :build
>
> :samza-api:compileJava
>
> :samza-api:processResources UP-TO-DATE
>
> :samza-api:classes
>
> :samza-api:jar
>
> :samza-api:javadoc
>
>
> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
> warning: no @param for ssp
>
>   void setStartingOffset(SystemStreamPartition ssp, String offset);
>
>^
>
>
> /Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
> warning: no @param for offset
>
>   void setStartingOffset(SystemStreamPartition ssp, String offset);
>
>^
>
> 2 warnings
>
> :samza-api:javadocJar
>
> :samza-api:sourcesJar
>
> :samza-api:signArchives SKIPPED
>
> :samza-api:assemble
>
> :samza-api:compileTestJava
>
> :samza-api:processTestResources UP-TO-DATE
>
> :samza-api:testClasses
>
> :samza-api:test
>
> :samza-api:check
>
> :samza-api:build
>
> :samza-core_2.10:compileJava
>
> :samza-core_2.10:compileScala
>
> [ant:scalac]
>
> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
> error: object SamzaObjectMapper is not a member of package
> org.apache.samza.serializers.model
>
> [ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper
>
> [ant:scalac]^
>
> [ant:scalac]
>
> /Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
> error: object TaskModel is not a member of package
> org.apache.samza.job.model
>
> [ant:scalac] import org.apache.samza.job.model.TaskModel
>
> [ant:scalac]^
>
> ...
>
>
> I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
> appreciate any help.
>
> Thanks,
>
> Roger
>
>
>
> On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover 
> wrote:
>
> > I think I see what's happening.
> >
> > When there are 8 tasks and I set yarn.container.count=8, then each
> > container is responsible for a single task.  However, the
> > systemStreamLagCounts map (
> >
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77
> )
> > and laggingSystemStreamPartitions (
> >
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83
> )
> > are configured to track all partitions for the bootstrap topic rather
> than
> > just the one partition assigned to this task.
> >
> > Later in the log, we see that the task/container completed bootstrap for
> > it's own partition.
> >
> > 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
> > [DEBUG] Bootstrap stream partition is fully caught up:
> > SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
> >
> > but the Bootstrapping Chooser still thinks that the remaining partitions
> > (assigned to other tasks in other containers) need to be completed.  JMX
> at
> > this point shows 7 lagging partitions of the 8 original partition count.
> >
> > I'm wondering why no one has run into this.  Doesn't LinkedIn use
> > partitioned bootstrapped topics?
> >
> > Thanks,
> >
> > Roger
> >
> > On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover 
> > wrote:
> >
> >> Hi Yan,
> >>
> >> I've uploaded a file with TRACE level logging here:
> >> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
> >>
> >> I really appreciate your help as this is a critical issue for me.
> >>
> >> Thanks,
> >>
> >> Roger
> >>
> >> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang 
> wrote:
> >>
> >>> Hi Roger,
> >>>
> >>> " but it onl

Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
Was looking through the code a little and it looks like the
BootstrappingChooser could use the list of SSPs passed into it's register()
method to figure out which partitions it need to monitor.

I wanted to try to build Samza to play around with it but I'm getting error
trying to build off of both the 0.9.0 and 0.9.1 branches.

thedude:samza (0.9.1) $ ./gradlew clean build

To honour the JVM settings for this build a new JVM will be forked. Please
consider using the daemon:
http://gradle.org/docs/2.0/userguide/gradle_daemon.html.

:clean

:samza-api:clean

:samza-core_2.10:clean

:samza-kafka_2.10:clean UP-TO-DATE

:samza-kv-inmemory_2.10:clean UP-TO-DATE

:samza-kv-rocksdb_2.10:clean UP-TO-DATE

:samza-kv_2.10:clean UP-TO-DATE

:samza-log4j:clean UP-TO-DATE

:samza-shell:clean UP-TO-DATE

:samza-test_2.10:clean UP-TO-DATE

:samza-yarn_2.10:clean UP-TO-DATE

:assemble UP-TO-DATE

:rat

Rat report: build/rat/rat-report.html

:check

:build

:samza-api:compileJava

:samza-api:processResources UP-TO-DATE

:samza-api:classes

:samza-api:jar

:samza-api:javadoc

/Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
warning: no @param for ssp

  void setStartingOffset(SystemStreamPartition ssp, String offset);

   ^

/Users/rhoover/Work/samza/samza-api/src/main/java/org/apache/samza/task/TaskContext.java:49:
warning: no @param for offset

  void setStartingOffset(SystemStreamPartition ssp, String offset);

   ^

2 warnings

:samza-api:javadocJar

:samza-api:sourcesJar

:samza-api:signArchives SKIPPED

:samza-api:assemble

:samza-api:compileTestJava

:samza-api:processTestResources UP-TO-DATE

:samza-api:testClasses

:samza-api:test

:samza-api:check

:samza-api:build

:samza-core_2.10:compileJava

:samza-core_2.10:compileScala

[ant:scalac]
/Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:43:
error: object SamzaObjectMapper is not a member of package
org.apache.samza.serializers.model

[ant:scalac] import org.apache.samza.serializers.model.SamzaObjectMapper

[ant:scalac]^

[ant:scalac]
/Users/rhoover/Work/samza/samza-core/src/main/scala/org/apache/samza/coordinator/JobCoordinator.scala:40:
error: object TaskModel is not a member of package
org.apache.samza.job.model

[ant:scalac] import org.apache.samza.job.model.TaskModel

[ant:scalac]^

...


I've got JDK 8 installed.  Wondering that makes a difference or not.  I'd
appreciate any help.

Thanks,

Roger



On Sun, Jun 21, 2015 at 1:02 PM, Roger Hoover 
wrote:

> I think I see what's happening.
>
> When there are 8 tasks and I set yarn.container.count=8, then each
> container is responsible for a single task.  However, the
> systemStreamLagCounts map (
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77)
> and laggingSystemStreamPartitions (
> https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83)
> are configured to track all partitions for the bootstrap topic rather than
> just the one partition assigned to this task.
>
> Later in the log, we see that the task/container completed bootstrap for
> it's own partition.
>
> 2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
> [DEBUG] Bootstrap stream partition is fully caught up:
> SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]
>
> but the Bootstrapping Chooser still thinks that the remaining partitions
> (assigned to other tasks in other containers) need to be completed.  JMX at
> this point shows 7 lagging partitions of the 8 original partition count.
>
> I'm wondering why no one has run into this.  Doesn't LinkedIn use
> partitioned bootstrapped topics?
>
> Thanks,
>
> Roger
>
> On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover 
> wrote:
>
>> Hi Yan,
>>
>> I've uploaded a file with TRACE level logging here:
>> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
>>
>> I really appreciate your help as this is a critical issue for me.
>>
>> Thanks,
>>
>> Roger
>>
>> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang  wrote:
>>
>>> Hi Roger,
>>>
>>> " but it only spawns one container and still hangs after bootstrap"
>>> -- this probably is due to your local machine does not have enough
>>> resource for the second container. Because I checked your log file, each
>>> container is about 4GB.
>>>
>>> "When I run it on our YARN cluster with a single container, it works
>>> correctly.  When I tried it with 5 containers, it gets hung after
>>> consuming
>>> the bootstrap topic."
>>>-- Have you figure it out? I have a looked at your log and also the
>>> code. My suspect is that, there is a null enveloper somehow blocking the
>>> process. If you can paste the trace level log, it will be more helpful
>>> because many logs in chooser are trace level.
>>>
>>> Thanks,
>>>
>>> Fang, Yan
>>> yanfang...@gmail.com
>>>
>

Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
I think I see what's happening.

When there are 8 tasks and I set yarn.container.count=8, then each
container is responsible for a single task.  However, the
systemStreamLagCounts map (
https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L77)
and laggingSystemStreamPartitions (
https://github.com/apache/samza/blob/0.9.0/samza-core/src/main/scala/org/apache/samza/system/chooser/BootstrappingChooser.scala#L83)
are configured to track all partitions for the bootstrap topic rather than
just the one partition assigned to this task.

Later in the log, we see that the task/container completed bootstrap for
it's own partition.

2015-06-21 12:28:55 org.apache.samza.system.chooser.BootstrappingChooser
[DEBUG] Bootstrap stream partition is fully caught up:
SystemStreamPartition [kafka, deploy.svc.tlrnsZOYQA6wrwAA4FLqZA, 0]

but the Bootstrapping Chooser still thinks that the remaining partitions
(assigned to other tasks in other containers) need to be completed.  JMX at
this point shows 7 lagging partitions of the 8 original partition count.

I'm wondering why no one has run into this.  Doesn't LinkedIn use
partitioned bootstrapped topics?

Thanks,

Roger

On Sun, Jun 21, 2015 at 12:22 PM, Roger Hoover 
wrote:

> Hi Yan,
>
> I've uploaded a file with TRACE level logging here:
> http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz
>
> I really appreciate your help as this is a critical issue for me.
>
> Thanks,
>
> Roger
>
> On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang  wrote:
>
>> Hi Roger,
>>
>> " but it only spawns one container and still hangs after bootstrap"
>> -- this probably is due to your local machine does not have enough
>> resource for the second container. Because I checked your log file, each
>> container is about 4GB.
>>
>> "When I run it on our YARN cluster with a single container, it works
>> correctly.  When I tried it with 5 containers, it gets hung after
>> consuming
>> the bootstrap topic."
>>-- Have you figure it out? I have a looked at your log and also the
>> code. My suspect is that, there is a null enveloper somehow blocking the
>> process. If you can paste the trace level log, it will be more helpful
>> because many logs in chooser are trace level.
>>
>> Thanks,
>>
>> Fang, Yan
>> yanfang...@gmail.com
>>
>> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover 
>> wrote:
>>
>> > I need some help.  I have a job which bootstraps one stream and then is
>> > supposed to read from two.  When I run it on our YARN cluster with a
>> single
>> > container, it works correctly.  When I tried it with 5 containers, it
>> gets
>> > hung after consuming the bootstrap topic.  I ran it with the grid
>> script on
>> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
>> > container and still hangs after bootstrap.
>> >
>> > Debug logs are here: http://pastebin.com/af3KPvju
>> >
>> > I looked at JMX metrics and see:
>> > - Task Metrics - no value for kafka offset of non-bootstrapped stream
>> > -  SystemConsumerMetrics
>> > - choose null keeps incrementing
>> >  - ssps-needed-by-chooser 1
>> >   - unprocessed-messages 62k
>> > - Bootstrapping Chooser
>> >   - lagging partitions 4
>> >   - laggin-batch-streams - 4
>> >   - batch-resets - 0
>> >
>> > Has anyone seen this or can offer ideas of how to better debug it?
>> >
>> > I'm using Samza 0.9.0 and YARN 2.4.0.
>> >
>> > Thanks!
>> >
>> > Roger
>> >
>>
>
>


Re: Samza hung after bootstrapping

2015-06-21 Thread Roger Hoover
Hi Yan,

I've uploaded a file with TRACE level logging here:
http://filebin.ca/261yhsTZcZQZ/samza-container-0.log.gz

I really appreciate your help as this is a critical issue for me.

Thanks,

Roger

On Fri, Jun 19, 2015 at 12:05 PM, Yan Fang  wrote:

> Hi Roger,
>
> " but it only spawns one container and still hangs after bootstrap"
> -- this probably is due to your local machine does not have enough
> resource for the second container. Because I checked your log file, each
> container is about 4GB.
>
> "When I run it on our YARN cluster with a single container, it works
> correctly.  When I tried it with 5 containers, it gets hung after consuming
> the bootstrap topic."
>-- Have you figure it out? I have a looked at your log and also the
> code. My suspect is that, there is a null enveloper somehow blocking the
> process. If you can paste the trace level log, it will be more helpful
> because many logs in chooser are trace level.
>
> Thanks,
>
> Fang, Yan
> yanfang...@gmail.com
>
> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover 
> wrote:
>
> > I need some help.  I have a job which bootstraps one stream and then is
> > supposed to read from two.  When I run it on our YARN cluster with a
> single
> > container, it works correctly.  When I tried it with 5 containers, it
> gets
> > hung after consuming the bootstrap topic.  I ran it with the grid script
> on
> > my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
> > container and still hangs after bootstrap.
> >
> > Debug logs are here: http://pastebin.com/af3KPvju
> >
> > I looked at JMX metrics and see:
> > - Task Metrics - no value for kafka offset of non-bootstrapped stream
> > -  SystemConsumerMetrics
> > - choose null keeps incrementing
> >  - ssps-needed-by-chooser 1
> >   - unprocessed-messages 62k
> > - Bootstrapping Chooser
> >   - lagging partitions 4
> >   - laggin-batch-streams - 4
> >   - batch-resets - 0
> >
> > Has anyone seen this or can offer ideas of how to better debug it?
> >
> > I'm using Samza 0.9.0 and YARN 2.4.0.
> >
> > Thanks!
> >
> > Roger
> >
>


Re: Samza hung after bootstrapping

2015-06-20 Thread Roger Hoover
Thank you, Yan.  I'll get a trace level log as soon as I can.

Sent from my iPhone

> On Jun 19, 2015, at 12:05 PM, Yan Fang  wrote:
> 
> Hi Roger,
> 
> " but it only spawns one container and still hangs after bootstrap"
>-- this probably is due to your local machine does not have enough
> resource for the second container. Because I checked your log file, each
> container is about 4GB.
> 
> "When I run it on our YARN cluster with a single container, it works
> correctly.  When I tried it with 5 containers, it gets hung after consuming
> the bootstrap topic."
>   -- Have you figure it out? I have a looked at your log and also the
> code. My suspect is that, there is a null enveloper somehow blocking the
> process. If you can paste the trace level log, it will be more helpful
> because many logs in chooser are trace level.
> 
> Thanks,
> 
> Fang, Yan
> yanfang...@gmail.com
> 
> On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover 
> wrote:
> 
>> I need some help.  I have a job which bootstraps one stream and then is
>> supposed to read from two.  When I run it on our YARN cluster with a single
>> container, it works correctly.  When I tried it with 5 containers, it gets
>> hung after consuming the bootstrap topic.  I ran it with the grid script on
>> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
>> container and still hangs after bootstrap.
>> 
>> Debug logs are here: http://pastebin.com/af3KPvju
>> 
>> I looked at JMX metrics and see:
>> - Task Metrics - no value for kafka offset of non-bootstrapped stream
>> -  SystemConsumerMetrics
>>- choose null keeps incrementing
>> - ssps-needed-by-chooser 1
>>  - unprocessed-messages 62k
>> - Bootstrapping Chooser
>>  - lagging partitions 4
>>  - laggin-batch-streams - 4
>>  - batch-resets - 0
>> 
>> Has anyone seen this or can offer ideas of how to better debug it?
>> 
>> I'm using Samza 0.9.0 and YARN 2.4.0.
>> 
>> Thanks!
>> 
>> Roger
>> 


Re: Samza hung after bootstrapping

2015-06-19 Thread Yan Fang
Hi Roger,

" but it only spawns one container and still hangs after bootstrap"
-- this probably is due to your local machine does not have enough
resource for the second container. Because I checked your log file, each
container is about 4GB.

"When I run it on our YARN cluster with a single container, it works
correctly.  When I tried it with 5 containers, it gets hung after consuming
the bootstrap topic."
   -- Have you figure it out? I have a looked at your log and also the
code. My suspect is that, there is a null enveloper somehow blocking the
process. If you can paste the trace level log, it will be more helpful
because many logs in chooser are trace level.

Thanks,

Fang, Yan
yanfang...@gmail.com

On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover 
wrote:

> I need some help.  I have a job which bootstraps one stream and then is
> supposed to read from two.  When I run it on our YARN cluster with a single
> container, it works correctly.  When I tried it with 5 containers, it gets
> hung after consuming the bootstrap topic.  I ran it with the grid script on
> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
> container and still hangs after bootstrap.
>
> Debug logs are here: http://pastebin.com/af3KPvju
>
> I looked at JMX metrics and see:
> - Task Metrics - no value for kafka offset of non-bootstrapped stream
> -  SystemConsumerMetrics
> - choose null keeps incrementing
>  - ssps-needed-by-chooser 1
>   - unprocessed-messages 62k
> - Bootstrapping Chooser
>   - lagging partitions 4
>   - laggin-batch-streams - 4
>   - batch-resets - 0
>
> Has anyone seen this or can offer ideas of how to better debug it?
>
> I'm using Samza 0.9.0 and YARN 2.4.0.
>
> Thanks!
>
> Roger
>


Samza hung after bootstrapping

2015-06-18 Thread Roger Hoover
I need some help.  I have a job which bootstraps one stream and then is
supposed to read from two.  When I run it on our YARN cluster with a single
container, it works correctly.  When I tried it with 5 containers, it gets
hung after consuming the bootstrap topic.  I ran it with the grid script on
my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
container and still hangs after bootstrap.

Debug logs are here: http://pastebin.com/af3KPvju

I looked at JMX metrics and see:
- Task Metrics - no value for kafka offset of non-bootstrapped stream
-  SystemConsumerMetrics
- choose null keeps incrementing
 - ssps-needed-by-chooser 1
  - unprocessed-messages 62k
- Bootstrapping Chooser
  - lagging partitions 4
  - laggin-batch-streams - 4
  - batch-resets - 0

Has anyone seen this or can offer ideas of how to better debug it?

I'm using Samza 0.9.0 and YARN 2.4.0.

Thanks!

Roger