Hi Roger,
" but it only spawns one container and still hangs after bootstrap"
-- this probably is due to your local machine does not have enough
resource for the second container. Because I checked your log file, each
container is about 4GB.
"When I run it on our YARN cluster with a single container, it works
correctly. When I tried it with 5 containers, it gets hung after consuming
the bootstrap topic."
-- Have you figure it out? I have a looked at your log and also the
code. My suspect is that, there is a null enveloper somehow blocking the
process. If you can paste the trace level log, it will be more helpful
because many logs in chooser are trace level.
Thanks,
Fang, Yan
[email protected]
On Thu, Jun 18, 2015 at 5:20 PM, Roger Hoover <[email protected]>
wrote:
> I need some help. I have a job which bootstraps one stream and then is
> supposed to read from two. When I run it on our YARN cluster with a single
> container, it works correctly. When I tried it with 5 containers, it gets
> hung after consuming the bootstrap topic. I ran it with the grid script on
> my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one
> container and still hangs after bootstrap.
>
> Debug logs are here: http://pastebin.com/af3KPvju
>
> I looked at JMX metrics and see:
> - Task Metrics - no value for kafka offset of non-bootstrapped stream
> - SystemConsumerMetrics
> - choose null keeps incrementing
> - ssps-needed-by-chooser 1
> - unprocessed-messages 62k
> - Bootstrapping Chooser
> - lagging partitions 4
> - laggin-batch-streams - 4
> - batch-resets - 0
>
> Has anyone seen this or can offer ideas of how to better debug it?
>
> I'm using Samza 0.9.0 and YARN 2.4.0.
>
> Thanks!
>
> Roger
>