I need some help. I have a job which bootstraps one stream and then is supposed to read from two. When I run it on our YARN cluster with a single container, it works correctly. When I tried it with 5 containers, it gets hung after consuming the bootstrap topic. I ran it with the grid script on my laptop (Mac OS X) with yarn.container.count=2 but it only spawns one container and still hangs after bootstrap.
Debug logs are here: http://pastebin.com/af3KPvju I looked at JMX metrics and see: - Task Metrics - no value for kafka offset of non-bootstrapped stream - SystemConsumerMetrics - choose null keeps incrementing - ssps-needed-by-chooser 1 - unprocessed-messages 62k - Bootstrapping Chooser - lagging partitions 4 - laggin-batch-streams - 4 - batch-resets - 0 Has anyone seen this or can offer ideas of how to better debug it? I'm using Samza 0.9.0 and YARN 2.4.0. Thanks! Roger