Hello,
I have had problems running my Samza application on the cluster. The
application starts fine and so the main event loop. As soon I start to send
messages to Kafka, Samza doesn’t start the Kafka system consumer (there are no
logs that state that).
The CPU usage for all containers is about 100% even if I stop producers. It is
like the container is stuck and can’t start the consumer. However Samza can set
the offsets for different partitions. For example in a container I see:
o.a.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for
topic and partition [test_topic,40]. Using it to instantiate consumer.
o.a.samza.system.kafka.BrokerProxy - Starting BrokerProxy for
node1.cluster.com:9092
o.a.samza.system.kafka.GetOffset - Validating offset 0 for topic and partition
[test_topic,19]
o.a.samza.system.kafka.GetOffset - Able to successfully read from offset 0 for
topic and partition [test_topic,19]. Using it to instantiate consumer.
o.a.samza.container.SamzaContainer - Entering run loop.
Here is the configuration I use:
{
yarn.container.count=24,
systems.kafka.samza.key.serde=int,
systems.kafka.consumer.zookeeper.connect=localhost:2181/,
serializers.registry.int.class=org.apache.samza.serializers.StringSerdeFactory,
systems.kafka.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory,
task.drop.deserialization.errors=true,
yarn.container.memory.mb=1024,
task.inputs=kafka.test_topic,
job.factory.class=org.apache.samza.job.yarn.YarnJobFactory,
yarn.package.path=hdfs://node1.cluster.com:8020/my-app-0.0.1-dist.tar.gz,
task.class=com.company.test.Task,
systems.kafka.samza.msg.serde=json,
job.name=test,
serializers.registry.json.class=org.apache.samza.serializers.JsonSerdeFactory,
systems.kafka.producer.bootstrap.servers=node1.cluster.com:9092,node2.cluster.com:9092,node3.cluster.com:9092,
}
The application was working before a server crash. I tried to clean all
Zookeeper data and restart everything. Do you have any idea why the consumer
doesn’t work?
Regards
Davide