Not sure about the Kafka side. From the Samza side, from your description ( "does not exit nor does it make any progress" ), I think the code is stuck in producer.close <https://github.com/apache/samza/blob/master/samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala#L143>, otherwise, it will throw SamzaException to quit the job. So maybe some Kafka experts in this mailing list or Kafka mailing list can help
Fang, Yan yanfang...@gmail.com On Tue, Apr 28, 2015 at 5:35 PM, Roger Hoover <roger.hoo...@gmail.com> wrote: > At error level logging, this was the only entry in the Samza log: > > 2015-04-28 14:28:25 KafkaSystemProducer [ERROR] task[Partition 2] > ssp[kafka,svc.call.w_deploy.c7tH4YaiTQyBEwAAhQzRXw,2] offset[9129395] > Unable to send message from TaskName-Partition 1 to system kafka > > Here is the log from the Kafka broker that was shutdown. > > http://pastebin.com/afgmLyNF > > Thanks, > > Roger > > > On Tue, Apr 28, 2015 at 3:49 PM, Yi Pan <nickpa...@gmail.com> wrote: > > > Roger, could you paste the full log from Samza container? If you can > figure > > out which Kafka broker the message was sent to, it would be helpful if we > > get the log from the broker as well. > > > > On Tue, Apr 28, 2015 at 3:31 PM, Roger Hoover <roger.hoo...@gmail.com> > > wrote: > > > > > Hi, > > > > > > I need some help figuring out what's going on. > > > > > > I'm running Kafka 0.8.2.1 and Samza 0.9.0 on YARN. All the topics have > > > replication factor of 2. > > > > > > I'm bouncing the Kafka broker using SIGTERM (with > > > controlled.shutdown.enable=true). I see the Samza job log this message > > and > > > then hang (does not exit nor does it make any progress). > > > > > > 2015-04-28 14:28:25 KafkaSystemProducer [ERROR] task[Partition 2] > > > ssp[kafka,my-topic,2] offset[9129395] Unable to send message from > > > TaskName-Partition 1 to system kafka > > > > > > The Kafka consumer (Druid Real-Time node) on the other side then barfs > on > > > the message: > > > > > > Exception in thread "chief-svc-perf" > > kafka.message.InvalidMessageException: > > > Message is corrupt (stored crc = 1792882425, computed crc = 3898271689) > > > at kafka.message.Message.ensureValid(Message.scala:166) > > > at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:101) > > > at kafka.consumer.ConsumerIterator.makeNext(ConsumerIterator.scala:33) > > > at > > kafka.utils.IteratorTemplate.maybeComputeNext(IteratorTemplate.scala:66) > > > at kafka.utils.IteratorTemplate.hasNext(IteratorTemplate.scala:58) > > > at > > > > > > > > > io.druid.firehose.kafka.KafkaEightFirehoseFactory$1.hasMore(KafkaEightFirehoseFactory.java:106) > > > at > > > > > > > > > io.druid.segment.realtime.RealtimeManager$FireChief.run(RealtimeManager.java:234) > > > > > > My questions are: > > > 1) What is the right way to bounce a Kafka broker? > > > 2) Is this a bug in Samza that the job hangs after producer request > > fails? > > > Has anyone seen this? > > > > > > Thanks, > > > > > > Roger > > > > > >