Re: Message lost after consumer crash in kafka 0.9
Sorry in fact the test code in gist does not exactly reproduce the problem we're facing. I'm working on that. 2016-02-02 10:46 GMT+01:00 Han JU : > Thanks Guazhang for the reply! > > So in fact if it's the case you said, if I understand correctly, then the > messages lost should be the last messages. But in our use case it is not > the last messages get lost. And this does not explain that the different > behavior depending on `kill -9` moment (before a commit or after a commit). > If a consumer app is killed before the first flush/commit then every > messages is received correctly. > > For the messages lost, our real app code flushes state and commits offset > regularly (say for 15m). In my test, say I've 45m's data, so I'll have 2 > flush/commit point and 3 trunk of flushed data. If a consumer app process > is `kill -9` after the first flush/commit point and I let the remaining app > runs till the end. I got message lost only in the second trunk. Both first > and third trunk are perfectly handled. > > 2016-02-02 0:18 GMT+01:00 Guozhang Wang : > >> One thing to add, is that by doing this you could possibly get duplicates >> but not data loss, which obeys Kafka's at-least once semantics. >> >> Guozhang >> >> On Mon, Feb 1, 2016 at 3:17 PM, Guozhang Wang wrote: >> >> > Hi Han, >> > >> > I looked at your test code and actually the error is in this line: >> > >> https://gist.github.com/darkjh/437ac72cdd4b1c4ca2e7#file-kafkabug2-scala-L61 >> > >> > where you call "commitSync" in the finally block, which will commit >> > messages that is returned to you from poll() call. >> > >> > >> > More specifically, for example your poll() call returned you a set of >> > messages with offset 0 to 100. From the consumer's point of view once >> they >> > are returned to the user they are considered "consumed", and hence if >> you >> > call commitSync after that they will ALL be committed (i.e. consumer >> will >> > commit offset 100). But if you hit an exception / got a close signal >> while, >> > say, processing message with offset 50, then call commitSync in the >> finally >> > block you will effectively lose messages 50 to 100. >> > >> > Hence as a user of the consumer, one should only call "commit" if she is >> > certain that all messages returned from "poll()" have been processed. >> > >> > Guozhang >> > >> > >> > On Mon, Feb 1, 2016 at 9:59 AM, Han JU wrote: >> > >> >> Hi, >> >> >> >> One of our usage of kafka is to tolerate arbitrary consumer crash >> without >> >> losing or duplicating messages. So in our code we manually commit >> offset >> >> after successfully persisted the consumer state. >> >> >> >> In prototyping with kafka-0.9's new consumer API, I found that in some >> >> cases, kafka failed to send a part of messages to the consumers even if >> >> the >> >> offsets are handled correctly. >> >> >> >> I've made sure that this time everything is latest on 0.9.0 branch >> >> (d1ff6c7) for both broker and client code. >> >> >> >> Test code snippet is here: >> >> https://gist.github.com/darkjh/437ac72cdd4b1c4ca2e7 >> >> >> >> Test setup: >> >> - 12 partitions >> >> - 2 consumer app process with 2 consumer thread each >> >> - producer produces exactly 1.2M messages in about 2 minutes (enough >> >> time >> >> for us to manual kill -9 consumer) >> >> - a consumer thread commits offset on each 80k messages received (to >> >> simulate our regularly offset commit) >> >> - after all messages are consumed, each consumer thread will write a >> >> number in file indicating how much message it has received. So all >> numbers >> >> should sum to exactly 1.2M if everything goes well >> >> >> >> Test run: >> >> - run the producer >> >> - run the 2 consumer app process in the same time >> >> - wait for the first commit offset (first 80k messages received in >> each >> >> consumer thread) >> >> - after the first commit offset, kill -9 one of the consumer app >> >> - let another consumer app runs till messages are finished >> >> - check the files written by the remaining consumer threads >> >>
Re: Message lost after consumer crash in kafka 0.9
Thanks Guazhang for the reply! So in fact if it's the case you said, if I understand correctly, then the messages lost should be the last messages. But in our use case it is not the last messages get lost. And this does not explain that the different behavior depending on `kill -9` moment (before a commit or after a commit). If a consumer app is killed before the first flush/commit then every messages is received correctly. For the messages lost, our real app code flushes state and commits offset regularly (say for 15m). In my test, say I've 45m's data, so I'll have 2 flush/commit point and 3 trunk of flushed data. If a consumer app process is `kill -9` after the first flush/commit point and I let the remaining app runs till the end. I got message lost only in the second trunk. Both first and third trunk are perfectly handled. 2016-02-02 0:18 GMT+01:00 Guozhang Wang : > One thing to add, is that by doing this you could possibly get duplicates > but not data loss, which obeys Kafka's at-least once semantics. > > Guozhang > > On Mon, Feb 1, 2016 at 3:17 PM, Guozhang Wang wrote: > > > Hi Han, > > > > I looked at your test code and actually the error is in this line: > > > https://gist.github.com/darkjh/437ac72cdd4b1c4ca2e7#file-kafkabug2-scala-L61 > > > > where you call "commitSync" in the finally block, which will commit > > messages that is returned to you from poll() call. > > > > > > More specifically, for example your poll() call returned you a set of > > messages with offset 0 to 100. From the consumer's point of view once > they > > are returned to the user they are considered "consumed", and hence if you > > call commitSync after that they will ALL be committed (i.e. consumer will > > commit offset 100). But if you hit an exception / got a close signal > while, > > say, processing message with offset 50, then call commitSync in the > finally > > block you will effectively lose messages 50 to 100. > > > > Hence as a user of the consumer, one should only call "commit" if she is > > certain that all messages returned from "poll()" have been processed. > > > > Guozhang > > > > > > On Mon, Feb 1, 2016 at 9:59 AM, Han JU wrote: > > > >> Hi, > >> > >> One of our usage of kafka is to tolerate arbitrary consumer crash > without > >> losing or duplicating messages. So in our code we manually commit offset > >> after successfully persisted the consumer state. > >> > >> In prototyping with kafka-0.9's new consumer API, I found that in some > >> cases, kafka failed to send a part of messages to the consumers even if > >> the > >> offsets are handled correctly. > >> > >> I've made sure that this time everything is latest on 0.9.0 branch > >> (d1ff6c7) for both broker and client code. > >> > >> Test code snippet is here: > >> https://gist.github.com/darkjh/437ac72cdd4b1c4ca2e7 > >> > >> Test setup: > >> - 12 partitions > >> - 2 consumer app process with 2 consumer thread each > >> - producer produces exactly 1.2M messages in about 2 minutes (enough > >> time > >> for us to manual kill -9 consumer) > >> - a consumer thread commits offset on each 80k messages received (to > >> simulate our regularly offset commit) > >> - after all messages are consumed, each consumer thread will write a > >> number in file indicating how much message it has received. So all > numbers > >> should sum to exactly 1.2M if everything goes well > >> > >> Test run: > >> - run the producer > >> - run the 2 consumer app process in the same time > >> - wait for the first commit offset (first 80k messages received in > each > >> consumer thread) > >> - after the first commit offset, kill -9 one of the consumer app > >> - let another consumer app runs till messages are finished > >> - check the files written by the remaining consumer threads > >> > >> And after that, by checking the file, we do not receive 1.2M message but > >> roughly at 1.04M. The lag on kafka of this topic is 0. > >> If you check the logs of the consumer app with DEBUG level, you'll find > >> out > >> that the offsets are correctly handled. 30s (default timeout) after the > >> kill -9 of one consumer app, the remaining consumer app correctly gets > >> assigned all the partitions and it starts right from the offsets that > the > >> crashed consumer has previously committed. So this makes the message > lost > >> quite mysterious for us. > >> Note that the kill -9 moment is important. If we kill -9 one consumer > app > >> *before* the first commit offset, everything goes well. All messages > >> received, no lost. But when killed *after* the first commit offset, > >> there'll be messages lost. > >> > >> Hope the code is clear to reproduce the problem. I'm available for any > >> further details needed. > >> > >> Thanks! > >> -- > >> *JU Han* > >> > >> Software Engineer @ Teads.tv > >> > >> +33 061960 > >> > > > > > > > > -- > > -- Guozhang > > > > > > -- > -- Guozhang > -- *JU Han* Software Engineer @ Teads.tv +33 061960
Message lost after consumer crash in kafka 0.9
Hi, One of our usage of kafka is to tolerate arbitrary consumer crash without losing or duplicating messages. So in our code we manually commit offset after successfully persisted the consumer state. In prototyping with kafka-0.9's new consumer API, I found that in some cases, kafka failed to send a part of messages to the consumers even if the offsets are handled correctly. I've made sure that this time everything is latest on 0.9.0 branch (d1ff6c7) for both broker and client code. Test code snippet is here: https://gist.github.com/darkjh/437ac72cdd4b1c4ca2e7 Test setup: - 12 partitions - 2 consumer app process with 2 consumer thread each - producer produces exactly 1.2M messages in about 2 minutes (enough time for us to manual kill -9 consumer) - a consumer thread commits offset on each 80k messages received (to simulate our regularly offset commit) - after all messages are consumed, each consumer thread will write a number in file indicating how much message it has received. So all numbers should sum to exactly 1.2M if everything goes well Test run: - run the producer - run the 2 consumer app process in the same time - wait for the first commit offset (first 80k messages received in each consumer thread) - after the first commit offset, kill -9 one of the consumer app - let another consumer app runs till messages are finished - check the files written by the remaining consumer threads And after that, by checking the file, we do not receive 1.2M message but roughly at 1.04M. The lag on kafka of this topic is 0. If you check the logs of the consumer app with DEBUG level, you'll find out that the offsets are correctly handled. 30s (default timeout) after the kill -9 of one consumer app, the remaining consumer app correctly gets assigned all the partitions and it starts right from the offsets that the crashed consumer has previously committed. So this makes the message lost quite mysterious for us. Note that the kill -9 moment is important. If we kill -9 one consumer app *before* the first commit offset, everything goes well. All messages received, no lost. But when killed *after* the first commit offset, there'll be messages lost. Hope the code is clear to reproduce the problem. I'm available for any further details needed. Thanks! -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: Stuck consumer with new consumer API in 0.9
Hi Guozhang, Sorry for that example. They does not come from the same run, just paste that to illustrate the problem. I'll try out what Jason suggests tomorrow and also retry the 0.9.0 branch. 2016-01-25 21:03 GMT+01:00 Rajiv Kurian : > Thanks Jason. We are using an affected client I guess. > > Is there a 0.9.0 client available on maven? My search at > http://mvnrepository.com/artifact/org.apache.kafka/kafka_2.10 only shows > the 0.9.0.0 client which seems to have this issue. > > > Thanks, > Rajiv > > On Mon, Jan 25, 2016 at 11:56 AM, Jason Gustafson > wrote: > > > Hey Rajiv, the bug was on the client. Here's a link to the JIRA: > > https://issues.apache.org/jira/browse/KAFKA-2978. > > > > -Jason > > > > On Mon, Jan 25, 2016 at 11:42 AM, Rajiv Kurian > wrote: > > > > > Hi Jason, > > > > > > Was this a server bug or a client bug? > > > > > > Thanks, > > > Rajiv > > > > > > On Mon, Jan 25, 2016 at 11:23 AM, Jason Gustafson > > > wrote: > > > > > > > Apologies for the late arrival to this thread. There was a bug in the > > > > 0.9.0.0 release of Kafka which could cause the consumer to stop > > fetching > > > > from a partition after a rebalance. If you're seeing this, please > > > checkout > > > > the 0.9.0 branch of Kafka and see if you can reproduce this problem. > If > > > you > > > > can, then it would be really helpful if you file a JIRA with the > steps > > to > > > > reproduce. > > > > > > > > From Han's initial example, it kind of looks like the problem might > be > > in > > > > the usage. The consumer lag as shown by the kafka-consumer-groups > > script > > > > relies on the last committed position to determine lag. To update > > > progress, > > > > you need to commit offsets regularly. In the gist, offsets are only > > > > committed on shutdown or when a rebalance occurs. When the group is > > > stable, > > > > no progress will be seen because there are no commits to update the > > > > position. > > > > > > > > Thanks, > > > > Jason > > > > > > > > On Mon, Jan 25, 2016 at 9:09 AM, Ismael Juma > > wrote: > > > > > > > > > Thanks! > > > > > > > > > > Ismael > > > > > > > > > > On Mon, Jan 25, 2016 at 4:03 PM, Han JU > > > wrote: > > > > > > > > > > > Issue created: https://issues.apache.org/jira/browse/KAFKA-3146 > > > > > > > > > > > > 2016-01-25 16:07 GMT+01:00 Han JU : > > > > > > > > > > > > > Hi Bruno, > > > > > > > > > > > > > > Can you tell me a little bit more about that? A seek() in the > > > > > > > `onPartitionAssigned`? > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > 2016-01-25 10:51 GMT+01:00 Han JU : > > > > > > > > > > > > > >> Ok I'll create a JIRA issue on this. > > > > > > >> > > > > > > >> Thanks! > > > > > > >> > > > > > > >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts < > > > > > bruno.rassae...@novazone.be > > > > > > >: > > > > > > >> > > > > > > >>> +1 here > > > > > > >>> > > > > > > >>> As a workaround we seek to the current offset which resets > the > > > > > current > > > > > > >>> clients internal states and everything continues. > > > > > > >>> > > > > > > >>> Regards, > > > > > > >>> Bruno Rassaerts | Freelance Java Developer > > > > > > >>> > > > > > > >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium > > > > > > >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15 > > > > > > >>> bruno.rassae...@novazone.be -www.novazone.be > > > > > > >>> > > > > > > >>> > On 23 Jan 2016, at 17:52, Ismael Juma > > > wrote: > > > > > > >>> > > > > > > > >>> > Hi, > > &g
Re: Stuck consumer with new consumer API in 0.9
Issue created: https://issues.apache.org/jira/browse/KAFKA-3146 2016-01-25 16:07 GMT+01:00 Han JU : > Hi Bruno, > > Can you tell me a little bit more about that? A seek() in the > `onPartitionAssigned`? > > Thanks. > > 2016-01-25 10:51 GMT+01:00 Han JU : > >> Ok I'll create a JIRA issue on this. >> >> Thanks! >> >> 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts : >> >>> +1 here >>> >>> As a workaround we seek to the current offset which resets the current >>> clients internal states and everything continues. >>> >>> Regards, >>> Bruno Rassaerts | Freelance Java Developer >>> >>> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium >>> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15 >>> bruno.rassae...@novazone.be -www.novazone.be >>> >>> > On 23 Jan 2016, at 17:52, Ismael Juma wrote: >>> > >>> > Hi, >>> > >>> > Can you please file an issue in JIRA so that we make sure this is >>> > investigated? >>> > >>> > Ismael >>> > >>> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU >>> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm >>> particularly >>> >> interested in the `ConsumerRebalanceListener`. >>> >> >>> >> My test setup is like the following: >>> >> - 5M messages pre-loaded in one node kafka 0.9 >>> >> - 12 partitions, auto offset commit set to false >>> >> - in `onPartitionsRevoked`, commit offset and flush the local state >>> >> >>> >> The test run is like the following: >>> >> - launch one process with 2 consumers and let it consume for a while >>> >> - launch another process with 2 consumers, this triggers a >>> rebalancing, >>> >> and let these 2 processes run until messages are all consumed >>> >> >>> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd >>> >> >>> >> So at first, the 2 consumers of the first process each got 6 >>> partitions. >>> >> And after the rebalancing, each consumer got 3 partitions. It's >>> confirmed >>> >> by logging inside the `onPartitionAssigned` callback. >>> >> >>> >> But after the rebalancing, one of the 2 consumers of the first >>> process stop >>> >> receiving messages, even if it has partitions assigned to: >>> >> >>> >> balance-1 pulled 7237 msgs ... >>> >> balance-0 pulled 7263 msgs ... >>> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2] >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since >>> the >>> >> group is rebalancing, try to re-join group. >>> >> balance-1 flush @ 536637 >>> >> balance-1 committed offset for List(balance-11, balance-10, balance-9, >>> >> balance-8, balance-7, balance-6) >>> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1] >>> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since >>> the >>> >> group is rebalancing, try to re-join group. >>> >> balance-0 flush @ 543845 >>> >> balance-0 committed offset for List(balance-5, balance-4, balance-3, >>> >> balance-2, balance-1, balance-0) >>> >> balance-0 got assigned List(balance-5, balance-4, balance-3) >>> >> balance-1 got assigned List(balance-11, balance-10, balance-9) >>> >> balance-1 pulled 3625 msgs ... >>> >> balance-0 pulled 3621 msgs ... >>> >> balance-0 pulled 3631 msgs ... >>> >> balance-0 pulled 3631 msgs ... >>> >> balance-1 pulled 0 msgs ... >>> >> balance-0 pulled 3643 msgs ... >>> >> balance-0 pulled 3643 msgs ... >>> >> balance-1 pulled 0 msgs ... >>> >> balance-0 pulled 3622 msgs ... >>> >> balance-0 pulled 3632 msgs ... >>> >> balance-1 pulled 0 msgs ... >>> >> balance-0 pulled 3637 msgs ... >>> >> balance-0 pulled 3641 msgs ... >>> >> balance-0 pulled 3640 msgs ... >>> >> balance-1 pulled 0 msgs ... >>> >> balance-0 pulled 3632 msgs ... >>> >> balance-0 pulled 3630 msgs ... >>> >> balance-1 pulled 0 msgs ... >
Re: Stuck consumer with new consumer API in 0.9
Hi Bruno, Can you tell me a little bit more about that? A seek() in the `onPartitionAssigned`? Thanks. 2016-01-25 10:51 GMT+01:00 Han JU : > Ok I'll create a JIRA issue on this. > > Thanks! > > 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts : > >> +1 here >> >> As a workaround we seek to the current offset which resets the current >> clients internal states and everything continues. >> >> Regards, >> Bruno Rassaerts | Freelance Java Developer >> >> Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium >> T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15 >> bruno.rassae...@novazone.be -www.novazone.be >> >> > On 23 Jan 2016, at 17:52, Ismael Juma wrote: >> > >> > Hi, >> > >> > Can you please file an issue in JIRA so that we make sure this is >> > investigated? >> > >> > Ismael >> > >> >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU >> wrote: >> >> >> >> Hi, >> >> >> >> I'm prototyping with the new consumer API of kafka 0.9 and I'm >> particularly >> >> interested in the `ConsumerRebalanceListener`. >> >> >> >> My test setup is like the following: >> >> - 5M messages pre-loaded in one node kafka 0.9 >> >> - 12 partitions, auto offset commit set to false >> >> - in `onPartitionsRevoked`, commit offset and flush the local state >> >> >> >> The test run is like the following: >> >> - launch one process with 2 consumers and let it consume for a while >> >> - launch another process with 2 consumers, this triggers a >> rebalancing, >> >> and let these 2 processes run until messages are all consumed >> >> >> >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd >> >> >> >> So at first, the 2 consumers of the first process each got 6 >> partitions. >> >> And after the rebalancing, each consumer got 3 partitions. It's >> confirmed >> >> by logging inside the `onPartitionAssigned` callback. >> >> >> >> But after the rebalancing, one of the 2 consumers of the first process >> stop >> >> receiving messages, even if it has partitions assigned to: >> >> >> >> balance-1 pulled 7237 msgs ... >> >> balance-0 pulled 7263 msgs ... >> >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2] >> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since >> the >> >> group is rebalancing, try to re-join group. >> >> balance-1 flush @ 536637 >> >> balance-1 committed offset for List(balance-11, balance-10, balance-9, >> >> balance-8, balance-7, balance-6) >> >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1] >> >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since >> the >> >> group is rebalancing, try to re-join group. >> >> balance-0 flush @ 543845 >> >> balance-0 committed offset for List(balance-5, balance-4, balance-3, >> >> balance-2, balance-1, balance-0) >> >> balance-0 got assigned List(balance-5, balance-4, balance-3) >> >> balance-1 got assigned List(balance-11, balance-10, balance-9) >> >> balance-1 pulled 3625 msgs ... >> >> balance-0 pulled 3621 msgs ... >> >> balance-0 pulled 3631 msgs ... >> >> balance-0 pulled 3631 msgs ... >> >> balance-1 pulled 0 msgs ... >> >> balance-0 pulled 3643 msgs ... >> >> balance-0 pulled 3643 msgs ... >> >> balance-1 pulled 0 msgs ... >> >> balance-0 pulled 3622 msgs ... >> >> balance-0 pulled 3632 msgs ... >> >> balance-1 pulled 0 msgs ... >> >> balance-0 pulled 3637 msgs ... >> >> balance-0 pulled 3641 msgs ... >> >> balance-0 pulled 3640 msgs ... >> >> balance-1 pulled 0 msgs ... >> >> balance-0 pulled 3632 msgs ... >> >> balance-0 pulled 3630 msgs ... >> >> balance-1 pulled 0 msgs ... >> >> .. >> >> >> >> `balance-0` and `balance-1` are the names of the consumer thread. So >> after >> >> the rebalancing, thread `balance-1` continues to poll but no message >> >> arrive, given that it has got 3 partitions assigned to after the >> >> rebalancing. >> >> >> >> Finally other 3 consumers pulls all their partitions' message, the >> >> situation is like >> >> >>
Re: Stuck consumer with new consumer API in 0.9
Ok I'll create a JIRA issue on this. Thanks! 2016-01-23 21:47 GMT+01:00 Bruno Rassaerts : > +1 here > > As a workaround we seek to the current offset which resets the current > clients internal states and everything continues. > > Regards, > Bruno Rassaerts | Freelance Java Developer > > Novazone, Edingsesteenweg 302, B-1755 Gooik, Belgium > T: +32(0)54/26.02.03 - M:+32(0)477/39.01.15 > bruno.rassae...@novazone.be -www.novazone.be > > > On 23 Jan 2016, at 17:52, Ismael Juma wrote: > > > > Hi, > > > > Can you please file an issue in JIRA so that we make sure this is > > investigated? > > > > Ismael > > > >> On Fri, Jan 22, 2016 at 3:13 PM, Han JU wrote: > >> > >> Hi, > >> > >> I'm prototyping with the new consumer API of kafka 0.9 and I'm > particularly > >> interested in the `ConsumerRebalanceListener`. > >> > >> My test setup is like the following: > >> - 5M messages pre-loaded in one node kafka 0.9 > >> - 12 partitions, auto offset commit set to false > >> - in `onPartitionsRevoked`, commit offset and flush the local state > >> > >> The test run is like the following: > >> - launch one process with 2 consumers and let it consume for a while > >> - launch another process with 2 consumers, this triggers a rebalancing, > >> and let these 2 processes run until messages are all consumed > >> > >> The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd > >> > >> So at first, the 2 consumers of the first process each got 6 partitions. > >> And after the rebalancing, each consumer got 3 partitions. It's > confirmed > >> by logging inside the `onPartitionAssigned` callback. > >> > >> But after the rebalancing, one of the 2 consumers of the first process > stop > >> receiving messages, even if it has partitions assigned to: > >> > >> balance-1 pulled 7237 msgs ... > >> balance-0 pulled 7263 msgs ... > >> 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2] > >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the > >> group is rebalancing, try to re-join group. > >> balance-1 flush @ 536637 > >> balance-1 committed offset for List(balance-11, balance-10, balance-9, > >> balance-8, balance-7, balance-6) > >> 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1] > >> o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the > >> group is rebalancing, try to re-join group. > >> balance-0 flush @ 543845 > >> balance-0 committed offset for List(balance-5, balance-4, balance-3, > >> balance-2, balance-1, balance-0) > >> balance-0 got assigned List(balance-5, balance-4, balance-3) > >> balance-1 got assigned List(balance-11, balance-10, balance-9) > >> balance-1 pulled 3625 msgs ... > >> balance-0 pulled 3621 msgs ... > >> balance-0 pulled 3631 msgs ... > >> balance-0 pulled 3631 msgs ... > >> balance-1 pulled 0 msgs ... > >> balance-0 pulled 3643 msgs ... > >> balance-0 pulled 3643 msgs ... > >> balance-1 pulled 0 msgs ... > >> balance-0 pulled 3622 msgs ... > >> balance-0 pulled 3632 msgs ... > >> balance-1 pulled 0 msgs ... > >> balance-0 pulled 3637 msgs ... > >> balance-0 pulled 3641 msgs ... > >> balance-0 pulled 3640 msgs ... > >> balance-1 pulled 0 msgs ... > >> balance-0 pulled 3632 msgs ... > >> balance-0 pulled 3630 msgs ... > >> balance-1 pulled 0 msgs ... > >> .. > >> > >> `balance-0` and `balance-1` are the names of the consumer thread. So > after > >> the rebalancing, thread `balance-1` continues to poll but no message > >> arrive, given that it has got 3 partitions assigned to after the > >> rebalancing. > >> > >> Finally other 3 consumers pulls all their partitions' message, the > >> situation is like > >> > >> GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER > >> balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1 > >> balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1 > >> balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1 > >> balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1 > >> balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1 > >> balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1 > >> balance-test,
Stuck consumer with new consumer API in 0.9
Hi, I'm prototyping with the new consumer API of kafka 0.9 and I'm particularly interested in the `ConsumerRebalanceListener`. My test setup is like the following: - 5M messages pre-loaded in one node kafka 0.9 - 12 partitions, auto offset commit set to false - in `onPartitionsRevoked`, commit offset and flush the local state The test run is like the following: - launch one process with 2 consumers and let it consume for a while - launch another process with 2 consumers, this triggers a rebalancing, and let these 2 processes run until messages are all consumed The code is here: https://gist.github.com/darkjh/fe1e5a5387bf13b4d4dd So at first, the 2 consumers of the first process each got 6 partitions. And after the rebalancing, each consumer got 3 partitions. It's confirmed by logging inside the `onPartitionAssigned` callback. But after the rebalancing, one of the 2 consumers of the first process stop receiving messages, even if it has partitions assigned to: balance-1 pulled 7237 msgs ... balance-0 pulled 7263 msgs ... 2016-01-22 15:50:37,533 [INFO] [pool-1-thread-2] o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the group is rebalancing, try to re-join group. balance-1 flush @ 536637 balance-1 committed offset for List(balance-11, balance-10, balance-9, balance-8, balance-7, balance-6) 2016-01-22 15:50:37,575 [INFO] [pool-1-thread-1] o.a.k.c.c.i.AbstractCoordinator - Attempt to heart beat failed since the group is rebalancing, try to re-join group. balance-0 flush @ 543845 balance-0 committed offset for List(balance-5, balance-4, balance-3, balance-2, balance-1, balance-0) balance-0 got assigned List(balance-5, balance-4, balance-3) balance-1 got assigned List(balance-11, balance-10, balance-9) balance-1 pulled 3625 msgs ... balance-0 pulled 3621 msgs ... balance-0 pulled 3631 msgs ... balance-0 pulled 3631 msgs ... balance-1 pulled 0 msgs ... balance-0 pulled 3643 msgs ... balance-0 pulled 3643 msgs ... balance-1 pulled 0 msgs ... balance-0 pulled 3622 msgs ... balance-0 pulled 3632 msgs ... balance-1 pulled 0 msgs ... balance-0 pulled 3637 msgs ... balance-0 pulled 3641 msgs ... balance-0 pulled 3640 msgs ... balance-1 pulled 0 msgs ... balance-0 pulled 3632 msgs ... balance-0 pulled 3630 msgs ... balance-1 pulled 0 msgs ... .. `balance-0` and `balance-1` are the names of the consumer thread. So after the rebalancing, thread `balance-1` continues to poll but no message arrive, given that it has got 3 partitions assigned to after the rebalancing. Finally other 3 consumers pulls all their partitions' message, the situation is like GROUP, TOPIC, PARTITION, CURRENT OFFSET, LOG END OFFSET, LAG, OWNER balance-test, balance, 9, 417467, 417467, 0, consumer-2_/127.0.0.1 balance-test, balance, 10, 417467, 417467, 0, consumer-2_/127.0.0.1 balance-test, balance, 11, 417467, 417467, 0, consumer-2_/127.0.0.1 balance-test, balance, 6, 180269, 417467, 237198, consumer-2_/127.0.0.1 balance-test, balance, 7, 180036, 417468, 237432, consumer-2_/127.0.0.1 balance-test, balance, 8, 180197, 417467, 237270, consumer-2_/127.0.0.1 balance-test, balance, 3, 417467, 417467, 0, consumer-1_/127.0.0.1 balance-test, balance, 4, 417468, 417468, 0, consumer-1_/127.0.0.1 balance-test, balance, 5, 417468, 417468, 0, consumer-1_/127.0.0.1 balance-test, balance, 0, 417467, 417467, 0, consumer-1_/127.0.0.1 balance-test, balance, 1, 417467, 417467, 0, consumer-1_/127.0.0.1 balance-test, balance, 2, 417467, 417467, 0, consumer-1_/127.0.0.1 So you can see, partition [6, 7, 8] still has messages, but the consumer can't pull them after the rebalancing. I've tried 0.9.0.0 release, trunk and 0.9.0 branch, for both server/broker and client. I hope the code is clear enough to illustrate/reproduce the problem. It's quite a surprise for me because this is the main feature of the new consumer API, but it does not seem to work properly. Feel free to talk to me for any details. -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: How to reset a consumer-group's offset in kafka 0.9?
Thanks a lot Guozhang! So currently there's no way to delete a consumer group in with the new consumer API? Do you plan to add it in the next versions? 2015-12-31 20:09 GMT+01:00 Guozhang Wang : > Hello Han, > > 1. As Marko mentioned you can use "seek" in the 0.9 Java consumer to reset > your consuming offsets. Or if you are stopping the consumer between your > test runs you can also commit() with offset 0 before you closing your > consumer at the end of each test. > > 2. In the 0.9 Java consumer both the consumer registry information and the > offsets are stored in Kafka servers instead of on the ZK, you can find a > more detailed description and motivation of this change here: > > https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Client+Re-Design > > So the "AdminUtils.deleteConsumerGroupInZK" will not help in removing the > consumer registry information. > > 3. We have deprecated kafka.tools.ConsumerOffsetChecker in 0.9, see > "deprecations in 0.9.0" here: > > http://kafka.apache.org/documentation.html#upgrade_9_breaking > > Instead you can try to use bin/kafka-consumer-groups.sh > (kafka.admin.ConsumerGroupCommand). > > Guozhang > > > > On Wed, Dec 30, 2015 at 9:10 AM, Han JU wrote: > > > Hi Marko, > > > > Yes we're currently using this on our production kafka 0.8. But it does > not > > seem to work with the new consumer API in 0.9. > > To answer my own question about deleting consumer group in new consumer > > API, it seems that it's currently not possible with the new consumer API > > (there's no delete related method in `AdminClient` of the new consumer > > API). > > > > > > 2015-12-30 17:02 GMT+01:00 Marko Bonaći : > > > > > If you want to monitor offset (ZK or Kafka based), try with QuantFind's > > > Kafka Offset Monitor. > > > If you use Docker, it's easy as: > > > > > > docker run -p 8080:8080 -e ZK=zk_hostname:2181 > > > jpodeszwik/kafka-offset-monitor > > > and then opening browser to dockerhost:8080. > > > > > > If not in the Docker mood, use instructions here: > > > https://github.com/quantifind/KafkaOffsetMonitor > > > > > > Marko Bonaći > > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > > > Solr & Elasticsearch Support > > > Sematext <http://sematext.com/> | Contact > > > <http://sematext.com/about/contact.html> > > > > > > On Wed, Dec 30, 2015 at 12:54 PM, Han JU > wrote: > > > > > > > Thanks guys. The `seek` seems a solution. But it's more cumbersome > than > > > in > > > > 0.8 because I have to plug in some extra code in my consumer > > abstractions > > > > rather than simply deleting a zk node. > > > > And one more question: where does kafka 0.9 stores the consumer-group > > > > information? In fact I also tried to delete the consumer group but > the > > > > `AdminUtils.deleteConsumerGroupInZK` does not seem to work in 0.9. > And > > > also > > > > `bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper > > > > localhost:2181 --group group-name` seems broken. > > > > > > > > Thanks! > > > > > > > > 2015-12-29 16:46 GMT+01:00 Marko Bonaći : > > > > > > > > > I was refering to Dana Powers's answer in the link I posted (to > use a > > > > > client API). You can find an example here: > > > > > > > > > > > > > > > > > > > > http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > > > > > > > > > Marko Bonaći > > > > > Monitoring | Alerting | Anomaly Detection | Centralized Log > > Management > > > > > Solr & Elasticsearch Support > > > > > Sematext <http://sematext.com/> | Contact > > > > > <http://sematext.com/about/contact.html> > > > > > > > > > > On Tue, Dec 29, 2015 at 4:41 PM, Stevo Slavić > > > wrote: > > > > > > > > > > > Then I guess @Before test, explicitly commit offset of 0. > > > > > > > > > > > > There doesn't seem to be a tool for committing offset, only for > > > > > > checking/fetching current offset (see > > > > > > http://kafka.apache.org/documentation.html#operations ) > > > > > > > > > &
kafka-run-class can not launch ConsumerGroupCommand in 0.9
Hi, I'm trying to check the offset of a consumer group with the new consumer API. But it seems that kafka-run-class cannot launch `ConsumerGroupCommand`. bin/kafka-run-class.sh kafka.tools.ConsumerGroupCommand --zookeeper localhost:2181 --group my-group >> Error: Could not find or load main class kafka.tools.ConsumerGroupCommand I've tried both the downloaded kafka distribution of scala 2.11 and also a manual compiled 0.9.0.0 version (with gradle jar + gradle copyDependantLibs) -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: How to reset a consumer-group's offset in kafka 0.9?
Hi Marko, Yes we're currently using this on our production kafka 0.8. But it does not seem to work with the new consumer API in 0.9. To answer my own question about deleting consumer group in new consumer API, it seems that it's currently not possible with the new consumer API (there's no delete related method in `AdminClient` of the new consumer API). 2015-12-30 17:02 GMT+01:00 Marko Bonaći : > If you want to monitor offset (ZK or Kafka based), try with QuantFind's > Kafka Offset Monitor. > If you use Docker, it's easy as: > > docker run -p 8080:8080 -e ZK=zk_hostname:2181 > jpodeszwik/kafka-offset-monitor > and then opening browser to dockerhost:8080. > > If not in the Docker mood, use instructions here: > https://github.com/quantifind/KafkaOffsetMonitor > > Marko Bonaći > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > Solr & Elasticsearch Support > Sematext <http://sematext.com/> | Contact > <http://sematext.com/about/contact.html> > > On Wed, Dec 30, 2015 at 12:54 PM, Han JU wrote: > > > Thanks guys. The `seek` seems a solution. But it's more cumbersome than > in > > 0.8 because I have to plug in some extra code in my consumer abstractions > > rather than simply deleting a zk node. > > And one more question: where does kafka 0.9 stores the consumer-group > > information? In fact I also tried to delete the consumer group but the > > `AdminUtils.deleteConsumerGroupInZK` does not seem to work in 0.9. And > also > > `bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper > > localhost:2181 --group group-name` seems broken. > > > > Thanks! > > > > 2015-12-29 16:46 GMT+01:00 Marko Bonaći : > > > > > I was refering to Dana Powers's answer in the link I posted (to use a > > > client API). You can find an example here: > > > > > > > > > http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > > > > > Marko Bonaći > > > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > > > Solr & Elasticsearch Support > > > Sematext <http://sematext.com/> | Contact > > > <http://sematext.com/about/contact.html> > > > > > > On Tue, Dec 29, 2015 at 4:41 PM, Stevo Slavić > wrote: > > > > > > > Then I guess @Before test, explicitly commit offset of 0. > > > > > > > > There doesn't seem to be a tool for committing offset, only for > > > > checking/fetching current offset (see > > > > http://kafka.apache.org/documentation.html#operations ) > > > > > > > > On Tue, Dec 29, 2015 at 4:35 PM, Han JU > > wrote: > > > > > > > > > Hi Stevo, > > > > > > > > > > But by deleting and recreating the topic, do I remove also the > > messages > > > > > ingested? > > > > > My use case is that I ingest prepared messages once and run > consumer > > > > tests > > > > > multiple times, between each test run I reset the consumer group's > > > offset > > > > > so that each run starts from the beginning and consumers all the > > > > messages. > > > > > > > > > > 2015-12-29 16:19 GMT+01:00 Stevo Slavić : > > > > > > > > > > > Have you considered deleting and recreating topic used in test? > > > > > > Once topic is clean, read/poll once - any committed offset should > > be > > > > > > outside of the range, and consumer should reset offset. > > > > > > > > > > > > On Tue, Dec 29, 2015 at 4:11 PM, Han JU > > > > wrote: > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > For local test purpose I need to frequently reset offset for a > > > > consumer > > > > > > > group. In 0.8 I just delete the consumer group's zk node under > > > > > > > `/consumers`. But with the redesign of the 0.9, how could I > > achieve > > > > the > > > > > > > same thing? > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > -- > > > > > > > *JU Han* > > > > > > > > > > > > > > Software Engineer @ Teads.tv > > > > > > > > > > > > > > +33 061960 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > *JU Han* > > > > > > > > > > Software Engineer @ Teads.tv > > > > > > > > > > +33 061960 > > > > > > > > > > > > > > > > > > > > -- > > *JU Han* > > > > Software Engineer @ Teads.tv > > > > +33 061960 > > > -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: How to reset a consumer-group's offset in kafka 0.9?
Thanks guys. The `seek` seems a solution. But it's more cumbersome than in 0.8 because I have to plug in some extra code in my consumer abstractions rather than simply deleting a zk node. And one more question: where does kafka 0.9 stores the consumer-group information? In fact I also tried to delete the consumer group but the `AdminUtils.deleteConsumerGroupInZK` does not seem to work in 0.9. And also `bin/kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper localhost:2181 --group group-name` seems broken. Thanks! 2015-12-29 16:46 GMT+01:00 Marko Bonaći : > I was refering to Dana Powers's answer in the link I posted (to use a > client API). You can find an example here: > > http://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html > > Marko Bonaći > Monitoring | Alerting | Anomaly Detection | Centralized Log Management > Solr & Elasticsearch Support > Sematext <http://sematext.com/> | Contact > <http://sematext.com/about/contact.html> > > On Tue, Dec 29, 2015 at 4:41 PM, Stevo Slavić wrote: > > > Then I guess @Before test, explicitly commit offset of 0. > > > > There doesn't seem to be a tool for committing offset, only for > > checking/fetching current offset (see > > http://kafka.apache.org/documentation.html#operations ) > > > > On Tue, Dec 29, 2015 at 4:35 PM, Han JU wrote: > > > > > Hi Stevo, > > > > > > But by deleting and recreating the topic, do I remove also the messages > > > ingested? > > > My use case is that I ingest prepared messages once and run consumer > > tests > > > multiple times, between each test run I reset the consumer group's > offset > > > so that each run starts from the beginning and consumers all the > > messages. > > > > > > 2015-12-29 16:19 GMT+01:00 Stevo Slavić : > > > > > > > Have you considered deleting and recreating topic used in test? > > > > Once topic is clean, read/poll once - any committed offset should be > > > > outside of the range, and consumer should reset offset. > > > > > > > > On Tue, Dec 29, 2015 at 4:11 PM, Han JU > > wrote: > > > > > > > > > Hello, > > > > > > > > > > For local test purpose I need to frequently reset offset for a > > consumer > > > > > group. In 0.8 I just delete the consumer group's zk node under > > > > > `/consumers`. But with the redesign of the 0.9, how could I achieve > > the > > > > > same thing? > > > > > > > > > > Thanks! > > > > > > > > > > -- > > > > > *JU Han* > > > > > > > > > > Software Engineer @ Teads.tv > > > > > > > > > > +33 061960 > > > > > > > > > > > > > > > > > > > > > -- > > > *JU Han* > > > > > > Software Engineer @ Teads.tv > > > > > > +33 061960 > > > > > > -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: How to reset a consumer-group's offset in kafka 0.9?
Hi Stevo, But by deleting and recreating the topic, do I remove also the messages ingested? My use case is that I ingest prepared messages once and run consumer tests multiple times, between each test run I reset the consumer group's offset so that each run starts from the beginning and consumers all the messages. 2015-12-29 16:19 GMT+01:00 Stevo Slavić : > Have you considered deleting and recreating topic used in test? > Once topic is clean, read/poll once - any committed offset should be > outside of the range, and consumer should reset offset. > > On Tue, Dec 29, 2015 at 4:11 PM, Han JU wrote: > > > Hello, > > > > For local test purpose I need to frequently reset offset for a consumer > > group. In 0.8 I just delete the consumer group's zk node under > > `/consumers`. But with the redesign of the 0.9, how could I achieve the > > same thing? > > > > Thanks! > > > > -- > > *JU Han* > > > > Software Engineer @ Teads.tv > > > > +33 061960 > > > -- *JU Han* Software Engineer @ Teads.tv +33 061960
How to reset a consumer-group's offset in kafka 0.9?
Hello, For local test purpose I need to frequently reset offset for a consumer group. In 0.8 I just delete the consumer group's zk node under `/consumers`. But with the redesign of the 0.9, how could I achieve the same thing? Thanks! -- *JU Han* Software Engineer @ Teads.tv +33 061960
Re: Question on 0.9 new consumer API
Ok thanks for your confirmation! 2015-11-12 15:19 GMT+01:00 Grant Henke : > The new consumer (0.9.0) will not be compatible with older brokers (0.8.2). > In general you should upgrade brokers before upgrading clients. The old > clients (0.8.2) will work on the new brokers (0.9.0). > > Thanks, > Grant > > On Thu, Nov 12, 2015 at 7:52 AM, Han JU wrote: > > > Hello, > > > > Just want to know if the new consumer API coming with 0.9 will be > > compatible with 0.8 broker servers? We're looking at the new consumer > > because the new rebalancing listener is very interesting for one of our > use > > case. > > > > Another question is that if we have to upgrade our brokers to 0.9, will > > they accept producers in 0.8.2? > > > > Thanks! > > > > -- > > *JU Han* > > > > Software Engineer @ Teads.tv > > > > +33 061960 > > > > > > -- > Grant Henke > Software Engineer | Cloudera > gr...@cloudera.com | twitter.com/gchenke | linkedin.com/in/granthenke > -- *JU Han* Software Engineer @ Teads.tv +33 061960
Question on 0.9 new consumer API
Hello, Just want to know if the new consumer API coming with 0.9 will be compatible with 0.8 broker servers? We're looking at the new consumer because the new rebalancing listener is very interesting for one of our use case. Another question is that if we have to upgrade our brokers to 0.9, will they accept producers in 0.8.2? Thanks! -- *JU Han* Software Engineer @ Teads.tv +33 061960