Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-23 Thread Ewen Cheslack-Postava
I'd suggest using the new consumer instead of the old consumer. We've
refined the implementation such that even with auto-commit you should get
at least once processing in the worst case (and when there aren't failures,
exactly once). The 0.10.0.0 release should get all of these semantics right.

-Ewen

On Mon, Jul 11, 2016 at 7:05 AM, Gerard Klijs 
wrote:

> You could set the auto.commit.interval.ms to a lower value, in your
> example
> it is 10 seconds, which can be a lot of messages. I don't really see how it
> could be prevented any further, since offset's can only committed by
> consumer to the partitions they are assigned to. I do believe there is some
> work in progress in which the assigned of partitions to consumers is
> somewhat sticky.
> In that case when a consumer has been assigned the same partitions after
> the rebalance as it has had before, and then it should not be necessary to
> consume the same data again in those partitions.
>
> On Mon, Jul 11, 2016 at 3:18 PM Michael Luban 
> wrote:
>
> > Using the 0.8.2.1 client.
> >
> > Is it possible to statistically minimize the possibility of duplication
> in
> > this scenario or has this behavior been corrected in a later client
> > version?  Or is the test flawed?
> >
> > https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
> >
>



-- 
Thanks,
Ewen


Re: Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-11 Thread Gerard Klijs
You could set the auto.commit.interval.ms to a lower value, in your example
it is 10 seconds, which can be a lot of messages. I don't really see how it
could be prevented any further, since offset's can only committed by
consumer to the partitions they are assigned to. I do believe there is some
work in progress in which the assigned of partitions to consumers is
somewhat sticky.
In that case when a consumer has been assigned the same partitions after
the rebalance as it has had before, and then it should not be necessary to
consume the same data again in those partitions.

On Mon, Jul 11, 2016 at 3:18 PM Michael Luban  wrote:

> Using the 0.8.2.1 client.
>
> Is it possible to statistically minimize the possibility of duplication in
> this scenario or has this behavior been corrected in a later client
> version?  Or is the test flawed?
>
> https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0
>


Duplicates consumed on rebalance. No compression, autocommit enabled.

2016-07-11 Thread Michael Luban
Using the 0.8.2.1 client.

Is it possible to statistically minimize the possibility of duplication in
this scenario or has this behavior been corrected in a later client
version?  Or is the test flawed?

https://gist.github.com/mluban/03a5c0d9221182e6ddbc37189c4d3eb0