So, we have one other blocker bug in system tests that we're trying to make
sure can safely be removed, so we've had a bit of slack time with this.
Obviously having this all happen very last minute isn't really ideal since
it didn't allow enough time to address the feedback -- Stevo's questions
didn't get a response until today.

I'd really like this fix in because I think it actually seems pretty low
risk, although it would potentially make some consumers on a new cluster be
delayed indefinitely if the user is relying on having fewer than the
default offset.topic.replication.factor. Many users will have there
server.properties files stored elsewhere and not rely on the version we
ship (and they shouldn't -- it contains other dangerous settings like
log.dirs in /tmp). This does mean some users could be negatively affected,
although I have to admit it seems like it'd be an extremely small
population of our users.

All that said, given the late voting and discussion continuing while the
VOTE thread proceeded, I'd like to suggest that if anyone objects to
including in 0.10.2.0, we bump it to 0.10.3.0 and also follow up on
Ismael's question (re: does this need to be in a major version, which
afaict was only followed up by Jeff's comment that he'd rather not wait).
The patch is there, so those that desperately want it can easily
cherry-pick it and build a version themselves if it's that big a problem
for them, else they would see the fix in the next release.

More generally, I think this KIP (and my experience dealing w/ the KIPs +
feature freeze for this release) suggest we need a little more clarity on
how KIPs fit into the release process. I'm going to follow up in a new
release post-mortem thread further but I think we should a) start to
distinguish between feature KIPs and bug fix KIPs and possibly apply
different rules for them (this one is admittedly ambiguous) and b) push the
deadline for feature KIPs to a week before feature freeze (giving a more
realistic review period for all that code) and make bugfix KIPs acceptable
up to a week before code freeze (again, providing a realistic review
period), and c) move our freeze dates off of Fridays since having them
there implicitly gives people 2 extra days to squeeze things in.

-Ewen

On Mon, Jan 30, 2017 at 1:46 PM, Onur Karaman <okara...@linkedin.com.invalid
> wrote:

> I've updated the KIP title to reflect this distinction:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 115%3A+Enforce+offsets.topic.replication.factor+upon+__
> consumer_offsets+auto+topic+creation
>
> On Mon, Jan 30, 2017 at 12:52 AM, Onur Karaman <
> onurkaraman.apa...@gmail.com
> > wrote:
>
> > Regarding Joel's comment:
> > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
> > >
> > > already voted, but one thing worth considering (since this KIP speaks
> of
> > > *enforcement*) is desired behavior if the topic already exists and the
> > > config != existing RF.
> > >
> >
> > The short answer: The first line of the KIP specifically states that it
> > attempts enforcement only upon topic creation and I think anything else
> > should be beyond the scope of this KIP.
> >
> > The long answer:
> > Mismatch between existing RF and the "offsets.topic.replication.factor"
> > config happens with:
> > a. topic creation paths 3-5 as defined in the KIP if the size of the
> > replicas set resulting from AdminUtils != "offsets.topic.replication.
> > factor"
> > b. topic creation path 6
> > c. a config change to the broker's "offsets.topic.replication.factor"
> > d. partition reassignments that expand the RF
> >
> > For all of these scenarios, I believe it all boils down to the intent of
> > the imperfectly named "offsets.topic.replication.factor" and
> > "default.replication.factor" configs. These configs really only represent
> > the RF to be used upon auto topic creation by the broker and are never
> > referenced anywhere else, whether it's "offsets.topic.replication.
> factor"
> > for the __consumer_offsets topic or "default.replication.factor" for any
> > other topic.
> >
> > I think any RF mismatch after topic creation is beyond the scope of this
> > discussion since the configs anyways were not intended to enforce RF
> > anywhere other than upon auto topic creation by the broker.
> >
> > Regarding Stevo's comment:
> > > On Thu, Jan 26, 2017 at 4:26 AM, Stevo Slavić <ssla...@gmail.com>
> wrote:
> > > test cluster. If this problem of offsets.topic.replication.factor not
> > being
> > > enforced others also observed only in their tests only, than I don't
> like
> > > the KIP proposed change, of setting offsets.topic.replication.factor
> to
> > 1
> > > by default. I understand backward compatibility goals of this, but I
> can
> > > imagine late discovered production issues as consequences of this
> change.
> >
> > It's worth noting that the KafkaConfig default for
> > "offsets.topic.replication.factor" is still 3. This KIP merely changes
> the
> > config/server.properties default to 1 so that the quickstart
> > "./bin/kafka-server-start.sh config/server.properties" continues to run
> > smoothly.
> >
> > On Thu, Jan 26, 2017 at 9:34 AM, Colin McCabe <cmcc...@apache.org>
> wrote:
> >
> > > Hi all,
> > >
> > > +1 (non-binding) for KIP-115.
> > >
> > > On Thu, Jan 26, 2017, at 04:26, Stevo Slavić wrote:
> > > > If I understood well, this KIP is trying to solve for the problem of
> > > > offsets.topic.replication.factor not being enforced, particularly in
> > > > context of  "when you have clients or tooling running as the cluster
> is
> > > > getting setup". Assuming that this problem was observed in
> production,
> > so
> > > > in non-testing only conditions, would it make sense to introduce
> > > > additional
> > > > property - min number of alive brokers before offsets topic is
> allowed
> > to
> > > > be created?
> > >
> > > It's an interesting idea, but... is there a user use-case for a
> property
> > > like this?  I'm having a hard time thinking of one, but maybe I missed
> > > something.
> > >
> > > cheers,
> > > Colin
> > >
> > > >
> > > > Currently offsets.topic.replication.factor is used for that purpose,
> > so
> > > > with offsets.topic.replication.factor set to 3 it's enough to have
> > just
> > > 3
> > > > brokers up for offsets topic to be created. Then all replicas of all
> > (by
> > > > default 50) partitions of this topic would be spread out over just
> > these
> > > > 3
> > > > brokers, while eventually entire cluster might be much larger in size
> > and
> > > > would benefit from wider spread of consumer offsets topic partitions
> > > > leadership.
> > > >
> > > > One can achieve wider spread later, manually. But that would first
> have
> > > > to
> > > > be detected, and then use provided CLI/scripts to change replica
> > > > assignment. IMO it would be better if it was possible to configure
> > > > desired
> > > > spread, even if just indirectly through configuring min number of
> alive
> > > > brokers. If not overriden in server.properties, this new property can
> > > > default to offsets.topic.replication.factor
> > > >
> > > > I've been bitten by problem of offsets.topic.replication.factor not
> > > being
> > > > enforced but only in testing, integration tests, it was almost
> > > > unpredictable when offsets topic is ready, test cluster initialized,
> > > > would
> > > > get lots of false failures, unstable tests, but eventually got to
> > > > predictable deterministic test behavior, found ways to fully
> initialize
> > > > test cluster. If this problem of offsets.topic.replication.factor
> not
> > > > being
> > > > enforced others also observed only in their tests only, than I don't
> > like
> > > > the KIP proposed change, of setting offsets.topic.replication.factor
> > to
> > > 1
> > > > by default. I understand backward compatibility goals of this, but I
> > can
> > > > imagine late discovered production issues as consequences of this
> > change.
> > > > So I wouldn't like to trade off production issues probability for
> > testing
> > > > convenience.
> > > >
> > > > Current Kafka documentation has nice note about
> > > > offsets.topic.replication.factor and related behavior. New note
> about
> > > new
> > > > default would have to be a warning in bold and red in docs, and every
> > > > broker should output proper warning in log if configuration for
> > > > offsets.topic.replication.factor is on new proposed default of 1.
> > > >
> > > > Kind regards,
> > > > Stevo Slavic.
> > > >
> > > > On Thu, Jan 26, 2017 at 8:43 AM, James Cheng <wushuja...@gmail.com>
> > > > wrote:
> > > >
> > > > >
> > > > > > On Jan 25, 2017, at 9:26 PM, Joel Koshy <jjkosh...@gmail.com>
> > wrote:
> > > > > >
> > > > > > already voted, but one thing worth considering (since this KIP
> > > speaks of
> > > > > > *enforcement*) is desired behavior if the topic already exists
> and
> > > the
> > > > > > config != existing RF.
> > > > > >
> > > > >
> > > > > Yeah, I'm curious about this too.
> > > > >
> > > > > -James
> > > > >
> > > > > > On Wed, Jan 25, 2017 at 4:30 PM, Dong Lin <lindon...@gmail.com>
> > > wrote:
> > > > > >
> > > > > >> +1
> > > > > >>
> > > > > >> On Wed, Jan 25, 2017 at 4:22 PM, Ismael Juma <ism...@juma.me.uk
> >
> > > wrote:
> > > > > >>
> > > > > >>> An important question is if this needs to wait for a major
> > release
> > > or
> > > > > >> not.
> > > > > >>>
> > > > > >>> Ismael
> > > > > >>>
> > > > > >>> On Thu, Jan 26, 2017 at 12:19 AM, Ismael Juma <
> ism...@juma.me.uk
> > >
> > > > > wrote:
> > > > > >>>
> > > > > >>>> +1 from me too.
> > > > > >>>>
> > > > > >>>> Ismael
> > > > > >>>>
> > > > > >>>> On Thu, Jan 26, 2017 at 12:07 AM, Ewen Cheslack-Postava <
> > > > > >>> e...@confluent.io
> > > > > >>>>> wrote:
> > > > > >>>>
> > > > > >>>>> +1
> > > > > >>>>>
> > > > > >>>>> Since this is an unusual one, I think it's worth pointing out
> > > that
> > > > > the
> > > > > >>> KIP
> > > > > >>>>> notes it is really a bug fix, but since it has compatibility
> > > > > >>> implications
> > > > > >>>>> the KIP was worth it. It was a sort of intentional bug, but
> > > confusing
> > > > > >>> and
> > > > > >>>>> dangerous.
> > > > > >>>>>
> > > > > >>>>> Seems important to fix this ASAP since people are hitting
> this
> > in
> > > > > >>> practice
> > > > > >>>>> and would have to go out of their way to set up monitoring to
> > > catch
> > > > > >> the
> > > > > >>>>> issue.
> > > > > >>>>>
> > > > > >>>>> -Ewen
> > > > > >>>>>
> > > > > >>>>> On Wed, Jan 25, 2017 at 4:02 PM, Jason Gustafson <
> > > ja...@confluent.io
> > > > > >
> > > > > >>>>> wrote:
> > > > > >>>>>
> > > > > >>>>>> +1 from me. The current behavior seems both surprising and
> > > > > >> dangerous.
> > > > > >>>>>>
> > > > > >>>>>> -Jason
> > > > > >>>>>>
> > > > > >>>>>> On Wed, Jan 25, 2017 at 3:58 PM, Onur Karaman <
> > > > > >>>>>> onurkaraman.apa...@gmail.com>
> > > > > >>>>>> wrote:
> > > > > >>>>>>
> > > > > >>>>>>> Hey everyone.
> > > > > >>>>>>>
> > > > > >>>>>>> I made a bug-fix KIP-115 to enforce
> > offsets.topic.replication.
> > > > > >>> factor:
> > > > > >>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > >>>>>>> 115%3A+Enforce+offsets.topic.replication.factor
> > > > > >>>>>>>
> > > > > >>>>>>> Comments are welcome.
> > > > > >>>>>>>
> > > > > >>>>>>> - Onur
> > > > > >>>>>>>
> > > > > >>>>>>
> > > > > >>>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>
> > > > >
> > > > >
> > >
> >
>

Reply via email to