Gwen, KIP write-up looks good. According to the rest of the KIP process proposal, would you like to start a DISCUSS/VOTE thread for it?
Thanks, Neha On Fri, Jan 16, 2015 at 9:37 AM, Ewen Cheslack-Postava <e...@confluent.io> wrote: > Gwen -- KIP write up looks good. Deprecation schedule probably needs to be > more specific, but I think that discussion probably needs to happen after a > solution is agreed upon. > > Jay -- I think "older clients will get a bad error message instead of a > good one" isn't what would be happening with this change. Previously they > wouldn't have received an error and they would have been able to produce > messages. After the change they'll just receive this new error message > which their clients can't possibly handle gracefully since it didn't exist > when the client was written. Whether the acks > 1 setting was actually > accomplishing what they thought doesn't matter. Someone could have > reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for > their applications, set it as a default across a bunch of apps, then follow > the recommended upgrade path of updating brokers to 0.8.2 and all their > apps will start failing on produce requests. > > > On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > This is a good case to discuss. > > > > Let's figure the general case of how we want to handle errors and get > that > > documented in the protocol. The problem right now is that we give no > > guidance on this. I actually thought Gwen's suggestion made sense on the > > guidance we should have given which is that we will enumerate a set of > > errors and their meaning for each API but it is possible that other > errors > > will occur and they should be handled (maybe poorly) in the same way > > UNKNOWN_ERROR is handled which is our normal escape hatch for things like > > OOMException. > > > > I really do think we shouldn't be dogmatic here: In considering a change > > to errors we should consider the potential ill-effect vs the complexity > of > > yet another protocol version. > > > > In this case I actually am not sure we need to bump the protocol because > > the whole point of the change was to make a setting we think doesn't make > > sense break, right? Well this will break it. It seems like the only > > downside is that older clients will get a bad error message instead of a > > good one. But it isn't like we will have rendered a client unusable, it > is > > just that they will need to change their config. > > > > -Jay > > > > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gshap...@cloudera.com> > > wrote: > > > >> I created a KIP for this suggestion: > >> > >> > >> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks > >> > >> Basically documenting what was already discussed here. Comments will > >> be awesome! > >> > >> Gwen > >> > >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gshap...@cloudera.com> > >> wrote: > >> > The errors are part of the KIP process now, so I think the clients are > >> safe :) > >> > > >> > > >> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals > >> > > >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <steve.mo...@gmail.com> > >> wrote: > >> >> Agree errors should be part of the protocol > >> >> > >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gshap...@cloudera.com> > >> wrote: > >> >>> > >> >>> Hi, > >> >>> > >> >>> I got convinced by Joe and Dana that errors are indeed part of the > >> >>> protocol and can't be randomly added. > >> >>> > >> >>> So, it looks like we need to bump version of ProduceRequest in the > >> >>> following way: > >> >>> Version 0 -> accept acks >1. I think we should keep the existing > >> >>> behavior too (i.e. not replace it with -1) to avoid surprising > >> >>> clients, but I'm willing to hear other opinions. > >> >>> Version 1 -> do not accept acks >1 and return an error. > >> >>> Are we ok with the error I added in KAFKA-1697? We can use something > >> >>> less specific like InvalidRequestParameter. This error can be reused > >> >>> in the future and reduce the need to add errors, but will also be > less > >> >>> clear to the client and its users. Maybe even add the error message > >> >>> string to the protocol in addition to the error code? (since we are > >> >>> bumping versions....) > >> >>> > >> >>> I think maintaining the old version throughout 0.8.X makes sense. > IMO > >> >>> dropping it for 0.9 is feasible, but I'll let client owners help > make > >> >>> that call. > >> >>> > >> >>> Am I missing anything? Should I start a KIP? It seems like a > KIP-type > >> >>> discussion :) > >> >>> > >> >>> Gwen > >> >>> > >> >>> > >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava > >> >>> <e...@confluent.io> wrote: > >> >>>> Gwen, > >> >>>> > >> >>>> I think the only option that wouldn't require a protocol version > >> change is > >> >>>> the one where acks > 1 is converted to acks = -1 since it's the > only > >> one > >> >>>> that doesn't potentially break older clients. The protocol guide > >> says that > >> >>>> the expected upgrade path is servers first, then clients, so old > >> clients, > >> >>>> including non-java clients, that may be using acks > 1 should be > >> able to > >> >>>> work with a new broker version. > >> >>>> > >> >>>> It's more work, but I think dealing with the protocol change is the > >> right > >> >>>> thing to do since it eventually gets us to the behavior I think is > >> better -- > >> >>>> the broker should reject requests with invalid values. I think Joe > >> and I > >> >>>> were basically in agreement. In my mind the major piece missing > from > >> his > >> >>>> description is how long we're going to maintain his "case 0" > >> behavior. It's > >> >>>> impractical to maintain old versions forever, but it sounds like > >> there > >> >>>> hasn't been a decision on how long to maintain them. Maybe that's > >> another > >> >>>> item to add to KIPs -- protocol versions and behavior need to be > >> listed as > >> >>>> deprecated and the earliest version in which they'll be removed > >> should be > >> >>>> specified so users can understand which versions are guaranteed to > be > >> >>>> compatible, even if they're using (well-written) non-java clients. > >> >>>> > >> >>>> -Ewen > >> >>>> > >> >>>> > >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers < > >> dana.pow...@gmail.com> wrote: > >> >>>>> > >> >>>>>> clients don't break on unknown errors > >> >>>>> > >> >>>>> maybe true for the official java clients, but I dont think the > >> assumption > >> >>>>> holds true for community-maintained clients and users of those > >> clients. > >> >>>>> kafka-python generally follows the fail-fast philosophy and raises > >> an > >> >>>>> exception on any unrecognized error code in any server response. > >> in this > >> >>>>> case, kafka-python allows users to set their own required-acks > >> policy when > >> >>>>> creating a producer instance. It is possible that users of > >> kafka-python > >> >>>>> have deployed producer code that uses ack>1 -- perhaps in > production > >> >>>>> environments -- and for those users the new error code will crash > >> their > >> >>>>> producer code. I would not be surprised if the same were true of > >> other > >> >>>>> community clients. > >> >>>>> > >> >>>>> *one reason for the fail-fast approach is that there isn't great > >> >>>>> documentation on what errors to expect for each request / response > >> -- so > >> >>>>> we > >> >>>>> use failures to alert that some error case is not handled > >> properly. and > >> >>>>> because of that, introducing new error cases without bumping the > api > >> >>>>> version is likely to cause those errors to get raised/thrown all > >> the way > >> >>>>> back up to the user. of course we (client maintainers) can fix > the > >> issues > >> >>>>> in the client libraries and suggest users upgrade, but it's not > the > >> ideal > >> >>>>> situation. > >> >>>>> > >> >>>>> > >> >>>>> long-winded way of saying: I agree w/ Joe. > >> >>>>> > >> >>>>> -Dana > >> >>>>> > >> >>>>> > >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira < > >> gshap...@cloudera.com> > >> >>>>> wrote: > >> >>>>> > >> >>>>>> Is the protocol bump caused by the behavior change or the new > error > >> >>>>>> code? > >> >>>>>> > >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive > >> errors > >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients > >> don't > >> >>>>>> break on unknown errors, they are simple more challenging to > >> debug. If > >> >>>>>> we document the new behavior, then its definitely debuggable and > >> >>>>>> fixable. > >> >>>>>> > >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 > >> were > >> >>>>>> never documented, and are not supported by Kafka clients starting > >> with > >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either, > >> >>>>>> although its a better case than new error codes. > >> >>>>>> > >> >>>>>> Thanks, > >> >>>>>> Gwen > >> >>>>>> > >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein < > joe.st...@stealth.ly> > >> >>>>>> wrote: > >> >>>>>>> Looping in the mailing list that the client developers live on > >> because > >> >>>>>> they > >> >>>>>>> are all not on dev (though they should be if they want to be > >> helping > >> >>>>>>> to > >> >>>>>>> build the best client libraries they can). > >> >>>>>>> > >> >>>>>>> I whole hardily believe that we need to not break existing > >> >>>>>>> functionality > >> >>>>>> of > >> >>>>>>> the client protocol, ever. > >> >>>>>>> > >> >>>>>>> There are many reasons for this and we have other threads on the > >> >>>>>>> mailing > >> >>>>>>> list where we are discussing that topic (no pun intended) that I > >> don't > >> >>>>>> want > >> >>>>>>> to re-hash here. > >> >>>>>>> > >> >>>>>>> If we change wire protocol functionality OR the binary format > >> (either) > >> >>>>>>> we > >> >>>>>>> must bump version AND treat version as a feature flag with > >> backward > >> >>>>>>> compatibility support until it is deprecated for some time for > >> folks > >> >>>>>>> to > >> >>>>>> deal > >> >>>>>>> with it. > >> >>>>>>> > >> >>>>>>> match version = { > >> >>>>>>> case 0: keepDoingWhatWeWereDoing() > >> >>>>>>> case 1: doNewStuff() > >> >>>>>>> case 2: doEvenMoreNewStuff() > >> >>>>>>> } > >> >>>>>>> > >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can > be > >> >>>>>> construed > >> >>>>>>> as "messy code" but I am eager to hear another (better? > >> different?) > >> >>>>>> solution > >> >>>>>>> to this. > >> >>>>>>> > >> >>>>>>> If we don't do a feature flag like this specifically with this > >> change > >> >>>>>> then > >> >>>>>>> what happens is that someone upgrades their brokers with a > rolling > >> >>>>>> restart > >> >>>>>>> in 0.8.3 and every single one of their producer requests start > to > >> fail > >> >>>>>> and > >> >>>>>>> they have a major production outage. eeeek!!!! > >> >>>>>>> > >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need > >> people to > >> >>>>>> start > >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to > >> work for > >> >>>>>>> everyone. > >> >>>>>>> > >> >>>>>>> Old producers and consumers must keep working with new brokers > >> and if > >> >>>>>>> we > >> >>>>>> are > >> >>>>>>> not going to support that then I am unclear what the use of > >> "version" > >> >>>>>>> is > >> >>>>>>> based on our original intentions of having it because of the > >> >>>>>>> 0.7=>-0.8. > >> >>>>>> We > >> >>>>>>> said no more breaking changes when we did that. > >> >>>>>>> > >> >>>>>>> - Joe Stein > >> >>>>>>> > >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava < > >> >>>>>> e...@confluent.io> > >> >>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>> Right, so this looks like it could create an issue similar to > >> what's > >> >>>>>>>> currently being discussed in > >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users > >> now get > >> >>>>>>>> errors > >> >>>>>>>> under conditions when they previously wouldn't. Old clients > won't > >> >>>>>>>> even > >> >>>>>>>> know > >> >>>>>>>> about the error code, so besides failing they won't even be > able > >> to > >> >>>>>>>> log > >> >>>>>>>> any > >> >>>>>>>> meaningful error messages. > >> >>>>>>>> > >> >>>>>>>> I think there are two options for compatibility: > >> >>>>>>>> > >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but > >> silently > >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the > >> same as > >> >>>>>>>> other > >> >>>>>>>> changes to behavior since the interaction between the client > and > >> >>>>>>>> server > >> >>>>>>>> remains the same, no error codes change, etc. The client might > >> just > >> >>>>>>>> see > >> >>>>>>>> some increased latency since the message might need to be > >> replicated > >> >>>>>>>> to > >> >>>>>>>> more brokers than they requested. > >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol > >> version > >> >>>>>>>> on > >> >>>>>>>> that > >> >>>>>>>> message to include the new error code but maintains both old > (now > >> >>>>>>>> deprecated) and new behavior, then a second that would be > >> applied in > >> >>>>>>>> a > >> >>>>>>>> later release that removes the old protocol + code for handling > >> acks > >> >>>>>> 1. > >> >>>>>>>> > >> >>>>>>>> 2 is probably the right thing to do. If we specify the release > >> when > >> >>>>>> we'll > >> >>>>>>>> remove the deprecated protocol at the time of deprecation it > >> makes > >> >>>>>> things > >> >>>>>>>> a > >> >>>>>>>> lot easier for people writing non-java clients and could give > >> users > >> >>>>>> better > >> >>>>>>>> predictability (e.g. if clients are at most 1 major release > >> behind > >> >>>>>>>> brokers, > >> >>>>>>>> they'll remain compatible but possibly use deprecated > features). > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira < > >> gshap...@cloudera.com> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> Hi Kafka Devs, > >> >>>>>>>>> > >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on > >> the > >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a > >> heads up > >> >>>>>>>>> on > >> >>>>>>>>> what these changes mean. > >> >>>>>>>>> > >> >>>>>>>>> Once this patch is included, any produce requests that include > >> >>>>>>>>> request.required.acks > 1 will result in an exception. This > >> will be > >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) > and > >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add > >> error > >> >>>>>>>>> codes retroactively). > >> >>>>>>>>> > >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and > >> >>>>>>>>> new), > >> >>>>>>>>> but we expect impact on users with older producers that relied > >> on > >> >>>>>>>>> acks > >> >>>>>>>>>> 1 and external clients (i.e python, go, etc). > >> >>>>>>>>> > >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using > >> acks = > >> >>>>>>>>> -1 > >> >>>>>>>>> and a min.isr parameter than matches their user case. > >> >>>>>>>>> > >> >>>>>>>>> This change was discussed in the past in the context of > >> KAFKA-1555 > >> >>>>>>>>> (min.isr), but let us know if you have any questions or > concerns > >> >>>>>>>>> regarding this change. > >> >>>>>>>>> > >> >>>>>>>>> Gwen > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> > >> >>>>>>>> -- > >> >>>>>>>> Thanks, > >> >>>>>>>> Ewen > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -- > >> >>>>>>> You received this message because you are subscribed to the > Google > >> >>>>>>> Groups > >> >>>>>>> "kafka-clients" group. > >> >>>>>>> To unsubscribe from this group and stop receiving emails from > it, > >> send > >> >>>>>>> an > >> >>>>>>> email to kafka-clients+unsubscr...@googlegroups.com. > >> >>>>>>> To post to this group, send email to > >> kafka-clie...@googlegroups.com. > >> >>>>>>> Visit this group at > http://groups.google.com/group/kafka-clients. > >> >>>>>>> To view this discussion on the web visit > >> >>>>>> > >> >>>>>> > >> > https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com > >> >>>>>> . > >> >>>>>>> > >> >>>>>>> For more options, visit https://groups.google.com/d/optout. > >> >>>>>> > >> >>>>>> -- > >> >>>>>> You received this message because you are subscribed to the > Google > >> >>>>>> Groups > >> >>>>>> "kafka-clients" group. > >> >>>>>> To unsubscribe from this group and stop receiving emails from it, > >> send > >> >>>>>> an > >> >>>>>> email to kafka-clients+unsubscr...@googlegroups.com. > >> >>>>>> To post to this group, send email to > >> kafka-clie...@googlegroups.com. > >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients > . > >> >>>>>> To view this discussion on the web visit > >> >>>>>> > >> >>>>>> > >> > https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com > >> >>>>>> . > >> >>>>>> For more options, visit https://groups.google.com/d/optout. > >> >>>> > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Thanks, > >> >>>> Ewen > >> > > > > > > > -- > Thanks, > Ewen > -- Thanks, Neha