Gwen -- KIP write up looks good. Deprecation schedule probably needs to be more specific, but I think that discussion probably needs to happen after a solution is agreed upon.
Jay -- I think "older clients will get a bad error message instead of a good one" isn't what would be happening with this change. Previously they wouldn't have received an error and they would have been able to produce messages. After the change they'll just receive this new error message which their clients can't possibly handle gracefully since it didn't exist when the client was written. Whether the acks > 1 setting was actually accomplishing what they thought doesn't matter. Someone could have reasonably read the docs on 0.8.1.1, thought acks = 2 is an ok setting for their applications, set it as a default across a bunch of apps, then follow the recommended upgrade path of updating brokers to 0.8.2 and all their apps will start failing on produce requests. On Thu, Jan 15, 2015 at 8:27 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > This is a good case to discuss. > > Let's figure the general case of how we want to handle errors and get that > documented in the protocol. The problem right now is that we give no > guidance on this. I actually thought Gwen's suggestion made sense on the > guidance we should have given which is that we will enumerate a set of > errors and their meaning for each API but it is possible that other errors > will occur and they should be handled (maybe poorly) in the same way > UNKNOWN_ERROR is handled which is our normal escape hatch for things like > OOMException. > > I really do think we shouldn't be dogmatic here: In considering a change > to errors we should consider the potential ill-effect vs the complexity of > yet another protocol version. > > In this case I actually am not sure we need to bump the protocol because > the whole point of the change was to make a setting we think doesn't make > sense break, right? Well this will break it. It seems like the only > downside is that older clients will get a bad error message instead of a > good one. But it isn't like we will have rendered a client unusable, it is > just that they will need to change their config. > > -Jay > > On Thu, Jan 15, 2015 at 6:14 PM, Gwen Shapira <gshap...@cloudera.com> > wrote: > >> I created a KIP for this suggestion: >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1+-+Remove+support+of+request.required.acks >> >> Basically documenting what was already discussed here. Comments will >> be awesome! >> >> Gwen >> >> On Thu, Jan 15, 2015 at 5:19 PM, Gwen Shapira <gshap...@cloudera.com> >> wrote: >> > The errors are part of the KIP process now, so I think the clients are >> safe :) >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals >> > >> > On Thu, Jan 15, 2015 at 5:12 PM, Steve Morin <steve.mo...@gmail.com> >> wrote: >> >> Agree errors should be part of the protocol >> >> >> >>> On Jan 15, 2015, at 17:59, Gwen Shapira <gshap...@cloudera.com> >> wrote: >> >>> >> >>> Hi, >> >>> >> >>> I got convinced by Joe and Dana that errors are indeed part of the >> >>> protocol and can't be randomly added. >> >>> >> >>> So, it looks like we need to bump version of ProduceRequest in the >> >>> following way: >> >>> Version 0 -> accept acks >1. I think we should keep the existing >> >>> behavior too (i.e. not replace it with -1) to avoid surprising >> >>> clients, but I'm willing to hear other opinions. >> >>> Version 1 -> do not accept acks >1 and return an error. >> >>> Are we ok with the error I added in KAFKA-1697? We can use something >> >>> less specific like InvalidRequestParameter. This error can be reused >> >>> in the future and reduce the need to add errors, but will also be less >> >>> clear to the client and its users. Maybe even add the error message >> >>> string to the protocol in addition to the error code? (since we are >> >>> bumping versions....) >> >>> >> >>> I think maintaining the old version throughout 0.8.X makes sense. IMO >> >>> dropping it for 0.9 is feasible, but I'll let client owners help make >> >>> that call. >> >>> >> >>> Am I missing anything? Should I start a KIP? It seems like a KIP-type >> >>> discussion :) >> >>> >> >>> Gwen >> >>> >> >>> >> >>> On Thu, Jan 15, 2015 at 2:31 PM, Ewen Cheslack-Postava >> >>> <e...@confluent.io> wrote: >> >>>> Gwen, >> >>>> >> >>>> I think the only option that wouldn't require a protocol version >> change is >> >>>> the one where acks > 1 is converted to acks = -1 since it's the only >> one >> >>>> that doesn't potentially break older clients. The protocol guide >> says that >> >>>> the expected upgrade path is servers first, then clients, so old >> clients, >> >>>> including non-java clients, that may be using acks > 1 should be >> able to >> >>>> work with a new broker version. >> >>>> >> >>>> It's more work, but I think dealing with the protocol change is the >> right >> >>>> thing to do since it eventually gets us to the behavior I think is >> better -- >> >>>> the broker should reject requests with invalid values. I think Joe >> and I >> >>>> were basically in agreement. In my mind the major piece missing from >> his >> >>>> description is how long we're going to maintain his "case 0" >> behavior. It's >> >>>> impractical to maintain old versions forever, but it sounds like >> there >> >>>> hasn't been a decision on how long to maintain them. Maybe that's >> another >> >>>> item to add to KIPs -- protocol versions and behavior need to be >> listed as >> >>>> deprecated and the earliest version in which they'll be removed >> should be >> >>>> specified so users can understand which versions are guaranteed to be >> >>>> compatible, even if they're using (well-written) non-java clients. >> >>>> >> >>>> -Ewen >> >>>> >> >>>> >> >>>>> On Thu, Jan 15, 2015 at 12:52 PM, Dana Powers < >> dana.pow...@gmail.com> wrote: >> >>>>> >> >>>>>> clients don't break on unknown errors >> >>>>> >> >>>>> maybe true for the official java clients, but I dont think the >> assumption >> >>>>> holds true for community-maintained clients and users of those >> clients. >> >>>>> kafka-python generally follows the fail-fast philosophy and raises >> an >> >>>>> exception on any unrecognized error code in any server response. >> in this >> >>>>> case, kafka-python allows users to set their own required-acks >> policy when >> >>>>> creating a producer instance. It is possible that users of >> kafka-python >> >>>>> have deployed producer code that uses ack>1 -- perhaps in production >> >>>>> environments -- and for those users the new error code will crash >> their >> >>>>> producer code. I would not be surprised if the same were true of >> other >> >>>>> community clients. >> >>>>> >> >>>>> *one reason for the fail-fast approach is that there isn't great >> >>>>> documentation on what errors to expect for each request / response >> -- so >> >>>>> we >> >>>>> use failures to alert that some error case is not handled >> properly. and >> >>>>> because of that, introducing new error cases without bumping the api >> >>>>> version is likely to cause those errors to get raised/thrown all >> the way >> >>>>> back up to the user. of course we (client maintainers) can fix the >> issues >> >>>>> in the client libraries and suggest users upgrade, but it's not the >> ideal >> >>>>> situation. >> >>>>> >> >>>>> >> >>>>> long-winded way of saying: I agree w/ Joe. >> >>>>> >> >>>>> -Dana >> >>>>> >> >>>>> >> >>>>> On Thu, Jan 15, 2015 at 12:07 PM, Gwen Shapira < >> gshap...@cloudera.com> >> >>>>> wrote: >> >>>>> >> >>>>>> Is the protocol bump caused by the behavior change or the new error >> >>>>>> code? >> >>>>>> >> >>>>>> 1) IMO, error_codes are data, and clients can expect to receive >> errors >> >>>>>> that they don't understand (i.e. unknown errors). AFAIK, clients >> don't >> >>>>>> break on unknown errors, they are simple more challenging to >> debug. If >> >>>>>> we document the new behavior, then its definitely debuggable and >> >>>>>> fixable. >> >>>>>> >> >>>>>> 2) The behavior change is basically a deprecation - i.e. acks > 1 >> were >> >>>>>> never documented, and are not supported by Kafka clients starting >> with >> >>>>>> version 0.8.2. I'm not sure this requires a protocol bump either, >> >>>>>> although its a better case than new error codes. >> >>>>>> >> >>>>>> Thanks, >> >>>>>> Gwen >> >>>>>> >> >>>>>> On Thu, Jan 15, 2015 at 10:10 AM, Joe Stein <joe.st...@stealth.ly> >> >>>>>> wrote: >> >>>>>>> Looping in the mailing list that the client developers live on >> because >> >>>>>> they >> >>>>>>> are all not on dev (though they should be if they want to be >> helping >> >>>>>>> to >> >>>>>>> build the best client libraries they can). >> >>>>>>> >> >>>>>>> I whole hardily believe that we need to not break existing >> >>>>>>> functionality >> >>>>>> of >> >>>>>>> the client protocol, ever. >> >>>>>>> >> >>>>>>> There are many reasons for this and we have other threads on the >> >>>>>>> mailing >> >>>>>>> list where we are discussing that topic (no pun intended) that I >> don't >> >>>>>> want >> >>>>>>> to re-hash here. >> >>>>>>> >> >>>>>>> If we change wire protocol functionality OR the binary format >> (either) >> >>>>>>> we >> >>>>>>> must bump version AND treat version as a feature flag with >> backward >> >>>>>>> compatibility support until it is deprecated for some time for >> folks >> >>>>>>> to >> >>>>>> deal >> >>>>>>> with it. >> >>>>>>> >> >>>>>>> match version = { >> >>>>>>> case 0: keepDoingWhatWeWereDoing() >> >>>>>>> case 1: doNewStuff() >> >>>>>>> case 2: doEvenMoreNewStuff() >> >>>>>>> } >> >>>>>>> >> >>>>>>> has to be a practice we adopt imho ... I know feature flags can be >> >>>>>> construed >> >>>>>>> as "messy code" but I am eager to hear another (better? >> different?) >> >>>>>> solution >> >>>>>>> to this. >> >>>>>>> >> >>>>>>> If we don't do a feature flag like this specifically with this >> change >> >>>>>> then >> >>>>>>> what happens is that someone upgrades their brokers with a rolling >> >>>>>> restart >> >>>>>>> in 0.8.3 and every single one of their producer requests start to >> fail >> >>>>>> and >> >>>>>>> they have a major production outage. eeeek!!!! >> >>>>>>> >> >>>>>>> I do 100% agree that > 1 makes no sense and we *REALLY* need >> people to >> >>>>>> start >> >>>>>>> using 0,1,-1 but we need to-do that in a way that is going to >> work for >> >>>>>>> everyone. >> >>>>>>> >> >>>>>>> Old producers and consumers must keep working with new brokers >> and if >> >>>>>>> we >> >>>>>> are >> >>>>>>> not going to support that then I am unclear what the use of >> "version" >> >>>>>>> is >> >>>>>>> based on our original intentions of having it because of the >> >>>>>>> 0.7=>-0.8. >> >>>>>> We >> >>>>>>> said no more breaking changes when we did that. >> >>>>>>> >> >>>>>>> - Joe Stein >> >>>>>>> >> >>>>>>> On Thu, Jan 15, 2015 at 12:38 PM, Ewen Cheslack-Postava < >> >>>>>> e...@confluent.io> >> >>>>>>> wrote: >> >>>>>>>> >> >>>>>>>> Right, so this looks like it could create an issue similar to >> what's >> >>>>>>>> currently being discussed in >> >>>>>>>> https://issues.apache.org/jira/browse/KAFKA-1649 where users >> now get >> >>>>>>>> errors >> >>>>>>>> under conditions when they previously wouldn't. Old clients won't >> >>>>>>>> even >> >>>>>>>> know >> >>>>>>>> about the error code, so besides failing they won't even be able >> to >> >>>>>>>> log >> >>>>>>>> any >> >>>>>>>> meaningful error messages. >> >>>>>>>> >> >>>>>>>> I think there are two options for compatibility: >> >>>>>>>> >> >>>>>>>> 1. An alternative change is to remove the ack > 1 code, but >> silently >> >>>>>>>> "upgrade" requests with acks > 1 to acks = -1. This isn't the >> same as >> >>>>>>>> other >> >>>>>>>> changes to behavior since the interaction between the client and >> >>>>>>>> server >> >>>>>>>> remains the same, no error codes change, etc. The client might >> just >> >>>>>>>> see >> >>>>>>>> some increased latency since the message might need to be >> replicated >> >>>>>>>> to >> >>>>>>>> more brokers than they requested. >> >>>>>>>> 2. Split this into two patches, one that bumps the protocol >> version >> >>>>>>>> on >> >>>>>>>> that >> >>>>>>>> message to include the new error code but maintains both old (now >> >>>>>>>> deprecated) and new behavior, then a second that would be >> applied in >> >>>>>>>> a >> >>>>>>>> later release that removes the old protocol + code for handling >> acks >> >>>>>> 1. >> >>>>>>>> >> >>>>>>>> 2 is probably the right thing to do. If we specify the release >> when >> >>>>>> we'll >> >>>>>>>> remove the deprecated protocol at the time of deprecation it >> makes >> >>>>>> things >> >>>>>>>> a >> >>>>>>>> lot easier for people writing non-java clients and could give >> users >> >>>>>> better >> >>>>>>>> predictability (e.g. if clients are at most 1 major release >> behind >> >>>>>>>> brokers, >> >>>>>>>> they'll remain compatible but possibly use deprecated features). >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> On Wed, Jan 14, 2015 at 3:51 PM, Gwen Shapira < >> gshap...@cloudera.com> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Hi Kafka Devs, >> >>>>>>>>> >> >>>>>>>>> We are working on KAFKA-1697 - remove code related to ack>1 on >> the >> >>>>>>>>> broker. Per Neha's suggestion, I'd like to give everyone a >> heads up >> >>>>>>>>> on >> >>>>>>>>> what these changes mean. >> >>>>>>>>> >> >>>>>>>>> Once this patch is included, any produce requests that include >> >>>>>>>>> request.required.acks > 1 will result in an exception. This >> will be >> >>>>>>>>> InvalidRequiredAcks in new versions (0.8.3 and up, I assume) and >> >>>>>>>>> UnknownException in existing versions (sorry, but I can't add >> error >> >>>>>>>>> codes retroactively). >> >>>>>>>>> >> >>>>>>>>> This behavior is already enforced by 0.8.2 producers (sync and >> >>>>>>>>> new), >> >>>>>>>>> but we expect impact on users with older producers that relied >> on >> >>>>>>>>> acks >> >>>>>>>>>> 1 and external clients (i.e python, go, etc). >> >>>>>>>>> >> >>>>>>>>> Users who relied on acks > 1 are expected to switch to using >> acks = >> >>>>>>>>> -1 >> >>>>>>>>> and a min.isr parameter than matches their user case. >> >>>>>>>>> >> >>>>>>>>> This change was discussed in the past in the context of >> KAFKA-1555 >> >>>>>>>>> (min.isr), but let us know if you have any questions or concerns >> >>>>>>>>> regarding this change. >> >>>>>>>>> >> >>>>>>>>> Gwen >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Thanks, >> >>>>>>>> Ewen >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> You received this message because you are subscribed to the Google >> >>>>>>> Groups >> >>>>>>> "kafka-clients" group. >> >>>>>>> To unsubscribe from this group and stop receiving emails from it, >> send >> >>>>>>> an >> >>>>>>> email to kafka-clients+unsubscr...@googlegroups.com. >> >>>>>>> To post to this group, send email to >> kafka-clie...@googlegroups.com. >> >>>>>>> Visit this group at http://groups.google.com/group/kafka-clients. >> >>>>>>> To view this discussion on the web visit >> >>>>>> >> >>>>>> >> https://groups.google.com/d/msgid/kafka-clients/CAA7ooCBtH2JjyQsArdx_%3DV25B4O1QJk0YvOu9U6kYt9sB4aqng%40mail.gmail.com >> >>>>>> . >> >>>>>>> >> >>>>>>> For more options, visit https://groups.google.com/d/optout. >> >>>>>> >> >>>>>> -- >> >>>>>> You received this message because you are subscribed to the Google >> >>>>>> Groups >> >>>>>> "kafka-clients" group. >> >>>>>> To unsubscribe from this group and stop receiving emails from it, >> send >> >>>>>> an >> >>>>>> email to kafka-clients+unsubscr...@googlegroups.com. >> >>>>>> To post to this group, send email to >> kafka-clie...@googlegroups.com. >> >>>>>> Visit this group at http://groups.google.com/group/kafka-clients. >> >>>>>> To view this discussion on the web visit >> >>>>>> >> >>>>>> >> https://groups.google.com/d/msgid/kafka-clients/CAHBV8WeUebxi%2B%2BSbjz8E9Yf4u4hkcPJ80Xsj0XTKcTac%3D%2B613A%40mail.gmail.com >> >>>>>> . >> >>>>>> For more options, visit https://groups.google.com/d/optout. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Thanks, >> >>>> Ewen >> > > -- Thanks, Ewen