Yeah guys, I think this is one where the cure is worse than the disease. Let's just have the two close methods, that is confusing enough. Adding a third won't help. I understand that having a few microseconds before the thread shuts down could be unexpected but I think there is nothing inherently wrong with that.
-Jay On Thu, Mar 19, 2015 at 3:09 PM, Jiangjie Qin <j...@linkedin.com.invalid> wrote: > Hi Neha, > > As Joel said, (1) and (2) provides almost exact same function as far as I > can see. I actually have no strong preference. > The only difference is that (2) provides the close() semantic meaning of > making sure all the resources have gone away, at the cost of adding > another abort() interface. > The simplicity of (1) is also attractive but at a slight cost of clarity. > I am slightly leaning towards (2) but would also be OK if we pursue (1). > > Thanks. > > Jiangjie (Becket) Qin > > > On 3/19/15, 2:46 PM, "Joel Koshy" <jjkosh...@gmail.com> wrote: > > >(1) should work, but as Jun suggested earlier in the thread it is > >slightly misleading. The (intuitive) post-condition of "close" is that > >the producer has shutdown - i.e., its sender thread, closed its > >metrics, serializer/deserializer, etc. That is not necessarily a > >post-condition of "close(0)" although one can contend that if you call > >the method in non-blocking mode (zero timeout) then it is reasonable > >to not expect that post-condition. > > > >So I think that although (2) adds one more API it brings "simplicity > >by virtue of overall clarity". > > > >I would be in favor of (2) but not strongly opposed to (1). > > > >Thanks, > > > >Joel > > > >On Thu, Mar 19, 2015 at 10:05:04AM -0700, Neha Narkhede wrote: > >> I'm in favor of (1) for the sake of simplicity and as Jay mentions to > >> reduce the number of different APIs. Can you explain when (1) does not > >>work? > >> > >> On Wed, Mar 18, 2015 at 2:52 PM, Jay Kreps <jay.kr...@gmail.com> wrote: > >> > >> > Personally I'm in favor of (1) just to reduce the number of different > >>APIs. > >> > People will find the difference between abort and close subtle and > >> > confusing and the only instance where you want it is this somewhat > >>unusual > >> > case you guys are pursuing, right? > >> > > >> > -Jay > >> > > >> > On Wed, Mar 18, 2015 at 2:13 PM, Jiangjie Qin > >><j...@linkedin.com.invalid> > >> > wrote: > >> > > >> > > It looks we have another option and are now deciding between the > >> > following > >> > > two interfaces: > >> > > > >> > > 1. Close() + close(timeout) > >> > > - timeout could be either positive or zero. > >> > > - only close(0) can be called from sender thread > >> > > > >> > > 2. Close() + abort() + close(timeout) > >> > > - timeout can either be positive or zero > >> > > - only abort() can be called from sender thread > >> > > > >> > > - abort() is equivalent to close(0) in 1) but does not join sender > >> > > thread and does not close metrics. > >> > > - Another thread has to call close() or close(timeout) in order > >>to make > >> > > sure the resources in producer are gone. > >> > > > >> > > The tow approach provides the same function we need, the difference > >>is > >> > > approach 2) follows convention of close() and abort(). On the other > >>hand, > >> > > approach 1) saves one interface compared with approach 2) but does > >>not > >> > > follow the convention. > >> > > > >> > > When the two approaches come to user code, it is probably something > >>like > >> > > this: > >> > > > >> > > Try { > >> > > While(!finished) > >> > > Producer.send(record, callback) > >> > > } catch (Exception e) { > >> > > Producer.close(5) > >> > > } > >> > > > >> > > Class CallbackImpl implements Callback { > >> > > onCompletion(RecordMetadata metadata Exception e) { > >> > > If (e != null) > >> > > Abort() / close() > >> > > } > >> > > } > >> > > > >> > > Because the two approach leads to almost the same user code, > >>assuming > >> > > users are always calling producer.close() as a clean up step, > >>personally > >> > I > >> > > prefer approach 2) as it follows convention. > >> > > > >> > > Any thoughts? > >> > > > >> > > Jiangjie (Becket) Qin > >> > > > >> > > > >> > > On 3/17/15, 10:25 AM, "Jiangjie Qin" <j...@linkedin.com> wrote: > >> > > > >> > > >Hi Jun, > >> > > > > >> > > >Yes, as Guozhang said, the main reason we set a flag is because > >>close(0) > >> > > >is expected to be called by sender thread itself. > >> > > >If we want to maintain the semantic meaning of close(), one > >>alternative > >> > is > >> > > >to have an abort() method does the same thing as close(0) except > >> > cleanup. > >> > > >And in close(timeout), after timeout we call abort() and join the > >>sender > >> > > >thread. This was one of the previous proposal. We merged abort to > >> > close(0) > >> > > >because they are almost doing the same thing. But from what you > >> > mentioned, > >> > > >it might make sense to have two separate methods. > >> > > > > >> > > >Thanks. > >> > > > > >> > > >Jiangjie (Becket) Qin > >> > > > > >> > > >On 3/16/15, 10:31 PM, "Guozhang Wang" <wangg...@gmail.com> wrote: > >> > > > > >> > > >>Yeah in this sense the sender thread will not exist immediately > >>in the > >> > > >>close(0) call, but will only terminate after the current response > >>batch > >> > > >>has > >> > > >>been processed, as will the producer instance itself. > >> > > >> > >> > > >>There is a reason for this though: for a clean shutdown the caller > >> > thread > >> > > >>has to wait for the sender thread to join before closing the > >>producer > >> > > >>instance, but this cannot be achieve if close(0) is called by the > >> > sender > >> > > >>thread itself (for example in KAFKA-1659, there is a proposal from > >> > Andrew > >> > > >>Stein on using thread.interrupt and thread.stop, but if it is > >>called by > >> > > >>the > >> > > >>ioThread itself the stop call will fail). Hence we came up with > >>the > >> > flag > >> > > >>approach to let the sender thread to close as soon as it is at the > >> > > >>barrier > >> > > >>of the run loop. > >> > > >> > >> > > >>Guozhang > >> > > >> > >> > > >>On Mon, Mar 16, 2015 at 9:41 PM, Jun Rao <j...@confluent.io> > wrote: > >> > > >> > >> > > >>> Hmm, does that mean that after close(0), the sender thread is > >>not > >> > > >>>necessary > >> > > >>> gone? Normally, after closing an entity, we expect all internal > >> > threads > >> > > >>> associated with the entity are shut down completely. > >> > > >>> > >> > > >>> Thanks, > >> > > >>> > >> > > >>> Jun > >> > > >>> > >> > > >>> On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin > >> > > >>><j...@linkedin.com.invalid> > >> > > >>> wrote: > >> > > >>> > >> > > >>> > Hi Jun, > >> > > >>> > > >> > > >>> > Close(0) will set two flags in sender. Running=false and a > >>newly > >> > > >>>added > >> > > >>> > forceClose=true. It will also set accumulator.closed=true so > >>no > >> > > >>>further > >> > > >>> > producer.send() will succeed. > >> > > >>> > The sender thread will finish executing all the callbacks in > >> > current > >> > > >>> batch > >> > > >>> > of responses, then it will see the forceClose flag. It will > >>just > >> > fail > >> > > >>>all > >> > > >>> > the incomplete batches in the producer and exit. > >> > > >>> > So close(0) is a non-blocking call and sender thread will not > >>try > >> > to > >> > > >>>join > >> > > >>> > itself in close(0). > >> > > >>> > > >> > > >>> > Thanks. > >> > > >>> > > >> > > >>> > Jiangjie (Becket) Qin > >> > > >>> > > >> > > >>> > On 3/16/15, 2:50 PM, "Jun Rao" <j...@confluent.io> wrote: > >> > > >>> > > >> > > >>> > >How does close(0) work if it's called from the sender > >>thread? If > >> > > >>> close(0) > >> > > >>> > >needs to wait for the sender thread to join, wouldn't this > >>cause a > >> > > >>> > >deadlock? > >> > > >>> > > > >> > > >>> > >Thanks, > >> > > >>> > > > >> > > >>> > >Jun > >> > > >>> > > > >> > > >>> > >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin > >> > > >>><j...@linkedin.com.invalid > >> > > >>> > > >> > > >>> > >wrote: > >> > > >>> > > > >> > > >>> > >> Thanks Guozhang. It wouldn’t be as thoroughly considered > >>without > >> > > >>> > >> discussing with you :) > >> > > >>> > >> > >> > > >>> > >> Jiangjie (Becket) Qin > >> > > >>> > >> > >> > > >>> > >> On 3/16/15, 1:07 PM, "Guozhang Wang" <wangg...@gmail.com> > >> > wrote: > >> > > >>> > >> > >> > > >>> > >> >Thanks Jiangjie, > >> > > >>> > >> > > >> > > >>> > >> >After talking to you offline on this, I have been > >>convinced and > >> > > >>> > >>changed my > >> > > >>> > >> >preference to blocking. The immediate shutdown approach > >>does > >> > have > >> > > >>> some > >> > > >>> > >> >unsafeness in some cases. > >> > > >>> > >> > > >> > > >>> > >> >Guozhang > >> > > >>> > >> > > >> > > >>> > >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin > >> > > >>> > >><j...@linkedin.com.invalid > >> > > >>> > >> > > >> > > >>> > >> >wrote: > >> > > >>> > >> > > >> > > >>> > >> >> It looks that the problem we want to solve and the > >>purpose we > >> > > >>>want > >> > > >>> to > >> > > >>> > >> >> achieve is: > >> > > >>> > >> >> If user uses close() in callback, we want to let user be > >> > aware > >> > > >>>that > >> > > >>> > >>they > >> > > >>> > >> >> should use close(0) instead of close() in the callback. > >> > > >>> > >> >> > >> > > >>> > >> >> We have agreed that we will have an error log to inform > >>user > >> > > >>>about > >> > > >>> > >>this > >> > > >>> > >> >> mis-usage. The options differ in the way how we can > >>force > >> > user > >> > > >>>to > >> > > >>> > >>take a > >> > > >>> > >> >> look at that error log. > >> > > >>> > >> >> There are two scenarios: > >> > > >>> > >> >> 1. User does not expect the program to exit. > >> > > >>> > >> >> 2. User expect the program to exit. > >> > > >>> > >> >> > >> > > >>> > >> >> For scenario 1), blocking will probably delay the > >>discovery > >> > of > >> > > >>>the > >> > > >>> > >> >> problem. Calling close(0) exposes the problem quicker. > >>In > >> > this > >> > > >>> > >>scenario > >> > > >>> > >> >> producer just encounter a send failure when running > >>normally. > >> > > >>> > >> >> For scenario 2), blocking will expose the problem quick. > >> > > >>>Calling > >> > > >>> > >> >>close(-1) > >> > > >>> > >> >> might hide the problem. This scenario might include: a) > >>Unit > >> > > >>>test > >> > > >>> > >>for a > >> > > >>> > >> >> send failure. b) Message sending during a close() call > >>from a > >> > > >>>user > >> > > >>> > >> >>thread. > >> > > >>> > >> >> > >> > > >>> > >> >> So as a summary table: > >> > > >>> > >> >> > >> > > >>> > >> >> Scenario 1) > >> > Scenario > >> > > >>>2) > >> > > >>> > >> >> > >> > > >>> > >> >> Blocking Delay problem discovery Guaranteed > >> > > >>>problem > >> > > >>> > >> >>discovery > >> > > >>> > >> >> > >> > > >>> > >> >> Close(-1) Immediate problem discovery Problem > >>might > >> > be > >> > > >>> hidden > >> > > >>> > >> >> > >> > > >>> > >> >> > >> > > >>> > >> >> Personally I prefer blocking because it seems providing > >>more > >> > > >>> > >>guarantees > >> > > >>> > >> >> and safer. > >> > > >>> > >> >> > >> > > >>> > >> >> Thanks. > >> > > >>> > >> >> > >> > > >>> > >> >> Jiangjie (Becket) Qin > >> > > >>> > >> >> > >> > > >>> > >> >> > >> > > >>> > >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" > >><wangg...@gmail.com> > >> > > >>>wrote: > >> > > >>> > >> >> > >> > > >>> > >> >> >HI Jiangjie, > >> > > >>> > >> >> > > >> > > >>> > >> >> >As far as I understand calling close() in the ioThread > >>is > >> > not > >> > > >>> > >>common, > >> > > >>> > >> >>as > >> > > >>> > >> >> >it > >> > > >>> > >> >> >may only trigger when we saw some non-retriable error. > >>Hence > >> > > >>>when > >> > > >>> > >>user > >> > > >>> > >> >>run > >> > > >>> > >> >> >their program it is unlikely that close() will be > >>triggered > >> > > >>>and > >> > > >>> > >>problem > >> > > >>> > >> >> >will be detected. So it seems to me that from the error > >> > > >>>detection > >> > > >>> > >> >>aspect > >> > > >>> > >> >> >these two options seems to be the same as people will > >> > usually > >> > > >>> > >>detect it > >> > > >>> > >> >> >from the producer metrics all dropping to 0. > >> > > >>> > >> >> > > >> > > >>> > >> >> >Guozhang > >> > > >>> > >> >> > > >> > > >>> > >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin > >> > > >>> > >> >><j...@linkedin.com.invalid> > >> > > >>> > >> >> >wrote: > >> > > >>> > >> >> > > >> > > >>> > >> >> >> It seems there are two options we can choose from > >>when > >> > > >>>close() > >> > > >>> is > >> > > >>> > >> >>called > >> > > >>> > >> >> >> from sender thread (callback): > >> > > >>> > >> >> >> 1. Log an error and close the producer using > >>close(-1) > >> > > >>> > >> >> >> 2. Log an error and block. > >> > > >>> > >> >> >> (Throwing an exception will not work because we > >>catch all > >> > > >>>the > >> > > >>> > >> >>exception > >> > > >>> > >> >> >> thrown from user callback. It will just lead to an > >>error > >> > > >>>log.) > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> My concern for the first option is that the producer > >>will > >> > be > >> > > >>> > >>closed > >> > > >>> > >> >>even > >> > > >>> > >> >> >> if we logged and error. I am wondering if some user > >>would > >> > > >>>not > >> > > >>> even > >> > > >>> > >> >>take > >> > > >>> > >> >> >>a > >> > > >>> > >> >> >> look at the log if producer is closed normally. > >>Because > >> > from > >> > > >>>the > >> > > >>> > >> >> >>programs > >> > > >>> > >> >> >> behavior, everything looks good. If that is the > >>case, the > >> > > >>>error > >> > > >>> > >> >>message > >> > > >>> > >> >> >>we > >> > > >>> > >> >> >> logged probably will just be ignored until some day > >>when > >> > > >>>people > >> > > >>> > >>check > >> > > >>> > >> >> >>the > >> > > >>> > >> >> >> log and see it. > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> As for the second option, because producer does not > >>close > >> > > >>>but > >> > > >>> > >>blocks. > >> > > >>> > >> >> >>User > >> > > >>> > >> >> >> will notice this the first time they run the > >>program. They > >> > > >>> > >>probably > >> > > >>> > >> >>will > >> > > >>> > >> >> >> look at the log to see why producer could not be > >>closed > >> > and > >> > > >>>they > >> > > >>> > >>will > >> > > >>> > >> >> >>see > >> > > >>> > >> >> >> the error log we put there. So they will get informed > >> > about > >> > > >>>this > >> > > >>> > >> >> >>mis-usage > >> > > >>> > >> >> >> of close() in sender thread the first time they run > >>the > >> > code > >> > > >>> > >>instead > >> > > >>> > >> >>of > >> > > >>> > >> >> >> some time later. > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> Personally I prefer the second one because it is more > >> > > >>>obvious > >> > > >>> that > >> > > >>> > >> >> >> something was wrong. > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> Jiangjie (Becket) Qin > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" > >><wangg...@gmail.com> > >> > > >>> wrote: > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> >Yeah I agree we should not silently change the > >>behavior > >> > of > >> > > >>>the > >> > > >>> > >> >>function > >> > > >>> > >> >> >> >with the given parameters; and I would prefer > >> > > >>> > >> >> >>error-logging-and-shutdown > >> > > >>> > >> >> >> >over blocking when close(>0) is used, since as Neha > >> > > >>>suggested > >> > > >>> > >> >>blocking > >> > > >>> > >> >> >> >would also not proceed with sending any data, bu > >>will > >> > just > >> > > >>>let > >> > > >>> > >> >>users to > >> > > >>> > >> >> >> >realize the issue later than sooner. > >> > > >>> > >> >> >> > > >> > > >>> > >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede > >> > > >>> > >><n...@confluent.io> > >> > > >>> > >> >> >>wrote: > >> > > >>> > >> >> >> > > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > And I also agree it is better if we can make > >>producer > >> > > >>>block > >> > > >>> > >>when > >> > > >>> > >> >> >> >> > close() is called from sender thread so user > >>will > >> > > >>>notice > >> > > >>> > >> >>something > >> > > >>> > >> >> >> >>went > >> > > >>> > >> >> >> >> > wrong. > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> This isn't a great experience either. Why can't > >>we just > >> > > >>>throw > >> > > >>> > >>an > >> > > >>> > >> >> >> >>exception > >> > > >>> > >> >> >> >> for a behavior we know is incorrect and we'd like > >>the > >> > > >>>user to > >> > > >>> > >> >>know. > >> > > >>> > >> >> >> >> Blocking as a means of doing that seems wrong and > >> > > >>>annoying. > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps > >> > > >>> > >><jay.kr...@gmail.com> > >> > > >>> > >> >> >> wrote: > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> > Cool. > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > I think blocking is good or alternately > >>throwing an > >> > > >>> exception > >> > > >>> > >> >> >>directly > >> > > >>> > >> >> >> >> from > >> > > >>> > >> >> >> >> > close(). Basically I would just worry about > >>subtly > >> > > >>>doing > >> > > >>> > >> >>something > >> > > >>> > >> >> >> >> slightly > >> > > >>> > >> >> >> >> > different from what the user asked for as it > >>will be > >> > > >>>hard > >> > > >>> to > >> > > >>> > >> >>notice > >> > > >>> > >> >> >> >>that > >> > > >>> > >> >> >> >> > behavior difference. > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > -Jay > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin > >> > > >>> > >> >> >> >><j...@linkedin.com.invalid > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > wrote: > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > > Hi Jay, > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > I have modified the KIP as you suggested. I > >>thinks > >> > as > >> > > >>> long > >> > > >>> > >>as > >> > > >>> > >> >>we > >> > > >>> > >> >> >> >>have > >> > > >>> > >> >> >> >> > > consistent define for timeout across Kafka > >> > interface, > >> > > >>> there > >> > > >>> > >> >>would > >> > > >>> > >> >> >> >>be no > >> > > >>> > >> >> >> >> > > problem. And I also agree it is better if we > >>can > >> > make > >> > > >>> > >>producer > >> > > >>> > >> >> >>block > >> > > >>> > >> >> >> >> when > >> > > >>> > >> >> >> >> > > close() is called from sender thread so user > >>will > >> > > >>>notice > >> > > >>> > >> >> >>something > >> > > >>> > >> >> >> >>went > >> > > >>> > >> >> >> >> > > wrong. > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > Thanks. > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > Jiangjie (Becket) Qin > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" > >> > > >>><jay.kr...@gmail.com> > >> > > >>> > >> wrote: > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > >Hey Jiangjie, > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >I think this is going to be very confusing > >>that > >> > > >>> > >> >> >> >> > > > close(0) waits indefinitely and > >> > > >>> > >> >> >> >> > > > close(-1) waits for 0. > >> > > >>> > >> >> >> >> > > >I understand this appears in other apis, but > >>it > >> > is a > >> > > >>> > >>constant > >> > > >>> > >> >> >> >>cause of > >> > > >>> > >> >> >> >> > > >bugs. Let's not repeat that mistake. > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >Let's make close(0) wait for 0. We don't > >>need a > >> > way > >> > > >>>to > >> > > >>> > >>wait > >> > > >>> > >> >> >> >> indefinitely > >> > > >>> > >> >> >> >> > > >as > >> > > >>> > >> >> >> >> > > >we already have close() so having a magical > >> > constant > >> > > >>>for > >> > > >>> > >> >>that is > >> > > >>> > >> >> >> >> > > >redundant. > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >Calling close() from the I/O thread was > >>already > >> > > >>>possible > >> > > >>> > >>and > >> > > >>> > >> >> >>would > >> > > >>> > >> >> >> >> block > >> > > >>> > >> >> >> >> > > >indefinitely. I think trying to silently > >>change > >> > the > >> > > >>> > >>behavior > >> > > >>> > >> >>is > >> > > >>> > >> >> >> >> probably > >> > > >>> > >> >> >> >> > > >not right. I.e. if the user calls close() in > >>the > >> > > >>> callback > >> > > >>> > >> >>there > >> > > >>> > >> >> >>is > >> > > >>> > >> >> >> >> > > >actually > >> > > >>> > >> >> >> >> > > >some misunderstanding and they need to think > >>more, > >> > > >>> > >>silently > >> > > >>> > >> >> >>making > >> > > >>> > >> >> >> >> this > >> > > >>> > >> >> >> >> > > >not > >> > > >>> > >> >> >> >> > > >block will hide the problem from them which > >>is the > >> > > >>> > >>opposite > >> > > >>> > >> >>of > >> > > >>> > >> >> >> >>what we > >> > > >>> > >> >> >> >> > > >want. > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >-Jay > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin > >> > > >>> > >> >> >> >> <j...@linkedin.com.invalid > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > >wrote: > >> > > >>> > >> >> >> >> > > > > >> > > >>> > >> >> >> >> > > >> Hey Joe & Jay, > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> Thanks for the comments on the voting > >>thread. > >> > > >>>Since it > >> > > >>> > >> >>seems > >> > > >>> > >> >> >>we > >> > > >>> > >> >> >> >> > probably > >> > > >>> > >> >> >> >> > > >> will have more discussion on this, I am > >>just > >> > > >>>replying > >> > > >>> > >>from > >> > > >>> > >> >>the > >> > > >>> > >> >> >> >> > > >>discussion > >> > > >>> > >> >> >> >> > > >> thread here. > >> > > >>> > >> >> >> >> > > >> I’ve updated the KIP page to make it less > >>like > >> > > >>> > >>half-baked, > >> > > >>> > >> >> >> >>apologize > >> > > >>> > >> >> >> >> > for > >> > > >>> > >> >> >> >> > > >> the rush... > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> The contract in current KIP is: > >> > > >>> > >> >> >> >> > > >> 1. close() - wait until all requests > >>either > >> > are > >> > > >>>sent > >> > > >>> > >>or > >> > > >>> > >> >> >>reach > >> > > >>> > >> >> >> >> > request > >> > > >>> > >> >> >> >> > > >> timeout. > >> > > >>> > >> >> >> >> > > >> 2. close(-1, TimeUnit.MILLISECONDS) - > >>close > >> > > >>> > >>immediately > >> > > >>> > >> >> >> >> > > >> 3. close(0, TimeUnit.MILLISECONDS) - > >> > equivalent > >> > > >>>to > >> > > >>> > >> >>close(), > >> > > >>> > >> >> >> >>i.e. > >> > > >>> > >> >> >> >> > Wait > >> > > >>> > >> >> >> >> > > >> until all requests are sent or reach > >>request > >> > > >>>timeout > >> > > >>> > >> >> >> >> > > >> 4. close(5, TimeUnit.MILLISECONDS) - try > >>the > >> > > >>>best to > >> > > >>> > >> >>finish > >> > > >>> > >> >> >> >> sending > >> > > >>> > >> >> >> >> > > >>in 5 > >> > > >>> > >> >> >> >> > > >> milliseconds, if something went wrong, just > >> > > >>>shutdown > >> > > >>> the > >> > > >>> > >> >> >>producer > >> > > >>> > >> >> >> >> > > >>anyway, > >> > > >>> > >> >> >> >> > > >> my callback will handle the failures. > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> About how we define what timeout value > >>stands > >> > for, > >> > > >>>I > >> > > >>> > >> >>actually > >> > > >>> > >> >> >> >> > struggled > >> > > >>> > >> >> >> >> > > >>a > >> > > >>> > >> >> >> >> > > >> little bit when wrote the patch. > >>Intuitively, > >> > > >>>close(0) > >> > > >>> > >> >>should > >> > > >>> > >> >> >> >>mean > >> > > >>> > >> >> >> >> > > >> immediately, however it seems that all the > >> > > >>>existing > >> > > >>> java > >> > > >>> > >> >>class > >> > > >>> > >> >> >> >>have > >> > > >>> > >> >> >> >> > this > >> > > >>> > >> >> >> >> > > >> convention of timeout=0 means no timeout or > >> > never > >> > > >>> > >>timeout > >> > > >>> > >> >> >> >> > > >>(Thread.join(0), > >> > > >>> > >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma > >>is > >> > > >>>either we > >> > > >>> > >> >>follow > >> > > >>> > >> >> >>the > >> > > >>> > >> >> >> >> > > >> intuition or we follow the convention. > >>What I > >> > > >>>chose is > >> > > >>> > >>to > >> > > >>> > >> >> >>follow > >> > > >>> > >> >> >> >>the > >> > > >>> > >> >> >> >> > > >> convention but document the interface to > >>let > >> > user > >> > > >>>be > >> > > >>> > >>aware > >> > > >>> > >> >>of > >> > > >>> > >> >> >>the > >> > > >>> > >> >> >> >> > usage. > >> > > >>> > >> >> >> >> > > >> The reason is that I think > >>producer.close() is a > >> > > >>> public > >> > > >>> > >> >> >> >>interface so > >> > > >>> > >> >> >> >> > it > >> > > >>> > >> >> >> >> > > >> might be better to follow java convention. > >> > Whereas > >> > > >>> > >> >>selector is > >> > > >>> > >> >> >> >>not a > >> > > >>> > >> >> >> >> > > >> public interface that used by user, so as > >>long > >> > as > >> > > >>>it > >> > > >>> > >>makes > >> > > >>> > >> >> >>sense > >> > > >>> > >> >> >> >>to > >> > > >>> > >> >> >> >> > us, > >> > > >>> > >> >> >> >> > > >>it > >> > > >>> > >> >> >> >> > > >> is less a problem to be different from java > >> > > >>> convention. > >> > > >>> > >> >>That > >> > > >>> > >> >> >>said > >> > > >>> > >> >> >> >> > since > >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout) is also a public > >> > interface, > >> > > >>>I > >> > > >>> > >>think > >> > > >>> > >> >>it > >> > > >>> > >> >> >> >>also > >> > > >>> > >> >> >> >> > makes > >> > > >>> > >> >> >> >> > > >> sense to make producer.close() to have the > >>same > >> > > >>> > >>definition > >> > > >>> > >> >>of > >> > > >>> > >> >> >> >> > > >> consumer.poll(timeout). > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> The main argument for keeping a timeout in > >>close > >> > > >>>would > >> > > >>> > >>be > >> > > >>> > >> >> >> >>separating > >> > > >>> > >> >> >> >> > the > >> > > >>> > >> >> >> >> > > >> close timeout from request timeout, which > >> > probably > >> > > >>> makes > >> > > >>> > >> >> >>sense. I > >> > > >>> > >> >> >> >> > would > >> > > >>> > >> >> >> >> > > >> guess typically the request timeout would > >>be > >> > long > >> > > >>> (e.g. > >> > > >>> > >>60 > >> > > >>> > >> >> >> >>seconds) > >> > > >>> > >> >> >> >> > > >> because we might want to consider retries > >>with > >> > > >>>back > >> > > >>> off > >> > > >>> > >> >>time. > >> > > >>> > >> >> >>If > >> > > >>> > >> >> >> >>we > >> > > >>> > >> >> >> >> > have > >> > > >>> > >> >> >> >> > > >> multiple batches in accumulator, in worst > >>case > >> > > >>>that > >> > > >>> > >>could > >> > > >>> > >> >>take > >> > > >>> > >> >> >> >>up to > >> > > >>> > >> >> >> >> > > >> several minutes to complete all the > >>requests. > >> > But > >> > > >>>when > >> > > >>> > >>we > >> > > >>> > >> >> >>close a > >> > > >>> > >> >> >> >> > > >> producer, we might not want to wait for > >>that > >> > long > >> > > >>>as > >> > > >>> it > >> > > >>> > >> >>might > >> > > >>> > >> >> >> >>cause > >> > > >>> > >> >> >> >> > some > >> > > >>> > >> >> >> >> > > >> other problem like deployment tool timeout. > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> There is also a subtle difference between > >> > > >>> close(timeout) > >> > > >>> > >> >>and > >> > > >>> > >> >> >> >> > > >> flush(timeout). The only purpose for > >>flush() is > >> > to > >> > > >>> write > >> > > >>> > >> >>data > >> > > >>> > >> >> >>to > >> > > >>> > >> >> >> >>the > >> > > >>> > >> >> >> >> > > >> broker, so it makes perfect sense to wait > >>until > >> > > >>> request > >> > > >>> > >> >> >>timeout. > >> > > >>> > >> >> >> >>I > >> > > >>> > >> >> >> >> > think > >> > > >>> > >> >> >> >> > > >> that is why flush(timeout) looks strange. > >>On the > >> > > >>>other > >> > > >>> > >> >>hand, > >> > > >>> > >> >> >>the > >> > > >>> > >> >> >> >>top > >> > > >>> > >> >> >> >> > > >> priority for close() is to close the > >>producer > >> > > >>>rather > >> > > >>> > >>than > >> > > >>> > >> >> >>flush() > >> > > >>> > >> >> >> >> > data, > >> > > >>> > >> >> >> >> > > >>so > >> > > >>> > >> >> >> >> > > >> close(timeout) gives guarantee on bounded > >> > waiting > >> > > >>>for > >> > > >>> > >>its > >> > > >>> > >> >>main > >> > > >>> > >> >> >> >>job. > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> Sorry for the confusion about forceClose > >>flag. > >> > It > >> > > >>>is > >> > > >>> > >>not a > >> > > >>> > >> >> >>public > >> > > >>> > >> >> >> >> > > >> interface. I mentioned it in Proposed > >>Changes > >> > > >>>section > >> > > >>> > >> >>which I > >> > > >>> > >> >> >> >> thought > >> > > >>> > >> >> >> >> > > >>was > >> > > >>> > >> >> >> >> > > >> supposed to provide implementation details. > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> Thanks again for all the comments and > >> > suggestions! > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> Jiangjie (Becket) Qin > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" < > >> > > >>> j...@linkedin.com> > >> > > >>> > >> >> wrote: > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> >The KIP page has been updated per Jay¹s > >> > comments. > >> > > >>> > >> >> >> >> > > >> >I¹d like to initiate the voting process > >>if no > >> > > >>>further > >> > > >>> > >> >> >>comments > >> > > >>> > >> >> >> >>are > >> > > >>> > >> >> >> >> > > >> >received by tomorrow. > >> > > >>> > >> >> >> >> > > >> > > >> > > >>> > >> >> >> >> > > >> >Jiangjie (Becket) Qin > >> > > >>> > >> >> >> >> > > >> > > >> > > >>> > >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" > >> > > >>><jay.kr...@gmail.com > >> > > >>> > > >> > > >>> > >> >>wrote: > >> > > >>> > >> >> >> >> > > >> > > >> > > >>> > >> >> >> >> > > >> >>Hey Jiangjie, > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >>Can you capture the full motivation and > >>use > >> > > >>>cases > >> > > >>> for > >> > > >>> > >>the > >> > > >>> > >> >> >> >>feature? > >> > > >>> > >> >> >> >> > > >>This > >> > > >>> > >> >> >> >> > > >> >>mentions your interest in having a way of > >> > > >>>aborting > >> > > >>> > >>from > >> > > >>> > >> >> >>inside > >> > > >>> > >> >> >> >>the > >> > > >>> > >> >> >> >> > > >> >>Callback. But it doesn't really explain > >>that > >> > > >>>usage > >> > > >>> or > >> > > >>> > >>why > >> > > >>> > >> >> >>other > >> > > >>> > >> >> >> >> > people > >> > > >>> > >> >> >> >> > > >> >>would want to do that. It also doesn't > >>list > >> > the > >> > > >>> > >>primary > >> > > >>> > >> >>use > >> > > >>> > >> >> >> >>case > >> > > >>> > >> >> >> >> for > >> > > >>> > >> >> >> >> > > >> >>having > >> > > >>> > >> >> >> >> > > >> >>close with a bounded timeout which was to > >> > avoid > >> > > >>> > >>blocking > >> > > >>> > >> >>too > >> > > >>> > >> >> >> >>long > >> > > >>> > >> >> >> >> on > >> > > >>> > >> >> >> >> > > >> >>shutdown. > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >>-Jay > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, > >>Jiangjie Qin > >> > > >>> > >> >> >> >> > > >><j...@linkedin.com.invalid > >> > > >>> > >> >> >> >> > > >> > > >> > > >>> > >> >> >> >> > > >> >>wrote: > >> > > >>> > >> >> >> >> > > >> >> > >> > > >>> > >> >> >> >> > > >> >>> Hi, > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> >>> I just created a KIP for adding a > >> > > >>>close(timeout) > >> > > >>> to > >> > > >>> > >>new > >> > > >>> > >> >> >> >> producer. > >> > > >>> > >> >> >> >> > > >>Most > >> > > >>> > >> >> >> >> > > >> >>>of > >> > > >>> > >> >> >> >> > > >> >>> the previous discussions are in > >>KAFKA-1660 > >> > > >>>where > >> > > >>> > >>Parth > >> > > >>> > >> >> >> >> Brahmbhatt > >> > > >>> > >> >> >> >> > > >>has > >> > > >>> > >> >> >> >> > > >> >>> already done a lot of work. > >> > > >>> > >> >> >> >> > > >> >>> Since this is an interface change so > >>we are > >> > > >>>going > >> > > >>> > >> >>through > >> > > >>> > >> >> >>the > >> > > >>> > >> >> >> >> KIP > >> > > >>> > >> >> >> >> > > >> >>>process. > >> > > >>> > >> >> >> >> > > >> >>> Here is the KIP link: > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> > >> > > >>> > >> >> > >> > > >>> > >> > >> > > >>> > > >> > > >>> > >> > > > >>https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978 > >> > > >>> > >> >> >> >> > > >> >>>2 > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> >>> Thanks. > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> >>> Jiangjie (Becket) Qin > >> > > >>> > >> >> >> >> > > >> >>> > >> > > >>> > >> >> >> >> > > >> > > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > >> > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > > >> > > >>> > >> >> >> >> > > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> >> -- > >> > > >>> > >> >> >> >> Thanks, > >> > > >>> > >> >> >> >> Neha > >> > > >>> > >> >> >> >> > >> > > >>> > >> >> >> > > >> > > >>> > >> >> >> > > >> > > >>> > >> >> >> > > >> > > >>> > >> >> >> >-- > >> > > >>> > >> >> >> >-- Guozhang > >> > > >>> > >> >> >> > >> > > >>> > >> >> >> > >> > > >>> > >> >> > > >> > > >>> > >> >> > > >> > > >>> > >> >> >-- > >> > > >>> > >> >> >-- Guozhang > >> > > >>> > >> >> > >> > > >>> > >> >> > >> > > >>> > >> > > >> > > >>> > >> > > >> > > >>> > >> >-- > >> > > >>> > >> >-- Guozhang > >> > > >>> > >> > >> > > >>> > >> > >> > > >>> > > >> > > >>> > > >> > > >>> > >> > > >> > >> > > >> > >> > > >> > >> > > >>-- > >> > > >>-- Guozhang > >> > > > > >> > > > >> > > > >> > > >> > >> > >> > >> -- > >> Thanks, > >> Neha > > > >