Re: [DISCUSSION] KIP-15 close(timeout) for producer

Jun Rao Mon, 16 Mar 2015 21:43:08 -0700

Hmm, does that mean that after close(0), the sender thread is not necessary
gone? Normally, after closing an entity, we expect all internal threads
associated with the entity are shut down completely.


Thanks,

Jun

On Mon, Mar 16, 2015 at 3:18 PM, Jiangjie Qin <[email protected]>
wrote:

> Hi Jun,
>
> Close(0) will set two flags in sender. Running=false and a newly added
> forceClose=true. It will also set accumulator.closed=true so no further
> producer.send() will succeed.
> The sender thread will finish executing all the callbacks in current batch
> of responses, then it will see the forceClose flag. It will just fail all
> the incomplete batches in the producer and exit.
> So close(0) is a non-blocking call and sender thread will not try to join
> itself in close(0).
>
> Thanks.
>
> Jiangjie (Becket) Qin
>
> On 3/16/15, 2:50 PM, "Jun Rao" <[email protected]> wrote:
>
> >How does close(0) work if it's called from the sender thread? If close(0)
> >needs to wait for the sender thread to join, wouldn't this cause a
> >deadlock?
> >
> >Thanks,
> >
> >Jun
> >
> >On Mon, Mar 16, 2015 at 2:26 PM, Jiangjie Qin <[email protected]>
> >wrote:
> >
> >> Thanks Guozhang. It wouldn’t be as thoroughly considered without
> >> discussing with you :)
> >>
> >> Jiangjie (Becket) Qin
> >>
> >> On 3/16/15, 1:07 PM, "Guozhang Wang" <[email protected]> wrote:
> >>
> >> >Thanks Jiangjie,
> >> >
> >> >After talking to you offline on this, I have been convinced and
> >>changed my
> >> >preference to blocking. The immediate shutdown approach does have some
> >> >unsafeness in some cases.
> >> >
> >> >Guozhang
> >> >
> >> >On Mon, Mar 16, 2015 at 11:50 AM, Jiangjie Qin
> >><[email protected]
> >> >
> >> >wrote:
> >> >
> >> >> It looks that the problem we want to solve and the purpose we want to
> >> >> achieve is:
> >> >> If user uses close() in callback, we want to let user be aware that
> >>they
> >> >> should use close(0) instead of close() in the callback.
> >> >>
> >> >> We have agreed that we will have an error log to inform user about
> >>this
> >> >> mis-usage. The options differ in the way how we can force user to
> >>take a
> >> >> look at that error log.
> >> >> There are two scenarios:
> >> >> 1. User does not expect the program to exit.
> >> >> 2. User expect the program to exit.
> >> >>
> >> >> For scenario 1), blocking will probably delay the discovery of the
> >> >> problem. Calling close(0) exposes the problem quicker. In this
> >>scenario
> >> >> producer just encounter a send failure when running normally.
> >> >> For scenario 2), blocking will expose the problem quick. Calling
> >> >>close(-1)
> >> >> might hide the problem. This scenario might include: a) Unit test
> >>for a
> >> >> send failure. b) Message sending during a close() call from a user
> >> >>thread.
> >> >>
> >> >> So as a summary table:
> >> >>
> >> >>                   Scenario 1)                         Scenario 2)
> >> >>
> >> >> Blocking      Delay problem discovery         Guaranteed problem
> >> >>discovery
> >> >>
> >> >> Close(-1)     Immediate problem discovery     Problem might be hidden
> >> >>
> >> >>
> >> >> Personally I prefer blocking because it seems providing more
> >>guarantees
> >> >> and safer.
> >> >>
> >> >> Thanks.
> >> >>
> >> >> Jiangjie (Becket) Qin
> >> >>
> >> >>
> >> >> On 3/16/15, 10:11 AM, "Guozhang Wang" <[email protected]> wrote:
> >> >>
> >> >> >HI Jiangjie,
> >> >> >
> >> >> >As far as I understand calling close() in the ioThread is not
> >>common,
> >> >>as
> >> >> >it
> >> >> >may only trigger when we saw some non-retriable error. Hence when
> >>user
> >> >>run
> >> >> >their program it is unlikely that close() will be triggered and
> >>problem
> >> >> >will be detected. So it seems to me that from the error detection
> >> >>aspect
> >> >> >these two options seems to be the same as people will usually
> >>detect it
> >> >> >from the producer metrics all dropping to 0.
> >> >> >
> >> >> >Guozhang
> >> >> >
> >> >> >On Mon, Mar 16, 2015 at 9:52 AM, Jiangjie Qin
> >> >><[email protected]>
> >> >> >wrote:
> >> >> >
> >> >> >> It seems there are two options we can choose from when close() is
> >> >>called
> >> >> >> from sender thread (callback):
> >> >> >> 1. Log an error and close the producer using close(-1)
> >> >> >> 2. Log an error and block.
> >> >> >> (Throwing an exception will not work because we catch all the
> >> >>exception
> >> >> >> thrown from user callback. It will just lead to an error log.)
> >> >> >>
> >> >> >> My concern for the first option is that the producer will be
> >>closed
> >> >>even
> >> >> >> if we logged and error. I am wondering if some user would not even
> >> >>take
> >> >> >>a
> >> >> >> look at the log if producer is closed normally. Because from the
> >> >> >>programs
> >> >> >> behavior, everything looks good. If that is the case, the error
> >> >>message
> >> >> >>we
> >> >> >> logged probably will just be ignored until some day when people
> >>check
> >> >> >>the
> >> >> >> log and see it.
> >> >> >>
> >> >> >> As for the second option, because producer does not close but
> >>blocks.
> >> >> >>User
> >> >> >> will notice this the first time they run the program. They
> >>probably
> >> >>will
> >> >> >> look at the log to see why producer could not be closed and they
> >>will
> >> >> >>see
> >> >> >> the error log we put there. So they will get informed about this
> >> >> >>mis-usage
> >> >> >> of close() in sender thread the first time they run the code
> >>instead
> >> >>of
> >> >> >> some time later.
> >> >> >>
> >> >> >> Personally I prefer the second one because it is more obvious that
> >> >> >> something was wrong.
> >> >> >>
> >> >> >> Jiangjie (Becket) Qin
> >> >> >>
> >> >> >> On 3/15/15, 4:27 PM, "Guozhang Wang" <[email protected]> wrote:
> >> >> >>
> >> >> >> >Yeah I agree we should not silently change the behavior of the
> >> >>function
> >> >> >> >with the given parameters; and I would prefer
> >> >> >>error-logging-and-shutdown
> >> >> >> >over blocking when close(>0) is used, since as Neha suggested
> >> >>blocking
> >> >> >> >would also not proceed with sending any data, bu will just let
> >> >>users to
> >> >> >> >realize the issue later than sooner.
> >> >> >> >
> >> >> >> >On Sun, Mar 15, 2015 at 3:25 PM, Neha Narkhede
> >><[email protected]>
> >> >> >>wrote:
> >> >> >> >
> >> >> >> >> >
> >> >> >> >> > And I also agree it is better if we can make producer block
> >>when
> >> >> >> >> > close() is called from sender thread so user will notice
> >> >>something
> >> >> >> >>went
> >> >> >> >> > wrong.
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> This isn't a great experience either. Why can't we just throw
> >>an
> >> >> >> >>exception
> >> >> >> >> for a behavior we know is incorrect and we'd like the user to
> >> >>know.
> >> >> >> >> Blocking as a means of doing that seems wrong and annoying.
> >> >> >> >>
> >> >> >> >> On Sun, Mar 15, 2015 at 11:56 AM, Jay Kreps
> >><[email protected]>
> >> >> >> wrote:
> >> >> >> >>
> >> >> >> >> > Cool.
> >> >> >> >> >
> >> >> >> >> > I think blocking is good or alternately throwing an exception
> >> >> >>directly
> >> >> >> >> from
> >> >> >> >> > close(). Basically I would just worry about subtly doing
> >> >>something
> >> >> >> >> slightly
> >> >> >> >> > different from what the user asked for as it will be hard to
> >> >>notice
> >> >> >> >>that
> >> >> >> >> > behavior difference.
> >> >> >> >> >
> >> >> >> >> > -Jay
> >> >> >> >> >
> >> >> >> >> > On Sat, Mar 14, 2015 at 5:48 PM, Jiangjie Qin
> >> >> >> >><[email protected]
> >> >> >> >> >
> >> >> >> >> > wrote:
> >> >> >> >> >
> >> >> >> >> > > Hi Jay,
> >> >> >> >> > >
> >> >> >> >> > > I have modified the KIP as you suggested. I thinks as long
> >>as
> >> >>we
> >> >> >> >>have
> >> >> >> >> > > consistent define for timeout across Kafka interface, there
> >> >>would
> >> >> >> >>be no
> >> >> >> >> > > problem. And I also agree it is better if we can make
> >>producer
> >> >> >>block
> >> >> >> >> when
> >> >> >> >> > > close() is called from sender thread so user will notice
> >> >> >>something
> >> >> >> >>went
> >> >> >> >> > > wrong.
> >> >> >> >> > >
> >> >> >> >> > > Thanks.
> >> >> >> >> > >
> >> >> >> >> > > Jiangjie (Becket) Qin
> >> >> >> >> > >
> >> >> >> >> > > On 3/14/15, 11:37 AM, "Jay Kreps" <[email protected]>
> >> wrote:
> >> >> >> >> > >
> >> >> >> >> > > >Hey Jiangjie,
> >> >> >> >> > > >
> >> >> >> >> > > >I think this is going to be very confusing that
> >> >> >> >> > > >  close(0) waits indefinitely and
> >> >> >> >> > > >  close(-1) waits for 0.
> >> >> >> >> > > >I understand this appears in other apis, but it is a
> >>constant
> >> >> >> >>cause of
> >> >> >> >> > > >bugs. Let's not repeat that mistake.
> >> >> >> >> > > >
> >> >> >> >> > > >Let's make close(0) wait for 0. We don't need a way to
> >>wait
> >> >> >> >> indefinitely
> >> >> >> >> > > >as
> >> >> >> >> > > >we already have close() so having a magical constant for
> >> >>that is
> >> >> >> >> > > >redundant.
> >> >> >> >> > > >
> >> >> >> >> > > >Calling close() from the I/O thread was already possible
> >>and
> >> >> >>would
> >> >> >> >> block
> >> >> >> >> > > >indefinitely. I think trying to silently change the
> >>behavior
> >> >>is
> >> >> >> >> probably
> >> >> >> >> > > >not right. I.e. if the user calls close() in the callback
> >> >>there
> >> >> >>is
> >> >> >> >> > > >actually
> >> >> >> >> > > >some misunderstanding and they need to think more,
> >>silently
> >> >> >>making
> >> >> >> >> this
> >> >> >> >> > > >not
> >> >> >> >> > > >block will hide the problem from them which is the
> >>opposite
> >> >>of
> >> >> >> >>what we
> >> >> >> >> > > >want.
> >> >> >> >> > > >
> >> >> >> >> > > >-Jay
> >> >> >> >> > > >
> >> >> >> >> > > >On Thu, Mar 12, 2015 at 1:49 AM, Jiangjie Qin
> >> >> >> >> <[email protected]
> >> >> >> >> > >
> >> >> >> >> > > >wrote:
> >> >> >> >> > > >
> >> >> >> >> > > >> Hey Joe & Jay,
> >> >> >> >> > > >>
> >> >> >> >> > > >> Thanks for the comments on the voting thread. Since it
> >> >>seems
> >> >> >>we
> >> >> >> >> > probably
> >> >> >> >> > > >> will have more discussion on this, I am just replying
> >>from
> >> >>the
> >> >> >> >> > > >>discussion
> >> >> >> >> > > >> thread here.
> >> >> >> >> > > >> I’ve updated the KIP page to make it less like
> >>half-baked,
> >> >> >> >>apologize
> >> >> >> >> > for
> >> >> >> >> > > >> the rush...
> >> >> >> >> > > >>
> >> >> >> >> > > >> The contract in current KIP is:
> >> >> >> >> > > >>   1. close() - wait until all requests either are sent
> >>or
> >> >> >>reach
> >> >> >> >> > request
> >> >> >> >> > > >> timeout.
> >> >> >> >> > > >>   2. close(-1, TimeUnit.MILLISECONDS) - close
> >>immediately
> >> >> >> >> > > >>   3. close(0, TimeUnit.MILLISECONDS) - equivalent to
> >> >>close(),
> >> >> >> >>i.e.
> >> >> >> >> > Wait
> >> >> >> >> > > >> until all requests are sent or reach request timeout
> >> >> >> >> > > >>   4. close(5, TimeUnit.MILLISECONDS) - try the best to
> >> >>finish
> >> >> >> >> sending
> >> >> >> >> > > >>in 5
> >> >> >> >> > > >> milliseconds, if something went wrong, just shutdown the
> >> >> >>producer
> >> >> >> >> > > >>anyway,
> >> >> >> >> > > >> my callback will handle the failures.
> >> >> >> >> > > >>
> >> >> >> >> > > >> About how we define what timeout value stands for, I
> >> >>actually
> >> >> >> >> > struggled
> >> >> >> >> > > >>a
> >> >> >> >> > > >> little bit when wrote the patch. Intuitively, close(0)
> >> >>should
> >> >> >> >>mean
> >> >> >> >> > > >> immediately, however it seems that all the existing java
> >> >>class
> >> >> >> >>have
> >> >> >> >> > this
> >> >> >> >> > > >> convention of timeout=0 means no timeout or never
> >>timeout
> >> >> >> >> > > >>(Thread.join(0),
> >> >> >> >> > > >> Object.wait(0), etc.) So here the dilemma is either we
> >> >>follow
> >> >> >>the
> >> >> >> >> > > >> intuition or we follow the convention. What I chose is
> >>to
> >> >> >>follow
> >> >> >> >>the
> >> >> >> >> > > >> convention but document the interface to let user be
> >>aware
> >> >>of
> >> >> >>the
> >> >> >> >> > usage.
> >> >> >> >> > > >> The reason is that I think producer.close() is a public
> >> >> >> >>interface so
> >> >> >> >> > it
> >> >> >> >> > > >> might be better to follow java convention. Whereas
> >> >>selector is
> >> >> >> >>not a
> >> >> >> >> > > >> public interface that used by user, so as long as it
> >>makes
> >> >> >>sense
> >> >> >> >>to
> >> >> >> >> > us,
> >> >> >> >> > > >>it
> >> >> >> >> > > >> is less a problem to be different from java convention.
> >> >>That
> >> >> >>said
> >> >> >> >> > since
> >> >> >> >> > > >> consumer.poll(timeout) is also a public interface, I
> >>think
> >> >>it
> >> >> >> >>also
> >> >> >> >> > makes
> >> >> >> >> > > >> sense to make producer.close() to have the same
> >>definition
> >> >>of
> >> >> >> >> > > >> consumer.poll(timeout).
> >> >> >> >> > > >>
> >> >> >> >> > > >> The main argument for keeping a timeout in close would
> >>be
> >> >> >> >>separating
> >> >> >> >> > the
> >> >> >> >> > > >> close timeout from request timeout, which probably makes
> >> >> >>sense. I
> >> >> >> >> > would
> >> >> >> >> > > >> guess typically the request timeout would be long (e.g.
> >>60
> >> >> >> >>seconds)
> >> >> >> >> > > >> because we might want to consider retries with back off
> >> >>time.
> >> >> >>If
> >> >> >> >>we
> >> >> >> >> > have
> >> >> >> >> > > >> multiple batches in accumulator, in worst case that
> >>could
> >> >>take
> >> >> >> >>up to
> >> >> >> >> > > >> several minutes to complete all the requests. But when
> >>we
> >> >> >>close a
> >> >> >> >> > > >> producer, we might not want to wait for that long as it
> >> >>might
> >> >> >> >>cause
> >> >> >> >> > some
> >> >> >> >> > > >> other problem like deployment tool timeout.
> >> >> >> >> > > >>
> >> >> >> >> > > >> There is also a subtle difference between close(timeout)
> >> >>and
> >> >> >> >> > > >> flush(timeout). The only purpose for flush() is to write
> >> >>data
> >> >> >>to
> >> >> >> >>the
> >> >> >> >> > > >> broker, so it makes perfect sense to wait until request
> >> >> >>timeout.
> >> >> >> >>I
> >> >> >> >> > think
> >> >> >> >> > > >> that is why flush(timeout) looks strange. On the other
> >> >>hand,
> >> >> >>the
> >> >> >> >>top
> >> >> >> >> > > >> priority for close() is to close the producer rather
> >>than
> >> >> >>flush()
> >> >> >> >> > data,
> >> >> >> >> > > >>so
> >> >> >> >> > > >> close(timeout) gives guarantee on bounded waiting for
> >>its
> >> >>main
> >> >> >> >>job.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Sorry for the confusion about forceClose flag. It is
> >>not a
> >> >> >>public
> >> >> >> >> > > >> interface. I mentioned it in Proposed Changes section
> >> >>which I
> >> >> >> >> thought
> >> >> >> >> > > >>was
> >> >> >> >> > > >> supposed to provide implementation details.
> >> >> >> >> > > >>
> >> >> >> >> > > >> Thanks again for all the comments and suggestions!
> >> >> >> >> > > >>
> >> >> >> >> > > >> Jiangjie (Becket) Qin
> >> >> >> >> > > >>
> >> >> >> >> > > >> On 3/10/15, 8:57 PM, "Jiangjie Qin" <[email protected]>
> >> >> wrote:
> >> >> >> >> > > >>
> >> >> >> >> > > >> >The KIP page has been updated per Jay¹s comments.
> >> >> >> >> > > >> >I¹d like to initiate the voting process if no further
> >> >> >>comments
> >> >> >> >>are
> >> >> >> >> > > >> >received by tomorrow.
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >Jiangjie (Becket) Qin
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >On 3/8/15, 9:45 AM, "Jay Kreps" <[email protected]>
> >> >>wrote:
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >>Hey Jiangjie,
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>Can you capture the full motivation and use cases for
> >>the
> >> >> >> >>feature?
> >> >> >> >> > > >>This
> >> >> >> >> > > >> >>mentions your interest in having a way of aborting
> >>from
> >> >> >>inside
> >> >> >> >>the
> >> >> >> >> > > >> >>Callback. But it doesn't really explain that usage or
> >>why
> >> >> >>other
> >> >> >> >> > people
> >> >> >> >> > > >> >>would want to do that. It also doesn't list the
> >>primary
> >> >>use
> >> >> >> >>case
> >> >> >> >> for
> >> >> >> >> > > >> >>having
> >> >> >> >> > > >> >>close with a bounded timeout which was to avoid
> >>blocking
> >> >>too
> >> >> >> >>long
> >> >> >> >> on
> >> >> >> >> > > >> >>shutdown.
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>-Jay
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>On Sat, Mar 7, 2015 at 12:25 PM, Jiangjie Qin
> >> >> >> >> > > >><[email protected]
> >> >> >> >> > > >> >
> >> >> >> >> > > >> >>wrote:
> >> >> >> >> > > >> >>
> >> >> >> >> > > >> >>> Hi,
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> I just created a KIP for adding a close(timeout) to
> >>new
> >> >> >> >> producer.
> >> >> >> >> > > >>Most
> >> >> >> >> > > >> >>>of
> >> >> >> >> > > >> >>> the previous discussions are in KAFKA-1660 where
> >>Parth
> >> >> >> >> Brahmbhatt
> >> >> >> >> > > >>has
> >> >> >> >> > > >> >>> already done a lot of work.
> >> >> >> >> > > >> >>> Since this is an interface change so we are going
> >> >>through
> >> >> >>the
> >> >> >> >> KIP
> >> >> >> >> > > >> >>>process.
> >> >> >> >> > > >> >>> Here is the KIP link:
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >>
> >> >> >> >> > >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=5373978
> >> >> >> >> > > >> >>>2
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> Thanks.
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >>> Jiangjie (Becket) Qin
> >> >> >> >> > > >> >>>
> >> >> >> >> > > >> >
> >> >> >> >> > > >>
> >> >> >> >> > > >>
> >> >> >> >> > >
> >> >> >> >> > >
> >> >> >> >> >
> >> >> >> >>
> >> >> >> >>
> >> >> >> >>
> >> >> >> >> --
> >> >> >> >> Thanks,
> >> >> >> >> Neha
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >> >
> >> >> >> >--
> >> >> >> >-- Guozhang
> >> >> >>
> >> >> >>
> >> >> >
> >> >> >
> >> >> >--
> >> >> >-- Guozhang
> >> >>
> >> >>
> >> >
> >> >
> >> >--
> >> >-- Guozhang
> >>
> >>
>
>

Re: [DISCUSSION] KIP-15 close(timeout) for producer

Reply via email to