Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

Jun Rao Fri, 18 Nov 2016 12:37:57 -0800

Hi, Radai,

3. Having a gauge of MemoryAvailable is useful. One issue with that though
is that if one only collects the metrics say every minute, one doesn't know
what has happened in between. We could additionally track the fraction of
the time when requested memory can't be served. Every time a request can't
be honored, we mark the starting time in memory pool. Every time a request
is honored, we end the time. We can then expose that accumulated fraction
of time as a Rate (similar to RequestHandlerAvgIdlePercent
in KafkaRequestHandlerPool). This will be a value between 0 and 1. The
higher the value, the more memory pressure.


Thanks,

Jun

On Fri, Nov 18, 2016 at 8:35 AM, radai <radai.rosenbl...@gmail.com> wrote:

> Hi Jun,
>
> 3. will (also :-) ) do. do you have ideas for appropriate names/metrics?
> I'm thinking along the lines of "MemoryAvailable" (current snapshot value
> from pool) and "Throttles" (some moving-avg of how often does throttling
> due to no mem kicks in). maybe also "BuffersOutstanding" ?
>
> On Thu, Nov 17, 2016 at 7:01 PM, Jun Rao <j...@confluent.io> wrote:
>
> > Hi, Radai,
> >
> > 2. Yes, on the server side, the timeout is hardcoded at 300ms. That's not
> > too bad. We can just leave it as it is.
> >
> > 3. Another thing. Do we plan to expose some JMX metrics so that we can
> > monitor if there is any memory pressure in the pool?
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Nov 17, 2016 at 8:57 AM, radai <radai.rosenbl...@gmail.com>
> wrote:
> >
> > > Hi Jun,
> > >
> > > 1. will do.
> > >
> > > 2. true. for several reasons:
> > >    2.1. which selector? there's a single pool but 16 selectors
> (linkedin
> > > typical, num.network.threads defaults to 3)
> > >    2.2. even if i could figure out which selector (all?) the better
> thing
> > > to do would be resume reading not when any memory becomes available
> > > (because worst case its not enough for anything) but when some "low
> > > watermark" of available memory is hit - so mute when @100% mem, unmute
> > when
> > > back down to 90%?
> > >    2.3. on the broker side (which is the current concern for my patch)
> > this
> > > max wait time is a hardcoded 300 ms (SocketServer.Processor.poll()),
> > which
> > > i think is acceptable and definitely not arbitrary or configurable.
> > >
> > >    if you still think this needs to be addressed (and you are right
> that
> > in
> > > the general case the timeout param could be arbitrary) i can implement
> > the
> > > watermark approach + pool.waitForLowWatermark(timeout) or something,
> and
> > > make Selector.poll() wait for low watermark at the end of poll() if no
> > work
> > > has been done (so as not to wait on memory needlessly for requests that
> > > done require it, as rajini suggested earlier)
> > >
> > > On Wed, Nov 16, 2016 at 9:04 AM, Jun Rao <j...@confluent.io> wrote:
> > >
> > > > Hi, Radai,
> > > >
> > > > Thanks for the updated proposal. +1 overall. A couple of comments
> > below.
> > > >
> > > > 1. Our current convention is to avoid using getters. Could you change
> > > > getSize and getAvailableMemory accordingly? Also, size is bit
> > ambiguous,
> > > > could we use sth like capacity?
> > > >
> > > > 2. This is more on the implementation details. I didn't see any code
> to
> > > > wake up the selector when memory is released from the pool. For
> > example,
> > > > suppose that all socket keys are muted since the pool is full. The
> > > > selector.poll() call will wait for the timeout, which could be
> > > arbitrarily
> > > > long. Now, if some memory is released, it seems that we should wake
> up
> > > the
> > > > selector early instead of waiting for the timeout.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Mon, Nov 14, 2016 at 11:41 AM, Rajini Sivaram <
> > > > rajinisiva...@googlemail.com> wrote:
> > > >
> > > > > +1
> > > > >
> > > > > Thank you for the KIP, Radai.
> > > > >
> > > > > On Mon, Nov 14, 2016 at 6:07 PM, Mickael Maison <
> > > > mickael.mai...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1. We've also been hit by OOMs on the broker because we were not
> > > able
> > > > > > to properly bound its memory usage.
> > > > > >
> > > > > > On Mon, Nov 14, 2016 at 5:56 PM, radai <
> radai.rosenbl...@gmail.com
> > >
> > > > > wrote:
> > > > > > > @rajini - fixed the hasBytesBuffered() method. also updated
> > poll()
> > > so
> > > > > > that
> > > > > > > no latency is added for picking up data stuck in ssl buffers
> > > (timeout
> > > > > is
> > > > > > > set to 0, just like with immediately connected keys and staged
> > > > > receives).
> > > > > > > thank you for pointing these out.
> > > > > > > added ssl (re) testing to the KIP testing plan.
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Nov 14, 2016 at 7:24 AM, Rajini Sivaram <
> > > > > > > rajinisiva...@googlemail.com> wrote:
> > > > > > >
> > > > > > >> Open point 1. I would just retain the current long value that
> > > > > specifies
> > > > > > >> queued.max.bytes as long and not as %heap since it is simple
> and
> > > > easy
> > > > > to
> > > > > > >> use. And keeps it consistent with other ".bytes" configs.
> > > > > > >>
> > > > > > >> Point 3. ssl buffers - I am not quite sure the implementation
> > > looks
> > > > > > >> correct. hasBytesBuffered() is checking position() of buffers
> ==
> > > 0.
> > > > > And
> > > > > > the
> > > > > > >> code checks this only when poll with a timeout returns
> (adding a
> > > > delay
> > > > > > when
> > > > > > >> there is nothing else to read).
> > > > > > >> But since this and open point 2 (optimization) are
> > implementation
> > > > > > details,
> > > > > > >> they can be looked at during PR review.
> > > > > > >>
> > > > > > >> It will be good to add SSL testing to the test plan as well,
> > since
> > > > > > there is
> > > > > > >> additional code to test for SSL.
> > > > > > >>
> > > > > > >>
> > > > > > >> On Fri, Nov 11, 2016 at 9:03 PM, radai <
> > > radai.rosenbl...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> > ok, i've made the following changes:
> > > > > > >> >
> > > > > > >> > 1. memory.pool.class.name has been removed
> > > > > > >> > 2. the code now only uses SimpleMemoryPool. the gc variant
> is
> > > left
> > > > > > >> (unused)
> > > > > > >> > as a developement aid and is unsettable via configuration.
> > > > > > >> > 3. I've resolved the issue of stale data getting stuck in
> > > > > intermediate
> > > > > > >> > (ssl) buffers.
> > > > > > >> > 4. default value for queued.max.bytes is -1, so off by
> > default.
> > > > any
> > > > > > <=0
> > > > > > >> > value is interpreted as off by the underlying code.
> > > > > > >> >
> > > > > > >> > open points:
> > > > > > >> >
> > > > > > >> > 1. the kafka config framework doesnt allow a value to be
> > either
> > > > long
> > > > > > or
> > > > > > >> > double, so in order to pull off the queued.max.bytes =
> 1000000
> > > or
> > > > > > >> > queued.max.bytes = 0.3 thing i'd need to define the config
> as
> > > type
> > > > > > >> string,
> > > > > > >> > which is ugly to me. do we want to support setting
> > > > queued.max.bytes
> > > > > > to %
> > > > > > >> of
> > > > > > >> > heap ? if so, by way of making queued.max.bytes of type
> > string,
> > > or
> > > > > by
> > > > > > way
> > > > > > >> > of a 2nd config param (with the resulting
> > > either/all/combination?
> > > > > > >> > validation). my personal opinion is string because i think a
> > > > single
> > > > > > >> > queued.max.bytes with overloaded meaning is more
> > understandable
> > > to
> > > > > > users.
> > > > > > >> > i'll await other people's opinions before doing anything.
> > > > > > >> > 2. i still need to evaluate rajini's optimization. sounds
> > > doable.
> > > > > > >> >
> > > > > > >> > asides:
> > > > > > >> >
> > > > > > >> > 1. i think you guys misunderstood the intent behind the gc
> > pool.
> > > > it
> > > > > > was
> > > > > > >> > never meant to be a magic pool that automatically releases
> > > buffers
> > > > > > >> (because
> > > > > > >> > just as rajini stated the performance implications would be
> > > > > > horrible). it
> > > > > > >> > was meant to catch leaks early. since that is indeed a
> > dev-only
> > > > > > concern
> > > > > > >> it
> > > > > > >> > wont ever get used in production.
> > > > > > >> > 2. i said this on some other kip discussion: i think the
> nice
> > > > thing
> > > > > > about
> > > > > > >> > the pool API is it "scales" from just keeping a memory bound
> > to
> > > > > > actually
> > > > > > >> > re-using buffers without changing the calling code. i think
> > > > > > >> actuallypooling
> > > > > > >> > large buffers will result in a significant performance
> impact,
> > > but
> > > > > > thats
> > > > > > >> > outside the scope of this kip. at that point i think more
> pool
> > > > > > >> > implementations (that actually pool) would be written. i
> agree
> > > > with
> > > > > > the
> > > > > > >> > ideal of exposing as few knobs as possible, but switching
> > pools
> > > > (or
> > > > > > pool
> > > > > > >> > params) for tuning may happen at some later point.
> > > > > > >> >
> > > > > > >> >
> > > > > > >> >
> > > > > > >> > On Fri, Nov 11, 2016 at 11:44 AM, Rajini Sivaram <
> > > > > > >> > rajinisiva...@googlemail.com> wrote:
> > > > > > >> >
> > > > > > >> > > 13. At the moment, I think channels are not muted if:
> > > > > > >> > >     channel.receive != null && channel.receive.buffer !=
> > null
> > > > > > >> > > This mutes all channels that aren't holding onto a
> > incomplete
> > > > > > buffer.
> > > > > > >> > They
> > > > > > >> > > may or may not have read the 4-byte size.
> > > > > > >> > >
> > > > > > >> > > I was thinking you could avoid muting channels if:
> > > > > > >> > >     channel.receive == null || channel.receive.size.
> > > remaining()
> > > > > > >> > > This will not mute channels that are holding onto a buffer
> > (as
> > > > > > above).
> > > > > > >> In
> > > > > > >> > > addition, it will not mute channels that haven't read the
> > > 4-byte
> > > > > > size.
> > > > > > >> A
> > > > > > >> > > client that is closed gracefully while the pool is full
> will
> > > not
> > > > > be
> > > > > > >> muted
> > > > > > >> > > in this case and the server can process close without
> > waiting
> > > > for
> > > > > > the
> > > > > > >> > pool
> > > > > > >> > > to free up. Once the 4-byte size is read, the channel will
> > be
> > > > > muted
> > > > > > if
> > > > > > >> > the
> > > > > > >> > > pool is still out of memory - for each channel, at most
> one
> > > > failed
> > > > > > read
> > > > > > >> > > attempt would be made while the pool is out of memory. I
> > think
> > > > > this
> > > > > > >> would
> > > > > > >> > > also delay muting of SSL channels since they can continue
> to
> > > > read
> > > > > > into
> > > > > > >> > > their (already allocated) network buffers and unwrap the
> > data
> > > > and
> > > > > > block
> > > > > > >> > > only when they need to allocate a buffer from the pool.
> > > > > > >> > >
> > > > > > >> > > On Fri, Nov 11, 2016 at 6:00 PM, Jay Kreps <
> > j...@confluent.io>
> > > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > > Hey Radai,
> > > > > > >> > > >
> > > > > > >> > > > +1 on deprecating and eventually removing the old
> config.
> > > The
> > > > > > >> intention
> > > > > > >> > > was
> > > > > > >> > > > absolutely bounding memory usage. I think having two
> ways
> > of
> > > > > doing
> > > > > > >> > this,
> > > > > > >> > > > one that gives a crisp bound on memory and one that is
> > hard
> > > to
> > > > > > reason
> > > > > > >> > > about
> > > > > > >> > > > is pretty confusing. I think people will really
> appreciate
> > > > > having
> > > > > > one
> > > > > > >> > > > config which instead lets them directly control the
> thing
> > > they
> > > > > > >> actually
> > > > > > >> > > > care about (memory).
> > > > > > >> > > >
> > > > > > >> > > > I also want to second Jun's concern on the complexity of
> > the
> > > > > > >> self-GCing
> > > > > > >> > > > memory pool. I wrote the memory pool for the producer.
> In
> > > that
> > > > > > area
> > > > > > >> the
> > > > > > >> > > > pooling of messages is the single biggest factor in
> > > > performance
> > > > > of
> > > > > > >> the
> > > > > > >> > > > client so I believed it was worth some
> > > > sophistication/complexity
> > > > > > if
> > > > > > >> > there
> > > > > > >> > > > was performance payoff. All the same, the complexity of
> > that
> > > > > code
> > > > > > has
> > > > > > >> > > made
> > > > > > >> > > > it VERY hard to keep correct (it gets broken roughly
> every
> > > > other
> > > > > > time
> > > > > > >> > > > someone makes a change). Over time I came to feel a lot
> > less
> > > > > > proud of
> > > > > > >> > my
> > > > > > >> > > > cleverness. I learned something interesting reading your
> > > > > > self-GCing
> > > > > > >> > > memory
> > > > > > >> > > > pool, but I wonder if the complexity is worth the payoff
> > in
> > > > this
> > > > > > >> case?
> > > > > > >> > > >
> > > > > > >> > > > Philosophically we've tried really hard to avoid
> > needlessly
> > > > > > >> "pluggable"
> > > > > > >> > > > implementations. That is, when there is a temptation to
> > > give a
> > > > > > config
> > > > > > >> > > that
> > > > > > >> > > > plugs in different Java classes at run time for
> > > implementation
> > > > > > >> choices,
> > > > > > >> > > we
> > > > > > >> > > > should instead think of how to give the user the good
> > > behavior
> > > > > > >> > > > automatically. I think the use case for configuring a
> the
> > > > GCing
> > > > > > pool
> > > > > > >> > > would
> > > > > > >> > > > be if you discovered a bug in which memory leaked. But
> > this
> > > > > isn't
> > > > > > >> > > something
> > > > > > >> > > > the user should have to think about right? If there is a
> > bug
> > > > we
> > > > > > >> should
> > > > > > >> > > find
> > > > > > >> > > > and fix it.
> > > > > > >> > > >
> > > > > > >> > > > -Jay
> > > > > > >> > > >
> > > > > > >> > > > On Fri, Nov 11, 2016 at 9:21 AM, radai <
> > > > > > radai.rosenbl...@gmail.com>
> > > > > > >> > > wrote:
> > > > > > >> > > >
> > > > > > >> > > > > jun's #1 + rajini's #11 - the new config param is to
> > > enable
> > > > > > >> changing
> > > > > > >> > > the
> > > > > > >> > > > > pool implentation class. as i said in my response to
> > jun i
> > > > > will
> > > > > > >> make
> > > > > > >> > > the
> > > > > > >> > > > > default pool impl be the simple one, and this param is
> > to
> > > > > allow
> > > > > > a
> > > > > > >> > user
> > > > > > >> > > > > (more likely a dev) to change it.
> > > > > > >> > > > > both the simple pool and the "gc pool" make basically
> > just
> > > > an
> > > > > > >> > > > > AtomicLong.get() + (hashmap.put for gc) calls before
> > > > > returning a
> > > > > > >> > > buffer.
> > > > > > >> > > > > there is absolutely no dependency on GC times in
> > > allocating
> > > > > (or
> > > > > > >> not).
> > > > > > >> > > the
> > > > > > >> > > > > extra background thread in the gc pool is forever
> asleep
> > > > > unless
> > > > > > >> there
> > > > > > >> > > are
> > > > > > >> > > > > bugs (==leaks) so the extra cost is basically nothing
> > > > (backed
> > > > > by
> > > > > > >> > > > > benchmarks). let me re-itarate again - ANY BUFFER
> > > ALLOCATED
> > > > > MUST
> > > > > > >> > ALWAYS
> > > > > > >> > > > BE
> > > > > > >> > > > > RELEASED - so the gc pool should not rely on gc for
> > > > reclaiming
> > > > > > >> > buffers.
> > > > > > >> > > > its
> > > > > > >> > > > > a bug detector, not a feature and is definitely not
> > > intended
> > > > > to
> > > > > > >> hide
> > > > > > >> > > > bugs -
> > > > > > >> > > > > the exact opposite - its meant to expose them sooner.
> > i've
> > > > > > cleaned
> > > > > > >> up
> > > > > > >> > > the
> > > > > > >> > > > > docs to avoid this confusion. i also like the fail on
> > > leak.
> > > > > will
> > > > > > >> do.
> > > > > > >> > > > > as for the gap between pool size and heap size -
> thats a
> > > > valid
> > > > > > >> > > argument.
> > > > > > >> > > > > may allow also sizing the pool as % of heap size? so
> > > > > > >> > queued.max.bytes =
> > > > > > >> > > > > 1000000 for 1MB and queued.max.bytes = 0.25 for 25% of
> > > > > available
> > > > > > >> > heap?
> > > > > > >> > > > >
> > > > > > >> > > > > jun's 2.2 - queued.max.bytes +
> socket.request.max.bytes
> > > > still
> > > > > > >> holds,
> > > > > > >> > > > > assuming the ssl-related buffers are small. the
> largest
> > > > > > weakness in
> > > > > > >> > > this
> > > > > > >> > > > > claim has to do with decompression rather than
> anything
> > > > > > >> ssl-related.
> > > > > > >> > so
> > > > > > >> > > > yes
> > > > > > >> > > > > there is an O(#ssl connections * sslEngine packet
> size)
> > > > > > component,
> > > > > > >> > but
> > > > > > >> > > i
> > > > > > >> > > > > think its small. again - decompression should be the
> > > > concern.
> > > > > > >> > > > >
> > > > > > >> > > > > rajini's #13 - interesting optimization. the problem
> is
> > > > > there's
> > > > > > no
> > > > > > >> > > > knowing
> > > > > > >> > > > > in advance what the _next_ request to come out of a
> > socket
> > > > is,
> > > > > > so
> > > > > > >> > this
> > > > > > >> > > > > would mute just those sockets that are 1. mutable and
> 2.
> > > > have
> > > > > a
> > > > > > >> > > > > buffer-demanding request for which we could not
> > allocate a
> > > > > > buffer.
> > > > > > >> > > > downside
> > > > > > >> > > > > is that as-is this would cause the busy-loop on poll()
> > > that
> > > > > the
> > > > > > >> mutes
> > > > > > >> > > > were
> > > > > > >> > > > > supposed to prevent - or code would need to be added
> to
> > > > > > ad-hocmute
> > > > > > >> a
> > > > > > >> > > > > connection that was so-far unmuted but has now
> > generated a
> > > > > > >> > > > memory-demanding
> > > > > > >> > > > > request?
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > >
> > > > > > >> > > > > On Fri, Nov 11, 2016 at 5:02 AM, Rajini Sivaram <
> > > > > > >> > > > > rajinisiva...@googlemail.com> wrote:
> > > > > > >> > > > >
> > > > > > >> > > > > > Radai,
> > > > > > >> > > > > >
> > > > > > >> > > > > > 11. The KIP talks about a new server configuration
> > > > parameter
> > > > > > >> > > > > > *memory.pool.class.name
> > > > > > >> > > > > > <http://memory.pool.class.name> *which is not in
> the
> > > > > > >> > implementation.
> > > > > > >> > > > Is
> > > > > > >> > > > > it
> > > > > > >> > > > > > still the case that the pool will be configurable?
> > > > > > >> > > > > >
> > > > > > >> > > > > > 12. Personally I would prefer not to have a garbage
> > > > > collected
> > > > > > >> pool
> > > > > > >> > > that
> > > > > > >> > > > > > hides bugs as well. Apart from the added code
> > complexity
> > > > and
> > > > > > >> extra
> > > > > > >> > > > thread
> > > > > > >> > > > > > to handle collections, I am also concerned about the
> > > > > > >> > > non-deterministic
> > > > > > >> > > > > > nature of GC timings. The KIP introduces delays in
> > > > > processing
> > > > > > >> > > requests
> > > > > > >> > > > > > based on the configuration parameter
> > *queued.max.bytes.
> > > > > *This
> > > > > > in
> > > > > > >> > > > > unrelated
> > > > > > >> > > > > > to the JVM heap size and hence pool can be full when
> > > there
> > > > > is
> > > > > > no
> > > > > > >> > > > pressure
> > > > > > >> > > > > > on the JVM to garbage collect. The KIP does not
> > prevent
> > > > > other
> > > > > > >> > > timeouts
> > > > > > >> > > > in
> > > > > > >> > > > > > the broker (eg. consumer session timeout) because it
> > is
> > > > > > relying
> > > > > > >> on
> > > > > > >> > > the
> > > > > > >> > > > > pool
> > > > > > >> > > > > > to be managed in a deterministic, timely manner.
> > Since a
> > > > > > garbage
> > > > > > >> > > > > collected
> > > > > > >> > > > > > pool cannot provide that guarantee, wouldn't it be
> > > better
> > > > to
> > > > > > run
> > > > > > >> > > tests
> > > > > > >> > > > > with
> > > > > > >> > > > > > a GC-pool that perhaps fails with a fatal error if
> it
> > > > > > encounters
> > > > > > >> a
> > > > > > >> > > > buffer
> > > > > > >> > > > > > that was not released?
> > > > > > >> > > > > >
> > > > > > >> > > > > > 13. The implementation currently mutes all channels
> > that
> > > > > don't
> > > > > > >> > have a
> > > > > > >> > > > > > receive buffer allocated. Would it make sense to
> mute
> > > only
> > > > > the
> > > > > > >> > > channels
> > > > > > >> > > > > > that need a buffer (i.e. allow channels to read the
> > > 4-byte
> > > > > > size
> > > > > > >> > that
> > > > > > >> > > is
> > > > > > >> > > > > not
> > > > > > >> > > > > > read using the pool) so that normal client
> connection
> > > > > close()
> > > > > > is
> > > > > > >> > > > handled
> > > > > > >> > > > > > even when the pool is full? Since the extra 4-bytes
> > may
> > > > > > already
> > > > > > >> be
> > > > > > >> > > > > > allocated for some connections, the total request
> > memory
> > > > has
> > > > > > to
> > > > > > >> > take
> > > > > > >> > > > into
> > > > > > >> > > > > > account *4*numConnections* bytes anyway.
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > On Thu, Nov 10, 2016 at 11:51 PM, Jun Rao <
> > > > j...@confluent.io
> > > > > >
> > > > > > >> > wrote:
> > > > > > >> > > > > >
> > > > > > >> > > > > > > Hi, Radai,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 1. Yes, I am concerned about the trickiness of
> > having
> > > to
> > > > > > deal
> > > > > > >> > with
> > > > > > >> > > > > wreak
> > > > > > >> > > > > > > refs. I think it's simpler to just have the simple
> > > > version
> > > > > > >> > > > instrumented
> > > > > > >> > > > > > > with enough debug/trace logging and do enough
> stress
> > > > > > testing.
> > > > > > >> > Since
> > > > > > >> > > > we
> > > > > > >> > > > > > > still have queued.max.requests, one can always
> fall
> > > back
> > > > > to
> > > > > > >> that
> > > > > > >> > > if a
> > > > > > >> > > > > > > memory leak issue is identified. We could also
> label
> > > the
> > > > > > >> feature
> > > > > > >> > as
> > > > > > >> > > > > beta
> > > > > > >> > > > > > if
> > > > > > >> > > > > > > we don't think this is production ready.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 2.2 I am just wondering after we fix that issue
> > > whether
> > > > > the
> > > > > > >> claim
> > > > > > >> > > > that
> > > > > > >> > > > > > the
> > > > > > >> > > > > > > request memory is bounded by  queued.max.bytes +
> > > > > > >> > > > > socket.request.max.bytes
> > > > > > >> > > > > > > is still true.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > 5. Ok, leaving the default as -1 is fine then.
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Thanks,
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > Jun
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > On Wed, Nov 9, 2016 at 6:01 PM, radai <
> > > > > > >> > radai.rosenbl...@gmail.com>
> > > > > > >> > > > > > wrote:
> > > > > > >> > > > > > >
> > > > > > >> > > > > > > > Hi Jun,
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > Thank you for taking the time to review this.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > 1. short version - yes, the concern is bugs, but
> > the
> > > > > cost
> > > > > > is
> > > > > > >> > tiny
> > > > > > >> > > > and
> > > > > > >> > > > > > > worth
> > > > > > >> > > > > > > > it, and its a common pattern. long version:
> > > > > > >> > > > > > > >    1.1 detecting these types of bugs (leaks)
> > cannot
> > > be
> > > > > > easily
> > > > > > >> > > done
> > > > > > >> > > > > with
> > > > > > >> > > > > > > > simple testing, but requires stress/stability
> > tests
> > > > that
> > > > > > run
> > > > > > >> > for
> > > > > > >> > > a
> > > > > > >> > > > > long
> > > > > > >> > > > > > > > time (long enough to hit OOM, depending on leak
> > size
> > > > and
> > > > > > >> > > available
> > > > > > >> > > > > > > memory).
> > > > > > >> > > > > > > > this is why some sort of leak detector is
> > "standard
> > > > > > practice"
> > > > > > >> > > .for
> > > > > > >> > > > > > > example
> > > > > > >> > > > > > > > look at netty (http://netty.io/wiki/
> > > > > > >> reference-counted-objects.
> > > > > > >> > > > > > > > html#leak-detection-levels)
> > > > > > >> > > > > > > > <http://netty.io/wiki/reference-counted-objects
> .
> > > > > > >> > > > > > > html#leak-detection-levels
> > > > > > >> > > > > > > > >-
> > > > > > >> > > > > > > > they have way more complicated built-in leak
> > > detection
> > > > > > >> enabled
> > > > > > >> > by
> > > > > > >> > > > > > > default.
> > > > > > >> > > > > > > > as a concrete example - during development i did
> > not
> > > > > > properly
> > > > > > >> > > > dispose
> > > > > > >> > > > > > of
> > > > > > >> > > > > > > > in-progress KafkaChannel.receive when a
> connection
> > > was
> > > > > > >> abruptly
> > > > > > >> > > > > closed
> > > > > > >> > > > > > > and
> > > > > > >> > > > > > > > I only found it because of the log msg printed
> by
> > > the
> > > > > > pool.
> > > > > > >> > > > > > > >    1.2 I have a benchmark suite showing the
> > > > performance
> > > > > > cost
> > > > > > >> of
> > > > > > >> > > the
> > > > > > >> > > > > gc
> > > > > > >> > > > > > > pool
> > > > > > >> > > > > > > > is absolutely negligible -
> > > > > > >> > > > > > > > https://github.com/radai-
> > > rosenblatt/kafka-benchmarks/
> > > > > > >> > > > > > > > tree/master/memorypool-benchmarks
> > > > > > >> > > > > > > >    1.3 as for the complexity of the impl - its
> > just
> > > > ~150
> > > > > > >> lines
> > > > > > >> > > and
> > > > > > >> > > > > > pretty
> > > > > > >> > > > > > > > straight forward. i think the main issue is that
> > not
> > > > > many
> > > > > > >> > people
> > > > > > >> > > > are
> > > > > > >> > > > > > > > familiar with weak refs and ref queues.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >    how about making the pool impl class a config
> > > param
> > > > > > >> > (generally
> > > > > > >> > > > > good
> > > > > > >> > > > > > > > going forward), make the default be the simple
> > pool,
> > > > and
> > > > > > keep
> > > > > > >> > the
> > > > > > >> > > > GC
> > > > > > >> > > > > > one
> > > > > > >> > > > > > > as
> > > > > > >> > > > > > > > a dev/debug/triage aid?
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > 2. the KIP itself doesnt specifically treat SSL
> at
> > > > all -
> > > > > > its
> > > > > > >> an
> > > > > > >> > > > > > > > implementation detail. as for my current patch,
> it
> > > has
> > > > > > some
> > > > > > >> > > minimal
> > > > > > >> > > > > > > > treatment of SSL - just enough to not mute SSL
> > > sockets
> > > > > > >> > > > mid-handshake
> > > > > > >> > > > > -
> > > > > > >> > > > > > > but
> > > > > > >> > > > > > > > the code in SslTransportLayer still allocates
> > > buffers
> > > > > > itself.
> > > > > > >> > it
> > > > > > >> > > is
> > > > > > >> > > > > my
> > > > > > >> > > > > > > > understanding that netReadBuffer/appReadBuffer
> > > > shouldn't
> > > > > > grow
> > > > > > >> > > > beyond
> > > > > > >> > > > > 2
> > > > > > >> > > > > > x
> > > > > > >> > > > > > > > sslEngine.getSession().getPacketBufferSize(),
> > > which i
> > > > > > assume
> > > > > > >> > to
> > > > > > >> > > be
> > > > > > >> > > > > > > small.
> > > > > > >> > > > > > > > they are also long lived (they live for the
> > duration
> > > > of
> > > > > > the
> > > > > > >> > > > > connection)
> > > > > > >> > > > > > > > which makes a poor fit for pooling. the bigger
> > fish
> > > to
> > > > > > fry i
> > > > > > >> > > think
> > > > > > >> > > > is
> > > > > > >> > > > > > > > decompression - you could read a 1MB blob into a
> > > > > > >> pool-provided
> > > > > > >> > > > buffer
> > > > > > >> > > > > > and
> > > > > > >> > > > > > > > then decompress it into 10MB of heap allocated
> on
> > > the
> > > > > spot
> > > > > > >> :-)
> > > > > > >> > > > also,
> > > > > > >> > > > > > the
> > > > > > >> > > > > > > > ssl code is extremely tricky.
> > > > > > >> > > > > > > >    2.2 just to make sure, youre talking about
> > > > > > Selector.java:
> > > > > > >> > > while
> > > > > > >> > > > > > > > ((networkReceive = channel.read()) != null)
> > > > > > >> > > > > > addToStagedReceives(channel,
> > > > > > >> > > > > > > > networkReceive); ? if so youre right, and i'll
> fix
> > > > that
> > > > > > >> > (probably
> > > > > > >> > > > by
> > > > > > >> > > > > > > > something similar to immediatelyConnectedKeys,
> not
> > > > sure
> > > > > > yet)
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > 3. isOutOfMemory is self explanatory (and i'll
> add
> > > > > > javadocs
> > > > > > >> and
> > > > > > >> > > > > update
> > > > > > >> > > > > > > the
> > > > > > >> > > > > > > > wiki). isLowOnMem is basically the point where I
> > > start
> > > > > > >> > > randomizing
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > selection key handling order to avoid potential
> > > > > > starvation.
> > > > > > >> its
> > > > > > >> > > > > rather
> > > > > > >> > > > > > > > arbitrary and now that i think of it should
> > probably
> > > > not
> > > > > > >> exist
> > > > > > >> > > and
> > > > > > >> > > > be
> > > > > > >> > > > > > > > entirely contained in Selector (where the
> > shuffling
> > > > > takes
> > > > > > >> > place).
> > > > > > >> > > > > will
> > > > > > >> > > > > > > fix.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > 4. will do.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > 5. I prefer -1 or 0 as an explicit "OFF" (or
> > > basically
> > > > > > >> anything
> > > > > > >> > > > <=0).
> > > > > > >> > > > > > > > Long.MAX_VALUE would still create a pool, that
> > would
> > > > > still
> > > > > > >> > waste
> > > > > > >> > > > time
> > > > > > >> > > > > > > > tracking resources. I dont really mind though if
> > you
> > > > > have
> > > > > > a
> > > > > > >> > > > preferred
> > > > > > >> > > > > > > magic
> > > > > > >> > > > > > > > value for off.
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > On Wed, Nov 9, 2016 at 9:28 AM, Jun Rao <
> > > > > j...@confluent.io
> > > > > > >
> > > > > > >> > > wrote:
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > > > > Hi, Radai,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thanks for the KIP. Some comments below.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 1. The KIP says "to facilitate faster
> > > implementation
> > > > > > (as a
> > > > > > >> > > safety
> > > > > > >> > > > > > net)
> > > > > > >> > > > > > > > the
> > > > > > >> > > > > > > > > pool will be implemented in such a way that
> > memory
> > > > > that
> > > > > > was
> > > > > > >> > not
> > > > > > >> > > > > > > > release()ed
> > > > > > >> > > > > > > > > (but still garbage collected) would be
> detected
> > > and
> > > > > > >> > > "reclaimed".
> > > > > > >> > > > > this
> > > > > > >> > > > > > > is
> > > > > > >> > > > > > > > to
> > > > > > >> > > > > > > > > prevent "leaks" in case of code paths that
> fail
> > to
> > > > > > >> release()
> > > > > > >> > > > > > > properly.".
> > > > > > >> > > > > > > > > What are the cases that could cause memory
> > leaks?
> > > If
> > > > > we
> > > > > > are
> > > > > > >> > > > > concerned
> > > > > > >> > > > > > > > about
> > > > > > >> > > > > > > > > bugs, it seems that it's better to just do
> more
> > > > > testing
> > > > > > to
> > > > > > >> > make
> > > > > > >> > > > > sure
> > > > > > >> > > > > > > the
> > > > > > >> > > > > > > > > usage of the simple implementation
> > > > (SimpleMemoryPool)
> > > > > is
> > > > > > >> > solid
> > > > > > >> > > > > > instead
> > > > > > >> > > > > > > of
> > > > > > >> > > > > > > > > adding more complicated logic
> > > > > > (GarbageCollectedMemoryPool)
> > > > > > >> to
> > > > > > >> > > > hide
> > > > > > >> > > > > > the
> > > > > > >> > > > > > > > > potential bugs.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 2. I am wondering how much this KIP covers the
> > SSL
> > > > > > channel
> > > > > > >> > > > > > > > implementation.
> > > > > > >> > > > > > > > > 2.1 SslTransportLayer maintains netReadBuffer,
> > > > > > >> > netWriteBuffer,
> > > > > > >> > > > > > > > > appReadBuffer per socket. Should those memory
> be
> > > > > > accounted
> > > > > > >> > for
> > > > > > >> > > in
> > > > > > >> > > > > > > memory
> > > > > > >> > > > > > > > > pool?
> > > > > > >> > > > > > > > > 2.2 One tricky thing with SSL is that during a
> > > > > > >> > > > KafkaChannel.read(),
> > > > > > >> > > > > > > it's
> > > > > > >> > > > > > > > > possible for multiple NetworkReceives to be
> > > returned
> > > > > > since
> > > > > > >> > > > multiple
> > > > > > >> > > > > > > > > requests' data could be encrypted together by
> > SSL.
> > > > To
> > > > > > deal
> > > > > > >> > with
> > > > > > >> > > > > this,
> > > > > > >> > > > > > > we
> > > > > > >> > > > > > > > > stash those NetworkReceives in
> > > > Selector.stagedReceives
> > > > > > and
> > > > > > >> > give
> > > > > > >> > > > it
> > > > > > >> > > > > > back
> > > > > > >> > > > > > > > to
> > > > > > >> > > > > > > > > the poll() call one NetworkReceive at a time.
> > What
> > > > > this
> > > > > > >> means
> > > > > > >> > > is
> > > > > > >> > > > > > that,
> > > > > > >> > > > > > > if
> > > > > > >> > > > > > > > > we stop reading from KafkaChannel in the
> middle
> > > > > because
> > > > > > >> > memory
> > > > > > >> > > > pool
> > > > > > >> > > > > > is
> > > > > > >> > > > > > > > > full, this channel's key may never get
> selected
> > > for
> > > > > > reads
> > > > > > >> > (even
> > > > > > >> > > > > after
> > > > > > >> > > > > > > the
> > > > > > >> > > > > > > > > read interest is turned on), but there are
> still
> > > > > pending
> > > > > > >> data
> > > > > > >> > > for
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > > channel, which will never get processed.
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 3. The code has the following two methods in
> > > > > MemoryPool,
> > > > > > >> > which
> > > > > > >> > > > are
> > > > > > >> > > > > > not
> > > > > > >> > > > > > > > > described in the KIP. Could you explain how
> they
> > > are
> > > > > > used
> > > > > > >> in
> > > > > > >> > > the
> > > > > > >> > > > > > wiki?
> > > > > > >> > > > > > > > > isLowOnMemory()
> > > > > > >> > > > > > > > > isOutOfMemory()
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 4. Could you also describe in the KIP at the
> > high
> > > > > level,
> > > > > > >> how
> > > > > > >> > > the
> > > > > > >> > > > > read
> > > > > > >> > > > > > > > > interest bit for the socket is turned on/off
> > with
> > > > > > respect
> > > > > > >> to
> > > > > > >> > > > > > > MemoryPool?
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > 5. Should queued.max.bytes defaults to -1 or
> > > > > > >> Long.MAX_VALUE?
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Thanks,
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > Jun
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > On Mon, Nov 7, 2016 at 1:08 PM, radai <
> > > > > > >> > > > radai.rosenbl...@gmail.com>
> > > > > > >> > > > > > > > wrote:
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > > > > Hi,
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > I would like to initiate a vote on KIP-72:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > https://cwiki.apache.org/
> > > > > > confluence/display/KAFKA/KIP-
> > > > > > >> > 72%3A+
> > > > > > >> > > > > > > > > > Allow+putting+a+bound+on+memor
> > > > > y+consumed+by+Incoming+
> > > > > > >> > > requests
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > The kip allows specifying a limit on the
> > amount
> > > of
> > > > > > memory
> > > > > > >> > > > > allocated
> > > > > > >> > > > > > > for
> > > > > > >> > > > > > > > > > reading incoming requests into. This is
> useful
> > > for
> > > > > > >> > "sizing" a
> > > > > > >> > > > > > broker
> > > > > > >> > > > > > > > and
> > > > > > >> > > > > > > > > > avoiding OOMEs under heavy load (as actually
> > > > happens
> > > > > > >> > > > occasionally
> > > > > > >> > > > > > at
> > > > > > >> > > > > > > > > > linkedin).
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > I believe I've addressed most (all?)
> concerns
> > > > > brought
> > > > > > up
> > > > > > >> > > during
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > > > discussion.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > To the best of my understanding this vote is
> > > about
> > > > > the
> > > > > > >> goal
> > > > > > >> > > and
> > > > > > >> > > > > > > > > > public-facing changes related to the new
> > > proposed
> > > > > > >> behavior,
> > > > > > >> > > but
> > > > > > >> > > > > as
> > > > > > >> > > > > > > for
> > > > > > >> > > > > > > > > > implementation, i have the code up here:
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > https://github.com/radai-
> > > > > > rosenblatt/kafka/tree/broker-
> > > > > > >> > memory
> > > > > > >> > > > > > > > > > -pool-with-muting
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > and I've stress-tested it to work properly
> > > > (meaning
> > > > > it
> > > > > > >> > chugs
> > > > > > >> > > > > along
> > > > > > >> > > > > > > and
> > > > > > >> > > > > > > > > > throttles under loads that would DOS
> 10.0.1.0
> > > > code).
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > I also believe that the primitives and
> > > "pattern"s
> > > > > > >> > introduced
> > > > > > >> > > in
> > > > > > >> > > > > > this
> > > > > > >> > > > > > > > KIP
> > > > > > >> > > > > > > > > > (namely the notion of a buffer pool and
> > > retrieving
> > > > > > from /
> > > > > > >> > > > > releasing
> > > > > > >> > > > > > > to
> > > > > > >> > > > > > > > > said
> > > > > > >> > > > > > > > > > pool instead of allocating memory) are
> > generally
> > > > > > useful
> > > > > > >> > > beyond
> > > > > > >> > > > > the
> > > > > > >> > > > > > > > scope
> > > > > > >> > > > > > > > > of
> > > > > > >> > > > > > > > > > this KIP for both performance issues
> > (allocating
> > > > > lots
> > > > > > of
> > > > > > >> > > > > > short-lived
> > > > > > >> > > > > > > > > large
> > > > > > >> > > > > > > > > > buffers is a performance bottleneck) and
> other
> > > > areas
> > > > > > >> where
> > > > > > >> > > > memory
> > > > > > >> > > > > > > > limits
> > > > > > >> > > > > > > > > > are a problem (KIP-81)
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Thank you,
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > > > Radai.
> > > > > > >> > > > > > > > > >
> > > > > > >> > > > > > > > >
> > > > > > >> > > > > > > >
> > > > > > >> > > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > >
> > > > > > >> > > > > > --
> > > > > > >> > > > > > Regards,
> > > > > > >> > > > > >
> > > > > > >> > > > > > Rajini
> > > > > > >> > > > > >
> > > > > > >> > > > >
> > > > > > >> > > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Regards,
> > > > > > >> > >
> > > > > > >> > > Rajini
> > > > > > >> > >
> > > > > > >> >
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> --
> > > > > > >> Regards,
> > > > > > >>
> > > > > > >> Rajini
> > > > > > >>
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Regards,
> > > > >
> > > > > Rajini
> > > > >
> > > >
> > >
> >
>

Re: [VOTE] KIP-72 - Allow putting a bound on memory consumed by Incoming requests

Reply via email to