In my opinion a lot of kafka configuration options were added using the "minimal diff" approach, which results in very nuanced and complicated configs required to indirectly achieve some goal. case in point - timeouts.
The goal here is to control the memory requirement. the 1st config was max size of a single request, now the proposal is to control the number of those in flight - which is inaccurate (you dont know the actual size and must over-estimate), would have an impact on throughput in case of over-estimation, and also fails to completely achieve the goal (what about decompression?) I think a memory pool in combination with Jay's proposal to only pick up from socket conditionally when memory is available is the correct approach - it deals with the problem directly and would result in a simler and more understandable configuration (a single property for max memory consumption). in the future the accuracy of the limit can be improved by, for example, declaring both the compressed _AND UNCOMPRESSED_ sizes up front, so that we can pick up from socket when we have enough memory to decompress as well - this would obviously be a wire format change and outside the scope here, but my point is that it could be done without adding any new configs) On Mon, Oct 31, 2016 at 10:25 AM, Joel Koshy <jjkosh...@gmail.com> wrote: > Agreed with this approach. > One detail to be wary of is that since we multiplex various other requests > (e.g., heartbeats, offset commits, metadata, etc.) over the client that > connects to the coordinator this could delay some of these critical > requests. Realistically I don't think it will be an issue except in extreme > scenarios where someone sets the memory limit to be unreasonably low. > > Thanks, > > Joel > > On Sun, Oct 30, 2016 at 12:32 PM, Jun Rao <j...@confluent.io> wrote: > > > Hi, Mickael, > > > > I agree with others that it's better to be able to control the bytes the > > consumer can read from sockets, instead of limiting the fetch requests. > > KIP-72 has a proposal to bound the memory size at the socket selector > > level. Perhaps that can be leveraged in this KIP too. > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > 72%3A+Allow+putting+a+bound+on+memory+consumed+by+Incoming+requests > > > > Thanks, > > > > Jun > > > > On Thu, Oct 27, 2016 at 3:23 PM, Jay Kreps <j...@confluent.io> wrote: > > > > > This is a good observation on limiting total memory usage. If I > > understand > > > the proposal I think it is that the consumer client would stop sending > > > fetch requests once a certain number of in-flight fetch requests is > met. > > I > > > think a better approach would be to always issue one fetch request to > > each > > > broker immediately, allow the server to process that request, and send > > data > > > back to the local machine where it would be stored in the socket buffer > > (up > > > to that buffer size). Instead of throttling the requests sent, the > > consumer > > > should ideally throttle the responses read from the socket buffer at > any > > > given time. That is, in a single poll call, rather than reading from > > every > > > single socket it should just read until it has a given amount of memory > > > used then bail out early. It can come back and read more from the other > > > sockets after those messages are processed. > > > > > > The advantage of this approach is that you don't incur the additional > > > latency. > > > > > > -Jay > > > > > > On Mon, Oct 10, 2016 at 6:41 AM, Mickael Maison < > > mickael.mai...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > I would like to discuss the following KIP proposal: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > 81%3A+Max+in-flight+fetches > > > > > > > > > > > > Feedback and comments are welcome. > > > > Thanks ! > > > > > > > > Mickael > > > > > > > > > >