See my answers below, On Thu, Sep 10, 2020 at 5:03 PM Gwen Shapira <g...@confluent.io> wrote:
> There is another option of doing the splitting on the server and hiding > this from the clients. My personal (and highly controversial) take is that > Kafka clients could use less complexity rather than more. They are > incredibly difficult to reason about as is. But maybe this > splitting/merging won't be that bad - multi-part messages are well > understood in general. > Sure, let's decide about the approach first and then figure out where we should put it. As Tim mentioned, the alternative is to have a separate file per message on broker side, which is read/written in a streaming way. > On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira <g...@confluent.io> wrote: > > > There is also another approach (harder to design, but may be easier to > use > > and maintain), which is to make Kafka handle large messages better and > > allow users to set higher limits - for example, can Kafka provide really > > high throughput with 1GB messages? Some systems do it well. > It is definitely possible. Although it seems complex to build a system which is efficient for both: small (Mb) and large (Gb) messages. What systems do you have in mind? > > > I don't know where the slowdowns happen, but perhaps it is one of these? > > 1. Java GC used to be a problem, but maybe we didn't try with newer GC > and > > simple tuning will solve it? > I'm afraid it will affect optimizations made for normal messages in page and write caches. When large chunk is getting there it will inevitably push out smaller ones. > 2. We have head-of-line blocking issue on the queue. There are approaches > > to solve that too. > One of the options is to set reasonable limits on multi part messages, to reduce the time for blocking. > > > I'd love to see more exploration on what exactly is the problem we are > > facing (and is it still an issue? Becket's work is a few years old now.) > It is definitely an issue, especially for the users of managed Kafka services like Confluent Cloud. These are forced to use either referenced-based or chunking. For those who are deploying Kafka themselves there are options to increase message size, at the price of slowing down the broker. A. > > > > On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov < > > sibirya...@scrapinghub.com> wrote: > > > >> Hey Ben, thanks for the link. My proposal is partially based on Becket's > >> ideas, but I haven't reached out to him directly. > >> > >> +Becket > >> Hi Becket, would you mind to have a look at my proposal (link is in the > >> first message) ? > >> > >> A. > >> > >> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford <b...@confluent.io> wrote: > >> > >> > LinkedIn had something like this. Becket did a talk on it a few years > >> ago. > >> > It would be interesting to know what became of it and if there were > >> lessons > >> > learned. > >> > https://www.youtube.com/watch?v=ZrbaXDYUZY8 > >> > > >> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov < > >> > sibirya...@scrapinghub.com> wrote: > >> > > >> > > Hello, > >> > > > >> > > I would like to get your opinions on this KIP idea. > >> > > > >> > > In short it will allow to transfer messages of bigger size than > >> allowed > >> > by > >> > > the broker. > >> > > > >> > > > >> > > > >> > > >> > https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing > >> > > > >> > > If all that makes sense, I'll create a full fledged KIP document and > >> > expand > >> > > the idea. > >> > > > >> > > Thanks, > >> > > A. > >> > > > >> > > >> > > >> > -- > >> > > >> > Ben Stopford > >> > > >> > > > > > > -- > > Gwen Shapira > > Engineering Manager | Confluent > > 650.450.2760 | @gwenshap > > Follow us: Twitter | blog > > > > > -- > Gwen Shapira > Engineering Manager | Confluent > 650.450.2760 | @gwenshap > Follow us: Twitter | blog >