See my answers below,

On Thu, Sep 10, 2020 at 5:03 PM Gwen Shapira <g...@confluent.io> wrote:

> There is another option of doing the splitting on the server and hiding
> this from the clients. My personal (and highly controversial) take is that
> Kafka clients could use less complexity rather than more. They are
> incredibly difficult to reason about as is.  But maybe this
> splitting/merging won't be that bad - multi-part messages are well
> understood in general.
>
Sure, let's decide about the approach first and then figure out where we
should put it. As Tim mentioned, the alternative is to have a separate file
per message on broker side, which is read/written in a streaming way.


> On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira <g...@confluent.io> wrote:
>
> > There is also another approach (harder to design, but may be easier to
> use
> > and maintain), which is to make Kafka handle large messages better and
> > allow users to set higher limits - for example, can Kafka provide really
> > high throughput with 1GB messages? Some systems do it well.
>
It is definitely possible. Although it seems complex to build a system
which is efficient for both: small (Mb) and large (Gb) messages.
What systems do you have in mind?

>
> > I don't know where the slowdowns happen, but perhaps it is one of these?
> > 1. Java GC used to be a problem, but maybe we didn't try with newer GC
> and
> > simple tuning will solve it?
>
I'm afraid it will affect optimizations made for normal messages in page
and write caches. When large chunk is getting there it will inevitably push
out smaller ones.

> 2. We have head-of-line blocking issue on the queue. There are approaches
> > to solve that too.
>
One of the options is to set reasonable limits on multi part messages, to
reduce the time for blocking.

>
> > I'd love to see more exploration on what exactly is the problem we are
> > facing (and is it still an issue? Becket's work is a few years old now.)
>
It is definitely an issue, especially for the users of managed Kafka
services like Confluent Cloud. These are forced to use either
referenced-based or chunking.
For those who are deploying Kafka themselves there are options to increase
message size, at the price of slowing down the broker.

A.

> >
> > On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
> > sibirya...@scrapinghub.com> wrote:
> >
> >> Hey Ben, thanks for the link. My proposal is partially based on Becket's
> >> ideas, but I haven't reached out to him directly.
> >>
> >> +Becket
> >> Hi Becket, would you mind to have a look at my proposal (link is in the
> >> first message) ?
> >>
> >> A.
> >>
> >> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford <b...@confluent.io> wrote:
> >>
> >> > LinkedIn had something like this. Becket did a talk on it a few years
> >> ago.
> >> > It would be interesting to know what became of it and if there were
> >> lessons
> >> > learned.
> >> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
> >> >
> >> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> >> > sibirya...@scrapinghub.com> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > I would like to get your opinions on this KIP idea.
> >> > >
> >> > > In short it will allow to transfer messages of bigger size than
> >> allowed
> >> > by
> >> > > the broker.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> >> > >
> >> > > If all that makes sense, I'll create a full fledged KIP document and
> >> > expand
> >> > > the idea.
> >> > >
> >> > > Thanks,
> >> > > A.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Ben Stopford
> >> >
> >>
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>

Reply via email to