Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-11 Thread Alexander Sibiryakov
Hello,

I don't have an answer for this. Ideally, the bigger the better. For our
use case: working with Web data, hundreds of megabytes should be enough.
But there could be other use cases, such as transferring of binary or media
files.

There is another dimension to consider: amount of large messages in the
stream. In our case it is less than 0.5%. If this number is high enough,
Kafka will be out of consideration, I assume.

A.

On Thu, Sep 10, 2020 at 7:02 PM Ismael Juma  wrote:

> Thanks for the KIP. I think the main question is what's the upper bound for
> message size you are looking to support. Is it hundreds of MBs, GBs, tens
> of GBs, something else? That would inform the options.
>
> Ismael
>
> On Thu, Sep 10, 2020 at 8:03 AM Gwen Shapira  wrote:
>
> > There is another option of doing the splitting on the server and hiding
> > this from the clients. My personal (and highly controversial) take is
> that
> > Kafka clients could use less complexity rather than more. They are
> > incredibly difficult to reason about as is.  But maybe this
> > splitting/merging won't be that bad - multi-part messages are well
> > understood in general.
> >
> > On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira  wrote:
> >
> > > There is also another approach (harder to design, but may be easier to
> > use
> > > and maintain), which is to make Kafka handle large messages better and
> > > allow users to set higher limits - for example, can Kafka provide
> really
> > > high throughput with 1GB messages? Some systems do it well.
> > >
> > > I don't know where the slowdowns happen, but perhaps it is one of
> these?
> > > 1. Java GC used to be a problem, but maybe we didn't try with newer GC
> > and
> > > simple tuning will solve it?
> > > 2. We have head-of-line blocking issue on the queue. There are
> approaches
> > > to solve that too.
> > >
> > > I'd love to see more exploration on what exactly is the problem we are
> > > facing (and is it still an issue? Becket's work is a few years old
> now.)
> > >
> > > On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
> > > sibirya...@scrapinghub.com> wrote:
> > >
> > >> Hey Ben, thanks for the link. My proposal is partially based on
> Becket's
> > >> ideas, but I haven't reached out to him directly.
> > >>
> > >> +Becket
> > >> Hi Becket, would you mind to have a look at my proposal (link is in
> the
> > >> first message) ?
> > >>
> > >> A.
> > >>
> > >> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford 
> wrote:
> > >>
> > >> > LinkedIn had something like this. Becket did a talk on it a few
> years
> > >> ago.
> > >> > It would be interesting to know what became of it and if there were
> > >> lessons
> > >> > learned.
> > >> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
> > >> >
> > >> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> > >> > sibirya...@scrapinghub.com> wrote:
> > >> >
> > >> > > Hello,
> > >> > >
> > >> > > I would like to get your opinions on this KIP idea.
> > >> > >
> > >> > > In short it will allow to transfer messages of bigger size than
> > >> allowed
> > >> > by
> > >> > > the broker.
> > >> > >
> > >> > >
> > >> > >
> > >> >
> > >>
> >
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> > >> > >
> > >> > > If all that makes sense, I'll create a full fledged KIP document
> and
> > >> > expand
> > >> > > the idea.
> > >> > >
> > >> > > Thanks,
> > >> > > A.
> > >> > >
> > >> >
> > >> >
> > >> > --
> > >> >
> > >> > Ben Stopford
> > >> >
> > >>
> > >
> > >
> > > --
> > > Gwen Shapira
> > > Engineering Manager | Confluent
> > > 650.450.2760 | @gwenshap
> > > Follow us: Twitter | blog
> > >
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-11 Thread Alexander Sibiryakov
See my answers below,

On Thu, Sep 10, 2020 at 5:03 PM Gwen Shapira  wrote:

> There is another option of doing the splitting on the server and hiding
> this from the clients. My personal (and highly controversial) take is that
> Kafka clients could use less complexity rather than more. They are
> incredibly difficult to reason about as is.  But maybe this
> splitting/merging won't be that bad - multi-part messages are well
> understood in general.
>
Sure, let's decide about the approach first and then figure out where we
should put it. As Tim mentioned, the alternative is to have a separate file
per message on broker side, which is read/written in a streaming way.


> On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira  wrote:
>
> > There is also another approach (harder to design, but may be easier to
> use
> > and maintain), which is to make Kafka handle large messages better and
> > allow users to set higher limits - for example, can Kafka provide really
> > high throughput with 1GB messages? Some systems do it well.
>
It is definitely possible. Although it seems complex to build a system
which is efficient for both: small (Mb) and large (Gb) messages.
What systems do you have in mind?

>
> > I don't know where the slowdowns happen, but perhaps it is one of these?
> > 1. Java GC used to be a problem, but maybe we didn't try with newer GC
> and
> > simple tuning will solve it?
>
I'm afraid it will affect optimizations made for normal messages in page
and write caches. When large chunk is getting there it will inevitably push
out smaller ones.

> 2. We have head-of-line blocking issue on the queue. There are approaches
> > to solve that too.
>
One of the options is to set reasonable limits on multi part messages, to
reduce the time for blocking.

>
> > I'd love to see more exploration on what exactly is the problem we are
> > facing (and is it still an issue? Becket's work is a few years old now.)
>
It is definitely an issue, especially for the users of managed Kafka
services like Confluent Cloud. These are forced to use either
referenced-based or chunking.
For those who are deploying Kafka themselves there are options to increase
message size, at the price of slowing down the broker.

A.

> >
> > On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
> > sibirya...@scrapinghub.com> wrote:
> >
> >> Hey Ben, thanks for the link. My proposal is partially based on Becket's
> >> ideas, but I haven't reached out to him directly.
> >>
> >> +Becket
> >> Hi Becket, would you mind to have a look at my proposal (link is in the
> >> first message) ?
> >>
> >> A.
> >>
> >> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford  wrote:
> >>
> >> > LinkedIn had something like this. Becket did a talk on it a few years
> >> ago.
> >> > It would be interesting to know what became of it and if there were
> >> lessons
> >> > learned.
> >> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
> >> >
> >> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> >> > sibirya...@scrapinghub.com> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > I would like to get your opinions on this KIP idea.
> >> > >
> >> > > In short it will allow to transfer messages of bigger size than
> >> allowed
> >> > by
> >> > > the broker.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> >> > >
> >> > > If all that makes sense, I'll create a full fledged KIP document and
> >> > expand
> >> > > the idea.
> >> > >
> >> > > Thanks,
> >> > > A.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Ben Stopford
> >> >
> >>
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-10 Thread Tim Fox



On 2020/09/04 07:17:36, Alexander Sibiryakov  
wrote: 
> Hello,
> 
> I would like to get your opinions on this KIP idea.
> 
> In short it will allow to transfer messages of bigger size than allowed by
> the broker.
> 
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> 
> If all that makes sense, I'll create a full fledged KIP document and expand
> the idea.
> 
> Thanks,
> A.
> 


Hi Alexander,

It's an interesting proposal. I've worked on previous brokers which support 
sending very large messages, up to 8 GiB with the server running in only 50 MB 
of RAM. 
https://activemq.apache.org/components/artemis/documentation/2.6.0/large-messages.html

There are generally two approaches that can be used to accomplish this:

1). Split the large message into smaller chunks at the producer and send it 
over the same mechanism as normal messages. Reassemble the large message from 
the chunks at the consumer.

2). The client streams the large message out of band from the client to the 
disk on the server (not stored in the log with other messages), then send a 
unique id for the message in the actual message. When the consumer receives the 
small message with the unique id it initiates a streaming download out of band 
from the server.

There are pros and cons for both approaches:

1) Means you don't need any special machinery for large messages on the broker, 
things just work as is

2) Is more efficient as you don't have to encode everything in little chunks 
and send through the machinery of the broker. You can open a socket to the 
server, and the server can slam it do disk using sendfile so this can be very 
fast and CPU efficient.

1) Can mean small messages can get lost in the log and slow to retrieve when 
there are millions of fragments of large messages in the same log.

BTW the concept of streaming messages has been around at least since JMS (1998 
iirc)!


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-10 Thread Ismael Juma
Thanks for the KIP. I think the main question is what's the upper bound for
message size you are looking to support. Is it hundreds of MBs, GBs, tens
of GBs, something else? That would inform the options.

Ismael

On Thu, Sep 10, 2020 at 8:03 AM Gwen Shapira  wrote:

> There is another option of doing the splitting on the server and hiding
> this from the clients. My personal (and highly controversial) take is that
> Kafka clients could use less complexity rather than more. They are
> incredibly difficult to reason about as is.  But maybe this
> splitting/merging won't be that bad - multi-part messages are well
> understood in general.
>
> On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira  wrote:
>
> > There is also another approach (harder to design, but may be easier to
> use
> > and maintain), which is to make Kafka handle large messages better and
> > allow users to set higher limits - for example, can Kafka provide really
> > high throughput with 1GB messages? Some systems do it well.
> >
> > I don't know where the slowdowns happen, but perhaps it is one of these?
> > 1. Java GC used to be a problem, but maybe we didn't try with newer GC
> and
> > simple tuning will solve it?
> > 2. We have head-of-line blocking issue on the queue. There are approaches
> > to solve that too.
> >
> > I'd love to see more exploration on what exactly is the problem we are
> > facing (and is it still an issue? Becket's work is a few years old now.)
> >
> > On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
> > sibirya...@scrapinghub.com> wrote:
> >
> >> Hey Ben, thanks for the link. My proposal is partially based on Becket's
> >> ideas, but I haven't reached out to him directly.
> >>
> >> +Becket
> >> Hi Becket, would you mind to have a look at my proposal (link is in the
> >> first message) ?
> >>
> >> A.
> >>
> >> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford  wrote:
> >>
> >> > LinkedIn had something like this. Becket did a talk on it a few years
> >> ago.
> >> > It would be interesting to know what became of it and if there were
> >> lessons
> >> > learned.
> >> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
> >> >
> >> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> >> > sibirya...@scrapinghub.com> wrote:
> >> >
> >> > > Hello,
> >> > >
> >> > > I would like to get your opinions on this KIP idea.
> >> > >
> >> > > In short it will allow to transfer messages of bigger size than
> >> allowed
> >> > by
> >> > > the broker.
> >> > >
> >> > >
> >> > >
> >> >
> >>
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> >> > >
> >> > > If all that makes sense, I'll create a full fledged KIP document and
> >> > expand
> >> > > the idea.
> >> > >
> >> > > Thanks,
> >> > > A.
> >> > >
> >> >
> >> >
> >> > --
> >> >
> >> > Ben Stopford
> >> >
> >>
> >
> >
> > --
> > Gwen Shapira
> > Engineering Manager | Confluent
> > 650.450.2760 | @gwenshap
> > Follow us: Twitter | blog
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-10 Thread Gwen Shapira
There is another option of doing the splitting on the server and hiding
this from the clients. My personal (and highly controversial) take is that
Kafka clients could use less complexity rather than more. They are
incredibly difficult to reason about as is.  But maybe this
splitting/merging won't be that bad - multi-part messages are well
understood in general.

On Thu, Sep 10, 2020 at 7:51 AM Gwen Shapira  wrote:

> There is also another approach (harder to design, but may be easier to use
> and maintain), which is to make Kafka handle large messages better and
> allow users to set higher limits - for example, can Kafka provide really
> high throughput with 1GB messages? Some systems do it well.
>
> I don't know where the slowdowns happen, but perhaps it is one of these?
> 1. Java GC used to be a problem, but maybe we didn't try with newer GC and
> simple tuning will solve it?
> 2. We have head-of-line blocking issue on the queue. There are approaches
> to solve that too.
>
> I'd love to see more exploration on what exactly is the problem we are
> facing (and is it still an issue? Becket's work is a few years old now.)
>
> On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
> sibirya...@scrapinghub.com> wrote:
>
>> Hey Ben, thanks for the link. My proposal is partially based on Becket's
>> ideas, but I haven't reached out to him directly.
>>
>> +Becket
>> Hi Becket, would you mind to have a look at my proposal (link is in the
>> first message) ?
>>
>> A.
>>
>> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford  wrote:
>>
>> > LinkedIn had something like this. Becket did a talk on it a few years
>> ago.
>> > It would be interesting to know what became of it and if there were
>> lessons
>> > learned.
>> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
>> >
>> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
>> > sibirya...@scrapinghub.com> wrote:
>> >
>> > > Hello,
>> > >
>> > > I would like to get your opinions on this KIP idea.
>> > >
>> > > In short it will allow to transfer messages of bigger size than
>> allowed
>> > by
>> > > the broker.
>> > >
>> > >
>> > >
>> >
>> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
>> > >
>> > > If all that makes sense, I'll create a full fledged KIP document and
>> > expand
>> > > the idea.
>> > >
>> > > Thanks,
>> > > A.
>> > >
>> >
>> >
>> > --
>> >
>> > Ben Stopford
>> >
>>
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-10 Thread Gwen Shapira
There is also another approach (harder to design, but may be easier to use
and maintain), which is to make Kafka handle large messages better and
allow users to set higher limits - for example, can Kafka provide really
high throughput with 1GB messages? Some systems do it well.

I don't know where the slowdowns happen, but perhaps it is one of these?
1. Java GC used to be a problem, but maybe we didn't try with newer GC and
simple tuning will solve it?
2. We have head-of-line blocking issue on the queue. There are approaches
to solve that too.

I'd love to see more exploration on what exactly is the problem we are
facing (and is it still an issue? Becket's work is a few years old now.)

On Thu, Sep 10, 2020 at 12:21 AM Alexander Sibiryakov <
sibirya...@scrapinghub.com> wrote:

> Hey Ben, thanks for the link. My proposal is partially based on Becket's
> ideas, but I haven't reached out to him directly.
>
> +Becket
> Hi Becket, would you mind to have a look at my proposal (link is in the
> first message) ?
>
> A.
>
> On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford  wrote:
>
> > LinkedIn had something like this. Becket did a talk on it a few years
> ago.
> > It would be interesting to know what became of it and if there were
> lessons
> > learned.
> > https://www.youtube.com/watch?v=ZrbaXDYUZY8
> >
> > On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> > sibirya...@scrapinghub.com> wrote:
> >
> > > Hello,
> > >
> > > I would like to get your opinions on this KIP idea.
> > >
> > > In short it will allow to transfer messages of bigger size than allowed
> > by
> > > the broker.
> > >
> > >
> > >
> >
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> > >
> > > If all that makes sense, I'll create a full fledged KIP document and
> > expand
> > > the idea.
> > >
> > > Thanks,
> > > A.
> > >
> >
> >
> > --
> >
> > Ben Stopford
> >
>


-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-10 Thread Alexander Sibiryakov
Hey Ben, thanks for the link. My proposal is partially based on Becket's
ideas, but I haven't reached out to him directly.

+Becket
Hi Becket, would you mind to have a look at my proposal (link is in the
first message) ?

A.

On Tue, Sep 8, 2020 at 12:35 PM Ben Stopford  wrote:

> LinkedIn had something like this. Becket did a talk on it a few years ago.
> It would be interesting to know what became of it and if there were lessons
> learned.
> https://www.youtube.com/watch?v=ZrbaXDYUZY8
>
> On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
> sibirya...@scrapinghub.com> wrote:
>
> > Hello,
> >
> > I would like to get your opinions on this KIP idea.
> >
> > In short it will allow to transfer messages of bigger size than allowed
> by
> > the broker.
> >
> >
> >
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
> >
> > If all that makes sense, I'll create a full fledged KIP document and
> expand
> > the idea.
> >
> > Thanks,
> > A.
> >
>
>
> --
>
> Ben Stopford
>


Re: [DISCUSS] KIP idea: Support of multipart messages

2020-09-08 Thread Ben Stopford
LinkedIn had something like this. Becket did a talk on it a few years ago.
It would be interesting to know what became of it and if there were lessons
learned.
https://www.youtube.com/watch?v=ZrbaXDYUZY8

On Fri, 4 Sep 2020 at 08:17, Alexander Sibiryakov <
sibirya...@scrapinghub.com> wrote:

> Hello,
>
> I would like to get your opinions on this KIP idea.
>
> In short it will allow to transfer messages of bigger size than allowed by
> the broker.
>
>
> https://docs.google.com/document/d/1cKBNxPkVVENly9YczYXsVDVWwrCdRq3G_cja5s2QDQg/edit?usp=sharing
>
> If all that makes sense, I'll create a full fledged KIP document and expand
> the idea.
>
> Thanks,
> A.
>


-- 

Ben Stopford