Re: Request to be added in contributors list of Apache Kafka

2021-10-05 Thread Matthias J. Sax

No problem :)

You should be all set.

On 10/5/21 7:18 PM, Nandini Nelson wrote:

Hello Matthias,

My username is nandininelson.
Sorry I forgot to add it in the initial message and later replied to the
automated message from ezmlm program.

Regards,
Nandini


On Tue, Oct 5, 2021 at 11:44 AM Matthias J. Sax  wrote:


Did you already create a JIRA account? (It's self-service.)

After you have created your account, please share your user name so we
can grant you contributor permissions.


-Matthias

On 10/4/21 8:47 PM, Nandini Nelson wrote:

Hello Kafka Community,

I would like to contribute to Apache Kafka and want to pick up some

issues.

I read here
<

https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka


that
I need to be added to the contributors list in order to start working on

a

jira.
Request you to add me to the list.

Thanks & Regards,
Nandini







Re: Request to be added in contributors list of Apache Kafka

2021-10-05 Thread Nandini Nelson
Hello Matthias,

My username is nandininelson.
Sorry I forgot to add it in the initial message and later replied to the
automated message from ezmlm program.

Regards,
Nandini


On Tue, Oct 5, 2021 at 11:44 AM Matthias J. Sax  wrote:

> Did you already create a JIRA account? (It's self-service.)
>
> After you have created your account, please share your user name so we
> can grant you contributor permissions.
>
>
> -Matthias
>
> On 10/4/21 8:47 PM, Nandini Nelson wrote:
> > Hello Kafka Community,
> >
> > I would like to contribute to Apache Kafka and want to pick up some
> issues.
> > I read here
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka
> >
> > that
> > I need to be added to the contributors list in order to start working on
> a
> > jira.
> > Request you to add me to the list.
> >
> > Thanks & Regards,
> > Nandini
> >
>


Re: kafka streams commit.interval.ms for at-least-once too high

2021-10-05 Thread Matthias J. Sax

- By producer config, i hope you mean batching and other settings that will
hold off producing of events. Correct me if i'm wrong


Correct.


- Not sure what you mean by throughput here, which configuration would
dictate that?


I referred to input topic throughput. If you have higher/lower 
throughput you might get data quicker/later depending on your producer 
configs.



- Do you mean here that the kafka streams internally handles waiting on
processing and offset commits of events that are already consumed and being
processed for streams instance?


If a rebalance/shutdown is triggered, Kafka Streams will stop processing 
new records and just finish processing all in-flight records. 
Afterwards, a commit happens right away for all fully processed records.



-Matthias


On 10/5/21 8:35 AM, Pushkar Deole wrote:

Matthias,

On your response "For at-least-once, you would still get output
continuously, depending on throughput and producer configs"
- Not sure what you mean by throughput here, which configuration would
dictate that?
- By producer config, i hope you mean batching and other settings that will
hold off producing of events. Correct me if i'm wrong

On your response "For regular rebalances/restarts, a longer commit interval
has no impact because offsets would be committed right away"
- Do you mean here that the kafka streams internally handles waiting on
processing and offset commits of events that are already consumed and being
processed for streams instance?

On Tue, Oct 5, 2021 at 11:43 AM Matthias J. Sax  wrote:


The main motivation for a shorter commit interval for EOS is
end-to-end-latency. A Topology could consist of multiple sub-topologies
and the end-to-end-latency for the EOS case is roughly commit-interval
times number-of-subtopologies.

For regular rebalances/restarts, a longer commit interval has no impact,
because for a regular rebalance/restart, offsets would be committed
right away to guarantee a clean hand-off. Only in case of failure, a
longer commit interval can lead to larger amount of duplicates (of
course only for at-least-once guarantees).

For at-least-once, you would still get output continuously, depending on
throughput and producer configs. Only offsets are committed each 30
seconds by default. This continuous output is also the reason why there
is not latency impact for at-least-once using a longer commit interval.

Beside an impact on latency, there is also a throughput impact. Using a
longer commit interval provides higher throughput.


-Matthias


On 10/4/21 7:31 AM, Pushkar Deole wrote:

Hi All,

I am looking into the commit.interval.ms in kafka streams which says

that

it is the time interval at which streams would commit offsets to source
topics.
However for exactly-once guarantee, default of this time is 100ms whereas
for at-least-once it is 3ms (i.e. 30sec)
Why is there such a huge time difference between the 2 guarantees and

what

does it mean to have this interval as high as 30 seconds, does it also
cause more probability of higher no. of duplicates in case of application
restarts or partition rebalance ?
Does it mean that the streams application would also publish events to
destination topic only at this interval which means delay in publishing
events to destinations topic ?







Re: kafka streams commit.interval.ms for at-least-once too high

2021-10-05 Thread Pushkar Deole
Matthias,

On your response "For at-least-once, you would still get output
continuously, depending on throughput and producer configs"
- Not sure what you mean by throughput here, which configuration would
dictate that?
- By producer config, i hope you mean batching and other settings that will
hold off producing of events. Correct me if i'm wrong

On your response "For regular rebalances/restarts, a longer commit interval
has no impact because offsets would be committed right away"
- Do you mean here that the kafka streams internally handles waiting on
processing and offset commits of events that are already consumed and being
processed for streams instance?

On Tue, Oct 5, 2021 at 11:43 AM Matthias J. Sax  wrote:

> The main motivation for a shorter commit interval for EOS is
> end-to-end-latency. A Topology could consist of multiple sub-topologies
> and the end-to-end-latency for the EOS case is roughly commit-interval
> times number-of-subtopologies.
>
> For regular rebalances/restarts, a longer commit interval has no impact,
> because for a regular rebalance/restart, offsets would be committed
> right away to guarantee a clean hand-off. Only in case of failure, a
> longer commit interval can lead to larger amount of duplicates (of
> course only for at-least-once guarantees).
>
> For at-least-once, you would still get output continuously, depending on
> throughput and producer configs. Only offsets are committed each 30
> seconds by default. This continuous output is also the reason why there
> is not latency impact for at-least-once using a longer commit interval.
>
> Beside an impact on latency, there is also a throughput impact. Using a
> longer commit interval provides higher throughput.
>
>
> -Matthias
>
>
> On 10/4/21 7:31 AM, Pushkar Deole wrote:
> > Hi All,
> >
> > I am looking into the commit.interval.ms in kafka streams which says
> that
> > it is the time interval at which streams would commit offsets to source
> > topics.
> > However for exactly-once guarantee, default of this time is 100ms whereas
> > for at-least-once it is 3ms (i.e. 30sec)
> > Why is there such a huge time difference between the 2 guarantees and
> what
> > does it mean to have this interval as high as 30 seconds, does it also
> > cause more probability of higher no. of duplicates in case of application
> > restarts or partition rebalance ?
> > Does it mean that the streams application would also publish events to
> > destination topic only at this interval which means delay in publishing
> > events to destinations topic ?
> >
>


Re: question about mm2 on consumer group offset mirroring

2021-10-05 Thread Calvin Chen

Hi Ryanne

Thanks for the reply, it helps, thank you very much!

-Calvin

> 在 2021年9月30日,22:52,Ryanne Dolan  写道:
> 
> Hey Calvin, the property you're looking for is
> emit.checkpoint.interval.seconds. That's how often MM will write
> checkpoints, which includes consumer group offsets.
> 
> Ryanne
> 
>> On Thu, Sep 30, 2021, 9:18 AM Calvin Chen  wrote:
>> 
>> Hi all
>> 
>> I have a question about the mirror make 2, on the consumer group offset
>> mirroring, what is the duration for mm2 to detect consumer group offset
>> change and mirror it to remote kafka consumer group?
>> 
>> I have my mm2 code define as below:
>> 
>> 
>> {{ kafka01_name }}->{{ kafka02_name }}.sync.group.offsets.enabled = true
>> {{ kafka02_name }}->{{ kafka01_name }}.sync.group.offsets.enabled = true
>> 
>> refresh.topics.interval.seconds=10
>> refresh.groups.interval.seconds=10
>> 
>> so I would expect the consumer group offset mirroring would happen every
>> around 10 second, but during test, I see sometime consumer group offset
>> mirroring are quick, sometimes it takes minutes, so I would like to know
>> how is offset mirrored and why there is time difference, thanks
>> 
>> -Calvin
>> 


Re: Request to be added in contributors list of Apache Kafka

2021-10-05 Thread Matthias J. Sax

Did you already create a JIRA account? (It's self-service.)

After you have created your account, please share your user name so we 
can grant you contributor permissions.



-Matthias

On 10/4/21 8:47 PM, Nandini Nelson wrote:

Hello Kafka Community,

I would like to contribute to Apache Kafka and want to pick up some issues.
I read here

that
I need to be added to the contributors list in order to start working on a
jira.
Request you to add me to the list.

Thanks & Regards,
Nandini



Re: kafka streams commit.interval.ms for at-least-once too high

2021-10-05 Thread Matthias J. Sax
The main motivation for a shorter commit interval for EOS is 
end-to-end-latency. A Topology could consist of multiple sub-topologies 
and the end-to-end-latency for the EOS case is roughly commit-interval 
times number-of-subtopologies.


For regular rebalances/restarts, a longer commit interval has no impact, 
because for a regular rebalance/restart, offsets would be committed 
right away to guarantee a clean hand-off. Only in case of failure, a 
longer commit interval can lead to larger amount of duplicates (of 
course only for at-least-once guarantees).


For at-least-once, you would still get output continuously, depending on 
throughput and producer configs. Only offsets are committed each 30 
seconds by default. This continuous output is also the reason why there 
is not latency impact for at-least-once using a longer commit interval.


Beside an impact on latency, there is also a throughput impact. Using a 
longer commit interval provides higher throughput.



-Matthias


On 10/4/21 7:31 AM, Pushkar Deole wrote:

Hi All,

I am looking into the commit.interval.ms in kafka streams which says that
it is the time interval at which streams would commit offsets to source
topics.
However for exactly-once guarantee, default of this time is 100ms whereas
for at-least-once it is 3ms (i.e. 30sec)
Why is there such a huge time difference between the 2 guarantees and what
does it mean to have this interval as high as 30 seconds, does it also
cause more probability of higher no. of duplicates in case of application
restarts or partition rebalance ?
Does it mean that the streams application would also publish events to
destination topic only at this interval which means delay in publishing
events to destinations topic ?