RE: [QUESTION] What to do about conflicted KIP numbers?

2024-06-17 Thread Welch, Matt
Thanks for the input Matthias,

I guess we will keep things as they are to prevent yet another layer added on 
top.
Eric has already taken the next available number and it's recorded on the wiki 
so I suppose there's no additional work required here.

Regards,
-Matt

-Original Message-
From: Matthias J. Sax  
Sent: Friday, June 14, 2024 3:35 PM
To: dev@kafka.apache.org
Subject: Re: [QUESTION] What to do about conflicted KIP numbers?

I don't think that there is an official guideline.

Personally, I would suggest that the corresponding KIP owners agree who is 
keeping the conflicting number, and how is changing it.

For the ones changing the number, I would propose to restart a new DISCUSS 
thread using the new number to separate the KIP threads.

Not sure if the is a better way to handle this... Just an idea on how I would 
do it.


Not sure if we can improve the wiki instruction to make the race 
condition less likely? Seems, this would happen if two people look at 
the next KIP number let's say X, but don't bump it right way to X+1 and 
publish their KIP with X a few hours/days later without verifying that X 
is still next available KIP number?


-Matthias


On 6/14/24 3:10 PM, Welch, Matt wrote:
> Hi Kafka devs,
> 
> I submitted a KIP last week and encountered a KIP-process race condition 
> where my KIP number was consumed by another dev without updating the wiki 
> page containing KIPs: Kafka Improvement Proposals - Apache Kafka - Apache 
> Software 
> Foundation<https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals>
> 
> There are now least three separate dev-list threads referencing this 
> conflicted KIP number so I'm concerned that discussion around this number 
> will now be permanently confusing due to the conflict and multiple concurrent 
> unrelated threads referencing the same KIP number.  I've intentionally kept 
> the KIP numbers out of this email to prevent yet another thread referencing 
> them.
> 
> While I'm happy to keep going with my existing KIP number, I was wondering if 
> I should "abandon" it and create a new one.
> This solution seems like it could create extra confusion, however, so what is 
> the best course of action here?
> 
> Thanks,
> Matt
> 


Re: [QUESTION] What to do about conflicted KIP numbers?

2024-06-14 Thread Matthias J. Sax

I don't think that there is an official guideline.

Personally, I would suggest that the corresponding KIP owners agree who 
is keeping the conflicting number, and how is changing it.


For the ones changing the number, I would propose to restart a new 
DISCUSS thread using the new number to separate the KIP threads.


Not sure if the is a better way to handle this... Just an idea on how I 
would do it.



Not sure if we can improve the wiki instruction to make the race 
condition less likely? Seems, this would happen if two people look at 
the next KIP number let's say X, but don't bump it right way to X+1 and 
publish their KIP with X a few hours/days later without verifying that X 
is still next available KIP number?



-Matthias


On 6/14/24 3:10 PM, Welch, Matt wrote:

Hi Kafka devs,

I submitted a KIP last week and encountered a KIP-process race condition where my KIP 
number was consumed by another dev without updating the wiki page containing KIPs: 
Kafka Improvement Proposals - Apache Kafka - Apache Software 
Foundation

There are now least three separate dev-list threads referencing this conflicted 
KIP number so I'm concerned that discussion around this number will now be 
permanently confusing due to the conflict and multiple concurrent unrelated 
threads referencing the same KIP number.  I've intentionally kept the KIP 
numbers out of this email to prevent yet another thread referencing them.

While I'm happy to keep going with my existing KIP number, I was wondering if I should 
"abandon" it and create a new one.
This solution seems like it could create extra confusion, however, so what is 
the best course of action here?

Thanks,
Matt



Re: [Question] About Kafka producer design decision making

2024-01-02 Thread Ismael Juma
I should also clarify that Chia-Ping  took it over from me originally and
improved the proposal significantly. I think he also got busy and hasn't
been able to spend time on it for a while though.

Ismael

On Tue, Jan 2, 2024 at 4:08 PM Ismael Juma  wrote:

> I had written a KIP a while back:
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100829459
>
> Happy for someone to pick it up and drive it forward.
>
> Ismael
>
> On Tue, Jan 2, 2024 at 2:00 PM Justine Olshan 
> wrote:
>
>> Hey folks --
>>  I think this is a good conversation. Given we plan to drop support for
>> Java 8 in 4.0 this seems like a good time to consider this change. Perhaps
>> we should file a JIRA ticket and maybe create a KIP to discuss?
>>
>> One thing we should consider however, is if we want some of these
>> operations to be asynchronous. I know I got into some tricky areas of
>> Kafka
>> recently that assumed operations completed in a given order so it is good
>> to confirm such changes are safe.
>>
>> Looking forward to further discussion,
>> Justine
>>
>> On Tue, Jan 2, 2024 at 1:14 PM Philip Nee  wrote:
>>
>> > hey sean - a lot of uses of the Futures are in the public API and
>> therefore
>> > take voting/effort to be changed.  i don't know any reason for
>> > intentionally avoiding the use of CompletableFuture, however, others
>> might
>> > know more than I do.
>> >
>> > thanks,
>> > P
>> >
>> > On Tue, Nov 14, 2023 at 1:27 AM 신수웅(Sean Sin) 
>> > wrote:
>> >
>> > > Dear Apache Kakfa Developers,
>> > >
>> > > I'm 4-year SWE in South Korea.
>> > > I have some questions while watching Kafka Producer API.
>> > >
>> > > *Why Use "Future" and Not "CompletableFuture"?*
>> > >
>> > > In the case of "Future", blocking occurs when calling "*get()*", so I
>> > > thought "Computable Future" would be better when doing more
>> asynchronous
>> > > operations.
>> > >
>> > > I looked at the Java API document
>> > > <
>> > >
>> >
>> https://kafka.apache.org/36/javadoc/org/apache/kafka/common/KafkaFuture.html#thenApply(org.apache.kafka.common.KafkaFuture.BaseFunction)
>> > > >
>> > > based on the latest version, version 3.6.x.
>> > >
>> > > If you look at that version, you can see that the Future object
>> provides
>> > > the "toCompletionStage() "method, which can convert "KafkaFuture" to
>> > > "ComputableFuture".
>> > >
>> > > In response to this, I think that in the initial design decision
>> process,
>> > > we considered compatibility issues under JDK 1.8 and the level of
>> > knowledge
>> > > of the learning curve or developer when introducing ComputableFuture,
>> > but I
>> > > wonder if this is correct.
>> > >
>> > > In addition, I wonder if it is recommended to use the
>> > "toCompletionStage()"
>> > > method to produce more non-blocking if we assume JDK 1.8 or higher.
>> > >
>> > > Thanks.
>> > > Su-Ung Shin.
>> > >
>> >
>>
>


Re: [Question] About Kafka producer design decision making

2024-01-02 Thread Ismael Juma
I had written a KIP a while back:

https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=100829459

Happy for someone to pick it up and drive it forward.

Ismael

On Tue, Jan 2, 2024 at 2:00 PM Justine Olshan 
wrote:

> Hey folks --
>  I think this is a good conversation. Given we plan to drop support for
> Java 8 in 4.0 this seems like a good time to consider this change. Perhaps
> we should file a JIRA ticket and maybe create a KIP to discuss?
>
> One thing we should consider however, is if we want some of these
> operations to be asynchronous. I know I got into some tricky areas of Kafka
> recently that assumed operations completed in a given order so it is good
> to confirm such changes are safe.
>
> Looking forward to further discussion,
> Justine
>
> On Tue, Jan 2, 2024 at 1:14 PM Philip Nee  wrote:
>
> > hey sean - a lot of uses of the Futures are in the public API and
> therefore
> > take voting/effort to be changed.  i don't know any reason for
> > intentionally avoiding the use of CompletableFuture, however, others
> might
> > know more than I do.
> >
> > thanks,
> > P
> >
> > On Tue, Nov 14, 2023 at 1:27 AM 신수웅(Sean Sin) 
> > wrote:
> >
> > > Dear Apache Kakfa Developers,
> > >
> > > I'm 4-year SWE in South Korea.
> > > I have some questions while watching Kafka Producer API.
> > >
> > > *Why Use "Future" and Not "CompletableFuture"?*
> > >
> > > In the case of "Future", blocking occurs when calling "*get()*", so I
> > > thought "Computable Future" would be better when doing more
> asynchronous
> > > operations.
> > >
> > > I looked at the Java API document
> > > <
> > >
> >
> https://kafka.apache.org/36/javadoc/org/apache/kafka/common/KafkaFuture.html#thenApply(org.apache.kafka.common.KafkaFuture.BaseFunction)
> > > >
> > > based on the latest version, version 3.6.x.
> > >
> > > If you look at that version, you can see that the Future object
> provides
> > > the "toCompletionStage() "method, which can convert "KafkaFuture" to
> > > "ComputableFuture".
> > >
> > > In response to this, I think that in the initial design decision
> process,
> > > we considered compatibility issues under JDK 1.8 and the level of
> > knowledge
> > > of the learning curve or developer when introducing ComputableFuture,
> > but I
> > > wonder if this is correct.
> > >
> > > In addition, I wonder if it is recommended to use the
> > "toCompletionStage()"
> > > method to produce more non-blocking if we assume JDK 1.8 or higher.
> > >
> > > Thanks.
> > > Su-Ung Shin.
> > >
> >
>


Re: [Question] About Kafka producer design decision making

2024-01-02 Thread Justine Olshan
Hey folks --
 I think this is a good conversation. Given we plan to drop support for
Java 8 in 4.0 this seems like a good time to consider this change. Perhaps
we should file a JIRA ticket and maybe create a KIP to discuss?

One thing we should consider however, is if we want some of these
operations to be asynchronous. I know I got into some tricky areas of Kafka
recently that assumed operations completed in a given order so it is good
to confirm such changes are safe.

Looking forward to further discussion,
Justine

On Tue, Jan 2, 2024 at 1:14 PM Philip Nee  wrote:

> hey sean - a lot of uses of the Futures are in the public API and therefore
> take voting/effort to be changed.  i don't know any reason for
> intentionally avoiding the use of CompletableFuture, however, others might
> know more than I do.
>
> thanks,
> P
>
> On Tue, Nov 14, 2023 at 1:27 AM 신수웅(Sean Sin) 
> wrote:
>
> > Dear Apache Kakfa Developers,
> >
> > I'm 4-year SWE in South Korea.
> > I have some questions while watching Kafka Producer API.
> >
> > *Why Use "Future" and Not "CompletableFuture"?*
> >
> > In the case of "Future", blocking occurs when calling "*get()*", so I
> > thought "Computable Future" would be better when doing more asynchronous
> > operations.
> >
> > I looked at the Java API document
> > <
> >
> https://kafka.apache.org/36/javadoc/org/apache/kafka/common/KafkaFuture.html#thenApply(org.apache.kafka.common.KafkaFuture.BaseFunction)
> > >
> > based on the latest version, version 3.6.x.
> >
> > If you look at that version, you can see that the Future object provides
> > the "toCompletionStage() "method, which can convert "KafkaFuture" to
> > "ComputableFuture".
> >
> > In response to this, I think that in the initial design decision process,
> > we considered compatibility issues under JDK 1.8 and the level of
> knowledge
> > of the learning curve or developer when introducing ComputableFuture,
> but I
> > wonder if this is correct.
> >
> > In addition, I wonder if it is recommended to use the
> "toCompletionStage()"
> > method to produce more non-blocking if we assume JDK 1.8 or higher.
> >
> > Thanks.
> > Su-Ung Shin.
> >
>


Re: [Question] About Kafka producer design decision making

2024-01-02 Thread Philip Nee
hey sean - a lot of uses of the Futures are in the public API and therefore
take voting/effort to be changed.  i don't know any reason for
intentionally avoiding the use of CompletableFuture, however, others might
know more than I do.

thanks,
P

On Tue, Nov 14, 2023 at 1:27 AM 신수웅(Sean Sin)  wrote:

> Dear Apache Kakfa Developers,
>
> I'm 4-year SWE in South Korea.
> I have some questions while watching Kafka Producer API.
>
> *Why Use "Future" and Not "CompletableFuture"?*
>
> In the case of "Future", blocking occurs when calling "*get()*", so I
> thought "Computable Future" would be better when doing more asynchronous
> operations.
>
> I looked at the Java API document
> <
> https://kafka.apache.org/36/javadoc/org/apache/kafka/common/KafkaFuture.html#thenApply(org.apache.kafka.common.KafkaFuture.BaseFunction)
> >
> based on the latest version, version 3.6.x.
>
> If you look at that version, you can see that the Future object provides
> the "toCompletionStage() "method, which can convert "KafkaFuture" to
> "ComputableFuture".
>
> In response to this, I think that in the initial design decision process,
> we considered compatibility issues under JDK 1.8 and the level of knowledge
> of the learning curve or developer when introducing ComputableFuture, but I
> wonder if this is correct.
>
> In addition, I wonder if it is recommended to use the "toCompletionStage()"
> method to produce more non-blocking if we assume JDK 1.8 or higher.
>
> Thanks.
> Su-Ung Shin.
>


Re: question

2023-09-22 Thread Luke Chen
Hi 殿杰

In short, we don't support it now.
But welcome to submit a PR to fix the gap.
You can check this ticket for more information:
https://issues.apache.org/jira/browse/KAFKA-7025

Thanks.
Luke

On Sat, Sep 23, 2023 at 2:14 AM shidian...@mxnavi.com 
wrote:

>
> hello,
>
> I‘m working inKafika development. Now,I have a question. Does Kafka
> support Android client?
>
>
>
>
> 石殿杰
> 技术中心
> 邮箱:shidian...@mxnavi.com
> 电话:18341724011
>


Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-09 Thread tison
Thanks for your reply!

I may not use "normalization". What I want to refer to is:

appendInfo.setLastOffset(offset.value - 1)

which underneath updates the base offset field (in record batch) but not
the offset delta of each record.

Best,
tison.


Justine Olshan  于2023年8月8日周二 00:43写道:

> The sequence summary looks right to me.
> For log normalization, are you referring to compaction? The segment's first
> and last offsets might change, but a batch keeps its offsets when
> compaction occurs.
>
> Hope that helps.
> Justine
>
> On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax  wrote:
>
> > > but the base offset may change during log normalizing.
> >
> > Not sure what you mean by "normalization" but offsets are immutable, so
> > they don't change. (To be fair, I am not an expert on brokers, so not
> > sure how this work in detail when log compaction ticks in).
> >
> > > This field is given by the producer and the broker should only read it.
> >
> > Sounds right. The point being is, that the broker has an "expected"
> > value for it, and if the provided value does not match the expected one,
> > the write is rejected to begin with.
> >
> >
> > -Matthias
> >
> > On 8/7/23 6:35 AM, tison wrote:
> > > Hi Matthias and Justine,
> > >
> > > Thanks for your reply!
> > >
> > > I can summarize the answer as -
> > >
> > > Record offset = base offset + offset delta. This field is calculated by
> > the
> > > broker and the delta won't change but the base offset may change during
> > log
> > > normalizing.
> > > Record sequence = base sequence + (offset) delta. This field is given
> by
> > > the producer and the broker should only read it.
> > >
> > > Is it correct?
> > >
> > > I implement the manipulation part of base offset following this
> > > understanding at [1].
> > >
> > > Best,
> > > tison.
> > >
> > > [1]
> > >
> >
> https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394
> > >
> > >
> > > Justine Olshan  于2023年8月2日周三 04:19写道:
> > >
> > >> For what it's worth -- the sequence number is not calculated
> > >> "baseOffset/baseSequence + offset delta" but rather by monotonically
> > >> increasing for a given epoch. If the epoch is bumped, we reset back to
> > >> zero.
> > >> This may mean that the offset and sequence may match, but do not
> > strictly
> > >> need to be the same. The sequence number will also always come from
> the
> > >> client and be in the produce records sent to the Kafka broker.
> > >>
> > >> As for offsets, there is some code in the log layer that maintains the
> > log
> > >> end offset and assigns offsets to the records. The produce handling on
> > the
> > >> leader should typically assign the offset.
> > >> I believe you can find that code here:
> > >>
> > >>
> >
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
> > >>
> > >> Justine
> > >>
> > >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax 
> > wrote:
> > >>
> > >>> The _offset_ is the position of the record in the partition.
> > >>>
> > >>> The _sequence number_ is a unique ID that allows broker to
> de-duplicate
> > >>> messages. It requires the producer to implement the idempotency
> > protocol
> > >>> (part of Kafka transactions); thus, sequence numbers are optional and
> > as
> > >>> long as you don't want to support idempotent writes, you don't need
> to
> > >>> worry about them. (If you want to dig into details, checkout KIP-98
> > that
> > >>> is the original KIP about Kafka TX).
> > >>>
> > >>> HTH,
> > >>> -Matthias
> > >>>
> > >>> On 8/1/23 2:19 AM, tison wrote:
> >  Hi,
> > 
> >  I'm wringing a Kafka API Rust codec library[1] to understand how
> Kafka
> >  models its concepts and how the core business logic works.
> > 
> >  During implementing the codec for Records[2], I saw a twins of
> fields
> >  "sequence" and "offset". Both of them are calculated by
> >  baseOffset/baseSequence + offset delta. Then I'm a bit confused how
> to
> > >>> deal
> >  with them properly - what's the difference between these two
> concepts
> >  logically?
> > 
> >  Also, to understand how the core business logic works, I write a
> > simple
> >  server based on my codec library, and observe that the server may
> need
> > >> to
> >  update offset for records produced. How does Kafka set the correct
> > >> offset
> >  for each produced records? And how does Kafka maintain the
> calculation
> > >>> for
> >  offset and sequence during these modifications?
> > 
> >  I'll appreciate if anyone can answer the question or give some
> > insights
> > >>> :D
> > 
> >  Best,
> >  tison.
> > 
> >  [1] https://github.com/tisonkun/kafka-api
> >  [2] https://kafka.apache.org/documentation/#messageformat
> > 
> > >>>
> > >>
> > >
> >
>


Re: [QUESTION] about topic dimension quota

2023-08-08 Thread hudeqi
In fact, I have implemented the bytesIn/bytesOut limit of the topic dimension. 
I don't know the community's attitude towards this feature, so I don't know if 
I need to propose a KIP to contribute.

best,
hudeqi


 -原始邮件-
 发件人: hudeqi 16120...@bjtu.edu.cn
 发送时间: 2023-08-08 21:10:39 (星期二)
 收件人: dev@kafka.apache.org
 抄送: 
 主题: [QUESTION] about topic dimension quota
 


Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread Justine Olshan
The sequence summary looks right to me.
For log normalization, are you referring to compaction? The segment's first
and last offsets might change, but a batch keeps its offsets when
compaction occurs.

Hope that helps.
Justine

On Mon, Aug 7, 2023 at 8:59 AM Matthias J. Sax  wrote:

> > but the base offset may change during log normalizing.
>
> Not sure what you mean by "normalization" but offsets are immutable, so
> they don't change. (To be fair, I am not an expert on brokers, so not
> sure how this work in detail when log compaction ticks in).
>
> > This field is given by the producer and the broker should only read it.
>
> Sounds right. The point being is, that the broker has an "expected"
> value for it, and if the provided value does not match the expected one,
> the write is rejected to begin with.
>
>
> -Matthias
>
> On 8/7/23 6:35 AM, tison wrote:
> > Hi Matthias and Justine,
> >
> > Thanks for your reply!
> >
> > I can summarize the answer as -
> >
> > Record offset = base offset + offset delta. This field is calculated by
> the
> > broker and the delta won't change but the base offset may change during
> log
> > normalizing.
> > Record sequence = base sequence + (offset) delta. This field is given by
> > the producer and the broker should only read it.
> >
> > Is it correct?
> >
> > I implement the manipulation part of base offset following this
> > understanding at [1].
> >
> > Best,
> > tison.
> >
> > [1]
> >
> https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394
> >
> >
> > Justine Olshan  于2023年8月2日周三 04:19写道:
> >
> >> For what it's worth -- the sequence number is not calculated
> >> "baseOffset/baseSequence + offset delta" but rather by monotonically
> >> increasing for a given epoch. If the epoch is bumped, we reset back to
> >> zero.
> >> This may mean that the offset and sequence may match, but do not
> strictly
> >> need to be the same. The sequence number will also always come from the
> >> client and be in the produce records sent to the Kafka broker.
> >>
> >> As for offsets, there is some code in the log layer that maintains the
> log
> >> end offset and assigns offsets to the records. The produce handling on
> the
> >> leader should typically assign the offset.
> >> I believe you can find that code here:
> >>
> >>
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
> >>
> >> Justine
> >>
> >> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax 
> wrote:
> >>
> >>> The _offset_ is the position of the record in the partition.
> >>>
> >>> The _sequence number_ is a unique ID that allows broker to de-duplicate
> >>> messages. It requires the producer to implement the idempotency
> protocol
> >>> (part of Kafka transactions); thus, sequence numbers are optional and
> as
> >>> long as you don't want to support idempotent writes, you don't need to
> >>> worry about them. (If you want to dig into details, checkout KIP-98
> that
> >>> is the original KIP about Kafka TX).
> >>>
> >>> HTH,
> >>> -Matthias
> >>>
> >>> On 8/1/23 2:19 AM, tison wrote:
>  Hi,
> 
>  I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
>  models its concepts and how the core business logic works.
> 
>  During implementing the codec for Records[2], I saw a twins of fields
>  "sequence" and "offset". Both of them are calculated by
>  baseOffset/baseSequence + offset delta. Then I'm a bit confused how to
> >>> deal
>  with them properly - what's the difference between these two concepts
>  logically?
> 
>  Also, to understand how the core business logic works, I write a
> simple
>  server based on my codec library, and observe that the server may need
> >> to
>  update offset for records produced. How does Kafka set the correct
> >> offset
>  for each produced records? And how does Kafka maintain the calculation
> >>> for
>  offset and sequence during these modifications?
> 
>  I'll appreciate if anyone can answer the question or give some
> insights
> >>> :D
> 
>  Best,
>  tison.
> 
>  [1] https://github.com/tisonkun/kafka-api
>  [2] https://kafka.apache.org/documentation/#messageformat
> 
> >>>
> >>
> >
>


Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread Matthias J. Sax

but the base offset may change during log normalizing.


Not sure what you mean by "normalization" but offsets are immutable, so 
they don't change. (To be fair, I am not an expert on brokers, so not 
sure how this work in detail when log compaction ticks in).



This field is given by the producer and the broker should only read it.


Sounds right. The point being is, that the broker has an "expected" 
value for it, and if the provided value does not match the expected one, 
the write is rejected to begin with.



-Matthias

On 8/7/23 6:35 AM, tison wrote:

Hi Matthias and Justine,

Thanks for your reply!

I can summarize the answer as -

Record offset = base offset + offset delta. This field is calculated by the
broker and the delta won't change but the base offset may change during log
normalizing.
Record sequence = base sequence + (offset) delta. This field is given by
the producer and the broker should only read it.

Is it correct?

I implement the manipulation part of base offset following this
understanding at [1].

Best,
tison.

[1]
https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394


Justine Olshan  于2023年8月2日周三 04:19写道:


For what it's worth -- the sequence number is not calculated
"baseOffset/baseSequence + offset delta" but rather by monotonically
increasing for a given epoch. If the epoch is bumped, we reset back to
zero.
This may mean that the offset and sequence may match, but do not strictly
need to be the same. The sequence number will also always come from the
client and be in the produce records sent to the Kafka broker.

As for offsets, there is some code in the log layer that maintains the log
end offset and assigns offsets to the records. The produce handling on the
leader should typically assign the offset.
I believe you can find that code here:

https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766

Justine

On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax  wrote:


The _offset_ is the position of the record in the partition.

The _sequence number_ is a unique ID that allows broker to de-duplicate
messages. It requires the producer to implement the idempotency protocol
(part of Kafka transactions); thus, sequence numbers are optional and as
long as you don't want to support idempotent writes, you don't need to
worry about them. (If you want to dig into details, checkout KIP-98 that
is the original KIP about Kafka TX).

HTH,
-Matthias

On 8/1/23 2:19 AM, tison wrote:

Hi,

I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
models its concepts and how the core business logic works.

During implementing the codec for Records[2], I saw a twins of fields
"sequence" and "offset". Both of them are calculated by
baseOffset/baseSequence + offset delta. Then I'm a bit confused how to

deal

with them properly - what's the difference between these two concepts
logically?

Also, to understand how the core business logic works, I write a simple
server based on my codec library, and observe that the server may need

to

update offset for records produced. How does Kafka set the correct

offset

for each produced records? And how does Kafka maintain the calculation

for

offset and sequence during these modifications?

I'll appreciate if anyone can answer the question or give some insights

:D


Best,
tison.

[1] https://github.com/tisonkun/kafka-api
[2] https://kafka.apache.org/documentation/#messageformat









Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-07 Thread tison
Hi Matthias and Justine,

Thanks for your reply!

I can summarize the answer as -

Record offset = base offset + offset delta. This field is calculated by the
broker and the delta won't change but the base offset may change during log
normalizing.
Record sequence = base sequence + (offset) delta. This field is given by
the producer and the broker should only read it.

Is it correct?

I implement the manipulation part of base offset following this
understanding at [1].

Best,
tison.

[1]
https://github.com/tisonkun/kafka-api/blob/d080ab7e4b57c0ab0182e0b254333f400e616cd2/simplesrv/src/lib.rs#L391-L394


Justine Olshan  于2023年8月2日周三 04:19写道:

> For what it's worth -- the sequence number is not calculated
> "baseOffset/baseSequence + offset delta" but rather by monotonically
> increasing for a given epoch. If the epoch is bumped, we reset back to
> zero.
> This may mean that the offset and sequence may match, but do not strictly
> need to be the same. The sequence number will also always come from the
> client and be in the produce records sent to the Kafka broker.
>
> As for offsets, there is some code in the log layer that maintains the log
> end offset and assigns offsets to the records. The produce handling on the
> leader should typically assign the offset.
> I believe you can find that code here:
>
> https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766
>
> Justine
>
> On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax  wrote:
>
> > The _offset_ is the position of the record in the partition.
> >
> > The _sequence number_ is a unique ID that allows broker to de-duplicate
> > messages. It requires the producer to implement the idempotency protocol
> > (part of Kafka transactions); thus, sequence numbers are optional and as
> > long as you don't want to support idempotent writes, you don't need to
> > worry about them. (If you want to dig into details, checkout KIP-98 that
> > is the original KIP about Kafka TX).
> >
> > HTH,
> >-Matthias
> >
> > On 8/1/23 2:19 AM, tison wrote:
> > > Hi,
> > >
> > > I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
> > > models its concepts and how the core business logic works.
> > >
> > > During implementing the codec for Records[2], I saw a twins of fields
> > > "sequence" and "offset". Both of them are calculated by
> > > baseOffset/baseSequence + offset delta. Then I'm a bit confused how to
> > deal
> > > with them properly - what's the difference between these two concepts
> > > logically?
> > >
> > > Also, to understand how the core business logic works, I write a simple
> > > server based on my codec library, and observe that the server may need
> to
> > > update offset for records produced. How does Kafka set the correct
> offset
> > > for each produced records? And how does Kafka maintain the calculation
> > for
> > > offset and sequence during these modifications?
> > >
> > > I'll appreciate if anyone can answer the question or give some insights
> > :D
> > >
> > > Best,
> > > tison.
> > >
> > > [1] https://github.com/tisonkun/kafka-api
> > > [2] https://kafka.apache.org/documentation/#messageformat
> > >
> >
>


Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-01 Thread Justine Olshan
For what it's worth -- the sequence number is not calculated
"baseOffset/baseSequence + offset delta" but rather by monotonically
increasing for a given epoch. If the epoch is bumped, we reset back to zero.
This may mean that the offset and sequence may match, but do not strictly
need to be the same. The sequence number will also always come from the
client and be in the produce records sent to the Kafka broker.

As for offsets, there is some code in the log layer that maintains the log
end offset and assigns offsets to the records. The produce handling on the
leader should typically assign the offset.
I believe you can find that code here:
https://github.com/apache/kafka/blob/b9a45546a7918799b6fb3c0fe63b56f47d8fcba9/core/src/main/scala/kafka/log/UnifiedLog.scala#L766

Justine

On Tue, Aug 1, 2023 at 11:38 AM Matthias J. Sax  wrote:

> The _offset_ is the position of the record in the partition.
>
> The _sequence number_ is a unique ID that allows broker to de-duplicate
> messages. It requires the producer to implement the idempotency protocol
> (part of Kafka transactions); thus, sequence numbers are optional and as
> long as you don't want to support idempotent writes, you don't need to
> worry about them. (If you want to dig into details, checkout KIP-98 that
> is the original KIP about Kafka TX).
>
> HTH,
>-Matthias
>
> On 8/1/23 2:19 AM, tison wrote:
> > Hi,
> >
> > I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
> > models its concepts and how the core business logic works.
> >
> > During implementing the codec for Records[2], I saw a twins of fields
> > "sequence" and "offset". Both of them are calculated by
> > baseOffset/baseSequence + offset delta. Then I'm a bit confused how to
> deal
> > with them properly - what's the difference between these two concepts
> > logically?
> >
> > Also, to understand how the core business logic works, I write a simple
> > server based on my codec library, and observe that the server may need to
> > update offset for records produced. How does Kafka set the correct offset
> > for each produced records? And how does Kafka maintain the calculation
> for
> > offset and sequence during these modifications?
> >
> > I'll appreciate if anyone can answer the question or give some insights
> :D
> >
> > Best,
> > tison.
> >
> > [1] https://github.com/tisonkun/kafka-api
> > [2] https://kafka.apache.org/documentation/#messageformat
> >
>


Re: [QUESTION] What is the difference between sequence and offset for a Record?

2023-08-01 Thread Matthias J. Sax

The _offset_ is the position of the record in the partition.

The _sequence number_ is a unique ID that allows broker to de-duplicate 
messages. It requires the producer to implement the idempotency protocol 
(part of Kafka transactions); thus, sequence numbers are optional and as 
long as you don't want to support idempotent writes, you don't need to 
worry about them. (If you want to dig into details, checkout KIP-98 that 
is the original KIP about Kafka TX).


HTH,
  -Matthias

On 8/1/23 2:19 AM, tison wrote:

Hi,

I'm wringing a Kafka API Rust codec library[1] to understand how Kafka
models its concepts and how the core business logic works.

During implementing the codec for Records[2], I saw a twins of fields
"sequence" and "offset". Both of them are calculated by
baseOffset/baseSequence + offset delta. Then I'm a bit confused how to deal
with them properly - what's the difference between these two concepts
logically?

Also, to understand how the core business logic works, I write a simple
server based on my codec library, and observe that the server may need to
update offset for records produced. How does Kafka set the correct offset
for each produced records? And how does Kafka maintain the calculation for
offset and sequence during these modifications?

I'll appreciate if anyone can answer the question or give some insights :D

Best,
tison.

[1] https://github.com/tisonkun/kafka-api
[2] https://kafka.apache.org/documentation/#messageformat



Re: Question ❓

2023-05-10 Thread Matthias J. Sax

Partitions are not for different users.

If you want to isolate users, you would do it at the topic level. You 
could use ACLs to grant access to different topics: 
https://kafka.apache.org/documentation/#security_authz



-Matthias

On 5/9/23 11:11 AM, влад тасканов wrote:


Hi. I recently start­ed studying kafka and raised a question. Is it possible 
for each user to make a separate queue? as I understand it, there is a broker 
with different topics, and each topic had the number of partitions = the number 
of use­rs. if yes, you can link to an example or explanation. Google didn't 
help me.


Re: Question ❓

2023-05-10 Thread hudeqi
The current request queue is very single. In fact, there will be many 
performance problems when the business scenario of a single cluster becomes 
complicated. Not only to divide according to user, but also to isolate 
according to request category, this is just my idea.

best,
hudeqi


 -原始邮件-
 发件人: "влад тасканов" 
 发送时间: 2023-05-10 02:11:21 (星期三)
 收件人: dev@kafka.apache.org
 抄送: 
 主题: Question ❓
 


Re: Question: CI Pipeline in Kafka GitHub Pull Requests & Github Actions Usage

2023-05-05 Thread aaron ai
Dear David:

I greatly appreciate your welcome and the explanation that you offered
regarding the Jenkins system!

I will try to familiarize myself with the standards and contribution
process of the Kafka community, and look forward to making some
contributions in the coming days.

Best regards,
Aaron

On Thu, May 4, 2023 at 11:47 PM David Arthur
 wrote:

> Hello, Aaron and welcome to the project!
>
> If you look towards the bottom of a Pull Request there will be a section
> reporting the status of the Checks for the latest commit.
> There is a "Details" link that takes you to the Jenkins job for that PR.
>
> The default Jenkins view is the new UI (called Blue Ocean I think?). The
> "Classic" Jenkins view is also available by clicking the button near the
> top:
>
> [image: image.png]
>
> From the Classic view it's pretty straightforward to download the console
> log for the job to see what went wrong.
>
> Our build steps are defined in the Jenkinsfile
>  checked into the
> repository. We do have a compile + static analysis step that happens before
> running tests.
>
> -David
>
> On Thu, May 4, 2023 at 5:34 AM aaron ai  wrote:
>
>> Hey folks,
>>
>> I hope this email finds you well. My name is Aaron Ai, and I'm a beginner
>> interested in the Kafka community. I am eager to contribute to Kafka's
>> development and be part of this amazing project.
>>
>> However, I have noticed some confusion surrounding the CI pipeline in
>> Kafka
>> GitHub pull requests. It appears that the builds often fail, and the
>> results are not very clear to analyse. How can I determine if a pull
>> request build is okay for now, given the current CI pipeline situation?
>> Furthermore, is it a good idea to use Github Action to help with some
>> static checks? If so, I would be more than happy to contribute. By the
>> way,
>> is there any plan to migrate to Github Actions in the near future?
>>
>> I eagerly await your response and appreciate any feedback you might have
>> on
>> this matter. Thank you for your time and consideration.
>>
>> Best regards,
>> Aaron
>>
>
>
> --
> -David
>


Re: Question: CI Pipeline in Kafka GitHub Pull Requests & Github Actions Usage

2023-05-04 Thread David Arthur
Hello, Aaron and welcome to the project!

If you look towards the bottom of a Pull Request there will be a section
reporting the status of the Checks for the latest commit.
There is a "Details" link that takes you to the Jenkins job for that PR.

The default Jenkins view is the new UI (called Blue Ocean I think?). The
"Classic" Jenkins view is also available by clicking the button near the
top:

[image: image.png]

>From the Classic view it's pretty straightforward to download the console
log for the job to see what went wrong.

Our build steps are defined in the Jenkinsfile
 checked into the
repository. We do have a compile + static analysis step that happens before
running tests.

-David

On Thu, May 4, 2023 at 5:34 AM aaron ai  wrote:

> Hey folks,
>
> I hope this email finds you well. My name is Aaron Ai, and I'm a beginner
> interested in the Kafka community. I am eager to contribute to Kafka's
> development and be part of this amazing project.
>
> However, I have noticed some confusion surrounding the CI pipeline in Kafka
> GitHub pull requests. It appears that the builds often fail, and the
> results are not very clear to analyse. How can I determine if a pull
> request build is okay for now, given the current CI pipeline situation?
> Furthermore, is it a good idea to use Github Action to help with some
> static checks? If so, I would be more than happy to contribute. By the way,
> is there any plan to migrate to Github Actions in the near future?
>
> I eagerly await your response and appreciate any feedback you might have on
> this matter. Thank you for your time and consideration.
>
> Best regards,
> Aaron
>


-- 
-David


Re: question

2022-10-10 Thread deng ziming
Hello jincheng,
Kafka provides Java Producer/Consumer/Admin public api, so you can access
Kafka if you are using Java to develop an Android App even though it's not
common. for example, you can develop an Android App to get the Kafka
metadata, send bury-point event log to Kafka, however, a better solution is
to do these jobs in a server and you access your server from your App.

--
Best,
Ziming

On Tue, Oct 11, 2022 at 12:41 AM 关锦成 <865617...@qq.com.invalid> wrote:

> Does kafka support Android client?


Re: Question about the Log Compaction

2022-03-02 Thread Jun Rao
Hi, Liang,

Currently, we store the MD5 of the record key in OffsetMap. Since it has a
large domain (16 bytes), we assume there is no collision there.

Thanks,

Jun

On Wed, Mar 2, 2022 at 1:20 AM 阮良  wrote:

> Hi all
>
> I am confused about the Log Compaction logic,use OffsetMap
> to deduplicating the log.   in my opinion when there is a hash conflict ,
> data may be lost
> Eg: Record1(key1,offset1)  Record2(key2,offset2)
> Conditionhash(key1) == hash(key2)   &&  (offset1 < offset2)
> *Result  Record1 will be remove by mistake *
>
>
>- Did I misunderstand the implementation logic?please give me some
>guidance, thank you very much
>
>
> *1:OffsetMap  put logic does not deal with the hash collision, if
> hash(key1) == hash(key2)key1 will be overwrire*
>
>
>
>
> 2:the logic of retain record
>
>


Re: Question on kafka connect's incremental rebalance algorithm

2021-04-05 Thread Luke Chen
Hi Ahmed,
I think this bug KAFKA-12495
 is the issue you
described, which is under code review now.
If not, please open another JIRA ticket to track it.

Thanks.
Luke

On Tue, Apr 6, 2021 at 4:18 AM Thouqueer Ahmed <
thouqueer.ah...@maplelabs.com> wrote:

> Hi,
>   What would happen when new worker joins after the
> synchronization barrier ?
>
> As per code -> performTaskAssignment function of IncrementalAssignor ->
> Boolean canRevoke is false when it is called during the 2nd rebalance.
> Hence revocation is skipped and only assignment is done.
> This would lead to imbalance in #tasks/#connectors.
>
> How is this case handled ?
>
> Thanks,
> Ahmed
>
>


Re: Question Regarding Offset Behavior When Calling Poll()

2020-10-08 Thread Zhen Zhang
Thank you.

Regards,
Zhen Zhang
Software Engineer
[image: Twilio] 
MOBILE (949) 771-6073
EMAIL zzh...@twilio.com


On Wed, Oct 7, 2020 at 11:35 PM Matthias J. Sax  wrote:

> I guess it's a question of the client implementation and you cannot
> infer the behavior from the Java client.
>
> I would _assume_ that the offset is _not_ advanced. But the behavior is
> not define by Kafka itself, but it's up the the client implementation
> itself. Thus, only the client's docs or community can clarify.
>
> Sorry that I cannot provide a better answer.
>
> -Matthias
>
>
> On 9/25/20 1:03 AM, Zhen Zhang wrote:
> > Hi,
> >
> > Sorry for the late reply, let me clarify on this.
> >
> > I am developing using Golang so I used a library based on librdkafka, and
> > there's one function, ReadMesage(), which is a wrapper on top of the
> poll()
> > function, except that it will only poll one message(record) at a time and
> > return either one of the following,
> >
> > 1. (msg, nil) -> normal situation with no error;
> > 2. (nil, err) -> err is a Kafka timeout error;
> > 3. (nil, err) -> err is general error;
> > 4. (msg, err) -> err is partition-specific error.
> >
> > When I was browsing the javadocs
> >
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> ,
> > I noticed the concept of `Offsets and Consumer Position`, so I am
> wondering
> > that in the event of the situation 2, 3 and 4, will the position still
> > increment by 1 even if what I get from ReadMessage() is an error?
> >
> > I tried to read the docs but didn't find anything useful I can relate
> to, I
> > tried to ask for help from the contributors of the library I was using
> but
> > didn't get an answer, not sure if this question is too obvious or too
> > noobie lolol. Since that library I used is an implementation of the Kafka
> > protocol, I decided to ask it here and I sincerely hope I can get some
> > insights from a Kafka guru.
> >
> > Thanks,
> > Zhen Zhang
> > Software Engineer
> > [image: Twilio] 
> > MOBILE (949) 771-6073
> > EMAIL zzh...@twilio.com
> >
> >
> > On Wed, Sep 23, 2020 at 9:45 AM Matthias J. Sax 
> wrote:
> >
> >> I guess it depends where the exception comes from? Can you clarify?
> >>
> >> -Matthias
> >>
> >> On 9/23/20 12:53 AM, Zhen Zhang wrote:
> >>> Hi there,
> >>>
> >>> I am new to Kafka and I would like to get some clarifications for a
> >> newbie
> >>> question,
> >>>
> >>> Let's say if I have set up my consumer's "enable.auto.commit" to false,
> >> and
> >>> then poll the records one at a time. So when calling poll(), starting
> >> from
> >>> offset 0, if any exception is thrown, should I expect to get the record
> >> at
> >>> offset 0 or offset 1 when I call poll() again? The reason I'm asking
> for
> >>> this is bc in the Kafka Doc, it says that,
> >>> "The position of the consumer gives the offset of the next record that
> >> will
> >>> be given out. It will be one larger than the highest offset the
> consumer
> >>> has seen in that partition. It automatically advances every time the
> >>> consumer receives messages in a call to poll(Duration)."
> >>>
> >>> But in my described situation above, an exception is thrown, I'm not
> sure
> >>> if this is counted as a successful poll (meaning that the next poll()
> >> will
> >>> give the next record) or a failed one (meaning that the next poll()
> will
> >>> give the same record again).
> >>>
> >>> I would really appreciate it for your help.
> >>>
> >>> Thanks,
> >>> Zhen Zhang
> >>> Software Engineer
> >>> [image: Twilio] 
> >>> MOBILE (949) 771-6073
> >>> EMAIL zzh...@twilio.com
> >>>
> >>
> >
>
>


Re: Question Regarding Offset Behavior When Calling Poll()

2020-10-08 Thread Matthias J. Sax
I guess it's a question of the client implementation and you cannot
infer the behavior from the Java client.

I would _assume_ that the offset is _not_ advanced. But the behavior is
not define by Kafka itself, but it's up the the client implementation
itself. Thus, only the client's docs or community can clarify.

Sorry that I cannot provide a better answer.

-Matthias


On 9/25/20 1:03 AM, Zhen Zhang wrote:
> Hi,
> 
> Sorry for the late reply, let me clarify on this.
> 
> I am developing using Golang so I used a library based on librdkafka, and
> there's one function, ReadMesage(), which is a wrapper on top of the poll()
> function, except that it will only poll one message(record) at a time and
> return either one of the following,
> 
> 1. (msg, nil) -> normal situation with no error;
> 2. (nil, err) -> err is a Kafka timeout error;
> 3. (nil, err) -> err is general error;
> 4. (msg, err) -> err is partition-specific error.
> 
> When I was browsing the javadocs
> https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html,
> I noticed the concept of `Offsets and Consumer Position`, so I am wondering
> that in the event of the situation 2, 3 and 4, will the position still
> increment by 1 even if what I get from ReadMessage() is an error?
> 
> I tried to read the docs but didn't find anything useful I can relate to, I
> tried to ask for help from the contributors of the library I was using but
> didn't get an answer, not sure if this question is too obvious or too
> noobie lolol. Since that library I used is an implementation of the Kafka
> protocol, I decided to ask it here and I sincerely hope I can get some
> insights from a Kafka guru.
> 
> Thanks,
> Zhen Zhang
> Software Engineer
> [image: Twilio] 
> MOBILE (949) 771-6073
> EMAIL zzh...@twilio.com
> 
> 
> On Wed, Sep 23, 2020 at 9:45 AM Matthias J. Sax  wrote:
> 
>> I guess it depends where the exception comes from? Can you clarify?
>>
>> -Matthias
>>
>> On 9/23/20 12:53 AM, Zhen Zhang wrote:
>>> Hi there,
>>>
>>> I am new to Kafka and I would like to get some clarifications for a
>> newbie
>>> question,
>>>
>>> Let's say if I have set up my consumer's "enable.auto.commit" to false,
>> and
>>> then poll the records one at a time. So when calling poll(), starting
>> from
>>> offset 0, if any exception is thrown, should I expect to get the record
>> at
>>> offset 0 or offset 1 when I call poll() again? The reason I'm asking for
>>> this is bc in the Kafka Doc, it says that,
>>> "The position of the consumer gives the offset of the next record that
>> will
>>> be given out. It will be one larger than the highest offset the consumer
>>> has seen in that partition. It automatically advances every time the
>>> consumer receives messages in a call to poll(Duration)."
>>>
>>> But in my described situation above, an exception is thrown, I'm not sure
>>> if this is counted as a successful poll (meaning that the next poll()
>> will
>>> give the next record) or a failed one (meaning that the next poll() will
>>> give the same record again).
>>>
>>> I would really appreciate it for your help.
>>>
>>> Thanks,
>>> Zhen Zhang
>>> Software Engineer
>>> [image: Twilio] 
>>> MOBILE (949) 771-6073
>>> EMAIL zzh...@twilio.com
>>>
>>
> 



signature.asc
Description: OpenPGP digital signature


Re: Question Regarding Offset Behavior When Calling Poll()

2020-09-25 Thread Zhen Zhang
Hi,

Sorry for the late reply, let me clarify on this.

I am developing using Golang so I used a library based on librdkafka, and
there's one function, ReadMesage(), which is a wrapper on top of the poll()
function, except that it will only poll one message(record) at a time and
return either one of the following,

1. (msg, nil) -> normal situation with no error;
2. (nil, err) -> err is a Kafka timeout error;
3. (nil, err) -> err is general error;
4. (msg, err) -> err is partition-specific error.

When I was browsing the javadocs
https://kafka.apache.org/26/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html,
I noticed the concept of `Offsets and Consumer Position`, so I am wondering
that in the event of the situation 2, 3 and 4, will the position still
increment by 1 even if what I get from ReadMessage() is an error?

I tried to read the docs but didn't find anything useful I can relate to, I
tried to ask for help from the contributors of the library I was using but
didn't get an answer, not sure if this question is too obvious or too
noobie lolol. Since that library I used is an implementation of the Kafka
protocol, I decided to ask it here and I sincerely hope I can get some
insights from a Kafka guru.

Thanks,
Zhen Zhang
Software Engineer
[image: Twilio] 
MOBILE (949) 771-6073
EMAIL zzh...@twilio.com


On Wed, Sep 23, 2020 at 9:45 AM Matthias J. Sax  wrote:

> I guess it depends where the exception comes from? Can you clarify?
>
> -Matthias
>
> On 9/23/20 12:53 AM, Zhen Zhang wrote:
> > Hi there,
> >
> > I am new to Kafka and I would like to get some clarifications for a
> newbie
> > question,
> >
> > Let's say if I have set up my consumer's "enable.auto.commit" to false,
> and
> > then poll the records one at a time. So when calling poll(), starting
> from
> > offset 0, if any exception is thrown, should I expect to get the record
> at
> > offset 0 or offset 1 when I call poll() again? The reason I'm asking for
> > this is bc in the Kafka Doc, it says that,
> > "The position of the consumer gives the offset of the next record that
> will
> > be given out. It will be one larger than the highest offset the consumer
> > has seen in that partition. It automatically advances every time the
> > consumer receives messages in a call to poll(Duration)."
> >
> > But in my described situation above, an exception is thrown, I'm not sure
> > if this is counted as a successful poll (meaning that the next poll()
> will
> > give the next record) or a failed one (meaning that the next poll() will
> > give the same record again).
> >
> > I would really appreciate it for your help.
> >
> > Thanks,
> > Zhen Zhang
> > Software Engineer
> > [image: Twilio] 
> > MOBILE (949) 771-6073
> > EMAIL zzh...@twilio.com
> >
>


Re: Question Regarding Offset Behavior When Calling Poll()

2020-09-23 Thread Matthias J. Sax
I guess it depends where the exception comes from? Can you clarify?

-Matthias

On 9/23/20 12:53 AM, Zhen Zhang wrote:
> Hi there,
> 
> I am new to Kafka and I would like to get some clarifications for a newbie
> question,
> 
> Let's say if I have set up my consumer's "enable.auto.commit" to false, and
> then poll the records one at a time. So when calling poll(), starting from
> offset 0, if any exception is thrown, should I expect to get the record at
> offset 0 or offset 1 when I call poll() again? The reason I'm asking for
> this is bc in the Kafka Doc, it says that,
> "The position of the consumer gives the offset of the next record that will
> be given out. It will be one larger than the highest offset the consumer
> has seen in that partition. It automatically advances every time the
> consumer receives messages in a call to poll(Duration)."
> 
> But in my described situation above, an exception is thrown, I'm not sure
> if this is counted as a successful poll (meaning that the next poll() will
> give the next record) or a failed one (meaning that the next poll() will
> give the same record again).
> 
> I would really appreciate it for your help.
> 
> Thanks,
> Zhen Zhang
> Software Engineer
> [image: Twilio] 
> MOBILE (949) 771-6073
> EMAIL zzh...@twilio.com
> 


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Gokul Ramanan Subramanian
Thanks Gwen and Ismael.

On Wed, Jun 24, 2020 at 6:52 PM Gwen Shapira  wrote:

> Kafka follows a "fully compatible model" and new features are always
> optional. We have extensive compatibility testing to make sure rolling
> upgrades are safe and painless.
> This investment in compatibility allows us to reduce the effort needed in
> maintaining old versions (Security CVE is the obvious exception, as well as
> data-loss bugs).
>
> Not every OSS project has the same model, but it has worked for Kafka for
> years now (since the awful upgrade from 0.7 to 0.8 in 2014).
>
> Gwen
>
>
> On Wed, Jun 24, 2020 at 10:40 AM Gokul Ramanan Subramanian <
> gokul24...@gmail.com> wrote:
>
> > I agree that is an option, but is there any reason to not have a 2.3.2
> and
> > 2.4.2 released? If so, it would be nice to know about these reasons.
> > Appreciate your time on this.
> >
> > On Wed, Jun 24, 2020 at 6:35 PM Ismael Juma  wrote:
> >
> > > I think there's some confusion here. You can upgrade to AK 2.5.0 and
> > > completely ignore ZK and TLS. It's completely optional.
> > >
> > > Ismael
> > >
> > > On Wed, Jun 24, 2020 at 10:20 AM Gokul Ramanan Subramanian <
> > > gokul24...@gmail.com> wrote:
> > >
> > > > Any updates on Kafka 2.3.2 and 2.4.2? Given the complexity of
> migrating
> > > > from non-encrypted TLS to encrypted TLS connection for ZooKeeper, it
> > > would
> > > > be nice to have a bug-free version of 2.3 and 2.4.
> > > >
> > > > Is there a technical reason why we hesitate to get these versions
> out?
> > Or
> > > > is it that no one has got around to it?
> > > >
> > > > On Wed, Jun 24, 2020 at 3:19 PM Sankalp Bhatia <
> > > sankalpbhati...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks Ismael for the response.
> > > > >
> > > > > For our clusters running 2.3.1 and 2.4.1, we saw some issues which
> > had
> > > > > 2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
> > > > > introduces some major changes like support for ZK encryption and a
> > few
> > > > > others, I was wondering if we should choose a smaller upgrade in
> such
> > > > cases
> > > > > as we don't really require the new features in 2.5 and above right
> > now.
> > > > >
> > > > > -
> > > > > Sankalp
> > > > >
> > > > > On Wed, 24 Jun 2020 at 14:23, Ismael Juma 
> wrote:
> > > > >
> > > > > > Hi Sankalp,
> > > > > >
> > > > > > Is there a reason why you cannot upgrade to Apache Kafka 2.5.0
> > > instead?
> > > > > We
> > > > > > are working on the 2.5.1 release, which would be the recommended
> > > > release.
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia <
> > > > > sankalpbhati...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > I would like to know if there are any plans to release a 2.3.2
> > and
> > > > > 2.4.2
> > > > > > > versions for Apache Kafka in the near future. I see there are
> > some
> > > > > issues
> > > > > > > marked as fixed in these two versions
> > > > > > > (https://tinyurl.com/ycdpz5cb).
> > > > > > > However, I could not find a branch/tag corresponding to these
> > > > versions
> > > > > in
> > > > > > > the github repository (https://github.com/apache/kafka).
> > > > > > >
> > > > > > > Also, It would be great if someone can help me understand or
> > share
> > > > any
> > > > > > > documentation around the release processes (specifically on
> when
> > we
> > > > > > decide
> > > > > > > to release a new bug fix version like the ones mentioned
> above.)
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Sankalp
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
> --
> Gwen Shapira
> Engineering Manager | Confluent
> 650.450.2760 | @gwenshap
> Follow us: Twitter | blog
>


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Gwen Shapira
Kafka follows a "fully compatible model" and new features are always
optional. We have extensive compatibility testing to make sure rolling
upgrades are safe and painless.
This investment in compatibility allows us to reduce the effort needed in
maintaining old versions (Security CVE is the obvious exception, as well as
data-loss bugs).

Not every OSS project has the same model, but it has worked for Kafka for
years now (since the awful upgrade from 0.7 to 0.8 in 2014).

Gwen


On Wed, Jun 24, 2020 at 10:40 AM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> I agree that is an option, but is there any reason to not have a 2.3.2 and
> 2.4.2 released? If so, it would be nice to know about these reasons.
> Appreciate your time on this.
>
> On Wed, Jun 24, 2020 at 6:35 PM Ismael Juma  wrote:
>
> > I think there's some confusion here. You can upgrade to AK 2.5.0 and
> > completely ignore ZK and TLS. It's completely optional.
> >
> > Ismael
> >
> > On Wed, Jun 24, 2020 at 10:20 AM Gokul Ramanan Subramanian <
> > gokul24...@gmail.com> wrote:
> >
> > > Any updates on Kafka 2.3.2 and 2.4.2? Given the complexity of migrating
> > > from non-encrypted TLS to encrypted TLS connection for ZooKeeper, it
> > would
> > > be nice to have a bug-free version of 2.3 and 2.4.
> > >
> > > Is there a technical reason why we hesitate to get these versions out?
> Or
> > > is it that no one has got around to it?
> > >
> > > On Wed, Jun 24, 2020 at 3:19 PM Sankalp Bhatia <
> > sankalpbhati...@gmail.com>
> > > wrote:
> > >
> > > > Thanks Ismael for the response.
> > > >
> > > > For our clusters running 2.3.1 and 2.4.1, we saw some issues which
> had
> > > > 2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
> > > > introduces some major changes like support for ZK encryption and a
> few
> > > > others, I was wondering if we should choose a smaller upgrade in such
> > > cases
> > > > as we don't really require the new features in 2.5 and above right
> now.
> > > >
> > > > -
> > > > Sankalp
> > > >
> > > > On Wed, 24 Jun 2020 at 14:23, Ismael Juma  wrote:
> > > >
> > > > > Hi Sankalp,
> > > > >
> > > > > Is there a reason why you cannot upgrade to Apache Kafka 2.5.0
> > instead?
> > > > We
> > > > > are working on the 2.5.1 release, which would be the recommended
> > > release.
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia <
> > > > sankalpbhati...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to know if there are any plans to release a 2.3.2
> and
> > > > 2.4.2
> > > > > > versions for Apache Kafka in the near future. I see there are
> some
> > > > issues
> > > > > > marked as fixed in these two versions
> > > > > > (https://tinyurl.com/ycdpz5cb).
> > > > > > However, I could not find a branch/tag corresponding to these
> > > versions
> > > > in
> > > > > > the github repository (https://github.com/apache/kafka).
> > > > > >
> > > > > > Also, It would be great if someone can help me understand or
> share
> > > any
> > > > > > documentation around the release processes (specifically on when
> we
> > > > > decide
> > > > > > to release a new bug fix version like the ones mentioned above.)
> > > > > >
> > > > > > Thanks,
> > > > > > Sankalp
> > > > > >
> > > > >
> > > >
> > >
> >
>


-- 
Gwen Shapira
Engineering Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter | blog


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Gokul Ramanan Subramanian
I agree that is an option, but is there any reason to not have a 2.3.2 and
2.4.2 released? If so, it would be nice to know about these reasons.
Appreciate your time on this.

On Wed, Jun 24, 2020 at 6:35 PM Ismael Juma  wrote:

> I think there's some confusion here. You can upgrade to AK 2.5.0 and
> completely ignore ZK and TLS. It's completely optional.
>
> Ismael
>
> On Wed, Jun 24, 2020 at 10:20 AM Gokul Ramanan Subramanian <
> gokul24...@gmail.com> wrote:
>
> > Any updates on Kafka 2.3.2 and 2.4.2? Given the complexity of migrating
> > from non-encrypted TLS to encrypted TLS connection for ZooKeeper, it
> would
> > be nice to have a bug-free version of 2.3 and 2.4.
> >
> > Is there a technical reason why we hesitate to get these versions out? Or
> > is it that no one has got around to it?
> >
> > On Wed, Jun 24, 2020 at 3:19 PM Sankalp Bhatia <
> sankalpbhati...@gmail.com>
> > wrote:
> >
> > > Thanks Ismael for the response.
> > >
> > > For our clusters running 2.3.1 and 2.4.1, we saw some issues which had
> > > 2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
> > > introduces some major changes like support for ZK encryption and a few
> > > others, I was wondering if we should choose a smaller upgrade in such
> > cases
> > > as we don't really require the new features in 2.5 and above right now.
> > >
> > > -
> > > Sankalp
> > >
> > > On Wed, 24 Jun 2020 at 14:23, Ismael Juma  wrote:
> > >
> > > > Hi Sankalp,
> > > >
> > > > Is there a reason why you cannot upgrade to Apache Kafka 2.5.0
> instead?
> > > We
> > > > are working on the 2.5.1 release, which would be the recommended
> > release.
> > > >
> > > > Ismael
> > > >
> > > > On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia <
> > > sankalpbhati...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to know if there are any plans to release a 2.3.2 and
> > > 2.4.2
> > > > > versions for Apache Kafka in the near future. I see there are some
> > > issues
> > > > > marked as fixed in these two versions
> > > > > (https://tinyurl.com/ycdpz5cb).
> > > > > However, I could not find a branch/tag corresponding to these
> > versions
> > > in
> > > > > the github repository (https://github.com/apache/kafka).
> > > > >
> > > > > Also, It would be great if someone can help me understand or share
> > any
> > > > > documentation around the release processes (specifically on when we
> > > > decide
> > > > > to release a new bug fix version like the ones mentioned above.)
> > > > >
> > > > > Thanks,
> > > > > Sankalp
> > > > >
> > > >
> > >
> >
>


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Ismael Juma
I think there's some confusion here. You can upgrade to AK 2.5.0 and
completely ignore ZK and TLS. It's completely optional.

Ismael

On Wed, Jun 24, 2020 at 10:20 AM Gokul Ramanan Subramanian <
gokul24...@gmail.com> wrote:

> Any updates on Kafka 2.3.2 and 2.4.2? Given the complexity of migrating
> from non-encrypted TLS to encrypted TLS connection for ZooKeeper, it would
> be nice to have a bug-free version of 2.3 and 2.4.
>
> Is there a technical reason why we hesitate to get these versions out? Or
> is it that no one has got around to it?
>
> On Wed, Jun 24, 2020 at 3:19 PM Sankalp Bhatia 
> wrote:
>
> > Thanks Ismael for the response.
> >
> > For our clusters running 2.3.1 and 2.4.1, we saw some issues which had
> > 2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
> > introduces some major changes like support for ZK encryption and a few
> > others, I was wondering if we should choose a smaller upgrade in such
> cases
> > as we don't really require the new features in 2.5 and above right now.
> >
> > -
> > Sankalp
> >
> > On Wed, 24 Jun 2020 at 14:23, Ismael Juma  wrote:
> >
> > > Hi Sankalp,
> > >
> > > Is there a reason why you cannot upgrade to Apache Kafka 2.5.0 instead?
> > We
> > > are working on the 2.5.1 release, which would be the recommended
> release.
> > >
> > > Ismael
> > >
> > > On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia <
> > sankalpbhati...@gmail.com>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I would like to know if there are any plans to release a 2.3.2 and
> > 2.4.2
> > > > versions for Apache Kafka in the near future. I see there are some
> > issues
> > > > marked as fixed in these two versions
> > > > (https://tinyurl.com/ycdpz5cb).
> > > > However, I could not find a branch/tag corresponding to these
> versions
> > in
> > > > the github repository (https://github.com/apache/kafka).
> > > >
> > > > Also, It would be great if someone can help me understand or share
> any
> > > > documentation around the release processes (specifically on when we
> > > decide
> > > > to release a new bug fix version like the ones mentioned above.)
> > > >
> > > > Thanks,
> > > > Sankalp
> > > >
> > >
> >
>


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Gokul Ramanan Subramanian
Any updates on Kafka 2.3.2 and 2.4.2? Given the complexity of migrating
from non-encrypted TLS to encrypted TLS connection for ZooKeeper, it would
be nice to have a bug-free version of 2.3 and 2.4.

Is there a technical reason why we hesitate to get these versions out? Or
is it that no one has got around to it?

On Wed, Jun 24, 2020 at 3:19 PM Sankalp Bhatia 
wrote:

> Thanks Ismael for the response.
>
> For our clusters running 2.3.1 and 2.4.1, we saw some issues which had
> 2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
> introduces some major changes like support for ZK encryption and a few
> others, I was wondering if we should choose a smaller upgrade in such cases
> as we don't really require the new features in 2.5 and above right now.
>
> -
> Sankalp
>
> On Wed, 24 Jun 2020 at 14:23, Ismael Juma  wrote:
>
> > Hi Sankalp,
> >
> > Is there a reason why you cannot upgrade to Apache Kafka 2.5.0 instead?
> We
> > are working on the 2.5.1 release, which would be the recommended release.
> >
> > Ismael
> >
> > On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia <
> sankalpbhati...@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to know if there are any plans to release a 2.3.2 and
> 2.4.2
> > > versions for Apache Kafka in the near future. I see there are some
> issues
> > > marked as fixed in these two versions
> > > (https://tinyurl.com/ycdpz5cb).
> > > However, I could not find a branch/tag corresponding to these versions
> in
> > > the github repository (https://github.com/apache/kafka).
> > >
> > > Also, It would be great if someone can help me understand or share any
> > > documentation around the release processes (specifically on when we
> > decide
> > > to release a new bug fix version like the ones mentioned above.)
> > >
> > > Thanks,
> > > Sankalp
> > >
> >
>


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Sankalp Bhatia
Thanks Ismael for the response.

For our clusters running 2.3.1 and 2.4.1, we saw some issues which had
2.3.2 and 2.4.2 as the fix versions. I looked at 2.5.0 but since it
introduces some major changes like support for ZK encryption and a few
others, I was wondering if we should choose a smaller upgrade in such cases
as we don't really require the new features in 2.5 and above right now.

-
Sankalp

On Wed, 24 Jun 2020 at 14:23, Ismael Juma  wrote:

> Hi Sankalp,
>
> Is there a reason why you cannot upgrade to Apache Kafka 2.5.0 instead? We
> are working on the 2.5.1 release, which would be the recommended release.
>
> Ismael
>
> On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia 
> wrote:
>
> > Hi All,
> >
> > I would like to know if there are any plans to release a 2.3.2 and 2.4.2
> > versions for Apache Kafka in the near future. I see there are some issues
> > marked as fixed in these two versions
> > (https://tinyurl.com/ycdpz5cb).
> > However, I could not find a branch/tag corresponding to these versions in
> > the github repository (https://github.com/apache/kafka).
> >
> > Also, It would be great if someone can help me understand or share any
> > documentation around the release processes (specifically on when we
> decide
> > to release a new bug fix version like the ones mentioned above.)
> >
> > Thanks,
> > Sankalp
> >
>


Re: Question around release plans for Apache Kafka 2.3.2 and 2.4.2

2020-06-24 Thread Ismael Juma
Hi Sankalp,

Is there a reason why you cannot upgrade to Apache Kafka 2.5.0 instead? We
are working on the 2.5.1 release, which would be the recommended release.

Ismael

On Wed, Jun 24, 2020 at 6:18 AM Sankalp Bhatia 
wrote:

> Hi All,
>
> I would like to know if there are any plans to release a 2.3.2 and 2.4.2
> versions for Apache Kafka in the near future. I see there are some issues
> marked as fixed in these two versions
> (https://tinyurl.com/ycdpz5cb).
> However, I could not find a branch/tag corresponding to these versions in
> the github repository (https://github.com/apache/kafka).
>
> Also, It would be great if someone can help me understand or share any
> documentation around the release processes (specifically on when we decide
> to release a new bug fix version like the ones mentioned above.)
>
> Thanks,
> Sankalp
>


Re: Question about log flusher real frequency

2020-03-09 Thread Fares Oueslati
Hi Alexandre,

Thank you for your quick answer.

I want to monitor it cause I'm trying to find out the reason why our
existing Kafka cluster is configured to flush data every10 milliseconds!
(people who configured it are not available anymore to answer).

As that value seems really low to me, I was trying to understand and to
monitor the "flush behaviour".

Fares

On Mon, Mar 9, 2020 at 5:24 PM Alexandre Dupriez <
alexandre.dupr...@gmail.com> wrote:

> Hi Fares,
>
> On Linux kernels, you can use the property "dirty_writeback_centisecs"
> [1] to configure the period between executions of kswapd, which does
> this "sync" job. The period is usually set to 30 seconds.
> There are few exceptions where Kafka explicitly forces a sync (via the
> force() method from the I/O API of the JDK), e.g. when a segment is
> rolled or Kafka shutting down.
>
> The page writeback activity from your kernel is monitorable at
> different levels of granularity and depending on the instrumentation
> you are willing to use.
>
> Why would you want to monitor this activity in the first place? Do you
> want to know exactly *when* your data is on the disk?
>
> [1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt
>
> Le lun. 9 mars 2020 à 15:58, Fares Oueslati  a
> écrit :
> >
> > Hello,
> >
> > By default, both log.flush.interval.ms and log.flush.interval.messages
> are
> > set to Long.MAX_VALUE.
> >
> > As I understand, it makes Kafka flush log to disk (fsync) only depends on
> > file system.
> >
> > Is there any simple way to monitor that frequency ?
> >
> > Is there a rule of thumb to estimate that value depending on the os ?
> >
> > Thank you guys !
> > Fares
>


Re: Question about log flusher real frequency

2020-03-09 Thread Alexandre Dupriez
Hi Fares,

On Linux kernels, you can use the property "dirty_writeback_centisecs"
[1] to configure the period between executions of kswapd, which does
this "sync" job. The period is usually set to 30 seconds.
There are few exceptions where Kafka explicitly forces a sync (via the
force() method from the I/O API of the JDK), e.g. when a segment is
rolled or Kafka shutting down.

The page writeback activity from your kernel is monitorable at
different levels of granularity and depending on the instrumentation
you are willing to use.

Why would you want to monitor this activity in the first place? Do you
want to know exactly *when* your data is on the disk?

[1] https://www.kernel.org/doc/Documentation/sysctl/vm.txt

Le lun. 9 mars 2020 à 15:58, Fares Oueslati  a écrit :
>
> Hello,
>
> By default, both log.flush.interval.ms and log.flush.interval.messages are
> set to Long.MAX_VALUE.
>
> As I understand, it makes Kafka flush log to disk (fsync) only depends on
> file system.
>
> Is there any simple way to monitor that frequency ?
>
> Is there a rule of thumb to estimate that value depending on the os ?
>
> Thank you guys !
> Fares


Re: question about offsetSync

2019-11-01 Thread Xu Jianhai
>From my opinion,
condition 4 is downstream data loss because of master down without sync in
time, so downstream data producer get smaller downstreamOffset.
condition 3 meaning upstream broker down without sync in time
condition 1 is init state
so condition 2 may write like this: `downstreamTargetOffset -
lastSyncDownstreamOffset >= maxOffsetLag` meaning not sync offset for long
time, should we rewrite ?

On Sat, Nov 2, 2019 at 12:41 AM Xu Jianhai  wrote:

> Hi:
> I am engineer from China, I review kafka mirror latest impl, but I do
> not figure out the reason why PartitionState update like that:
> ```
>
> // true if we should emit an offset sync
> boolean update(long upstreamOffset, long downstreamOffset) {
> boolean shouldSyncOffsets = false;
> long upstreamStep = upstreamOffset - lastSyncUpstreamOffset;
> long downstreamTargetOffset = lastSyncDownstreamOffset + upstreamStep;
> if (lastSyncDownstreamOffset == -1L
> || downstreamOffset - downstreamTargetOffset >= maxOffsetLag
> || upstreamOffset - previousUpstreamOffset != 1L
> || downstreamOffset < previousDownstreamOffset) {
> lastSyncUpstreamOffset = upstreamOffset;
> lastSyncDownstreamOffset = downstreamOffset;
> shouldSyncOffsets = true;
> }
> previousUpstreamOffset = upstreamOffset;
> previousDownstreamOffset = downstreamOffset;
> return shouldSyncOffsets;
> }
> }
>
> ```
> I can not know why the condition is like that.
> 1. lastSyncDownstreamOffset == -1L: never sync, so call sync method
> 2. downstreamOffset - downstreamTargetOffset >= maxOffsetLag: offset is
> not accurate, so sync. but why use maxOffsetLag? why not >0: meaning not
> accurate
> 3. upstreamOffset - previousUpstreamOffset != 1L: meaning why?
> 4. downstreamOffset < previousDownstreamOffset: meaning why?
>
>
>
>
>
>
>


Re: Question on performance data for Kafka vs NATS

2019-03-22 Thread Adam Bellemare
One more thing to note:

You are looking at regular, base NATS. On its own, it is not a direct 1-1
comparison to Kafka because it lacks things like data retention, clustering
and replication. Instead, you would want to compare it to NATS-Streaming, (
https://github.com/nats-io/nats-streaming-server ). You can find a number
of more recent articles and comparisons by a simple web search.

With that being said, this is likely not the best venue for an in-depth
discussion on tradeoffs between the two (especially since I see you're
spanning two very large mailing lists).

Adam




On Fri, Mar 22, 2019 at 1:34 AM Hans Jespersen  wrote:

> Thats a 4.5 year old benchmark and it was run with a single broker node
> and only 1 producer and 1 consumer all running on a single MacBookPro.
> Definitely not the target production environment for Kafka.
>
> -hans
>
> > On Mar 21, 2019, at 11:43 AM, M. Manna  wrote:
> >
> > HI All,
> >
> > https://nats.io/about/
> >
> > this shows a general comparison of sender/receiver throughputs for NATS
> and
> > other messaging system including our favourite Kafka.
> >
> > It appears that Kafka, despite taking the 2nd place, has a very low
> > throughput. My question is, where does Kafka win over NATS? is it the
> > unique partitioning and delivery semantics? Or, is it something else.
> >
> > From what I can see, NATS has traditional pub/sub and queuing. But it
> > doesn't look like there is any proper retention system built for this.
> >
> > Has anyone come across this already?
> >
> > Thanks,
>


Re: Question on performance data for Kafka vs NATS

2019-03-21 Thread Hans Jespersen
Thats a 4.5 year old benchmark and it was run with a single broker node and 
only 1 producer and 1 consumer all running on a single MacBookPro. Definitely 
not the target production environment for Kafka. 

-hans

> On Mar 21, 2019, at 11:43 AM, M. Manna  wrote:
> 
> HI All,
> 
> https://nats.io/about/
> 
> this shows a general comparison of sender/receiver throughputs for NATS and
> other messaging system including our favourite Kafka.
> 
> It appears that Kafka, despite taking the 2nd place, has a very low
> throughput. My question is, where does Kafka win over NATS? is it the
> unique partitioning and delivery semantics? Or, is it something else.
> 
> From what I can see, NATS has traditional pub/sub and queuing. But it
> doesn't look like there is any proper retention system built for this.
> 
> Has anyone come across this already?
> 
> Thanks,


Re: Question about connector rebalancing

2018-09-05 Thread Gwen Shapira
Can you use stand-alone mode in that case?

On Wed, Sep 5, 2018 at 7:12 PM, Chen He  wrote:

> Hi Kafka experts
>
> I have a question about connector rebalancing issue. Why don't we make it
> option, I mean have a parameter that turn on/off it instead of having it as
> a must?
>
> We can have a parameter like: "connector.rebalancing.enable" parameter and
> make it as "true" by default. It allows users to turn it off if they want.
>
> There are some cases that connector rebalancing is not needed.
>
> Regards!
>
> Chen
>



-- 
*Gwen Shapira*
Product Manager | Confluent
650.450.2760 | @gwenshap
Follow us: Twitter  | blog



Re: Question Regarding Apache Kafka replication

2018-08-14 Thread Jason Gustafson
Hi Shibha,

I have a KIP in the works which proposes to do this. Stay tuned.

-Jason

On Tue, Aug 14, 2018 at 4:43 PM, Shibha Malik  wrote:

> Hi Team,
>
> I wanted to know as to why Kafka does not support read by consumers from a
> replica of a partition which is not a leader of the partition ( but one of
> the partition on ISR ) ?
>
> Thanks,
> Shibha Malik
>


Re: Question about issues of Kafka release version 1.1.1

2018-07-23 Thread Ismael Juma
Seems like you're right Lambdaliu. Rajini/Jason, can you please check and
update the JIRAs?

Ismael

On Mon, Jul 23, 2018 at 7:09 AM lambdaliu(刘少波) 
wrote:

> Hi team,
>
> I Have downloaded the source release of kafka version 1.1.1 and found the
> JIRA
> issues KAFKA-6911 and KAFKA-6809 listed in the release notes but it's PR
> looks
> like doesn't contain in the source release. Is this a valid situation?
> Should we
> create a JIRA issue to trace it?
>
> Regards,
> Lambdaliu(Shaobo Liu)
> 
>


Re: Question about developer documentation

2018-02-27 Thread Ewen Cheslack-Postava
The web page is more about general project info and might be of interest to
people beyond just developers. But I agree the wiki landing page could use
some updating. Even more than just the developer section as we're missing
several releases, the oldest ones are listed at the top, etc.

-Ewen

On Fri, Feb 9, 2018 at 10:53 AM, Ray Chiang  wrote:

> There is some documentation for developers at:
>
>   http://kafka.apache.org/project
>
> There's also another set of links at the bottom of this wiki page:
>
>   https://cwiki.apache.org/confluence/display/KAFKA/Index
>
> There's some minor duplication of information, but it's definitely not
> quite presented in a clean "step by step" manner.
>
> I think it could benefit from a reorganization of how the information is
> presented.  Before I start making suggestions, does anyone have any
> thoughts on the subject?
>
> -Ray
>
>


Re: Question Kafka Reassign partitions tool

2017-12-26 Thread Sagar
Hi Todd,

That worked. Basically I used a python library to read the leaders of a
topic/partition from the following path on zookeeper:
/brokers/topics/[topic]/partitions/[partitionId]/state.

Then I was able to get the correct leader and assign the replica set so
that the current leader remains the first entry in that list, basically to
keep the preferred replica logic intact.

Thanks for the help.

Thanks!
Sagar.

On Fri, Dec 22, 2017 at 10:08 PM, Todd Palino  wrote:

> Yes, the replicas are stored in Zookeeper, so you can iterate over the
> information there to build a view of the cluster that you can use. If you
> want an example for this, take a look at the code for kafka-assigner in
> https://github.com/linkedin/kafka-tools. Or you can just use that tool to
> adjust replication factors and balance partitions.
>
> -Todd
>
>
> On Fri, Dec 22, 2017 at 9:21 AM, Sagar  wrote:
>
> > Hi Todd,
> >
> > Thanks for the reply. Problem is I have about 160 topics(5 partitions for
> > each) for which I need to increase the replication factors for. So, I
> would
> > have to find the current leader for each of the partitions and hand code
> > the json which would become tedious.
> >
> > The partition leader info is stored in zookeeper? If yes, then can I
> > probably use that get the current json and then build the json which can
> be
> > fed to the tool. I tried to search using the zk shell but couldn't
> find...
> >
> > Sagar.
> >
> > On Fri, Dec 22, 2017 at 7:45 PM, Todd Palino  wrote:
> >
> > > Preferred replica election is naive. It will always follow the order of
> > the
> > > replicas as they are set. So if you want to set the default leader,
> just
> > > make it the first replica in the list for the partition. We build the
> > JASON
> > > this way all the time.
> > >
> > > -Todd
> > >
> > >
> > > On Dec 22, 2017 6:46 AM, "Sagar"  wrote:
> > >
> > > Hi,
> > >
> > > Had a question on Kafka reassign partitions tool.
> > >
> > > We have a 3 node cluster but our replication factor is set to 1. So we
> > have
> > > been looking to increase it to 3 for HA.
> > >
> > > I tried the tool on a couple of topics and it increases the replication
> > > factor alright. Also it doesn't change the leader as the leader still
> is
> > in
> > > the RAR.
> > >
> > > This is how I run it:
> > >
> > > Json which is used:
> > >
> > > {"version":1,"partitions":[
> > >
> > > {"topic":"cric-engine.engine.eng_fow","partition":3,"
> > > replicas":[92,51,101]}]}
> > >
> > > Earlier config for the topic
> > >
> > > kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> > > 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> > > Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> > > Configs:
> > > Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> > > 92,51,101 Isr: 101,51,92
> > > Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> > > 92,51,101 Isr: 51,101,92
> > > Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92
> > Isr:
> > > 92
> > > Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas:
> 101
> > > Isr:
> > > 101
> > > Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51
> > Isr:
> > > 51
> > >
> > > After running:
> > >
> > >  kafka-reassign-partitions --reassignment-json-file
> > > increase-replication-factor.json --execute --zookeeper 10.0.4.165:2181
> ,
> > > 10.0.5.139:2181,10.0.6.106:2181
> > >
> > > partitions 3 Replicas increase:
> > >
> > > kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> > > 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> > > Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> > > Configs:
> > > Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> > > 92,51,101 Isr: 101,51,92
> > > Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> > > 92,51,101 Isr: 51,101,92
> > > Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92
> > Isr:
> > > 92
> > > Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas:
> > > 92,51,101 Isr: 101,51,92
> > > Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51
> > Isr:
> > > 51
> > >
> > > What I wanted to know is that does it affect the preferred replica? If
> > you
> > > see the Replicas, all of them are now 92,51,101 even though the leader
> > has
> > > remained the same from before. So, if any of the broker goes down or we
> > > run kafka-preferred-replica-election.sh, wouldn't it move all the
> > leaders
> > > to broker 92? Is my assesment correct?
> > >
> > > If yes, then is there a way I can still do this operation by getting
> > leader
> > > for a partition first, then adding it to the replica list and then
> > building
> > > the json dynamically?
> > >
> > > Thanks!
> > > Sagar.
> > >
> >
>
>
>
> --
> *Todd Palino*
> Senior 

Re: Question Kafka Reassign partitions tool

2017-12-22 Thread Todd Palino
Yes, the replicas are stored in Zookeeper, so you can iterate over the
information there to build a view of the cluster that you can use. If you
want an example for this, take a look at the code for kafka-assigner in
https://github.com/linkedin/kafka-tools. Or you can just use that tool to
adjust replication factors and balance partitions.

-Todd


On Fri, Dec 22, 2017 at 9:21 AM, Sagar  wrote:

> Hi Todd,
>
> Thanks for the reply. Problem is I have about 160 topics(5 partitions for
> each) for which I need to increase the replication factors for. So, I would
> have to find the current leader for each of the partitions and hand code
> the json which would become tedious.
>
> The partition leader info is stored in zookeeper? If yes, then can I
> probably use that get the current json and then build the json which can be
> fed to the tool. I tried to search using the zk shell but couldn't find...
>
> Sagar.
>
> On Fri, Dec 22, 2017 at 7:45 PM, Todd Palino  wrote:
>
> > Preferred replica election is naive. It will always follow the order of
> the
> > replicas as they are set. So if you want to set the default leader, just
> > make it the first replica in the list for the partition. We build the
> JASON
> > this way all the time.
> >
> > -Todd
> >
> >
> > On Dec 22, 2017 6:46 AM, "Sagar"  wrote:
> >
> > Hi,
> >
> > Had a question on Kafka reassign partitions tool.
> >
> > We have a 3 node cluster but our replication factor is set to 1. So we
> have
> > been looking to increase it to 3 for HA.
> >
> > I tried the tool on a couple of topics and it increases the replication
> > factor alright. Also it doesn't change the leader as the leader still is
> in
> > the RAR.
> >
> > This is how I run it:
> >
> > Json which is used:
> >
> > {"version":1,"partitions":[
> >
> > {"topic":"cric-engine.engine.eng_fow","partition":3,"
> > replicas":[92,51,101]}]}
> >
> > Earlier config for the topic
> >
> > kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> > 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> > Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> > Configs:
> > Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> > 92,51,101 Isr: 101,51,92
> > Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> > 92,51,101 Isr: 51,101,92
> > Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92
> Isr:
> > 92
> > Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas: 101
> > Isr:
> > 101
> > Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51
> Isr:
> > 51
> >
> > After running:
> >
> >  kafka-reassign-partitions --reassignment-json-file
> > increase-replication-factor.json --execute --zookeeper 10.0.4.165:2181,
> > 10.0.5.139:2181,10.0.6.106:2181
> >
> > partitions 3 Replicas increase:
> >
> > kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> > 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> > Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> > Configs:
> > Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> > 92,51,101 Isr: 101,51,92
> > Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> > 92,51,101 Isr: 51,101,92
> > Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92
> Isr:
> > 92
> > Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas:
> > 92,51,101 Isr: 101,51,92
> > Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51
> Isr:
> > 51
> >
> > What I wanted to know is that does it affect the preferred replica? If
> you
> > see the Replicas, all of them are now 92,51,101 even though the leader
> has
> > remained the same from before. So, if any of the broker goes down or we
> > run kafka-preferred-replica-election.sh, wouldn't it move all the
> leaders
> > to broker 92? Is my assesment correct?
> >
> > If yes, then is there a way I can still do this operation by getting
> leader
> > for a partition first, then adding it to the replica list and then
> building
> > the json dynamically?
> >
> > Thanks!
> > Sagar.
> >
>



-- 
*Todd Palino*
Senior Staff Engineer, Site Reliability
Data Infrastructure Streaming



linkedin.com/in/toddpalino


Re: Question Kafka Reassign partitions tool

2017-12-22 Thread Sagar
Hi Todd,

Thanks for the reply. Problem is I have about 160 topics(5 partitions for
each) for which I need to increase the replication factors for. So, I would
have to find the current leader for each of the partitions and hand code
the json which would become tedious.

The partition leader info is stored in zookeeper? If yes, then can I
probably use that get the current json and then build the json which can be
fed to the tool. I tried to search using the zk shell but couldn't find...

Sagar.

On Fri, Dec 22, 2017 at 7:45 PM, Todd Palino  wrote:

> Preferred replica election is naive. It will always follow the order of the
> replicas as they are set. So if you want to set the default leader, just
> make it the first replica in the list for the partition. We build the JASON
> this way all the time.
>
> -Todd
>
>
> On Dec 22, 2017 6:46 AM, "Sagar"  wrote:
>
> Hi,
>
> Had a question on Kafka reassign partitions tool.
>
> We have a 3 node cluster but our replication factor is set to 1. So we have
> been looking to increase it to 3 for HA.
>
> I tried the tool on a couple of topics and it increases the replication
> factor alright. Also it doesn't change the leader as the leader still is in
> the RAR.
>
> This is how I run it:
>
> Json which is used:
>
> {"version":1,"partitions":[
>
> {"topic":"cric-engine.engine.eng_fow","partition":3,"
> replicas":[92,51,101]}]}
>
> Earlier config for the topic
>
> kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> Configs:
> Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> 92,51,101 Isr: 101,51,92
> Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> 92,51,101 Isr: 51,101,92
> Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92 Isr:
> 92
> Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas: 101
> Isr:
> 101
> Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51 Isr:
> 51
>
> After running:
>
>  kafka-reassign-partitions --reassignment-json-file
> increase-replication-factor.json --execute --zookeeper 10.0.4.165:2181,
> 10.0.5.139:2181,10.0.6.106:2181
>
> partitions 3 Replicas increase:
>
> kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
> 10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
> Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
> Configs:
> Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
> 92,51,101 Isr: 101,51,92
> Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
> 92,51,101 Isr: 51,101,92
> Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92 Isr:
> 92
> Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas:
> 92,51,101 Isr: 101,51,92
> Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51 Isr:
> 51
>
> What I wanted to know is that does it affect the preferred replica? If you
> see the Replicas, all of them are now 92,51,101 even though the leader has
> remained the same from before. So, if any of the broker goes down or we
> run kafka-preferred-replica-election.sh, wouldn't it move all the leaders
> to broker 92? Is my assesment correct?
>
> If yes, then is there a way I can still do this operation by getting leader
> for a partition first, then adding it to the replica list and then building
> the json dynamically?
>
> Thanks!
> Sagar.
>


Re: Question Kafka Reassign partitions tool

2017-12-22 Thread Todd Palino
Preferred replica election is naive. It will always follow the order of the
replicas as they are set. So if you want to set the default leader, just
make it the first replica in the list for the partition. We build the JASON
this way all the time.

-Todd


On Dec 22, 2017 6:46 AM, "Sagar"  wrote:

Hi,

Had a question on Kafka reassign partitions tool.

We have a 3 node cluster but our replication factor is set to 1. So we have
been looking to increase it to 3 for HA.

I tried the tool on a couple of topics and it increases the replication
factor alright. Also it doesn't change the leader as the leader still is in
the RAR.

This is how I run it:

Json which is used:

{"version":1,"partitions":[

{"topic":"cric-engine.engine.eng_fow","partition":3,"
replicas":[92,51,101]}]}

Earlier config for the topic

kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
Configs:
Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
92,51,101 Isr: 101,51,92
Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
92,51,101 Isr: 51,101,92
Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92 Isr:
92
Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas: 101
Isr:
101
Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51 Isr:
51

After running:

 kafka-reassign-partitions --reassignment-json-file
increase-replication-factor.json --execute --zookeeper 10.0.4.165:2181,
10.0.5.139:2181,10.0.6.106:2181

partitions 3 Replicas increase:

kafka-topics --describe --topic cric-engine.engine.eng_fow --zookeeper
10.0.4.165:2181,10.0.5.139:2181,10.0.6.106:2181
Topic:cric-engine.engine.eng_fow PartitionCount:5 ReplicationFactor:3
Configs:
Topic: cric-engine.engine.eng_fow Partition: 0 Leader: 101 Replicas:
92,51,101 Isr: 101,51,92
Topic: cric-engine.engine.eng_fow Partition: 1 Leader: 51 Replicas:
92,51,101 Isr: 51,101,92
Topic: cric-engine.engine.eng_fow Partition: 2 Leader: 92 Replicas: 92 Isr:
92
Topic: cric-engine.engine.eng_fow Partition: 3 Leader: 101 Replicas:
92,51,101 Isr: 101,51,92
Topic: cric-engine.engine.eng_fow Partition: 4 Leader: 51 Replicas: 51 Isr:
51

What I wanted to know is that does it affect the preferred replica? If you
see the Replicas, all of them are now 92,51,101 even though the leader has
remained the same from before. So, if any of the broker goes down or we
run kafka-preferred-replica-election.sh, wouldn't it move all the leaders
to broker 92? Is my assesment correct?

If yes, then is there a way I can still do this operation by getting leader
for a partition first, then adding it to the replica list and then building
the json dynamically?

Thanks!
Sagar.


Re: Question about fromLogical method in connect/api/src/main/java/org/apache/kafka/connect/data/Date.java

2017-10-30 Thread Dongjin Lee
CharSyam,

As I understood, it seems like Date#fromLogical assumes that the given Data
object is created under UTC timezone - its unit test, DateTest, creates all
related GregorianCalendar objects with TimeZone.getTimeZone("UTC"). So if
all the Date objects given as the parameter are UTC-oriented, it seems like
this implementation makes any problem. So: Have ever experienced the case
of non-UTC Date object? I inspected all classes implementing Transformation
interface[^1] but all the classes related to Date or Time[^2] were using
UTC-oriented Date object. (check the `TimeZone.getTimeZone("UTC")` clause
from the code.) If there is a logical crack, it will worth a fix.

Thanks,
Dongjin

[^1]:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-66:+Single+Message+Transforms+for+Kafka+Connect
[^2]: TimestampRouter, TimestampConverter.

On Mon, Oct 30, 2017 at 4:45 PM, CharSyam  wrote:

> Hi here.
>
> I have a question about fromLogical method in
> connect/api/src/main/java/org/apache/kafka/connect/data/Date.java
>
> below code does these step.
> 1. calendar with UTC timezone
> 2. set value
> 3. check hour, minute, second, millisecond are 0. if not throw
> DataException.
>
> ```
> public static int fromLogical(Schema schema, java.util.Date value) {
> if (schema.name() == null || !(schema.name().equals(LOGICAL_NAME)))
> throw new DataException("Requested conversion of Date object
> but the schema does not match.");
> Calendar calendar = Calendar.getInstance(UTC);
> calendar.setTime(value);
> if (calendar.get(Calendar.HOUR_OF_DAY) != 0 ||
> calendar.get(Calendar.MINUTE) != 0 ||
> calendar.get(Calendar.SECOND) != 0 ||
> calendar.get(Calendar.MILLISECOND) != 0) {
> throw new DataException("Kafka Connect Date type should not
> have any time fields set to non-zero values.");
> }
> long unixMillis = calendar.getTimeInMillis();
> return (int) (unixMillis / MILLIS_PER_DAY);
> }
> ```
>
> but, if value is not made by UTC, for if we read value from other
> Timezone, the Date can be represented other value, because timezone
> will make difference.
>
> so I think this code cause failure when different timezone.
>
> I think revising with timezone info is a way to solve this.
> ```
> public static int fromLogical(Schema schema, java.util.Date value) {
> if (schema.name() == null || !(schema.name().equals(LOGICAL_NAME)))
> throw new DataException("Requested conversion of Date object
> but the schema does not match.");
> //Using DefaultTimeZone
> Calendar calendar = Calendar.getInstance(TimeZone.getDefault());
> calendar.setTime(value);
> if (calendar.get(Calendar.HOUR_OF_DAY) != 0 ||
> calendar.get(Calendar.MINUTE) != 0 ||
> calendar.get(Calendar.SECOND) != 0 ||
> calendar.get(Calendar.MILLISECOND) != 0) {
> throw new DataException("Kafka Connect Date type should not
> have any time fields set to non-zero values.");
> }
>
> //Calculate timezone difference as reviseMillis
>long reviseMillis = calendar.get(Calendar.ZONE_OFFSET) +
> calendar.get(Calendar.DST_OFFSET) / 60;
>
> //revising with reviseMillis
> long unixMillis = calendar.getTimeInMillis() + reviseMillis;
> return (int) (unixMillis / MILLIS_PER_DAY);
> }
>
> public static java.util.Date toLogical(Schema schema, int value) {
> if (schema.name() == null || !(schema.name().equals(LOGICAL_NAME)))
> throw new DataException("Requested conversion of Date object
> but the schema does not match.");
>
> //Calculate timezone difference as reviseMillis
>Calendar calendar = Calendar.getInstance(TimeZone.getDefault());
> long reviseMillis = calendar.get(Calendar.ZONE_OFFSET) +
> calendar.get(Calendar.DST_OFFSET) / 60;
> //revising with reviseMillis
>return new java.util.Date(value * MILLIS_PER_DAY - reviseMillis);
> }
> ```
>
> But I wonder, is there some reason to force this work?
> to check not to have any time fields set to non-zero values?
>
> What do you think guys?
>



-- 
*Dongjin Lee*

*A hitchhiker in the mathematical world.*

*github:  github.com/dongjinleekr
linkedin: kr.linkedin.com/in/dongjinleekr
slideshare:
www.slideshare.net/dongjinleekr
*


Re: Question regarding Producer and Duplicates

2016-09-05 Thread Florian Hussonnois
Thanks Guozhang,

So increasing batch.size can lead to more duplicates in case of failure.

Also when you said : "The broker will accept a batch of records as a whole
or reject them ". For example, if a producer request contains two batches,
a first one for Topic A / Partition 0 and a second one for Topic B /
Partition 1.
Does that means the batch 1 will failed if the acknowledgments for batch 2
cannot be satisfied ? Is a producer request is refused as a whole, or only
one batch within a request is refused ?

Thanks,


2016-08-31 20:53 GMT+02:00 Guozhang Wang :

> Hi Florian,
>
> The broker will accept a batch of records as a whole or reject them as a
> whole unless it encounters an IOException while trying to append the
> messages, which will be treated as a fatal error anyways.
>
> Duplicates usually happen when the whole batch is accepted but the ack was
> not delivered in time, and hence it was re-tried.
>
>
> Guozhang
>
>
> On Tue, Aug 30, 2016 at 2:45 AM, Florian Hussonnois  >
> wrote:
>
> > Hi all,
> >
> > I am using kafka_2.11-0.10.0.1, my understanding is that the producer API
> > batches records per partition to send efficient requests. We can
> configure
> > batch.size to increase the throughtput.
> >
> > However, in case of failure all records within the batch failed ? If that
> > is true,  does that mean that increasing batch.size can also increase the
> > number of duplicates in case of retries ?
> >
> > Thanks,
> >
> > Florian.
> >
>
>
>
> --
> -- Guozhang
>



-- 
Florian HUSSONNOIS


Re: Question regarding Producer and Duplicates

2016-08-31 Thread Guozhang Wang
Hi Florian,

The broker will accept a batch of records as a whole or reject them as a
whole unless it encounters an IOException while trying to append the
messages, which will be treated as a fatal error anyways.

Duplicates usually happen when the whole batch is accepted but the ack was
not delivered in time, and hence it was re-tried.


Guozhang


On Tue, Aug 30, 2016 at 2:45 AM, Florian Hussonnois 
wrote:

> Hi all,
>
> I am using kafka_2.11-0.10.0.1, my understanding is that the producer API
> batches records per partition to send efficient requests. We can configure
> batch.size to increase the throughtput.
>
> However, in case of failure all records within the batch failed ? If that
> is true,  does that mean that increasing batch.size can also increase the
> number of duplicates in case of retries ?
>
> Thanks,
>
> Florian.
>



-- 
-- Guozhang


Re: question in regard to KIP-30

2015-08-17 Thread Joe Stein
I don't think it makes sense to change the core default implementation with
KIP-30. Too much risk both in stability and in increasing the time to
getting it done and available for folks that want to try Kafka without
Zookeeper.

It would be interesting to see how that implementation would work along
with the others too if that would be something folks wanted to support in
their environment I encourage that.

I will be sending out a discuss thread on KIP-30 hopefully in the next day
or so we can go back and forth on the motivations and purposes behind it
and whatever technical details are required too of course.

~ Joe Stein

On Mon, Aug 17, 2015 at 9:29 AM, Sergiy Yevtushenko 
sergiy.yevtushe...@gmail.com wrote:

 Hi,

 Are there any plans to work on this improvement?
 As a possible core for default implementation it might worth to consider
 https://github.com/belaban/jgroups-raft .
 It already contains RAFT consensus algorithm implementation. Adding
 something like distributed hash map for shared metadata should not be an
 issue.

 Regards,
 Sergiy.



Re: question in regard to KIP-30

2015-08-17 Thread Sergiy Yevtushenko
Well, from my point of view it at least makes sense to abstract out the
implementation. Now any attempt to replace ZK with something else will lead
to significant efforts.
So, even providing of pluggable interface for the consensus and metadata
would be extremely helpful and would enable other people to provide
replacement for ZK-based implementation. There are many situations when ZK
is too inconvenient. For example, ZK is very inconvenient for embedding.
This in turn makes embedding of Kafka very inconvenient too (while Kafka
itself is quite convenient in this regard).

Also, jgroups-raft is only one implementation of RAFT. There are several
others which potentially can be used for this purpose.

2015-08-17 16:42 GMT+03:00 Joe Stein joe.st...@stealth.ly:

 I don't think it makes sense to change the core default implementation with
 KIP-30. Too much risk both in stability and in increasing the time to
 getting it done and available for folks that want to try Kafka without
 Zookeeper.

 It would be interesting to see how that implementation would work along
 with the others too if that would be something folks wanted to support in
 their environment I encourage that.

 I will be sending out a discuss thread on KIP-30 hopefully in the next day
 or so we can go back and forth on the motivations and purposes behind it
 and whatever technical details are required too of course.

 ~ Joe Stein

 On Mon, Aug 17, 2015 at 9:29 AM, Sergiy Yevtushenko 
 sergiy.yevtushe...@gmail.com wrote:

  Hi,
 
  Are there any plans to work on this improvement?
  As a possible core for default implementation it might worth to consider
  https://github.com/belaban/jgroups-raft .
  It already contains RAFT consensus algorithm implementation. Adding
  something like distributed hash map for shared metadata should not be an
  issue.
 
  Regards,
  Sergiy.
 



Re: Question about sub-projects and project merging

2015-07-13 Thread Greg Stein
Hi Jay,

Looking at your question, I see the Apache Samza and Apache Kafka
*communities* have little overlap(*). The Board looks at communities, and
their overlap or lack thereof. Smushing two communities under one TLP is
what we have historically called an umbrella TLP, and discourage.
Communities should be allowed to operate independently.

If you have *one* community, then one TLP makes sense.

If you have *two* communities, then increase the overlap. When they look
like one community, and that one community votes to merge TLPs ... then ask
for that.

Cheers,
-g

(*) 2 common PMC members, 3 common committers.


On Mon, Jul 13, 2015 at 12:37 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey board members,

 There is a longish thread on the Apache Samza mailing list on the
 relationship between Kafka and Samza and whether they wouldn't make a lot
 more sense as a single project. This raised some questions I was hoping to
 get advice on.

 Discussion thread (warning: super long, I attempt to summarize relevant
 bits below):

 http://mail-archives.apache.org/mod_mbox/samza-dev/201507.mbox/%3ccabyby7d_-jcxj7fizsjuebjedgbep33flyx3nrozt0yeox9...@mail.gmail.com%3E

 Anyhow, some people thought Apache has lot's of sub-projects, that would
 be a graceful way to step in the right direction. At that point others
 popped up and said, sub-projects are discouraged by the board.

 I'm not sure if we understand technically what a subproject is, but I
 think it means a second repo/committership under the same PMC.

 A few questions:
 - Is that what a sub-project is?
 - Are they discouraged? If so, why?
 - Assuming it makes sense in this case what is the process for making one?
 - Putting aside sub-projects as a mechanism what are examples where
 communities merged successfully? We were pointed towards Lucene/SOLR. Are
 there others?

 Relevant background info:
 - Samza depends on Kafka, but not vice versa
 - There is some overlap in committers but not extensive (3/11 Samza
 committers are also Kafka committers)

 Thanks for the advice!

 -Jay






Re: Question about concurrency during Log config change

2015-03-16 Thread Andrii Biletskyi
Jay,

Thanks for quick response. Yes, this might be not that harmful for users,
I'm not sure about that. But it definitely looks like data race. Your
solution
is simple and should work, hard to tell promptly when it's about
concurrency.

Initially I was looking through this code to understand whether we can
inherit
this approach for Global Brokers Config. In this case your solution will be
harder
to implement since we access broker's config in many-many different places.
But that's another story.

Thanks,
Andrii Biletskyi

On Mon, Mar 16, 2015 at 5:56 PM, Jay Kreps jay.kr...@gmail.com wrote:

 You are correct. Each read will be a valid value but there is no guarantee
 that subsequent reads will read from the same config. I don't think that is
 a problem, do you? If we want to strengthen the guarantee we can grab the
 config once in the method
val config = log.config
 and then do however many accesses against that variable which will remain
 constant even if the config is updated in the course of the method.

 -Jay

 On Mon, Mar 16, 2015 at 8:50 AM, Andrii Biletskyi 
 andrii.bilets...@stealth.ly wrote:

  Hi all,
 
  I was looking through the code related to dynamic Log config change
  feature and
  noticed the way we deal with concurrency there. I have a question about
 it.
 
  The Log class holds volatile LogConfig property, almost all methods in
  Log.scala are
  synchronized on private lock object. But the code in TopicConfigManager
  (
 
 
 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/server/TopicConfigManager.scala#L108
  )
  which substitutes Log's logConfig is not synchronized.
 
  Code execution example:
  Thread 1: Log.append - Log:288 config.*maxMessageSize* is accessed
 
 
 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/log/Log.scala#L288
 
  Thread 2: handles log config change - TopicConfigManager:108 (see above)
  substitutes
  log's config - changes *maxMessageSize* and *segmentSize*
 
  Thread 1: Log.append Log:299 - code accesses config.*segmentSize* and
  pickups updated
  config setting
 
 
 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/log/Log.scala#L299
 
  So looks like we accessed object in partial state - in scope of one
  procedure
  (Log.append) we took one setting from the old state (maxMessageSize), and
  the other
  one from the updated state.
 
  Methods in Log are synchronized, as mentioned above. But logConfig is
 only
  volatile
  which solves visibility problems but doesn't prevent it from being
 changed
  in other
  thread, as I understand.
 
  Am I missing something here?
 
  Thanks,
  Andrii Biletskyi
 



Re: Question about concurrency during Log config change

2015-03-16 Thread Jay Kreps
You are correct. Each read will be a valid value but there is no guarantee
that subsequent reads will read from the same config. I don't think that is
a problem, do you? If we want to strengthen the guarantee we can grab the
config once in the method
   val config = log.config
and then do however many accesses against that variable which will remain
constant even if the config is updated in the course of the method.

-Jay

On Mon, Mar 16, 2015 at 8:50 AM, Andrii Biletskyi 
andrii.bilets...@stealth.ly wrote:

 Hi all,

 I was looking through the code related to dynamic Log config change
 feature and
 noticed the way we deal with concurrency there. I have a question about it.

 The Log class holds volatile LogConfig property, almost all methods in
 Log.scala are
 synchronized on private lock object. But the code in TopicConfigManager
 (

 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/server/TopicConfigManager.scala#L108
 )
 which substitutes Log's logConfig is not synchronized.

 Code execution example:
 Thread 1: Log.append - Log:288 config.*maxMessageSize* is accessed

 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/log/Log.scala#L288

 Thread 2: handles log config change - TopicConfigManager:108 (see above)
 substitutes
 log's config - changes *maxMessageSize* and *segmentSize*

 Thread 1: Log.append Log:299 - code accesses config.*segmentSize* and
 pickups updated
 config setting

 https://github.com/apache/kafka/blob/7130da90a9ee9e6fb4beb2a2a6ab05c06c9bfac4/core/src/main/scala/kafka/log/Log.scala#L299

 So looks like we accessed object in partial state - in scope of one
 procedure
 (Log.append) we took one setting from the old state (maxMessageSize), and
 the other
 one from the updated state.

 Methods in Log are synchronized, as mentioned above. But logConfig is only
 volatile
 which solves visibility problems but doesn't prevent it from being changed
 in other
 thread, as I understand.

 Am I missing something here?

 Thanks,
 Andrii Biletskyi



Re: Question about ZookeeperConsumerConnector

2014-11-11 Thread Jun Rao
Hi, Jiangjie,

Thanks for the investigation. Yes, this seems like a real issue.

1. It doesn't seem that we need to put the shutdownCommand back into the
queue. Once an iterator receives a shutdownCommand, it will be in a Done
state and will remain in that state forever.

2. Yes, we just need to get the unique set of queues and put in a
shutdownCommand
per queue.

Jun


On Mon, Nov 10, 2014 at 7:27 PM, Becket Qin becket@gmail.com wrote:

 Hi,

 We encountered a production issue recently that Mirror Maker could not
 properly shutdown because ZookeeperConsumerConnector is blocked on
 shutdown(). After looking into the code, we found 2 issues that caused this
 problem.

 1. After consumer iterator receives the shutdownCommand, It puts the
 shutdownCommand back into the data chunk queue. Is there any reason for
 doing this?

 2. In ZookeeperConsumerConnector shutdown(), we could potentially put
 multiple shutdownCommand into the same data chunk queue, provided the
 topics are sharing the same data chunk queue in topicThreadIdAndQueues.
 (KAFKA-1764 is opened)

 In our case, we only have 1 consumer stream for all the topics, the data
 chunk queue capacity is set to 1. The execution sequence causing problem is
 as below:
 1. ZookeeperConsumerConnector shutdown() is called, it tries to put
 shutdownCommand for each queue in topicThreadIdAndQueues. Since we only
 have 1 queue, multiple shutdownCommand will be put into the queue.
 2. In sendShutdownToAllQueues(), between queue.clean() and
 queue.put(shutdownCommand), consumer iterator receives the shutdownCommand
 and put it back into the data chunk queue. After that,
 ZookeeperConsumerConnector tries to put another shutdownCommand into the
 data chunk queue but will block forever.


 The thread stack trace is as below:

 Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting
 on condition [0x7ff4f0124000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x000680b96bf0 (a
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at
 java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
 at

 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
 at
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
 at

 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262)
 at

 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259)
 at scala.collection.Iterator$class.foreach(Iterator.scala:727)
 at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
 at
 scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
 at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
 at

 kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259)
 at

 kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199)
 at

 kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192)
 - locked 0x000680dd5848 (a java.lang.Object)
 at

 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at

 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
 at scala.collection.immutable.List.foreach(List.scala:318)
 at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185)
 at kafka.tools.MirrorMaker$$anon$1.run(MirrorMaker.scala:169)


 Thanks.

 Jiangjie (Becket) Qin



Re: Question about ZookeeperConsumerConnector

2014-11-11 Thread Guozhang Wang
Thanks Jun!

Jiangjie, could you file a JIRA? Thanks.

Guozhang

On Tue, Nov 11, 2014 at 9:27 AM, Jun Rao jun...@gmail.com wrote:

 Hi, Jiangjie,

 Thanks for the investigation. Yes, this seems like a real issue.

 1. It doesn't seem that we need to put the shutdownCommand back into the
 queue. Once an iterator receives a shutdownCommand, it will be in a Done
 state and will remain in that state forever.

 2. Yes, we just need to get the unique set of queues and put in a
 shutdownCommand
 per queue.

 Jun


 On Mon, Nov 10, 2014 at 7:27 PM, Becket Qin becket@gmail.com wrote:

  Hi,
 
  We encountered a production issue recently that Mirror Maker could not
  properly shutdown because ZookeeperConsumerConnector is blocked on
  shutdown(). After looking into the code, we found 2 issues that caused
 this
  problem.
 
  1. After consumer iterator receives the shutdownCommand, It puts the
  shutdownCommand back into the data chunk queue. Is there any reason for
  doing this?
 
  2. In ZookeeperConsumerConnector shutdown(), we could potentially put
  multiple shutdownCommand into the same data chunk queue, provided the
  topics are sharing the same data chunk queue in topicThreadIdAndQueues.
  (KAFKA-1764 is opened)
 
  In our case, we only have 1 consumer stream for all the topics, the data
  chunk queue capacity is set to 1. The execution sequence causing problem
 is
  as below:
  1. ZookeeperConsumerConnector shutdown() is called, it tries to put
  shutdownCommand for each queue in topicThreadIdAndQueues. Since we only
  have 1 queue, multiple shutdownCommand will be put into the queue.
  2. In sendShutdownToAllQueues(), between queue.clean() and
  queue.put(shutdownCommand), consumer iterator receives the
 shutdownCommand
  and put it back into the data chunk queue. After that,
  ZookeeperConsumerConnector tries to put another shutdownCommand into the
  data chunk queue but will block forever.
 
 
  The thread stack trace is as below:
 
  Thread-23 #58 prio=5 os_prio=0 tid=0x7ff440004800 nid=0x40a waiting
  on condition [0x7ff4f0124000]
 java.lang.Thread.State: WAITING (parking)
  at sun.misc.Unsafe.park(Native Method)
  - parking to wait for  0x000680b96bf0 (a
  java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
  at
  java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
  at
 
 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
  at
 
 java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
  at
 
 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:262)
  at
 
 
 kafka.consumer.ZookeeperConsumerConnector$$anonfun$sendShutdownToAllQueues$1.apply(ZookeeperConsumerConnector.scala:259)
  at scala.collection.Iterator$class.foreach(Iterator.scala:727)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
  at
  scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
  at
 
 
 kafka.consumer.ZookeeperConsumerConnector.sendShutdownToAllQueues(ZookeeperConsumerConnector.scala:259)
  at
 
 
 kafka.consumer.ZookeeperConsumerConnector.liftedTree1$1(ZookeeperConsumerConnector.scala:199)
  at
 
 
 kafka.consumer.ZookeeperConsumerConnector.shutdown(ZookeeperConsumerConnector.scala:192)
  - locked 0x000680dd5848 (a java.lang.Object)
  at
 
 
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
  at
 
 
 kafka.tools.MirrorMaker$$anonfun$cleanShutdown$1.apply(MirrorMaker.scala:185)
  at scala.collection.immutable.List.foreach(List.scala:318)
  at kafka.tools.MirrorMaker$.cleanShutdown(MirrorMaker.scala:185)
  at kafka.tools.MirrorMaker$$anon$1.run(MirrorMaker.scala:169)
 
 
  Thanks.
 
  Jiangjie (Becket) Qin
 




-- 
-- Guozhang


Re: Question regarding topic creation

2014-10-05 Thread Jun Rao
In 0.8.x, a topic doesn't necessarily get created on every broker. Each
topic can have one or more partitions. Each partition can have one or more
replicas. In general, the replicas are spread over the brokers in a
cluster. The producer obtains the metadata of a topic to determine which
broker to send a message to.

Thanks,

Jun

On Fri, Oct 3, 2014 at 8:09 AM, Sreenivasulu Nallapati 
sreenu.nallap...@gmail.com wrote:

 Hello,

 I am new to Kafka and trying to evaluate for one of our use case. I have a
 basic question regarding topic.

 Assume we have a 10 broker cluster. If a create a Topic, will it be created
 in all 10 brokers?
 If not, how it manages internally if we try to send a message to that
 perticular topic? assume we created a producer with all brokers

 Thanks in advance.


 Thanks
 Sreeni



Re: question about message sets and produce requests/responses

2014-07-23 Thread Joel Koshy
Not sure if I followed the question correctly, but S1 and S2 would be
message-sets to different partitions of T - the producer request
object contains a map of topic-partition - messageset). So each
partition would get its own ack. The response object is just a map of
partition - response status so the order of acks in the response is
the response status map's order.

Joel

On Wed, Jul 23, 2014 at 02:09:40PM -0700, Dave Peterson wrote:
 Hello,
 
 I have a question about produce requests and responses.
 Suppose I create a produce request consisting of a single topic T
 which contains two message sets S1 and S2, with S1 preceding
 S2.  In other words, the request looks like this:
 
 ( request header ( T ( S1, S2 ) ) )
 
 Is it possible that I may get a response in which the order of
 the ACKs for the individual message sets is reversed?  In other
 words, a response that looks like this:
 
 ( response header ( T ( ack for S2, ack for S1 ) ) )
 
 I know that if a request contains multiple topics, then the topics
 may arrive out of order in the response.  So I was wondering if
 ACKs for individual message sets within a topic can likewise
 arrive out of order in the response.
 
 Thanks,
 Dave



Re: question about message sets and produce requests/responses

2014-07-23 Thread Dave Peterson
Ok, that's all I need to know.  Thanks!


On Wed, Jul 23, 2014 at 4:33 PM, Joel Koshy jjkosh...@gmail.com wrote:

 Not sure if I followed the question correctly, but S1 and S2 would be
 message-sets to different partitions of T - the producer request
 object contains a map of topic-partition - messageset). So each
 partition would get its own ack. The response object is just a map of
 partition - response status so the order of acks in the response is
 the response status map's order.

 Joel

 On Wed, Jul 23, 2014 at 02:09:40PM -0700, Dave Peterson wrote:
  Hello,
 
  I have a question about produce requests and responses.
  Suppose I create a produce request consisting of a single topic T
  which contains two message sets S1 and S2, with S1 preceding
  S2.  In other words, the request looks like this:
 
  ( request header ( T ( S1, S2 ) ) )
 
  Is it possible that I may get a response in which the order of
  the ACKs for the individual message sets is reversed?  In other
  words, a response that looks like this:
 
  ( response header ( T ( ack for S2, ack for S1 ) ) )
 
  I know that if a request contains multiple topics, then the topics
  may arrive out of order in the response.  So I was wondering if
  ACKs for individual message sets within a topic can likewise
  arrive out of order in the response.
 
  Thanks,
  Dave




Re: question about message sets and produce requests/responses

2014-07-23 Thread Joel Koshy
Also, to be clear, by ack I mean an error code as described here:
https://cwiki.apache.org/confluence/display/KAFKA/A+Guide+To+The+Kafka+Protocol#AGuideToTheKafkaProtocol-ProduceResponse

On Wed, Jul 23, 2014 at 04:33:29PM -0700, Joel Koshy wrote:
 Not sure if I followed the question correctly, but S1 and S2 would be
 message-sets to different partitions of T - the producer request
 object contains a map of topic-partition - messageset). So each
 partition would get its own ack. The response object is just a map of
 partition - response status so the order of acks in the response is
 the response status map's order.
 
 Joel
 
 On Wed, Jul 23, 2014 at 02:09:40PM -0700, Dave Peterson wrote:
  Hello,
  
  I have a question about produce requests and responses.
  Suppose I create a produce request consisting of a single topic T
  which contains two message sets S1 and S2, with S1 preceding
  S2.  In other words, the request looks like this:
  
  ( request header ( T ( S1, S2 ) ) )
  
  Is it possible that I may get a response in which the order of
  the ACKs for the individual message sets is reversed?  In other
  words, a response that looks like this:
  
  ( response header ( T ( ack for S2, ack for S1 ) ) )
  
  I know that if a request contains multiple topics, then the topics
  may arrive out of order in the response.  So I was wondering if
  ACKs for individual message sets within a topic can likewise
  arrive out of order in the response.
  
  Thanks,
  Dave
 



Re: Question on Kafka .8 / Logstash client

2014-06-13 Thread István
I am not sure what you are trying to achieve but here is this:

https://github.com/joekiller/logstash-kafka

On Thu, Jun 12, 2014 at 11:06 AM, Sutanu Das sd2...@att.com wrote:
 Hi Kafka Users,


 1.   Is there a plugin/client for Logstash for Kakfa 0.8 ?


 2.   Is there any example of using Kafka using Logstash?



 I'm not in the mailing list yet so I'll really appreciate a email response - 
 thanks for all your contribution to Kafkas project



-- 
the sun shines for all


Re: Question about Kafka code contribution guidelines

2013-03-28 Thread Jun Rao
All committers have to file the following agreement:
http://www.apache.org/licenses/icla.txt

Contributor submits patches to a jira. The patches will be reviewed and if
accepted, they will be checked in by a committer. If a contributor makes
significant contribution, he/she will be nominated as a committer.

All contributed code has the Apache license.

Thanks,

Jun

On Thu, Mar 28, 2013 at 12:20 PM, Roshanak Afsarifard 
rafsarif...@actuate.com wrote:

 Dear Kafka development team,

 First of all thank you for all your efforts on Kafka. We are
 considering using Kafka in our software.
 In order to do so, we need your help to understand the processes in place
 to manage Intellectual Property, code ownership, and provenance in this
 project.

 Can you please answer the following questions:


 1.   What is the process to make someone a committer to this project,
 with the ability to checkin code to this project? Does the person have to
 agree to a certain terms of use? If so please point me to the terms of use
 document.



 2.   Did all the people who contributed code agree to contribute code
 under the existing license?



 3.   Outside the committers, how are code contributions from the
 community accepted? Do external code contributors have to accept the same
 terms as above, before their code is integrated into the project?



 4.   Does anyone else have rights to the code that has been
 contributed to this project?


 Thank you in advance.

 Roshanak Afsarifard
 Director, Engineering
 Actuate Corporation
 2207 Bridgepointe Parkway, Suite 500
 San Mateo, CA  94404
 Tel: (650) 645-3913  Fax: (650) 645-3700
 Email: rafsarif...@actuate.commailto:rafsarif...@actuate.com
 www.actuate.comhttp://www.actuate.com/




Re: Question about offsets

2013-01-30 Thread David Arthur

Thanks, Jay. Makes sense

+1 for documentation. I think in general the wiki is a bit unorganized, 
makes it hard to know what is the current info vs old proposals.


-David

On 1/30/13 12:01 AM, Jay Kreps wrote:

Yes, offsets are now logical 0,1,2,3...

Some details on this change are here:
https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal
https://issues.apache.org/jira/browse/KAFKA-506
https://issues.apache.org/jira/browse/KAFKA-631

There was some previous threads on this that discussed the benefits. The
main goals included:
1. Allow key-based retention policy in addition to time-based retention.
This allows the topic to retain the latest value for each message key for
data streams that contain updates.
2. Fix the semantics of compressed messages (which are weird in 0.7 because
the compressed messages essentially have no offset and are unaddressable
hence commit() in the middle of a compressed message set will give
duplicates!).
3. Allow odd iteration patterns (skipping ahead N messages, reading
messages backwards, etc)

It is also just nicer for humans and less prone to weird errors.

To answer your questions:

1. Lookups are done using a memory mapped index file kept with the log that
maps logical offsets to byte offsets. This index is sparse (need not
contain every message). Once the begin and end position in the log segment
are found the data transfer happens the same as before.

2. The rule is that all messages have offsets stored with the messages.
These are assigned by the server. So in your example, the offsets of B1, B2
will be sequentially assigned and the wrapper message will have the offset
of the last message in the compressed set.

As a result in 0.8 requesting any offset in the compressed message set will
yield that compressed messageset (you need the whole thing to decompress).
This means the java client now works correctly with commit() and compressed
message sets--if you commit in the middle of a compressed message set and
restart from that point you will read the compressed set again, throw away
the messages with offset lower than your consumers offset, and then start
consuming each message after that.

Hope that makes sense.

We really need to get this stuff documented since it is all floating around
on various wikis and JIRAs.

-Jay


On Tue, Jan 29, 2013 at 8:38 PM, David Arthur mum...@gmail.com wrote:


Fiddling with my Python client on 0.8, noticed something has changed with
offsets.

It seems that instead of a byte offset in the log file, the offset is now
a logical one. I had a few questions about this:

1) How is the byte offset determined by the broker? Since messages are not
fixed width, does it use an index or do a simple binary search?
2) Regarding compressed MessageSets, how are the offsets incremented?
Suppose I have the following MessageSet

MessageSet A
  - Message A1, normal message
  - Message A2, normal message
  - Message A3, compressed MessageSet B
  - MessageSet B
 - Message B1
 - Message B2

Assuming we start from 0, message A1 gets offset of 0, A2 of 1. Now I am
unclear how the numbering goes. Would A3 get offset 3, and B1 - 4 B2 - 5?
Or are the offsets inside a compressed MessageSet not used?

Is it possible to request a message inside a compressed message set?

Also what about nested compression sets, what if Message B3 is itself a
compressed MessageSet (not that it makes sense, just curious what would
happen).

Thanks!
-David








Re: Question about offsets

2013-01-29 Thread Jay Kreps
Yes, offsets are now logical 0,1,2,3...

Some details on this change are here:
https://cwiki.apache.org/confluence/display/KAFKA/Keyed+Messages+Proposal
https://issues.apache.org/jira/browse/KAFKA-506
https://issues.apache.org/jira/browse/KAFKA-631

There was some previous threads on this that discussed the benefits. The
main goals included:
1. Allow key-based retention policy in addition to time-based retention.
This allows the topic to retain the latest value for each message key for
data streams that contain updates.
2. Fix the semantics of compressed messages (which are weird in 0.7 because
the compressed messages essentially have no offset and are unaddressable
hence commit() in the middle of a compressed message set will give
duplicates!).
3. Allow odd iteration patterns (skipping ahead N messages, reading
messages backwards, etc)

It is also just nicer for humans and less prone to weird errors.

To answer your questions:

1. Lookups are done using a memory mapped index file kept with the log that
maps logical offsets to byte offsets. This index is sparse (need not
contain every message). Once the begin and end position in the log segment
are found the data transfer happens the same as before.

2. The rule is that all messages have offsets stored with the messages.
These are assigned by the server. So in your example, the offsets of B1, B2
will be sequentially assigned and the wrapper message will have the offset
of the last message in the compressed set.

As a result in 0.8 requesting any offset in the compressed message set will
yield that compressed messageset (you need the whole thing to decompress).
This means the java client now works correctly with commit() and compressed
message sets--if you commit in the middle of a compressed message set and
restart from that point you will read the compressed set again, throw away
the messages with offset lower than your consumers offset, and then start
consuming each message after that.

Hope that makes sense.

We really need to get this stuff documented since it is all floating around
on various wikis and JIRAs.

-Jay


On Tue, Jan 29, 2013 at 8:38 PM, David Arthur mum...@gmail.com wrote:

 Fiddling with my Python client on 0.8, noticed something has changed with
 offsets.

 It seems that instead of a byte offset in the log file, the offset is now
 a logical one. I had a few questions about this:

 1) How is the byte offset determined by the broker? Since messages are not
 fixed width, does it use an index or do a simple binary search?
 2) Regarding compressed MessageSets, how are the offsets incremented?
 Suppose I have the following MessageSet

 MessageSet A
  - Message A1, normal message
  - Message A2, normal message
  - Message A3, compressed MessageSet B
  - MessageSet B
 - Message B1
 - Message B2

 Assuming we start from 0, message A1 gets offset of 0, A2 of 1. Now I am
 unclear how the numbering goes. Would A3 get offset 3, and B1 - 4 B2 - 5?
 Or are the offsets inside a compressed MessageSet not used?

 Is it possible to request a message inside a compressed message set?

 Also what about nested compression sets, what if Message B3 is itself a
 compressed MessageSet (not that it makes sense, just curious what would
 happen).

 Thanks!
 -David