[jira] [Resolved] (KAFKA-13865) Fix ResponseSendTimeMs metric in RequestChannel.scala was removed repeatedly

2022-05-02 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13865.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

>  Fix ResponseSendTimeMs metric  in RequestChannel.scala  was removed 
> repeatedly
> ---
>
> Key: KAFKA-13865
> URL: https://issues.apache.org/jira/browse/KAFKA-13865
> Project: Kafka
>  Issue Type: Bug
>Reporter: zhaobo
>Priority: Minor
> Fix For: 3.3.0
>
>
> ResponseSendTimeMs metric was removed in line 576,but we removed it again in 
> line 578.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [kafka-clients] Re: [VOTE] 3.2.0 RC0

2022-05-02 Thread Bruno Cadonna

Hi Guozhang,

Thank you for your message!

The performance regression did not find its way to the 3.2 branch. In 
other words, the commit that is fixed by your commit is not on branch 3.2.


Best,
Bruno


On 27.04.22 21:36, Guozhang Wang wrote:

Hi Bruno,

Could I also have this commit (
https://github.com/apache/kafka/commit/e026384ffb3170a2e71053a4163db58b9bd8fba6)
in the next release candidate? It's fixing a performance regression that
was just introduced, and not yet released in older versions.


Guozhang

On Tue, Apr 26, 2022 at 11:01 AM Jun Rao  wrote:


Hi, Bruno.

Thanks for the reply. Your understanding is correct. This is a regression
introduced only in the 3.2 branch.

Sorry for the late notice.

Jun

On Tue, Apr 26, 2022 at 10:04 AM Bruno Cadonna  wrote:


Hi Jun,

Thank you for your message!

Now I see how this issue was introduced in 3.2.0. The fix for the bug
described in KAFKA-12841 introduced it, right? I initially understood
that the PR you want to include is the fix for the bug described in
KAFKA-12841 which dates back to 2.6.

I think that classifies as a regression.

I will abort the voting and create a new release candidate.

Best,
Bruno

On 26.04.22 18:09, 'Jun Rao' via kafka-clients wrote:

Hi, Bruno,

Could we include https://github.com/apache/kafka/pull/12064
 in 3.2.0? This fixes an
issue introduced in 3.2.0 where in some of the error cases, the

producer

interceptor is called twice for the same record.

Thanks,

Jun

On Tue, Apr 26, 2022 at 6:34 AM Bruno Cadonna mailto:cado...@apache.org>> wrote:

 Hi all,

 This is a gently reminder to vote for the first candidate for
 release of
 Apache Kafka 3.2.0.

 I added the 3.2 documentation to the kafka site. That means
 https://kafka.apache.org/32/documentation.html
  works now.

 A successful system tests run can be found here:
 https://jenkins.confluent.io/job/system-test-kafka/job/3.2/24/
 

 Thank you to Michal for voting on the release candidate.

 Best,
 Bruno

 On 15.04.22 21:05, Bruno Cadonna wrote:
  > Hello Kafka users, developers and client-developers,
  >
  > This is the first candidate for release of Apache Kafka 3.2.0.
  >
  > * log4j 1.x is replaced with reload4j (KAFKA-9366)
  > * StandardAuthorizer for KRaft (KIP-801)
  > * Send a hint to the partition leader to recover the partition
 (KIP-704)
  > * Top-level error code field in DescribeLogDirsResponse

(KIP-784)

  > * kafka-console-producer writes headers and null values (KIP-798

and

  > KIP-810)
  > * JoinGroupRequest and LeaveGroupRequest have a reason attached
 (KIP-800)
  > * Static membership protocol lets the leader skip assignment
 (KIP-814)
  > * Rack-aware standby task assignment in Kafka Streams (KIP-708)
  > * Interactive Query v2 (KIP-796, KIP-805, and KIP-806)
  > * Connect APIs list all connector plugins and retrieve their
  > configuration (KIP-769)
  > * TimestampConverter SMT supports different unix time precisions
 (KIP-808)
  > * Connect source tasks handle producer exceptions (KIP-779)
  >
  > Release notes for the 3.2.0 release:
  >


https://home.apache.org/~cadonna/kafka-3.2.0-rc0/RELEASE_NOTES.html

 <

https://home.apache.org/~cadonna/kafka-3.2.0-rc0/RELEASE_NOTES.html


  >
  > *** Please download, test and vote by Monday, April 25, 9am CEST
  >
  > Kafka's KEYS file containing PGP keys we use to sign the

release:

  > https://kafka.apache.org/KEYS 
  >
  > * Release artifacts to be voted upon (source and binary):
  > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/
 
  >
  > * Maven artifacts to be voted upon:
  >


https://repository.apache.org/content/groups/staging/org/apache/kafka/

 <

https://repository.apache.org/content/groups/staging/org/apache/kafka/>

  >
  > * Javadoc:
  > https://home.apache.org/~cadonna/kafka-3.2.0-rc0/javadoc/
 
  >
  > * Tag to be voted upon (off 3.2 branch) is the 3.2.0 tag:
  > https://github.com/apache/kafka/releases/tag/3.2.0-rc0
 
  >
  > * Documentation (not yet ported to kafka-site):
  > https://kafka.apache.org/32/documentation.html
 
  >
  > * Protocol:
  > https://kafka.apache.org/32/protocol.html
 
  >
  > * Successful Jenkins builds for the 3.2 branch:
  > I'll share a link once the builds complete
  >
  >
  > /**
  >
  

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #900

2022-05-02 Thread Apache Jenkins Server
See 




Re: Re: [Discuss] KIP-581: Value of optional null field which has default value

2022-05-02 Thread Mickael Maison
Hi Cheng Pan,

Thanks for raising this KIP, this would be a useful improvement!

You never started a VOTE thread for this KIP. Are you interested in
finishing up this work?
If so, based on the discussion, I think you can open a VOTE thread as
described in 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#KafkaImprovementProposals-Process
If not, it's not a problem, someone else can volunteer to pick it up.

Please let us know.

Thanks,
Mickael

On Sat, Aug 8, 2020 at 11:05 AM Ruslan Gibaiev  wrote:
>
> Hello guys.
> Proposed PR seems to be fixing the issue in a backward-compatible way.
> Let's please move forward with it. Would be great to see it included into 
> next Kafka release
> Thank you
>
> On 2020/07/29 02:49:07, "379377...@qq.com" <379377...@qq.com> wrote:
> > Hi Chris,
> >
> > Thanks for your good suggestion, the KIP document and draft PR has been 
> > updated, please review again.
> >
> > And I found due to my misoperation, the mail thread has been broken, no 
> > idea how to fix it.
> >
> >
> >
> >
> > Thanks
> > Cheng Pan
> >
> > From: Christopher Egerton
> > Date: 2020-05-04 10:53
> > To: dev
> > Subject: Re: [Discuss] KIP-581: Value of optional null field which has 
> > default value
> > Hi Cheng,
> >
> > I think refactoring that method should be fine (if maybe a little painful);
> > the method itself is private and all places that it's invoked directly are
> > either package-private or non-static, so it shouldn't affect any of the
> > public methods of the JSON converter to change "convertToConnect" to be
> > non-static. Even if it did, the only parts of the JSON converter that are
> > public API (and therefore officially subject to concerns about
> > compatibility) are the methods it implements that satisfy the "Converter"
> > and "HeaderConverter" interfaces.
> >
> > Would you mind explicitly specifying in the KIP that the new property will
> > be added for the JSON converter only, and that it will affect both
> > serialization and deserialization?
> >
> > Cheers,
> >
> > Chris
> >
> > On Tue, Apr 28, 2020 at 10:52 AM 379377944 <379377...@qq.com> wrote:
> >
> > > Hi Chris,
> > >
> > >
> > > Thanks for your reminder, the original implement is deprecated, I just
> > > update the JIRA with the new
> > > PR link:  https://github.com/apache/kafka/pull/8575
> > >
> > >
> > > As question 2), I agree with you that we should consider both
> > > serialization and deserialization, and as you said, I only implement the
> > > serialization now. This is  because the original serde implement is not
> > > symmetrical, the convertToConnect is a static method and can’t access the
> > > field in JsonConverter
> > > instance, maybe I should do some refactoring to implement the
> > > deserialization.
> > >
> > >
> > > Thanks,
> > > Cheng Pan
> > >  Original Message
> > > Sender: Christopher Egerton
> > > Recipient: dev
> > > Date: Wednesday, Apr 15, 2020 02:28
> > > Subject: Re: [Discuss] KIP-581: Value of optional null field which has
> > > default value
> > >
> > >
> > > Hi Cheng, Thanks for the KIP! I really appreciate the care that was taken
> > > to ensure backwards compatibility for existing users, and the minimal
> > > changes to public interface that are suggested to address this. I have two
> > > quick requests for clarification: 1) Where is the proposed
> > > "accept.optional.null" property going to apply? It's hinted that it'll 
> > > take
> > > effect on the JSON converter but not actually called out anywhere. 2)
> > > Assuming this takes effect on the JSON converter, is the intent to alter
> > > the semantics for both serialization and deserialization? The code snippet
> > > from the JSON converter that's included in the KIP comes from the
> > > "convertToJson" method, which is used for serialization. However, based on
> > > https://github.com/apache/kafka/blob/ea47a885b1fe47dfb87c1dc86db1b0e7eb8a273c/connect/json/src/main/java/org/apache/kafka/connect/json/JsonConverter.java#L712-L713
> > > it looks like the converter also inserts the default value for
> > > optional-but-null data during deserialization. Thanks again for the KIP!
> > > Cheers, Chris On Wed, Mar 18, 2020 at 12:00 AM Cheng Pan 
> > > <379377...@qq.com>
> > > wrote: > Hi all, > > I'd like to use this thread to discuss KIP-581: Value
> > > of optional null > field which has default value, please see detail at: >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-581%3A+Value+of+optional+null+field+which+has+default+value
> > > > > > There are some previous discussion at: >
> > > https://github.com/apache/kafka/pull/7112 > > > I'm a beginner for apache
> > > project, please let me know if I did any thing > wrong. > > > Best 
> > > regards,
> > > > Cheng Pan
> >


[Discuss] KIP-832: Allow creating a producer/consumer using a producer/consumer config

2022-05-02 Thread lqjacklee
Hi Bruno and frosiere,

Thanks for raising this KIP, this would be a useful improvement!

I advise just expose Map and Properties to the user. so that the parameters
will be consistent with old one.

the Advise like :

1, consumer api:
public KafkaConsumer(Map configs,

 Deserializer keyDeserializer,
 Deserializer valueDeserializer,
 List> interceptors)

2, producer api:

public KafkaProducer(Map configs, Serializer
keySerializer, Serializer valueSerializer,
List> interceptors)

Any idea?

Please let me know.
Thanks,
lqjack


Re: Possible Bug: kafka-reassign-partitions causing the data retention time to be reset

2022-05-02 Thread Fares Oueslati
Hello Lqjacklee,

Is there any news on this please, especially regarding my last message?

Do you think it is possible to modify the segment files manually with touch
-a -m -t 203801181205.09 my_sement_file? I could keep the original
modification date of the files before the move and update them afterwards.

Thanks a lot!
Fares

On Fri, Apr 29, 2022 at 3:24 AM Fares Oueslati 
wrote:

> Thanks for your help!
>
> I'm not sure how that would help me though. I'm not actually trying to
> decommission a Kafka broker.
> I would like to move all the data from one disk (log.dir) to another
> within the same broker while keeping the original modification time of the
> moved segment files.
> After that I would like to delete the disk, not the broker.
>
> Kind Regards,
> Fares
>
> On Thu, Apr 28, 2022 at 7:05 PM lqjacklee  wrote:
>
>> The resource (https://mike.seid.io/blog/decommissiong-a-kafka-node.html)
>> may help you.
>> I have created (https://issues.apache.org/jira/browse/KAFKA-13860) to
>> replay the case .
>>
>> On Thu, Apr 28, 2022 at 10:33 PM Fares Oueslati 
>> wrote:
>>
>>> Hello,
>>>
>>> I'm not sure how to report this properly but I didn't get any answer in
>>> the
>>> user mailing list.
>>>
>>> In order to remove a disk in a JBOD setup, I moved all data from one disk
>>> to another on every Kafka broker using kafka-reassign-partitions, then I
>>> went through some weird behaviour.
>>> Basically, the disk storage kept increasing even though there is no
>>> change
>>> on bytes in metric per broker.
>>> After investigation, I’ve seen that all segment log files in the new
>>> log.dir had a modification date set to the moment when the move had been
>>> done.
>>> So I guess the process applying the retention policy (log cleaner?) uses
>>> that timestamp to check whether the segment file should be deleted or
>>> not.
>>> So I ended up with a lot more data than we were supposed to store, since
>>> we
>>> are basically doubling the retention time of all the freshly moved data.
>>>
>>> This seems to me to be a buggy behavior of the command, is it possible to
>>> create a JIRA to track and eventually fix this?
>>> The only option I see to fix it is to keep the modification date before
>>> moving the data and applying it manually afterwards for every segment
>>> file, touching those files manually doesn't seem very safe imho.
>>>
>>> Thanks
>>> Fares Oueslati
>>>
>>


Re: [DISCUSS] KIP-782: Expandable batch size in producer

2022-05-02 Thread Jack Vanlightly
The current configs are hard to use for the Kafka user and a little inflexible 
so I am pleased to see the discussion.

Ultimately we want flexibility. We don't want to force users to understand the 
underlying implementation/protocol and we want the producer to handle high or 
low throughput efficiently - with the right amount of inflight data for the 
conditions.

Given each producer by default has limited inflight requests per connection 
then we need extra flexibility in the batch sizing.

The way I would like to be able to approach as a user is that 1) I use 
linger.ms to bound my client-side produce latency 2) I use request.size and 
max.inflight.requests.per.connection to calculate my upper-bound inflight data. 
As a user, I don't have to think about anything else (no edge cases or 
caveats). The producer handles the batching intelligently, trying to fully 
utilise the request sizes while respecting the linger.

I understand Jun's point about increasing the default batch size. The only 
downside I see is having to juggle the request vs batch size to avoid delaying 
batches because we can only fit so many big batches in a request. That is a 
calculation we force on the user and reduces flexibility.

As I understand the proposed change, each ProducerBatch would be physically 
made up of a list of ByteBuffers that allows for dynamically increasing the 
batch size in increments of batch.initial.size.

The documented benefit is that we can send large batches when there is lots of 
data ready to be sent, while also not having to wait for linger.ms when there 
is less data. Comparing this to Jun's suggested approach, it looks the same 
except that we get size triggered batches instead of linger when throughput is 
low. How big an advantage is that?

An alternative dynamic sizing strategy is one based on fairness. Batches are 
dynamically sized in a fair way that more fairly distributes the data of each 
partition across the request.

Messages are added to small sub-batches (say 4kb or 16kb) in the accumulator. 
When draining the sub-batches in the Sender, the logic selects sub-batches by 
creation order repeatedly until constraints are met (like request.size). All 
sub-batches of the same partition are grouped into a single batch for sending. 

This way the user can set a high batch.size (could even default it to request 
size) as the final batches will be fairly distributed across partitions and we 
should get good utilisation of the request.size (and still respecting linger).

Example with request.size=128kb, 16kb sub-batches, 1 broker and picking 
sub-batches in creation order. None have reached linger in this example for 
simplicity.

Accumulator
p0: [sb1{t1},sb2{t2},sb3{t3},sb4{t4},sb5{t5},sb6{t6},sb7{t7},sb8{t8}]
p1: [sb1{t1}]
p2: [sb1{t3}]
p3: [sb1{t7}]

Request 1
p0: [sb1-sb5]
p1: [sb1]
p2: [sb1]

Request 2 (with no additional data having had arrived)
p0: [sb6-sb8]
p3: [sb1]

The downsides are a decent refactoring.
There would need to be changes to the housekeeping of how callbacks are managed 
for example. There might be more impact that I am not aware of, I'm fairly new 
to this code.

Thanks
Jack

On 2021/12/13 19:15:37 Jun Rao wrote:
> Hi, Lucas,
> 
> Thanks for the reply. It would be useful to summarize the benefits of a
> separate batch.max.size. To me, it's not clear why a user would want two
> different batch sizes. In your example, I can understand why a user would
> want to form a batch with a 5ms linger. But why would a user prefer 16K
> batches with 5ms linger, if say 256K is deemed best for throughput?
> 
> Thanks,
> 
> Jun
> 
> On Fri, Dec 10, 2021 at 4:35 PM Lucas Bradstreet 
> wrote:
> 
> > Hi Jun,
> >
> > One difference compared to increasing the default batch size is that users
> > may actually prefer smaller batches but it makes much less sense to
> > accumulate many small batches if a batch is already sending.
> >
> > For example, imagine a user that prefer 16K batches with 5ms linger.
> > Everything is functioning normally and 16KB batches are being sent. Then
> > there's a 500ms blip for that broker. Do we want to continue to accumulate
> > 16KB batches, each of which requires a round trip, or would we prefer to
> > accumulate larger batches while sending is blocked?
> >
> > I'm not hugely against increasing the default batch.size in general, but
> > batch.max.size does seem to have some nice properties.
> >
> > Thanks,
> >
> > Lucas
> >
> > On Fri, Dec 10, 2021 at 9:42 AM Jun Rao  wrote:
> >
> > > Hi, Artem, Luke,
> > >
> > > Thanks for the reply.
> > >
> > > 11. If we get rid of batch.max.size and increase the default batch.size,
> > > it's true the behavior is slightly different than before. However, does
> > > that difference matter to most of our users? In your example, if a user
> > > sets linger.ms to 100ms and thinks 256KB is good for throughput, does it
> > > matter to deliver any batch smaller than 256KB before 100ms? I also find
> > it
> > > a bit hard to explain to

Re: [DISCUSS] KIP-830: Allow disabling JMX Reporter

2022-05-02 Thread Mickael Maison
HI Federico,

Thanks for taking a look.

I think the method you suggest is what I included in the rejected
alternatives section. The reason I rejected it is because it would
force people to change their configuration in order to preserve the
current behavior, hence a breaking change.

Thanks,
Mickael

On Fri, Apr 22, 2022 at 6:26 PM Federico Valeri  wrote:
>
> Hi Mickael, what about setting the default value to JmxReporter while
> also maintaining the old behavior? When a user is setting
> metric.reporters without explicitly including the JmxReporter, we
> could simply print a warning message which says "The automatic
> addition of the JmxReporter is deprecated. Add it explicitly to the
> metric.reporters list if needed.", or something like that. That way we
> can avoid adding a new property, and drop the implicit setting in the
> next major release.
>
> On Wed, Apr 13, 2022 at 11:49 AM Mickael Maison
>  wrote:
> >
> > Hi,
> >
> > I've created a small KIP to allow disabling JMXreporter:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
> >
> > Let me know if you have any feedback.
> >
> > Thanks,
> > Mickael


Re: [DISCUSS] KIP-830: Allow disabling JMX Reporter

2022-05-02 Thread Ismael Juma
Hi Mickael,

I think Federico is suggesting that we would do this in AK 4.0 and hence
the breaking change (with the appropriate deprecation period) would be
possible. I personally prefer that option too. Is there an urgent need to
do this before 4.0?

Ismael

On Mon, May 2, 2022 at 8:58 AM Mickael Maison 
wrote:

> HI Federico,
>
> Thanks for taking a look.
>
> I think the method you suggest is what I included in the rejected
> alternatives section. The reason I rejected it is because it would
> force people to change their configuration in order to preserve the
> current behavior, hence a breaking change.
>
> Thanks,
> Mickael
>
> On Fri, Apr 22, 2022 at 6:26 PM Federico Valeri 
> wrote:
> >
> > Hi Mickael, what about setting the default value to JmxReporter while
> > also maintaining the old behavior? When a user is setting
> > metric.reporters without explicitly including the JmxReporter, we
> > could simply print a warning message which says "The automatic
> > addition of the JmxReporter is deprecated. Add it explicitly to the
> > metric.reporters list if needed.", or something like that. That way we
> > can avoid adding a new property, and drop the implicit setting in the
> > next major release.
> >
> > On Wed, Apr 13, 2022 at 11:49 AM Mickael Maison
> >  wrote:
> > >
> > > Hi,
> > >
> > > I've created a small KIP to allow disabling JMXreporter:
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
> > >
> > > Let me know if you have any feedback.
> > >
> > > Thanks,
> > > Mickael
>


Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API

2022-05-02 Thread Mickael Maison
Hi Divij,

The new fields default to -1 in the protocol too. So in case a broker
hits an Exception accessing a logdir and sends an error response back,
the client will get -1. For this reason I think UNKNOWN_SPACE is
slightly better than UNSUPPORTED.

Thanks,
Mickael

On Fri, Apr 22, 2022 at 9:35 AM Tom Bentley  wrote:
>
> Hi Mickael,
>
> Thanks for the KIP, I can see this would be useful.
>
> I guess you could have used optional tagged fields, rather than bumping the
> version, but then again I don't see it being particularly advantageous in
> this case either.
>
> Kind regards,
>
> Tom
>
> On Tue, 19 Apr 2022 at 10:23, Divij Vaidya  wrote:
>
> > I have a minor suggestion below but overall KIP looks good to me to start a
> > vote.
> >
> > *Reg#6* Would you consider replacing UNKNOWN_SPACE with UNSUPPORTED?
> > UNSUPPORTED tells the user explicitly that the value is missing due to
> > client/server version mismatch whereas with UNKNOWN_SPACE, the user is left
> > wondering whether it is a problem with underlying storage not providing
> > space information or something else.
> >
> > Divij Vaidya
> >
> >
> >
> > On Fri, Apr 15, 2022 at 3:40 PM Mickael Maison 
> > wrote:
> >
> > > Hi Luke,
> > >
> > > 7. I've updated the KIP to clarify these sizes are in bytes.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Fri, Apr 15, 2022 at 12:16 PM Luke Chen  wrote:
> > > >
> > > > Hi Mickael,
> > > >
> > > > Thanks for the KIP!
> > > > This is a good improvement.
> > > >
> > > > (3) +1 for not adding the number of files in the directory. Counting
> > the
> > > > file numbers should be slow.
> > > > (7) Could you make the fields clear in `DescribeLogDirsResponse`, to
> > > > mention the returned number is size in Byte (or not?)
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Fri, Apr 15, 2022 at 5:27 PM Mickael Maison <
> > mickael.mai...@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > Thanks for the feedback.
> > > > >
> > > > > 3. Yes that's right. Also the number of file descriptors is really
> > not
> > > > > a property of log directories. Administrators typically tracked that
> > > > > count per process and for the whole operating system.
> > > > >
> > > > > 5. That's a good point, I've updated the KIP to mention sizes will be
> > > > > capped to Long.MAX_VALUE even if the actual storage is larger.
> > > > >
> > > > > 6. Brokers would never return UNKNOWN_SPACE. When new clients query
> > > > > older brokers via the admin API, the admin client will use
> > > > > UNKNOWN_SPACE to indicate these values weren't provided by brokers.
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > > > >
> > > > > On Fri, Apr 8, 2022 at 5:00 PM Divij Vaidya  > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks for replying. I still have a few lingering
> > questions/comments.
> > > > > >
> > > > > > *Reg#1* Understood. I checked and the underlying system call is
> > > statvfs
> > > > > for
> > > > > > unix systems which should be ok to call here.
> > > > > > *Reg#2* Fair point. I checked again and yes, log.dir always means
> > > local
> > > > > > storage even when tiered storage is enabled.
> > > > > > *Reg#3* The rationale for adding these new (size) fields to the
> > > > > > `DescribeLogDirs` is to allow the administrator to monitor or
> > perhaps
> > > > > take
> > > > > > automated action based on results. Doesn't monitoring the number of
> > > file
> > > > > > descriptors fall in the same category of use cases? I am assuming
> > > that we
> > > > > > want to add the size information in the API response because JVM
> > > makes it
> > > > > > possible to get this information in a platform agnostic manner
> > which
> > > is
> > > > > not
> > > > > > true for open file descriptors, correct?
> > > > > > *Reg#4* Agree.
> > > > > > *New#5*: As an FYI, Java FileStore API breaks on large storage
> > sizes.
> > > > > See:
> > > > > > https://bugs.openjdk.java.net/browse/JDK-8162520. ElasticSearch
> > has
> > > been
> > > > > > hit by these limitations in the past. For JDK 11, you will probably
> > > have
> > > > > to
> > > > > > add defensive checks such as
> > > > > >
> > > > >
> > >
> > https://github.com/opensearch-project/OpenSearch/blob/b74d71fb747cc2873d4c2ffae825944da4d06e1b/server/src/main/java/org/opensearch/monitor/fs/FsProbe.java#L148
> > > > > .
> > > > > > The documentation of the API mentioned in KIP will also be modified
> > > to
> > > > > > account for this edge case.
> > > > > > *New#6*: Can you please provide an example where the return for
> > these
> > > > > APIs
> > > > > > would be UNKNOWN_SPACE? Doesn't JVM guarantee that this API will
> > > > > definitely
> > > > > > return results (else it throws an IOException)? I would propose
> > that
> > > we
> > > > > get
> > > > > > rid of default since JVM guarantees that this would work on all
> > > > > platforms.
> > > > > > If it doesn't then it's a bug and should be uncovered via an
> > > exception.
> > > > > >
> > > > > > Also, I wou

Re: [DISCUSS] KIP-660: Pluggable ReplicaPlacer

2022-05-02 Thread Mickael Maison
Hi,

If there are no further comments, I'll start a vote in the next few days.

Thanks,
Mickael

On Wed, Mar 30, 2022 at 3:51 AM Luke Chen  wrote:
>
> Hi Mickael,
>
> Thanks for the update.
> It answered my questions!
>
> Thank you.
> Luke
>
> On Wed, Mar 30, 2022 at 12:09 AM Mickael Maison 
> wrote:
>
> > Hi Luke,
> >
> > Thanks for the feedback.
> >
> > 1. Thanks, fixed!
> > 2. Yes that's right. It's the same behavior for topic policies
> > 3. I've added details about how the mentioned scenarios could be
> > addressed. The information required to make such decisions is not part
> > of the Kafka cluster metadata so an external input is necessary. This
> > KIP does not propose a specific mechanism for doing it.
> >
> > I hope this answers your questions.
> >
> > Thanks,
> > Mickael
> >
> >
> > On Tue, Mar 29, 2022 at 5:42 PM Mickael Maison 
> > wrote:
> > >
> > > Hi Ryanne,
> > >
> > > That's a good point!
> > >
> > > There's no value in having all implementations perform the same sanity
> > > checks. If the replication factor is < 1 or larger than the current
> > > number of registered brokers, the controller should directly throw
> > > InvalidReplicationFactorException and not call the ReplicaPlacer. I've
> > > updated the KIP so the place() method now only throws
> > > ReplicaPlacementException.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > >
> > > On Mon, Mar 28, 2022 at 6:20 PM Ryanne Dolan 
> > wrote:
> > > >
> > > > Wondering about InvalidReplicationFactorException. Why would an
> > > > implementation throw this? Given the information passed to the method,
> > > > seems like this could only be thrown if there were obviously invalid
> > > > arguments, like a negative number or zero. Can we just guarantee such
> > > > invalid arguments aren't passed in?
> > > >
> > > > Ryanne
> > > >
> > > > On Sat, Mar 26, 2022, 8:51 AM Luke Chen  wrote:
> > > >
> > > > > Hi Mickael,
> > > > >
> > > > > Thanks for the KIP!
> > > > > It's indeed a pain point for the Kafka admins.
> > > > >
> > > > > I have some comments:
> > > > > 1. Typo in motivation section: When administrators [when to] remove
> > brokers
> > > > > from a cluster,
> > > > > 2. If different `replica.placer.class.name` configs are set in all
> > > > > controllers, I think only the config for  "active controller" will
> > take
> > > > > effect, right?
> > > > > 3. Could you explain more about how the proposal fixes some
> > scenarios you
> > > > > listed, ex: the new added broker case. How could we know the broker
> > is new
> > > > > added? I guess it's by checking the broker load via some metrics
> > > > > dynamically, right?
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Fri, Mar 18, 2022 at 10:30 AM Ryanne Dolan  > >
> > > > > wrote:
> > > > >
> > > > > > Thanks Mickael, this makes sense to me! I've been wanting
> > something like
> > > > > > this in order to decommission a broker without new partitions
> > getting
> > > > > > accidentally assigned to it.
> > > > > >
> > > > > > Ryanne
> > > > > >
> > > > > > On Thu, Mar 17, 2022, 5:56 AM Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I'd like to start a new discussion on KIP-660. I originally
> > wrote this
> > > > > > > KIP in 2020 and the initial discussion
> > > > > > > (
> > https://lists.apache.org/thread/xn7xyb74nyt281brto4x28r9rzxm4lp9)
> > > > > > > raised some concerns especially around KRaft (which did not
> > exist at
> > > > > > > that time) and scalability.
> > > > > > >
> > > > > > > Since then, we got a new KRaft controller so I've been able to
> > revisit
> > > > > > > this KIP. I kept the KIP number as it's essentially the same
> > idea, but
> > > > > > > the proposal is significantly different:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-660%3A+Pluggable+ReplicaPlacer
> > > > > > >
> > > > > > > Please take a look and let me know if you have any feedback.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Mickael
> > > > > > >
> > > > > >
> > > > >
> >


Re: [DISCUSS] KIP-830: Allow disabling JMX Reporter

2022-05-02 Thread Mickael Maison
Hi Ismael,

Thanks for the feedback.
Kafka 4.0 still feels a bit far away. This is not really urgent so I
guess we could wait and save us adding a new configuration setting.

Thanks,
Mickael

On Mon, May 2, 2022 at 6:12 PM Ismael Juma  wrote:
>
> Hi Mickael,
>
> I think Federico is suggesting that we would do this in AK 4.0 and hence
> the breaking change (with the appropriate deprecation period) would be
> possible. I personally prefer that option too. Is there an urgent need to
> do this before 4.0?
>
> Ismael
>
> On Mon, May 2, 2022 at 8:58 AM Mickael Maison 
> wrote:
>
> > HI Federico,
> >
> > Thanks for taking a look.
> >
> > I think the method you suggest is what I included in the rejected
> > alternatives section. The reason I rejected it is because it would
> > force people to change their configuration in order to preserve the
> > current behavior, hence a breaking change.
> >
> > Thanks,
> > Mickael
> >
> > On Fri, Apr 22, 2022 at 6:26 PM Federico Valeri 
> > wrote:
> > >
> > > Hi Mickael, what about setting the default value to JmxReporter while
> > > also maintaining the old behavior? When a user is setting
> > > metric.reporters without explicitly including the JmxReporter, we
> > > could simply print a warning message which says "The automatic
> > > addition of the JmxReporter is deprecated. Add it explicitly to the
> > > metric.reporters list if needed.", or something like that. That way we
> > > can avoid adding a new property, and drop the implicit setting in the
> > > next major release.
> > >
> > > On Wed, Apr 13, 2022 at 11:49 AM Mickael Maison
> > >  wrote:
> > > >
> > > > Hi,
> > > >
> > > > I've created a small KIP to allow disabling JMXreporter:
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
> > > >
> > > > Let me know if you have any feedback.
> > > >
> > > > Thanks,
> > > > Mickael
> >


[jira] [Created] (KAFKA-13866) Support more advanced time retention policies

2022-05-02 Thread Matthias J. Sax (Jira)
Matthias J. Sax created KAFKA-13866:
---

 Summary: Support more advanced time retention policies
 Key: KAFKA-13866
 URL: https://issues.apache.org/jira/browse/KAFKA-13866
 Project: Kafka
  Issue Type: Improvement
  Components: config, core, log cleaner
Reporter: Matthias J. Sax


Time-based retention policy compares the record timestamp to broker wall-clock 
time. Those semantics are questionable and also lead to issues for data 
reprocessing: If one want to re-process older data then retention time, it's 
not possible as broker expire those record aggressively and user need to 
increate the retention time accordingly.

Especially for Kafka Stream, we have seen many cases when users got bit by the 
current behavior.

It would be best, if Kafka would track _two_ timestamps per record: the record 
event-time (as the broker do currently), plus the log append-time (which is 
only tracked currently if the topic is configured with "append-time" tracking, 
but the issue is, that it overwrite the producer provided record event-time).

Tracking both timestamps would allow to set a pure wall-clock time retention 
time plus a pure event-time retention time policy:
 * Wall-clock time: keep (at least) the date X days after writing
 * Event-time: keep (at max) the X days worth of event-time data

Comparing wall-clock time to wall-clock time and event-time to event-time 
provides much cleaner semantics. The idea is to combine both policies and only 
expire data if both policies trigger.

For the event-time policy, the broker would need to track "stream time" as max 
event-timestamp it has see per partition (similar to how Kafka Streams is 
tracking "stream time" client side).

Note the difference between "at least" and "at max" above: for the 
data-reprocessing case, the max-based event-time policy avoids that the broker 
would keep a huge history for the reprocessing case.

It would be part of a KIP discussion on the details how wall-clock/event-time 
and mix/max policies could be combined. For example, it might also be useful to 
have the following policy: keep at least X days worth of event-time history no 
matter how long the data is already stored (ie, there would only be an 
event-time base expiration but not wall-clock time). It could also be combined 
with a wall-clock time expiration: delete data only after it's at least X days 
old and stored for at least Y days.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #901

2022-05-02 Thread Apache Jenkins Server
See 




[DISCUSS] KIP-832 Allow creating a producer/consumer using a producer/consumer config

2022-05-02 Thread François Rosière
Kip link:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882578


Re: [DISCUSS] KIP-830: Allow disabling JMX Reporter

2022-05-02 Thread Ismael Juma
The other option would be to add a deprecated config that then goes away in
AK 4.0. Something like "auto.include.jmx.reporter" and set it to `true` by
default. In 4.0, the default would effectively switch to `false` and the
config would be removed. Anyone with a config that would _change_ after
this config goes away would get a warning. For people who want to remove
the jmx reporter, they would set this to `false`.

Ismael

On Mon, May 2, 2022 at 9:35 AM Mickael Maison 
wrote:

> Hi Ismael,
>
> Thanks for the feedback.
> Kafka 4.0 still feels a bit far away. This is not really urgent so I
> guess we could wait and save us adding a new configuration setting.
>
> Thanks,
> Mickael
>
> On Mon, May 2, 2022 at 6:12 PM Ismael Juma  wrote:
> >
> > Hi Mickael,
> >
> > I think Federico is suggesting that we would do this in AK 4.0 and hence
> > the breaking change (with the appropriate deprecation period) would be
> > possible. I personally prefer that option too. Is there an urgent need to
> > do this before 4.0?
> >
> > Ismael
> >
> > On Mon, May 2, 2022 at 8:58 AM Mickael Maison 
> > wrote:
> >
> > > HI Federico,
> > >
> > > Thanks for taking a look.
> > >
> > > I think the method you suggest is what I included in the rejected
> > > alternatives section. The reason I rejected it is because it would
> > > force people to change their configuration in order to preserve the
> > > current behavior, hence a breaking change.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Fri, Apr 22, 2022 at 6:26 PM Federico Valeri 
> > > wrote:
> > > >
> > > > Hi Mickael, what about setting the default value to JmxReporter while
> > > > also maintaining the old behavior? When a user is setting
> > > > metric.reporters without explicitly including the JmxReporter, we
> > > > could simply print a warning message which says "The automatic
> > > > addition of the JmxReporter is deprecated. Add it explicitly to the
> > > > metric.reporters list if needed.", or something like that. That way
> we
> > > > can avoid adding a new property, and drop the implicit setting in the
> > > > next major release.
> > > >
> > > > On Wed, Apr 13, 2022 at 11:49 AM Mickael Maison
> > > >  wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I've created a small KIP to allow disabling JMXreporter:
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter
> > > > >
> > > > > Let me know if you have any feedback.
> > > > >
> > > > > Thanks,
> > > > > Mickael
> > >
>


[jira] [Created] (KAFKA-13867) Improve JavaDoc for MetadataVersion.java

2022-05-02 Thread Colin McCabe (Jira)
Colin McCabe created KAFKA-13867:


 Summary: Improve JavaDoc for MetadataVersion.java
 Key: KAFKA-13867
 URL: https://issues.apache.org/jira/browse/KAFKA-13867
 Project: Kafka
  Issue Type: Improvement
Reporter: Colin McCabe






--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #902

2022-05-02 Thread Apache Jenkins Server
See 




Re: Possible Bug: kafka-reassign-partitions causing the data retention time to be reset

2022-05-02 Thread Luke Chen
Hi Fared,

> So I guess the process applying the retention policy (log cleaner?) uses
that timestamp to check whether the segment file should be deleted or not.
So I ended up with a lot more data than we were supposed to store, since we
are basically doubling the retention time of all the freshly moved data.

No, Kafka uses the largest timestamp each segment has to determine if it
exceeds the retention period. But if the log is corrupted or for some
reason it can't find the largest timestamp, it'll use file modified date.
(You can check here

)

If you confirmed the log retention doesn't work as expected after log dir
movement, you can create a JIRA ticket for this issue. That must be a bug.
If possible, please upload the broker logs and the log files with log
records (if possible, to identify if there's log corruption) into the JIRA.

Thank you.
Luke

On Mon, May 2, 2022 at 10:22 PM Fares Oueslati 
wrote:

> Hello Lqjacklee,
>
> Is there any news on this please, especially regarding my last message?
>
> Do you think it is possible to modify the segment files manually with touch
> -a -m -t 203801181205.09 my_sement_file? I could keep the original
> modification date of the files before the move and update them afterwards.
>
> Thanks a lot!
> Fares
>
> On Fri, Apr 29, 2022 at 3:24 AM Fares Oueslati 
> wrote:
>
> > Thanks for your help!
> >
> > I'm not sure how that would help me though. I'm not actually trying to
> > decommission a Kafka broker.
> > I would like to move all the data from one disk (log.dir) to another
> > within the same broker while keeping the original modification time of
> the
> > moved segment files.
> > After that I would like to delete the disk, not the broker.
> >
> > Kind Regards,
> > Fares
> >
> > On Thu, Apr 28, 2022 at 7:05 PM lqjacklee  wrote:
> >
> >> The resource (https://mike.seid.io/blog/decommissiong-a-kafka-node.html
> )
> >> may help you.
> >> I have created (https://issues.apache.org/jira/browse/KAFKA-13860) to
> >> replay the case .
> >>
> >> On Thu, Apr 28, 2022 at 10:33 PM Fares Oueslati <
> oueslati.fa...@gmail.com>
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> I'm not sure how to report this properly but I didn't get any answer in
> >>> the
> >>> user mailing list.
> >>>
> >>> In order to remove a disk in a JBOD setup, I moved all data from one
> disk
> >>> to another on every Kafka broker using kafka-reassign-partitions, then
> I
> >>> went through some weird behaviour.
> >>> Basically, the disk storage kept increasing even though there is no
> >>> change
> >>> on bytes in metric per broker.
> >>> After investigation, I’ve seen that all segment log files in the new
> >>> log.dir had a modification date set to the moment when the move had
> been
> >>> done.
> >>> So I guess the process applying the retention policy (log cleaner?)
> uses
> >>> that timestamp to check whether the segment file should be deleted or
> >>> not.
> >>> So I ended up with a lot more data than we were supposed to store,
> since
> >>> we
> >>> are basically doubling the retention time of all the freshly moved
> data.
> >>>
> >>> This seems to me to be a buggy behavior of the command, is it possible
> to
> >>> create a JIRA to track and eventually fix this?
> >>> The only option I see to fix it is to keep the modification date before
> >>> moving the data and applying it manually afterwards for every segment
> >>> file, touching those files manually doesn't seem very safe imho.
> >>>
> >>> Thanks
> >>> Fares Oueslati
> >>>
> >>
>