Re: [DISCUSS] Single broker failures causing offline partitions

2024-09-11 Thread Haruki Okada
Hi Kamal,

> Is the leader election automated to find the replica with the highest
offset and latest epoch?
> And, finding the eligible replica manually, will increase the outage
mitigation time

That's right, but this procedure is not yet automated in our operation.
Because, we found that "choosing highest epoch with latest epoch" is still
NOT safe and may lose committed messages in rare edge case if there are
simultaneous preferred leader switch (which might be the case in our
deployment, because we enable CruiseControl).

We used formal methods to prove this.
-
https://speakerdeck.com/line_developers/the-application-of-formal-methods-in-kafka-reliability-engineering
- https://github.com/ocadaruma/kafka-spec

So I believe we need KIP-966.

2024年9月11日(水) 15:55 Kamal Chandraprakash :

> Hi Haruki,
>
> We are also interested in this issue.
>
> > The problem is how to identify such "eligible" replicas.
>
> Is the leader election automated to find the replica with the highest
> offset and latest epoch?
> If yes, could you please open a PR for it?
>
> When a broker goes down, it might be serving leadership for 1000s of
> partitions.
> And, finding the eligible replica manually, will increase the outage
> mitigation time
> as the producers/consumers are blocked when there are offline partitions.
>
> --
> Kamal
>
>
> On Wed, Sep 11, 2024 at 3:57 AM Haruki Okada  wrote:
>
> > Hi Martin,
> >
> > Thank you for bringing up this issue.
> >
> > We suffer from this "single-broker failure causing unavailable partition"
> > issue due to the disk failure for years too! Because we use HDDs and HDDs
> > tend to cause high disk latency (tens~ of seconds) easily on disk glitch,
> > which often blocks request-handler threads and making it unable to handle
> > fetch-requests, then kicking followers out of ISRs.
> >
> > I believe solving the issue fundamentally is impossible unless we stop
> > relying on external quorum (either KRaft or ZK) for failure
> > detection/leader election and move to quorum-based data replication,
> which
> > is not currently planned in Kafka.
> >
> > Let me share some of our experiences on how to address this problem.
> >
> > ## Proactive disk replacement / broker removal
> > This is kind of dumb solution but we monitor disk health (e.g. Physical
> > disk error counts under RAID) and replace disks or remove brokers
> > proactively before it gets worse.
> >
> > ## Mitigate disk failure impact to the broker functionality
> > In the first place, basically Kafka is page-cache intensive so
> disk-latency
> > impacting the broker so much is unexpected.
> > We found there are some call paths which disk-latency impact amplifies
> and
> > we fixed them.
> >
> > - https://github.com/apache/kafka/pull/14289
> > * This is a heuristic to mitigate KAFKA-7504, the network-thread
> > blocking issue on catch-up reads which may impact many clients (including
> > followers)
> > * Not merged to upstream yet but we run this patch on production for
> > years.
> > - https://github.com/apache/kafka/pull/14242
> > * This is a patch to stop calling fsync under Log#lock, which may
> cause
> > all request handler threads to exhaust easily due to the lock contention
> > when one thread is executing fsync. (disk latency directly impacts fsync
> > latency)
> >
> > ## Prepare offline-partition-handling manual as the last resort
> > Even with above efforts, unavailable-partition still may occur so we
> > prepared (manual) runbook for such situations.
> > Essentially, this is a procedure to do KIP-966 manually.
> >
> > We use acks=all and min.insync.replicas=2 on all partition, which means
> > there should be one "eligible" (i.e. have all committed messages) replica
> > even after a partition goes offline.
> > The problem is how to identify such "eligible" replicas.
> >
> > If we can still login to the last leader, we can just check if the
> > log-suffix matches. (by DumpLogSegments tool...)
> > What about if the last leader completely fails and unable to login?
> > In this case, we check the remaining two replicas' log segments and
> decide
> > one has a longer log as the "eligible" replica, as long as they have the
> > same leader epoch.
> > (NOTE: Checking leader epoch is necessary, because in case leader is
> > changing around the incidental time, "replica with higher offset" and
> > "replica with all committed messages" may not match)
> >
> >
> > Hope these help you.
> >

Re: [DISCUSS] KIP-1066: Mechanism to cordon brokers and log directories

2024-07-04 Thread Haruki Okada
Hi,

Thank you for the KIP.
The motivation sounds make sense to me.

I have a few questions:

- [nits] "AlterPartitions request" in Error handling section is
"AlterPartitionReassignments request" actually, right?
- Don't we need to include cordoned information in DescribeLogDirs response
too? Some tools (e.g. CruiseControl) need to have a way to know which
broker/log-dirs are cordoned to generate partition reassignment proposal.

Thanks,

2024年7月4日(木) 22:57 Mickael Maison :

> Hi,
>
> I'd like to start a discussion on KIP-1066 that introduces a mechanism
> to cordon log directories and brokers.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1066%3A+Mechanism+to+cordon+brokers+and+log+directories
>
> Thanks,
> Mickael
>


-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-17061) KafkaController takes long time to connect to newly added broker after registration on large cluster

2024-07-02 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-17061:


 Summary: KafkaController takes long time to connect to newly added 
broker after registration on large cluster
 Key: KAFKA-17061
 URL: https://issues.apache.org/jira/browse/KAFKA-17061
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada
 Attachments: image-2024-07-02-17-22-06-100.png, 
image-2024-07-02-17-24-11-861.png

h2. Environment
 * Kafka version: 3.3.2
 * Cluster: 200~ brokers
 * Total num partitions: 40k
 * ZK-based cluster

h2. Phenomenon

When a broker left the cluster once due to the long STW and came back after a 
while, the controller took 6 seconds until connecting to the broker after znode 
registration, it caused significant message delivery delay.
{code:java}
[2024-06-22 23:59:38,202] INFO [Controller id=1] Newly added brokers: 2, 
deleted brokers: , bounced brokers: , all live brokers: 1,... 
(kafka.controller.KafkaController)
[2024-06-22 23:59:38,203] DEBUG [Channel manager on controller 1]: Controller 1 
trying to connect to broker 2 (kafka.controller.ControllerChannelManager)
[2024-06-22 23:59:38,205] INFO [RequestSendThread controllerId=1] Starting 
(kafka.controller.RequestSendThread)
[2024-06-22 23:59:38,205] INFO [Controller id=1] New broker startup callback 
for 2 (kafka.controller.KafkaController)
[2024-06-22 23:59:44,524] INFO [RequestSendThread controllerId=1] Controller 1 
connected to broker-2:9092 (id: 2 rack: rack-2) for sending state change 
requests (kafka.controller.RequestSendThread)
{code}
h2. Analysis

>From the flamegraph at that time, we can see that 
>[liveBrokerIds|https://github.com/apache/kafka/blob/3.3.2/core/src/main/scala/kafka/controller/ControllerContext.scala#L217]
> calculation takes significant time.

!image-2024-07-02-17-24-11-861.png|width=541,height=303!

Since no concurrent modification against liveBrokerEpochs is expected, we can 
just cache the result to improve the performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Build hanging

2024-06-07 Thread Haruki Okada
Oh, thank you for identifying the root cause quickly Apoorv.

I changed the assignee of KAFKA-16916 to you.


Thanks,

2024年6月8日(土) 10:13 Apoorv Mittal :

> Hi,
> Please find the fix for the issue:
> https://github.com/apache/kafka/pull/16249
>
> Regards,
> Apoorv Mittal
> +44 7721681581
>
>
> On Sat, Jun 8, 2024 at 2:09 AM Sophie Blee-Goldman 
> wrote:
>
> > Thanks for jumping on this guys, and nice find Haruki
> >
> > I agree with Luke that we should disable the offending tests ASAP to
> > unblock other things, and file a 3.8 blocker to investigate this further.
> > Thanks for the PR Haruki and thanks for filing the ticket Luke -- I
> marked
> > it as a blocker for 3.8.0 so this doesn't slip.
> >
> > By the way -- someone else was looking into this on the side and
> mentioned
> > another test they were suspicious of as well. It's
> >
> >
> SaslClientsWithInvalidCredentialsTest.testKafkaAdminClientWithAuthenticationFailure
> >
> > Does one of you have time to look into this test as well? Don't want to
> > have to wait another 8 hours for your PR to hit the timeout and abort
> > again. If not, I can spend a minute looking at this other test a bit
> later
> > tonight
> >
> > On Fri, Jun 7, 2024 at 6:05 PM Haruki Okada  wrote:
> >
> > > Opened the PR h(ttps://github.com/apache/kafka/pull/16248)
> > > Let's see if CI runs properly.
> > >
> > > Thanks,
> > >
> > > 2024年6月8日(土) 10:01 Luke Chen :
> > >
> > > > > Let's disable for now to unblock builds, and revert later if we
> can't
> > > > solve
> > > > it until code freeze?
> > > >
> > > > That's exactly what I meant to do.
> > > > I've opened KAFKA-16916 <
> > > https://issues.apache.org/jira/browse/KAFKA-16916
> > > > >
> > > > for this issue and assigned to you.
> > > > Welcome to unassign yourselves if you don't have time to fix the
> > > > adminClient behavior change issue.
> > > > But, let's disable it first.
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > >
> > > > On Sat, Jun 8, 2024 at 8:55 AM Haruki Okada 
> > wrote:
> > > >
> > > > > Hi Luke,
> > > > >
> > > > > I see, but since this is likely due to AdminClient's behavior
> change,
> > > we
> > > > > need to fix it anyways not only disabling test before 3.8 release.
> > > > > Let's disable for now to unblock builds, and revert later if we
> can't
> > > > solve
> > > > > it until code freeze?
> > > > >
> > > > > 2024年6月8日(土) 9:31 Luke Chen :
> > > > >
> > > > > > Hi Haruki,
> > > > > >
> > > > > > Thanks for identifying this blocking test.
> > > > > > Could you help quickly open a PR to disable this test to unblock
> > the
> > > CI
> > > > > > build?
> > > > > >
> > > > > > Thanks.
> > > > > > Luke
> > > > > >
> > > > > > On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  >
> > > > wrote:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > > I found that the hanging can be reproduced locally.
> > > > > > > The blocking test is
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> > > > > > > It started to block after this commit (
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> > > > > > > )
> > > > > > >
> > > > > > > Thanks,
> > > > > > >
> > > > > > > 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
> > > > > > >
> > > > > > > > Seems like the build is currently broken -- specifically, a
> > test
> > > is
> > > > > > > hanging
> > > > > &g

Re: Build hanging

2024-06-07 Thread Haruki Okada
Opened the PR h(ttps://github.com/apache/kafka/pull/16248)
Let's see if CI runs properly.

Thanks,

2024年6月8日(土) 10:01 Luke Chen :

> > Let's disable for now to unblock builds, and revert later if we can't
> solve
> it until code freeze?
>
> That's exactly what I meant to do.
> I've opened KAFKA-16916 <https://issues.apache.org/jira/browse/KAFKA-16916
> >
> for this issue and assigned to you.
> Welcome to unassign yourselves if you don't have time to fix the
> adminClient behavior change issue.
> But, let's disable it first.
>
> Thanks.
> Luke
>
>
> On Sat, Jun 8, 2024 at 8:55 AM Haruki Okada  wrote:
>
> > Hi Luke,
> >
> > I see, but since this is likely due to AdminClient's behavior change, we
> > need to fix it anyways not only disabling test before 3.8 release.
> > Let's disable for now to unblock builds, and revert later if we can't
> solve
> > it until code freeze?
> >
> > 2024年6月8日(土) 9:31 Luke Chen :
> >
> > > Hi Haruki,
> > >
> > > Thanks for identifying this blocking test.
> > > Could you help quickly open a PR to disable this test to unblock the CI
> > > build?
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada 
> wrote:
> > >
> > > > Hi
> > > >
> > > > I found that the hanging can be reproduced locally.
> > > > The blocking test is
> > > >
> > > >
> > >
> >
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> > > > It started to block after this commit (
> > > >
> > > >
> > >
> >
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> > > > )
> > > >
> > > > Thanks,
> > > >
> > > > 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
> > > >
> > > > > Seems like the build is currently broken -- specifically, a test is
> > > > hanging
> > > > > and causing it to abort after 7+ hours. There are many examples in
> > the
> > > > > current PRs, such as
> > > > >
> > > > > Timed out after almost 8 hours:
> > > > > 1.
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > > > > 2.
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> > > > >
> > > > > Still running after 6+ hours:
> > > > > 1.
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> > > > >
> > > > > It's pretty difficult to tell which test is hanging but it seems
> like
> > > one
> > > > > of the commits in the last 1-2 days is the likely culprit. If
> anyone
> > > has
> > > > an
> > > > > idea of what may have caused this or is actively investigating,
> > please
> > > > let
> > > > > everyone know.
> > > > >
> > > > > Needless to say, this is rather urgent given the upcoming 3.8 code
> > > > freeze.
> > > > >
> > > > > Thanks,
> > > > > Sophie
> > > > >
> > > >
> > > >
> > > > --
> > > > 
> > > > Okada Haruki
> > > > ocadar...@gmail.com
> > > > 
> > > >
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: Build hanging

2024-06-07 Thread Haruki Okada
Hi Luke,

I see, but since this is likely due to AdminClient's behavior change, we
need to fix it anyways not only disabling test before 3.8 release.
Let's disable for now to unblock builds, and revert later if we can't solve
it until code freeze?

2024年6月8日(土) 9:31 Luke Chen :

> Hi Haruki,
>
> Thanks for identifying this blocking test.
> Could you help quickly open a PR to disable this test to unblock the CI
> build?
>
> Thanks.
> Luke
>
> On Sat, Jun 8, 2024 at 8:20 AM Haruki Okada  wrote:
>
> > Hi
> >
> > I found that the hanging can be reproduced locally.
> > The blocking test is
> >
> >
> "org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
> > It started to block after this commit (
> >
> >
> https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
> > )
> >
> > Thanks,
> >
> > 2024年6月8日(土) 8:30 Sophie Blee-Goldman :
> >
> > > Seems like the build is currently broken -- specifically, a test is
> > hanging
> > > and causing it to abort after 7+ hours. There are many examples in the
> > > current PRs, such as
> > >
> > > Timed out after almost 8 hours:
> > > 1.
> > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> > > 2.
> > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
> > >
> > > Still running after 6+ hours:
> > > 1.
> > >
> > >
> >
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
> > >
> > > It's pretty difficult to tell which test is hanging but it seems like
> one
> > > of the commits in the last 1-2 days is the likely culprit. If anyone
> has
> > an
> > > idea of what may have caused this or is actively investigating, please
> > let
> > > everyone know.
> > >
> > > Needless to say, this is rather urgent given the upcoming 3.8 code
> > freeze.
> > >
> > > Thanks,
> > > Sophie
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: Build hanging

2024-06-07 Thread Haruki Okada
Hi

I found that the hanging can be reproduced locally.
The blocking test is
"org.apache.kafka.common.security.authenticator.ClientAuthenticationFailureTest.testAdminClientWithInvalidCredentials".
It started to block after this commit (
https://github.com/apache/kafka/commit/c01279b92acefd9135089588319910bac79bfd4c
)

Thanks,

2024年6月8日(土) 8:30 Sophie Blee-Goldman :

> Seems like the build is currently broken -- specifically, a test is hanging
> and causing it to abort after 7+ hours. There are many examples in the
> current PRs, such as
>
> Timed out after almost 8 hours:
> 1.
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16238/1/pipeline/
> 2.
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16201/15/pipeline
>
> Still running after 6+ hours:
> 1.
>
> https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-16236/3/pipeline/
>
> It's pretty difficult to tell which test is hanging but it seems like one
> of the commits in the last 1-2 days is the likely culprit. If anyone has an
> idea of what may have caused this or is actively investigating, please let
> everyone know.
>
> Needless to say, this is rather urgent given the upcoming 3.8 code freeze.
>
> Thanks,
> Sophie
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-1042 support for wildcard when creating new acls

2024-05-03 Thread Haruki Okada
Hi, Murali.

First, could you add the KIP-1042 to the index (
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals)
as well so that everyone can find it easily?

I took a look at the KIP, then I have 2 questions:

1. Though the new MATCH resource pattern type may reduce the effort of
adding ACLs in some cases, do you have any concrete use case you are in
mind? (When prefixed ACL was introduced in KIP-290, there was a use-case
that using it for implementing multi-tenancy)

2. As you may know, ACL lookup is in the hot-path which the performance is
very important. (
https://github.com/apache/kafka/blob/240243b91d69c2b65b5e456065fdcce90c121b04/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala#L539).
Do you have ideas how do we update `matchingAcls` to support MATCH-type ACL
without introducing performance issue?


Thanks,

2024年5月3日(金) 19:51 Claude Warren, Jr :

> As I wrote in [1], the ACL evaluation algorithm needs to be specified with
> respect to the specificity of the pattern so that we know exactly which if
> *-accounts-* takes precedence over nl-accounts-* or visa versa.
>
> I think that we should spell out that precedence is evaluated as follows:
>
> 1. Remove patterns that do not match
> 2. More specific patterns take precedence over less specific patterns
> 3. for patterns of the same precedence DENY overrides ALLOW
>
> Determining specificity:
>
> Specificity is based on the Levenshtein distance between the pattern and
> the text being evaluated. The lower the distance the more specific the
> rule.
> Using the topic name: nl-accounts-localtopic we can evaluate the
> Levenshtein distance for various patterns.
> nl-accounts-localtopic = 0
> *-accounts-localtopic = 2
> nl-accounts-local* = 5
> *-accounts-local* = 7
> nl-accounts-* = 10
> *-accounts-* = 12
>
> In the special case of matching principles User matches are more specific
> than Group matches.
>
> I don't know if this should be added to KIP-1042 or presented as a new KIP.
>
> Claude
>
> [1] https://lists.apache.org/thread/0l88tkbxq3ol9rnx0ljnmswj5y6pho1f
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> >
>
> On Fri, May 3, 2024 at 12:18 PM Claude Warren  wrote:
>
> > Took me awhile to find it but the link to the KIP is
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1042%3A+Support+for+wildcard+when+creating+new+acls
> >
> > On Fri, May 3, 2024 at 10:13 AM Murali Basani 
> > wrote:
> >
> > > Hello,
> > >
> > > I'd like to propose a suggestion to our resource patterns in Kafka
> ACLs.
> > >
> > > Currently, when adding new ACLs in Kafka, we have two types of resource
> > > patterns for topics:
> > >
> > >- LITERAL
> > >- PREFIXED
> > >
> > > However, when it comes to listing or removing ACLs, we have a couple
> more
> > > options:
> > >
> > >- MATCH
> > >- ANY (will match any pattern type)
> > >
> > >
> > > If we can extend creating acls as well with 'MATCH' pattern type, it
> > would
> > > be very beneficial. Even though this kind of acl should be created with
> > > utmost care, it will help organizations streamline their ACL management
> > > processes.
> > >
> > > Example scenarios :
> > >
> > > Let's say we need to create ACLs for the following six topics:
> > > nl-accounts-localtopic, nl-accounts-remotetopic,
> de-accounts-localtopic,
> > > de-accounts-remotetopic, cz-accounts-localtopic,
> cz-accounts-remotetopic
> > >
> > > Currently, we achieve this using existing functionality by creating
> three
> > > prefixed ACLs as shown below:
> > >
> > > kafka-acls --bootstrap-server localhost:9092 \
> > > > --add \
> > > > --allow-principal
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > \
> > > > --producer \
> > > > --topic nl-accounts- \
> > > > --resource-pattern-type prefixed
> > >
> > >
> > > kafka-acls --bootstrap-server localhost:9092 \
> > > > --add \
> > > > --allow-principal
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > \
> > > > --producer \
> > > > --topic de-accounts- \
> > > > --resource-pattern-type prefixed
> > >
> > >
> > > kafka-acls --bootstrap-server localhost:9092 \
> > > > --add \
> > > > --allow-principal
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > \
> > > > --producer \
> > > > --topic cz-accounts- \
> > > > --resource-pattern-type prefixed
> > >
> > >
> > > However, if we had the 'MATCH' pattern type available, we could
> > accomplish
> > > this with a single ACL, as illustrated here:
> > >
> > > kafka-acls --bootstrap-server localhost:9092 \
> > > > --add \
> > > > --allow-principal
> > > >
> > >
> >
> User:CN=serviceuser,OU=ServiceUsers,O=Unknown,L=Unknown,ST=Unknown,C=Unknown
> > > > \
> > > > --producer \
> > > > --topic *-accounts-* \
> > > > --resource-pattern-type match
> > >
> > >
> > >
> > > This patte

[jira] [Created] (KAFKA-16541) Potential leader epoch checkpoint file corruption on OS crash

2024-04-11 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16541:


 Summary: Potential leader epoch checkpoint file corruption on OS 
crash
 Key: KAFKA-16541
 URL: https://issues.apache.org/jira/browse/KAFKA-16541
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Haruki Okada
Assignee: Haruki Okada


Pointed out by [~junrao] on 
[GitHub|https://github.com/apache/kafka/pull/14242#discussion_r1556161125]

[A patch for KAFKA-15046|https://github.com/apache/kafka/pull/14242] got rid of 
fsync of leader-epoch ckeckpoint file in some path for performance reason.

However, since now checkpoint file is flushed to the device asynchronously by 
OS, content would corrupt if OS suddenly crashes (e.g. by power failure, kernel 
panic) in the middle of flush.

Corrupted checkpoint file could prevent Kafka broker to start-up



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16393) SslTransportLayer doesn't implement write(ByteBuffer[], int, int) correctly

2024-03-20 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16393:


 Summary: SslTransportLayer doesn't implement write(ByteBuffer[], 
int, int) correctly
 Key: KAFKA-16393
 URL: https://issues.apache.org/jira/browse/KAFKA-16393
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada


As of Kafka 3.7.0, SslTransportLayer.write(ByteBuffer[], int, int) is 
implemented like below:

{code:java}
public long write(ByteBuffer[] srcs, int offset, int length) throws IOException 
{
...
int i = offset;
while (i < length) {
if (srcs[i].hasRemaining() || hasPendingWrites()) {

{code}

The loop index starts at `offset` and ends with `length`.
However this isn't correct because end-index should be `offset + length`.

Let's say we have the array of ByteBuffer with length = 5 and try calling this 
method with offset = 3, length = 1.

In current code, `write(srcs, 3, 1)` doesn't attempt any write because the loop 
condition is immediately false.

For now, seems this method is only called with args offset = 0, length = 
srcs.length in Kafka code base so not causing any problem though, we should fix 
this because this could introduce subtle bug if use this method with different 
args in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Minimum constraint for segment.ms

2024-03-14 Thread Haruki Okada
Hi, Divij.

This isn't about config default value/constraint change though, I found
there's a behavior discrepancy in max.block.ms config, which may cause
breaking change if we change the behavior.
The detail is described in the ticket:
https://issues.apache.org/jira/browse/KAFKA-16372

What do you think?

2024年3月14日(木) 13:09 Kamal Chandraprakash :

> One use case I see for setting the `segment.bytes` to 1 is to delete all
> the records from the topic.
> We can mention about it in the doc to use the `kafka-delete-records` API
> instead.
>
>
>
>
> On Wed, Mar 13, 2024 at 6:59 PM Divij Vaidya 
> wrote:
>
> > + users@kafka
> >
> > Hi users of Apache Kafka
> >
> > With the upcoming 4.0 release, we have an opportunity to improve the
> > constraints and default values for various Kafka configurations.
> >
> > We are soliciting your feedback and suggestions on configurations where
> the
> > default values and/or constraints should be adjusted. Please reply in
> this
> > thread directly.
> >
> > --
> > Divij Vaidya
> > Apache Kafka PMC
> >
> >
> >
> > On Wed, Mar 13, 2024 at 12:56 PM Divij Vaidya 
> > wrote:
> >
> > > Thanks for the discussion folks. I have started a KIP
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1030%3A+Change+constraints+and+default+values+for+various+configurations
> > > to keep track of the changes that we are discussion. Please consider
> this
> > > as a collaborative work-in-progress KIP and once it is ready to be
> > > published, we can start a discussion thread on it.
> > >
> > > I am also going to start a thread to solicit feedback from users@
> > mailing
> > > list as well.
> > >
> > > --
> > > Divij Vaidya
> > >
> > >
> > >
> > > On Wed, Mar 13, 2024 at 12:55 PM Christopher Shannon <
> > > christopher.l.shan...@gmail.com> wrote:
> > >
> > >> I think it's a great idea to raise a KIP to look at adjusting defaults
> > and
> > >> minimum/maximum config values for version 4.0.
> > >>
> > >> As pointed out, the minimum values for segment.ms and segment.bytes
> > don't
> > >> make sense and would probably bring down a cluster pretty quickly if
> set
> > >> that low, so version 4.0 is a good time to fix it and to also look at
> > the
> > >> other configs as well for adjustments.
> > >>
> > >> On Wed, Mar 13, 2024 at 4:39 AM Sergio Daniel Troiano
> > >>  wrote:
> > >>
> > >> > hey guys,
> > >> >
> > >> > Regarding to num.recovery.threads.per.data.dir: I agree, in our
> > company
> > >> we
> > >> > use the number of vCPUs to do so as this is not competing with ready
> > >> > cluster traffic.
> > >> >
> > >> >
> > >> > On Wed, 13 Mar 2024 at 09:29, Luke Chen  wrote:
> > >> >
> > >> > > Hi Divij,
> > >> > >
> > >> > > Thanks for raising this.
> > >> > > The valid minimum value 1 for `segment.ms` is completely
> > >> unreasonable.
> > >> > > Similarly for `segment.bytes`, `metadata.log.segment.ms`,
> > >> > > `metadata.log.segment.bytes`.
> > >> > >
> > >> > > In addition to that, there are also some config default values
> we'd
> > >> like
> > >> > to
> > >> > > propose to change in v4.0.
> > >> > > We can collect more comments from the community, and come out
> with a
> > >> KIP
> > >> > > for them.
> > >> > >
> > >> > > 1. num.recovery.threads.per.data.dir:
> > >> > > The current default value is 1. But the log recovery is happening
> > >> before
> > >> > > brokers are in ready state, which means, we should use all the
> > >> available
> > >> > > resource to speed up the log recovery to bring the broker to ready
> > >> state
> > >> > > soon. Default value should be... maybe 4 (to be decided)?
> > >> > >
> > >> > > 2. Other configs might be able to consider to change the default,
> > but
> > >> > open
> > >> > > for comments:
> > >> > >2.1. num.replica.fetchers: default is 1, but that's not enough
> > when
> > >> > > there are multiple partitions in the cluster
> > >> > >2.2. `socket.send.buffer.bytes`/`socket.receive.buffer.bytes`:
> > >> > > Currently, we set 100kb as default value, but that's not enough
> for
> > >> > > high-speed network.
> > >> > >
> > >> > > Thank you.
> > >> > > Luke
> > >> > >
> > >> > >
> > >> > > On Tue, Mar 12, 2024 at 1:32 AM Divij Vaidya <
> > divijvaidy...@gmail.com
> > >> >
> > >> > > wrote:
> > >> > >
> > >> > > > Hey folks
> > >> > > >
> > >> > > > Before I file a KIP to change this in 4.0, I wanted to
> understand
> > >> the
> > >> > > > historical context for the value of the following setting.
> > >> > > >
> > >> > > > Currently, segment.ms minimum threshold is set to 1ms [1].
> > >> > > >
> > >> > > > Segments are expensive. Every segment uses multiple file
> > descriptors
> > >> > and
> > >> > > > it's easy to run out of OS limits when creating a large number
> of
> > >> > > segments.
> > >> > > > Large number of segments also delays log loading on startup
> > because
> > >> of
> > >> > > > expensive operations such as iterating through all directories &
> > >> > > > conditionally loading all producer state.
> > >> > > >
> > >> 

[jira] [Created] (KAFKA-16372) max.block.ms behavior inconsistency with javadoc and the config description

2024-03-14 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-16372:


 Summary: max.block.ms behavior inconsistency with javadoc and the 
config description
 Key: KAFKA-16372
 URL: https://issues.apache.org/jira/browse/KAFKA-16372
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Haruki Okada


As of Kafka 3.7.0, the javadoc of 
[KafkaProducer.send|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L956]
 states that it throws TimeoutException when max.block.ms is exceeded on buffer 
allocation or initial metadata fetch.

Also it's stated in [max.block.ms config 
description|https://kafka.apache.org/37/documentation.html#producerconfigs_buffer.memory].

However, I found that this is not true because TimeoutException extends 
ApiException, and KafkaProducer.doSend catches ApiException and [wraps it as 
FutureFailure|https://github.com/apache/kafka/blob/3.7.0/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L1075-L1086]
 instead of throwing it.

I wonder if this is a bug or the documentation error.

Seems this discrepancy exists since 0.9.0.0, which max.block.ms is introduced.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: fine-gained acls

2023-12-01 Thread Haruki Okada
Hi.

KafkaConsumer can subscribe to topics by pattern:
https://kafka.apache.org/36/javadoc/org/apache/kafka/clients/consumer/Consumer.html#subscribe(java.util.regex.Pattern)

2023年12月1日(金) 22:05 Chunlin Yang :

> Hi team,
>
> My use case is I have a central controller to manage tens of thousands of
> clusters. Each cluster can receive and send the message via Kafka. but each
> cluster can only consume its own message. The controller can consume all
> the messages from each cluster.
>
> I checked the Kafka document and know that there is no limitation for Kafka
> topics and the Kafka provides the ACLs per topic so my idea is to create
> one topic per cluster. but it seems Kafka cannot support subscript topics
> with wildcard. Is that true?
>
> I guess I do not use Kafka correctly. Could you share your best practices
> which can address my case? Thanks in advance.
>


-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-15924) Flaky test - QuorumControllerTest.testFatalMetadataReplayErrorOnActive

2023-11-29 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15924:


 Summary: Flaky test - 
QuorumControllerTest.testFatalMetadataReplayErrorOnActive
 Key: KAFKA-15924
 URL: https://issues.apache.org/jira/browse/KAFKA-15924
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/15/tests]

 
{code:java}
Error
org.opentest4j.AssertionFailedError: expected:  
but was: 
Stacktrace
org.opentest4j.AssertionFailedError: expected:  
but was: 
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
at 
app//org.apache.kafka.controller.QuorumControllerTest.testFatalMetadataReplayErrorOnActive(QuorumControllerTest.java:1132)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app

[jira] [Created] (KAFKA-15921) Flaky test - SaslScramSslEndToEndAuthorizationTest.testAuthentications

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15921:


 Summary: Flaky test - 
SaslScramSslEndToEndAuthorizationTest.testAuthentications
 Key: KAFKA-15921
 URL: https://issues.apache.org/jira/browse/KAFKA-15921
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
Stacktrace
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:166)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:161)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:628)
at 
app//kafka.api.SaslScramSslEndToEndAuthorizationTest.testAuthentications(SaslScramSslEndToEndAuthorizationTest.scala:92)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
at 
java.base@17.0.7/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
at 
java.base@17.0.7/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@17.0.7/java.lang.reflect.Method.invoke(Method.java:568)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTa

[jira] [Created] (KAFKA-15920) Flaky test - PlaintextConsumerTest.testCoordinatorFailover

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15920:


 Summary: Flaky test - PlaintextConsumerTest.testCoordinatorFailover
 Key: KAFKA-15920
 URL: https://issues.apache.org/jira/browse/KAFKA-15920
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
Stacktrace
org.opentest4j.AssertionFailedError: expected: <0> but was: <1>
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:150)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:145)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:527)
at 
app//kafka.api.AbstractConsumerTest.ensureNoRebalance(AbstractConsumerTest.scala:326)
at 
app//kafka.api.BaseConsumerTest.testCoordinatorFailover(BaseConsumerTest.scala:109)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native
 Method)
at 
java.base@11.0.16.1/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base@11.0.16.1/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base@11.0.16.1/java.lang.reflect.Method.invoke(Method.java:566)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestTemplateMethod(TimeoutExtension.java:94)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(Node

[jira] [Created] (KAFKA-15919) Flaky test - BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15919:


 Summary: Flaky test - 
BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs
 Key: KAFKA-15919
 URL: https://issues.apache.org/jira/browse/KAFKA-15919
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]
{code:java}
Error
org.opentest4j.AssertionFailedError: expected:  
but was: 
Stacktrace
org.opentest4j.AssertionFailedError: expected:  
but was: 
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at 
app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
at 
app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
at 
app//kafka.server.BrokerLifecycleManagerTest.testAlwaysSendsAccumulatedOfflineDirs(BrokerLifecycleManagerTest.scala:236)
at 
java.base@21.0.1/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
at java.base@21.0.1/java.lang.reflect.Method.invoke(Method.java:580)
at 
app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
at 
app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
at 
app//org.junit.jupiter.engine.extension.SameThreadTimeoutInvocation.proceed(SameThreadTimeoutInvocation.java:45)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
at 
app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
at 
app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
at 
app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
at 
app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
at 
app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
at 
app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
at 
app//org.junit.platform.engine.support.hierarchical.NodeTestTask.execute(NodeTestTask.java:95)
at java.base@21.0.1/java.util.ArrayList.forEach(ArrayList.java:1596)
at 
app

[jira] [Created] (KAFKA-15918) Flaky test - OffsetsApiIntegrationTest. testResetSinkConnectorOffsets

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15918:


 Summary: Flaky test - OffsetsApiIntegrationTest. 
testResetSinkConnectorOffsets
 Key: KAFKA-15918
 URL: https://issues.apache.org/jira/browse/KAFKA-15918
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]

 
{code:java}
Error
org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
Sink connector consumer group offsets should catch up to the topic end offsets 
==> expected:  but was: 
Stacktrace
org.opentest4j.AssertionFailedError: Condition not met within timeout 3. 
Sink connector consumer group offsets should catch up to the topic end offsets 
==> expected:  but was: 
at 
org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
at 
org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
at org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63)
at org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36)
at org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210)
at 
org.apache.kafka.test.TestUtils.lambda$waitForCondition$3(TestUtils.java:331)
at 
org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:379)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:328)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:312)
at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:302)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.verifyExpectedSinkConnectorOffsets(OffsetsApiIntegrationTest.java:917)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.resetAndVerifySinkConnectorOffsets(OffsetsApiIntegrationTest.java:725)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsets(OffsetsApiIntegrationTest.java:672)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)

[jira] [Created] (KAFKA-15917) Flaky test - OffsetsApiIntegrationTest. testAlterSinkConnectorOffsetsZombieSinkTasks

2023-11-28 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15917:


 Summary: Flaky test - OffsetsApiIntegrationTest. 
testAlterSinkConnectorOffsetsZombieSinkTasks
 Key: KAFKA-15917
 URL: https://issues.apache.org/jira/browse/KAFKA-15917
 Project: Kafka
  Issue Type: Bug
Reporter: Haruki Okada


[https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14242/14/tests/]

 

 
{code:java}
Error
java.lang.AssertionError: 
Expected: a string containing "zombie sink task"
 but: was "Could not alter connector offsets. Error response: 
{"error_code":500,"message":"Failed to alter consumer group offsets for 
connector test-connector"}"
Stacktrace
java.lang.AssertionError: 
Expected: a string containing "zombie sink task"
 but: was "Could not alter connector offsets. Error response: 
{"error_code":500,"message":"Failed to alter consumer group offsets for 
connector test-connector"}"
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:8)
at 
org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testAlterSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:431)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker$2.run(TestWorker.java:176)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.executeAndMaintainThreadName(TestWorker.java:129)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:100)
at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.execute(TestWorker.java:60)
at 
org.grad

KAFKA-15567 Review request

2023-10-14 Thread Haruki Okada
Hi!

I found that ReplicaFetcherThreadBenchmark (and other few benchmarks) is
not working now and reported as KAFKA-15567.

I submitted a patch for that in https://github.com/apache/kafka/pull/14513.

Could anyone review the PR?


Thanks,

--

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-15567) ReplicaFetcherThreadBenchmark is not working

2023-10-09 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15567:


 Summary: ReplicaFetcherThreadBenchmark is not working
 Key: KAFKA-15567
 URL: https://issues.apache.org/jira/browse/KAFKA-15567
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada
Assignee: Haruki Okada


* ReplicaFetcherThreadBenchmark is not working as of current trunk 
(https://github.com/apache/kafka/tree/c223a9c3761f796468ccfdae9e177e764ab6a965)

 
{code:java}
% jmh-benchmarks/jmh.sh ReplicaFetcherThreadBenchmark
(snip)
java.lang.NullPointerException
    at kafka.server.metadata.ZkMetadataCache.(ZkMetadataCache.scala:89)
    at kafka.server.MetadataCache.zkMetadataCache(MetadataCache.scala:120)
    at 
org.apache.kafka.jmh.fetcher.ReplicaFetcherThreadBenchmark.setup(ReplicaFetcherThreadBenchmark.java:220)
    at 
org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest._jmh_tryInit_f_replicafetcherthreadbenchmark0_G(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:448)
    at 
org.apache.kafka.jmh.fetcher.jmh_generated.ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.testFetcher_AverageTime(ReplicaFetcherThreadBenchmark_testFetcher_jmhTest.java:164)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method)
    at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:527)
    at 
org.openjdk.jmh.runner.BenchmarkHandler$BenchmarkTask.call(BenchmarkHandler.java:504)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829) {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: Complete Kafka replication protocol description

2023-09-10 Thread Haruki Okada
Hi Jack,

Thank you for the great work, not only the spec but also for the
comprehensive documentation about the replication.
Actually I wrote some TLA+ spec to verify unclean leader election behavior
before so I will double-check my understanding with your complete spec :)


Thanks,

2023年9月10日(日) 21:42 David Jacot :

> Hi Jack,
>
> This is great! Thanks for doing it. I will look into it when I have a bit
> of time, likely after Current.
>
> Would you be interested in contributing it to the main repository? That
> would be a great contribution to the project. Having it there would allow
> the community to maintain it while changes to the protocol are made. That
> would also pave the way for having other specs in the future (e.g. new
> consumer group protocol).
>
> Best,
> David
>
> Le dim. 10 sept. 2023 à 12:45, Jack Vanlightly  a
> écrit :
>
> > Hi all,
> >
> > As part of my work on formally verifying different parts of Apache Kafka
> > and working on KIP-966 I have built up a lot of knowledge about how the
> > replication protocol works. Currently it is mostly documented across
> > various KIPs and in the code itself. I have written a complete protocol
> > description (with KIP-966 changes applied) which is inspired by the
> precise
> > but accessible style and language of the Raft paper. The idea is that it
> > makes it easier for contributors and anyone else interested in the
> protocol
> > to learn how it works, the fundamental properties it has and how those
> > properties are supported by the various behaviors and conditions.
> >
> > It currently resides next to the TLA+ specification itself in my
> > kafka-tlaplus repository. I'd be interested to receive feedback from the
> > community.
> >
> >
> >
> https://github.com/Vanlightly/kafka-tlaplus/blob/main/kafka_data_replication/kraft/kip-966/description/0_kafka_replication_protocol.md
> >
> > Thanks
> > Jack
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-936: Throttle number of active PIDs

2023-06-06 Thread Haruki Okada
Hi, Omnia.

Thanks for the KIP.
The feature sounds indeed helpful and the strategy to use bloom-filter
looks good.

I have three questions:

1. Do you have any memory-footprint estimation
for TimeControlledBloomFilter?
* If I read the KIP correctly, TimeControlledBloomFilter will be
allocated per KafkaPrincipal so the size should be reasonably small
considering clusters which have a large number of users.
* i.e. What false-positive rate do you plan to choose as the default?
2. What do you think about rotating windows on produce-requests arrival
instead of scheduler?
* If we do rotation in scheduler threads, my concern is potential
scheduler threads occupation which could make other background tasks to
delay
3. Why the default producer.id.quota.window.size.seconds is 1 hour?
* Unlike other quota types (1 second)

Thanks,

2023年6月6日(火) 23:55 Omnia Ibrahim :

> Hi everyone,
> I want to start the discussion of the KIP-936 to throttle the number of
> active PIDs per KafkaPrincipal. The proposal is here
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-936%3A+Throttle+number+of+active+PIDs
> >
>
> Thanks for your time and feedback.
> Omnia
>


-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-15046) Produce performance issue under high disk load

2023-05-31 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-15046:


 Summary: Produce performance issue under high disk load
 Key: KAFKA-15046
 URL: https://issues.apache.org/jira/browse/KAFKA-15046
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 3.3.2
Reporter: Haruki Okada
 Attachments: image-2023-06-01-12-46-30-058.png, 
image-2023-06-01-12-52-40-959.png, image-2023-06-01-12-54-04-211.png, 
image-2023-06-01-12-56-19-108.png

* Phenomenon:
 ** !image-2023-06-01-12-46-30-058.png|width=259,height=236!
 ** Producer response time 99%ile got quite bad when we performed replica 
reassignment on the cluster
 *** RequestQueue scope was significant
 ** Also request-time throttling happens almost all the time. This caused 
producers to delay sending messages at the incidental time.
 ** At the incidental time, the disk I/O latency was higher than usual due to 
the high load for replica reassignment.
 *** !image-2023-06-01-12-56-19-108.png|width=255,height=128!
 * Analysis:
 ** The request-handler utilization was much higher than usual.
 *** !image-2023-06-01-12-52-40-959.png|width=278,height=113!
 ** Also, thread time utilization was much higher than usual on almost all users
 *** !image-2023-06-01-12-54-04-211.png|width=276,height=110!
 ** From taking jstack several times, for most of them, we found that a 
request-handler was doing fsync for flusing ProducerState and meanwhile other 
request-handlers were waiting Log#lock for appending messages.

 *** 
{code:java}
"data-plane-kafka-request-handler-14" #166 daemon prio=5 os_prio=0 
cpu=51264789.27ms elapsed=599242.76s tid=0x7efdaeba7770 nid=0x1e704 
runnable  [0x7ef9a12e2000]
   java.lang.Thread.State: RUNNABLE
at sun.nio.ch.FileDispatcherImpl.force0(java.base@11.0.17/Native Method)
at 
sun.nio.ch.FileDispatcherImpl.force(java.base@11.0.17/FileDispatcherImpl.java:82)
at 
sun.nio.ch.FileChannelImpl.force(java.base@11.0.17/FileChannelImpl.java:461)
at 
kafka.log.ProducerStateManager$.kafka$log$ProducerStateManager$$writeSnapshot(ProducerStateManager.scala:451)
at 
kafka.log.ProducerStateManager.takeSnapshot(ProducerStateManager.scala:754)
at kafka.log.UnifiedLog.roll(UnifiedLog.scala:1544)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.maybeRoll(UnifiedLog.scala:1523)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:919)
- locked <0x00060d75d820> (a java.lang.Object)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:760)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1170)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1158)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:956)
at 
kafka.server.ReplicaManager$$Lambda$2379/0x000800b7c040.apply(Unknown 
Source)
at 
scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
at 
scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
at scala.collection.mutable.HashMap.map(HashMap.scala:35)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:944)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:602)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:666)
at kafka.server.KafkaApis.handle(KafkaApis.scala:175)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:75)
at java.lang.Thread.run(java.base@11.0.17/Thread.java:829) {code}

 ** Also there were bunch of logs that writing producer snapshots took hundreds 
of milliseconds.
 *** 
{code:java}
...
[2023-05-01 11:08:36,689] INFO [ProducerStateManager partition=xxx-4] Wrote 
producer snapshot at offset 1748817854 with 8 producer ids in 809 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:37,319] INFO [ProducerStateManager partition=yyy-34] Wrote 
producer snapshot at offset 247996937813 with 0 producer ids in 547 ms. 
(kafka.log.ProducerStateManager)
[2023-05-01 11:08:38,887] INFO [ProducerStateManager partition=zzz-9] Wrote 
producer snapshot at offset 226222355404 with 0 producer ids in 576 ms. 
(kafka.log.ProducerStateManager)
... {code}

 * From the analysis, we summarized the issue as below:

 ** 1. Disk write latency got worse due to the replica reassignment
 *** We already use replication quota, and lowering the quota further may not 
be acceptable for too long assignment duration
 ** 2. ProducerStateManager#takeSnapshot started to take time due to fsync 
latency
 *** This is done at every log segment roll.
 *** In our case, the broker hosts hundreds of partition leaders with high 
load, so log roll is occurring very frequently.
 ** 3. During ProducerStateM

Re: [DISCUSS] KIP-926: introducing acks=min.insync.replicas config

2023-05-11 Thread Haruki Okada
Hi, Luke.

Though this proposal definitely looks interesting, as others pointed out,
the leader election implementation would be the hard part.

And I think even LEO-based-election is not safe, which could cause silent
committed-data loss easily.

Let's say we have replicas A,B,C and A is the leader initially, and
min.insync.replicas = 2.

- 1. Initial
* A(leo=0), B(leo=0), C(leo=0)
- 2. Produce a message to A
* A(leo=1), B(leo=0), C(leo=0)
- 3. Another producer produces a message to A (i.e. as the different batch)
* A(leo=2), B(leo=0), C(leo=0)
- 4. C replicates the first batch. offset=1 is committed (by
acks=min.insync.replicas)
* A(leo=2), B(leo=0), C(leo=1)
- 5. A loses ZK session (or broker session timeout in KRaft)
- 6. Controller (regardless ZK/KRaft) doesn't store LEO in itself, so it
needs to interact with each replica. It detects C has the largest LEO and
decided to elect C as the new leader
- 7. Before leader-election is performed, B replicates offset=1,2 from A.
offset=2 is committed
* This is possible because even if A lost ZK session, A could handle
fetch requests for a while.
- 8. Controller elects C as the new leader. B truncates its offset.
offset=2 is lost silently.

I have a feeling that we need quorum-based data replication? as Divij
pointed out.


2023年5月11日(木) 22:33 David Jacot :

> Hi Luke,
>
> > Yes, on second thought, I think the new leader election is required to
> work
> for this new acks option. I'll think about it and open another KIP for it.
>
> It can't be in another KIP as it is required for your proposal to work.
> This is also an important part to discuss as it requires the controller to
> do more operations on leader changes.
>
> Cheers,
> David
>
> On Thu, May 11, 2023 at 2:44 PM Luke Chen  wrote:
>
> > Hi Ismael,
> > Yes, on second thought, I think the new leader election is required to
> work
> > for this new acks option. I'll think about it and open another KIP for
> it.
> >
> > Hi Divij,
> > Yes, I agree with all of them. I'll think about it and let you know how
> we
> > can work together.
> >
> > Hi Alexandre,
> > > 100. The KIP makes one statement which may be considered critical:
> > "Note that in acks=min.insync.replicas case, the slow follower might
> > be easier to become out of sync than acks=all.". Would you have some
> > data on that behaviour when using the new ack semantic? It would be
> > interesting to analyse and especially look at the percentage of time
> > the number of replicas in ISR is reduced to the configured
> > min.insync.replicas.
> >
> > The comparison data would be interesting. I can have a test when
> available.
> > But this KIP will be deprioritized because there should be a
> pre-requisite
> > KIP for it.
> >
> > > A (perhaps naive) hypothesis would be that the
> > new ack semantic indeed provides better produce latency, but at the
> > cost of precipitating the slowest replica(s) out of the ISR?
> >
> > Yes, it could be.
> >
> > > 101. I understand the impact on produce latency, but I am not sure
> > about the impact on durability. Is your durability model built against
> > the replication factor or the number of min insync replicas?
> >
> > Yes, and also the new LEO-based leader election (not proposed yet).
> >
> > > 102. Could a new type of replica which would not be allowed to enter
> > the ISR be an alternative? Such replica could attempt replication on a
> > best-effort basis and would provide the permanent guarantee not to
> > interfere with foreground traffic.
> >
> > You mean a backup replica, which will never become leader (in-sync),
> right?
> > That's an interesting thought and might be able to become a workaround
> with
> > the existing leader election. Let me think about it.
> >
> > Hi qiangLiu,
> >
> > > It's a good point that add this config and get better P99 latency, but
> is
> > this changing the meaning of "in sync replicas"? consider a situation
> with
> > "replica=3 acks=2", when two broker fail and left only the broker that
> > does't have the message, it is in sync, so will be elected as leader,
> will
> > it cause a NOT NOTICED lost of acked messages?
> >
> > Yes, it will, so the `min.insync.replicas` config in the broker/topic
> level
> > should be set correctly. In your example, it should be set to 2, so that
> > when 2 replicas down, no new message write will be successful.
> >
> >
> > Thank you.
> > Luke
> >
> >
> > On Thu, May 11, 2023 at 12:16 PM 67 <6...@gd67.com> wrote:
> >
> > > Hi Luke,
> > >
> > >
> > > It's a good point that add this config and get better P99 latency, but
> is
> > > this changing the meaning of "in sync replicas"? consider a situation
> > with
> > > "replica=3 acks=2", when two broker fail and left only the broker that
> > > does't have the message, it is in sync, so will be elected as leader,
> > will
> > > it cause a *NOT NOTICED* lost of acked messages?
> > >
> > > qiangLiu
> > >
> > >
> > > 在2023年05月10 12时58分,"Ismael Juma"写道:
> > >
> > >
> > > Hi Luke,
> > >
> > 

Re: [DISCUSS] KIP-913: add new method to provide possibility for accelerate first record's sending

2023-03-08 Thread Haruki Okada
Thanks for your explanation.

> Thus, The code will look strange to call partitionsFor(topic)

Hmm is that so?
As the javadoc for `partitionsFor` states, it is simply for getting
partition metadata for the topic, so I think nothing is strange even if we
use it for "warmup" metadata.
(And if so, `getCluster` also looks strange)

> print some useful information when startup

This sounds like trying to solve a different problem from the
initial motivation (warmup metadata).
For this particular problem, we can just use Admin API's corresponding
methods.

> partitionsFor(topic) using the maxBlockTimeMs

I can understand the intention though, this is just a "block time" for the
caller thread and metadata-request is being sent individually on Sender
thread, so we can just retry calling partitionsFor until metadata-response
is eventually received and we get successful metadata even when it timed
out.

So I'm not sure if it makes sense to add a new API or timeout config (for
KIP-912), considering KafkaProducer already has many timeout parameters so
it could introduce another complexity.

2023年3月9日(木) 12:18 jian fu :

> Hi Haruki Okada:
>
> There is another KIP912 discussion related to this one. Welcome to give
> some comments/suggestions.
>
> Thanks.
>
> I think if the 912 is done. New method getCluster can use the new config
> directly with time parameter removed.
>
> WDTY?
>
> [DISCUSS] KIP-912: Support decreasing send's block time without worrying
> about metadata's fetch-Apache Mail Archives
> <https://lists.apache.org/thread/jq2fb8ylwxb2ojgvo5qdc57mgrmvxybj>
>
> jian fu  于2023年3月9日周四 11:11写道:
>
> > Hi Okada Haruki:
> >
> > Thanks for your comment.
> >
> > I can use it partitionsFor(topic) for the goal, thus there are two
> reasons
> > why I don't choose it and propose to add new dedicated method:
> > 1) Consider how to use the method to solve the issue, We should call it
> in
> > application's startup process before any record sent. Thus, The code will
> > look strange to call partitionsFor(topic) . So I suggest to add one
> > common method such as getCluster so that you can get and print some
> useful
> > information when startup with the goal reached. It also can provide more
> > information self compare with partitionsFor.
> > 2) partitionsFor(topic) using the maxBlockTimeMs as the max blocking
> > time. For the metadata's fetching, I will take a lot of time so that we
> > must set maxBlockTimeMs to a big value (at least > time for metadata).
> > Thus consider that the send method is async. Most of application like to
> > reduce the maxBlockTimeMs. It is conflict time requirement. So we need
> > one new blocking time as parameter of the method. I don't want to change
> > the existing interface.
> > So based on above reasons. I suggest to add new method with time control
> > parameter.
> > WDTY?
> >
> > Thanks.
> >
> >
> > Haruki Okada  于2023年3月9日周四 10:19写道:
> >
> >> You can just use Producer#partitionsFor
> >>
> >> 2023年3月9日(木) 11:13 jian fu :
> >>
> >> > Hi All:
> >> >
> >> > If anyone can help to give some comments or suggestions for the
> >> proposal.
> >> > Thanks in advance!
> >> >
> >> > Regards
> >> > Jian
> >> >
> >> > jian fu  于2023年3月6日周一 17:00写道:
> >> >
> >> > > Hi Everyone:
> >> > > Nice to meet you.
> >> > >
> >> > > I created one KIP to request your review.
> >> > > KIP-913: add new method to provide possibility for accelerate first
> >> > > record's sending
> >> > > <
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-913%3A+add+new+method+to+provide+possibility+for+accelerate+first+record%27s+sending
> >> > >
> >> > > The example PR:
> >> > > *https://github.com/apache/kafka/pull/13320/files
> >> > > <https://github.com/apache/kafka/pull/13320/files>*
> >> > >
> >> > > Thanks.
> >> > >
> >> > > Regards
> >> > > Jian
> >> > >
> >> > >
> >> >
> >> > --
> >> > Regards  Fu.Jian
> >> > --
> >> > Cisco Communications, Inc.
> >> >
> >>
> >>
> >> --
> >> 
> >> Okada Haruki
> >> ocadar...@gmail.com
> >> 
> >>
> >
> >
> > --
> > Regards  Fu.Jian
> > --
> > Cisco Communications, Inc.
> >
> >
>
> --
> Regards  Fu.Jian
> --
> Cisco Communications, Inc.
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-913: add new method to provide possibility for accelerate first record's sending

2023-03-08 Thread Haruki Okada
You can just use Producer#partitionsFor

2023年3月9日(木) 11:13 jian fu :

> Hi All:
>
> If anyone can help to give some comments or suggestions for the proposal.
> Thanks in advance!
>
> Regards
> Jian
>
> jian fu  于2023年3月6日周一 17:00写道:
>
> > Hi Everyone:
> > Nice to meet you.
> >
> > I created one KIP to request your review.
> > KIP-913: add new method to provide possibility for accelerate first
> > record's sending
> > <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-913%3A+add+new+method+to+provide+possibility+for+accelerate+first+record%27s+sending
> >
> > The example PR:
> > *https://github.com/apache/kafka/pull/13320/files
> > *
> >
> > Thanks.
> >
> > Regards
> > Jian
> >
> >
>
> --
> Regards  Fu.Jian
> --
> Cisco Communications, Inc.
>


-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-14445) Producer doesn't request metadata update on REQUEST_TIMED_OUT

2022-12-06 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-14445:


 Summary: Producer doesn't request metadata update on 
REQUEST_TIMED_OUT
 Key: KAFKA-14445
 URL: https://issues.apache.org/jira/browse/KAFKA-14445
 Project: Kafka
  Issue Type: Improvement
Reporter: Haruki Okada


Produce requests may fail with timeout by `request.timeout.ms` in below two 
cases:
 * Didn't receive produce response within `request.timeout.ms`
 * Produce response received, but it ended up with `REQUEST_TIMEOUT_MS` in the 
broker

Former case usually happens when a broker-machine got failed or there's network 
glitch etc.

In this case, the connection will be disconnected and metadata-update will be 
requested to discover new leader: 
[https://github.com/apache/kafka/blob/3.3.1/clients/src/main/java/org/apache/kafka/clients/producer/internals/Sender.java#L556]

 

The problem is in latter case (REQUEST_TIMED_OUT on the broker).

In this case, the produce request will be ended up with TimeoutException, which 
doesn't inherit InvalidMetadataException so it doesn't trigger metadata update.

 

Typical cause of REQUEST_TIMED_OUT is replication delay due to follower-side 
problem, that metadata-update doesn't make much sense indeed.

 

However, we found that in some cases, stale metadata on REQUEST_TIMED_OUT could 
cause produce requests to retry unnecessarily , which may end up with batch 
expiration due to delivery timeout.

Below is the scenario we experienced:
 * Environment:
 ** Partition tp-0 has 3 replicas, 1, 2, 3. Leader is 1
 ** min.insync.replicas=2
 ** acks=all
 * Scenario:
 ** broker 1 "partially" failed
 *** It lost ZooKeeper connection and kicked out from the cluster
  There was controller log like:
 * 
{code:java}
[2022-12-04 08:01:04,013] INFO [Controller id=XX] Newly added brokers: , 
deleted brokers: 1, bounced brokers: {code}

 *** However, somehow the broker was able continued to receive produce requests
  We're still working on investigating how this is possible though.
  Indeed, broker 1 was somewhat "alive" and keeps working according to 
server.log
 *** In other words, broker 1 became "zombie"
 ** broker 2 was elected as new leader
 *** broker 3 became follower of broker 2
 *** However, since broker 1 was still out of cluster, it didn't receive 
LeaderAndIsr so 1 kept thinking itself as the leader of tp-0
 ** Meanwhile, producer keeps sending produce requests to broker 1 and requests 
were failed due to REQUEST_TIMED_OUT because no brokers replicates from broker 
1.
 *** REQUEST_TIMED_OUT doesn't trigger metadata update, so produce didn't have 
a change to update its stale metadata

 

So I suggest to request metadata update even on REQUEST_TIMED_OUT exception, 
for the case that the old leader became "zombie"



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS]: Including TLA+ in the repo

2022-07-26 Thread Haruki Okada
Hi.

That also sounds good to me.

TLA+ spec helps us to understand the protocol from a high-level.
Also I agree with Tim's point that the spread of familiarity with
formal methods would be beneficial in the long term.

2022年7月26日(火) 23:58 Tom Bentley :

> Hi,
>
> I noticed that TLA+ has featured in the Test Plans of a couple of recent
> KIPs [1,2]. This is a good thing in my opinion. I'm aware that TLA+ has
> been used in the past to prove properties of various parts of the Kafka
> protocol [3,4].
>
> The point I wanted to raise is that I think it would be beneficial to the
> community if these models could be part of the main Kafka repo. That way
> there are fewer hurdles to their discoverability and it makes it easier for
> people to compare the implementation with the spec. Spreading familiarity
> with TLA+ within the community is also a potential side-benefit.
>
> I notice that the specs in [4] are MIT-licensed, but according to the
> Apache 3rd party license policy [5] it should be OK to include.
>
> Thoughts?
>
> Kind regards,
>
> Tom
>
> [1]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-848%3A+The+Next+Generation+of+the+Consumer+Rebalance+Protocol#KIP848:TheNextGenerationoftheConsumerRebalanceProtocol-TestPlan
> [2]:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-853%3A+KRaft+Voter+Changes#KIP853:KRaftVoterChanges-TestPlan
> [3]: https://github.com/hachikuji/kafka-specification
> [4]:
>
> https://github.com/Vanlightly/raft-tlaplus/tree/main/specifications/pull-raft
> [5]: https://www.apache.org/legal/resolved.html
>


-- 

Okada Haruki
ocadar...@gmail.com



KAFKA-13572 Negative preferred replica imbalance metric

2022-07-14 Thread Haruki Okada
Hi, Kafka.

We found that the race in topic-deletion procedure could cause the
preferred replica imbalance metric to be negative.
The phenomenon can easily happen when many topics are deleted at once, and
since we use the metric for monitoring, we have to restart the controller
to fix the metric every time it happens.

I submitted a patch to fix it: https://github.com/apache/kafka/pull/12405
It'd be appreciated if anyone could review the PR.


Thanks,

-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-02-16 Thread Haruki Okada
Hi Bruno,

Thank you for driving the release !
Can we add KIP-764 to the plan? (
https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
)


Thanks,

2022年2月16日(水) 18:22 Bruno Cadonna :

> Hi Julien,
>
> Thank you for the feedback on the release plan.
>
> I added KIP-808 to the plan.
>
> Best,
> Bruno
>
> On 16.02.22 09:10, Julien Chanaud wrote:
> > Hello Bruno,
> >
> > Can we also add KIP-808 to the plan ?
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-808%3A+Add+support+for+different+unix+precisions+in+TimestampConverter+SMT
> >
> > Thank you,
> >
> > Julien
> >
> > Le mar. 15 févr. 2022 à 19:01, Mickael Maison 
> a
> > écrit :
> >
> >> Hi Bruno,
> >>
> >> Thanks for publishing the release plan!
> >> Can we add KIP-769 to the plan?
> >>
> >> Thanks,
> >> Mickael
> >>
> >> On Tue, Feb 15, 2022 at 12:37 PM Bruno Cadonna 
> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> I published a release plan for the Apache Kafka 3.2.0 release here:
> >>> https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0
> >>>
> >>> KIP Freeze: 2 March 2022
> >>> Feature Freeze: 16 March 2022
> >>> Code Freeze:30 March 2022
> >>>
> >>> At least two weeks of stabilization will follow Code Freeze.
> >>>
> >>> Please let me know if should add or remove KIPs from the plan or if you
> >>> have any other objections.
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>>
> >>> On 04.02.22 16:03, Bruno Cadonna wrote:
>  Hi,
> 
>  I'd like to volunteer to be the release manager for our next
>  feature release, 3.2.0. If there are no objections, I'll send
>  out the release plan soon.
> 
>  Best,
>  Bruno
> >>
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-719: Add Log4J2 Appender

2021-12-23 Thread Haruki Okada
Thanks for the clarification.

About 2, I wan't aware of those concerns.
Let me check them first.


Thanks,

2021年12月23日(木) 13:37 Dongjin Lee :

> Hi Haruki,
>
>
> Thanks for organizing the issue.
>
>
> If the community prefers logback, I will gladly change the dependency and
> update the PR. However, it has the following issues:
>
>
> 1. The log4j2 vulnerabilities seem mostly fixed, and KIP-653 + KIP-719 are
> not released yet. So, using log4j2 (whose recent update pace is so high)
> will not affect the users.
>
>
> 2. To switch to logback, the following features should be reworked:
>
>
>   a. Dynamic logger level configuration (core, connect)
>
>   b. Logging tests (streams)
>
>   c. Kafka Appender (tools)
>
>
> a and b are the most challenging ones since there is little documentation
> on how to do this, so it requires analyzing the implementation itself.
> (what I actually did with log4j2) About c, logback does not provide a Kafka
> Appender so we have to provide an equivalent.
>
>
> It is why I prefer to use log4j2. How do you think?
>
>
> Thanks,
>
> Dongjin
>
>
> On Thu, Dec 23, 2021 at 9:01 AM Haruki Okada  wrote:
>
> > Hi, Dongjin,
> >
> > Sorry for interrupting the discussion.
> > And thank you for your hard work about KIP-653, KIP-719.
> >
> > I understand that KIP-653 is already accepted so log4j2 is the choice of
> > the Kafka community though, I'm now feeling that logback is a better
> choice
> > here.
> >
> > Reasons:
> >
> > - even after "log4shell", several vulnerabilities found on log4j2 so new
> > versions are released and users have to update in high-pace
> > * actually, a CVE was also reported for logback (CVE-2021-42550) but
> it
> > requires edit-permission of the config file for an attacker so it's much
> > less threatening
> > - log4j1.x and logback are made by same developer (ceki), so
> substantially
> > the successor of log4j1 is logback rather than log4j2
> > - in Hadoop project, seems similar suggestion was made from a PMC
> > * https://issues.apache.org/jira/browse/HADOOP-12956
> >
> >
> > What do you think about adopting logback instead?
> >
> >
> > Thanks,
> >
> > 2021年12月21日(火) 18:02 Dongjin Lee :
> >
> > > Hi Mickael,
> > >
> > > > In the meantime, you may want to bump the VOTE thread too.
> > >
> > > Sure, I just reset the voting thread with a brief context.
> > >
> > > Thanks,
> > > Dongjin
> > >
> > > On Tue, Dec 21, 2021 at 2:13 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > Thanks Dongjin!
> > > >
> > > > I'll take a look soon.
> > > > In the meantime, you may want to bump the VOTE thread too.
> > > >
> > > > Best,
> > > > Mickael
> > > >
> > > >
> > > > On Sat, Dec 18, 2021 at 10:00 AM Dongjin Lee 
> > wrote:
> > > > >
> > > > > Hi Mickael,
> > > > >
> > > > > Finally, I did it! As you can see at the PR
> > > > > <https://github.com/apache/kafka/pull/10244>, KIP-719 now uses
> > > log4j2's
> > > > > Kafka appender, and log4j-appender is not used by the other modules
> > > > > anymore. You can see how it will work with KIP-653 at this preview
> > > > > <http://home.apache.org/~dongjin/post/apache-kafka-log4j2-support/
> >,
> > > > based
> > > > > on Apache Kafka 3.0.0. The proposal document
> > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-719%3A+Deprecate+Log4J+Appender
> > > > >
> > > > > is also updated accordingly, with its title.
> > > > >
> > > > > There is a minor issue on log4j2
> > > > > <https://issues.apache.org/jira/browse/LOG4J2-3256>, but it seems
> > like
> > > > it
> > > > > will be resolved soon.
> > > > >
> > > > > Best,
> > > > > Dongjin
> > > > >
> > > > > On Wed, Dec 15, 2021 at 9:28 PM Dongjin Lee 
> > > wrote:
> > > > >
> > > > > > Hi Mickael,
> > > > > >
> > > > > > > Can we do step 3 without breaking any compatibility? If so then
> > > that
> > > > > > sounds like a good 

Re: [DISCUSS] KIP-719: Add Log4J2 Appender

2021-12-22 Thread Haruki Okada
Hi, Dongjin,

Sorry for interrupting the discussion.
And thank you for your hard work about KIP-653, KIP-719.

I understand that KIP-653 is already accepted so log4j2 is the choice of
the Kafka community though, I'm now feeling that logback is a better choice
here.

Reasons:

- even after "log4shell", several vulnerabilities found on log4j2 so new
versions are released and users have to update in high-pace
* actually, a CVE was also reported for logback (CVE-2021-42550) but it
requires edit-permission of the config file for an attacker so it's much
less threatening
- log4j1.x and logback are made by same developer (ceki), so substantially
the successor of log4j1 is logback rather than log4j2
- in Hadoop project, seems similar suggestion was made from a PMC
* https://issues.apache.org/jira/browse/HADOOP-12956


What do you think about adopting logback instead?


Thanks,

2021年12月21日(火) 18:02 Dongjin Lee :

> Hi Mickael,
>
> > In the meantime, you may want to bump the VOTE thread too.
>
> Sure, I just reset the voting thread with a brief context.
>
> Thanks,
> Dongjin
>
> On Tue, Dec 21, 2021 at 2:13 AM Mickael Maison 
> wrote:
>
> > Thanks Dongjin!
> >
> > I'll take a look soon.
> > In the meantime, you may want to bump the VOTE thread too.
> >
> > Best,
> > Mickael
> >
> >
> > On Sat, Dec 18, 2021 at 10:00 AM Dongjin Lee  wrote:
> > >
> > > Hi Mickael,
> > >
> > > Finally, I did it! As you can see at the PR
> > > , KIP-719 now uses
> log4j2's
> > > Kafka appender, and log4j-appender is not used by the other modules
> > > anymore. You can see how it will work with KIP-653 at this preview
> > > ,
> > based
> > > on Apache Kafka 3.0.0. The proposal document
> > > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-719%3A+Deprecate+Log4J+Appender
> > >
> > > is also updated accordingly, with its title.
> > >
> > > There is a minor issue on log4j2
> > > , but it seems like
> > it
> > > will be resolved soon.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Wed, Dec 15, 2021 at 9:28 PM Dongjin Lee 
> wrote:
> > >
> > > > Hi Mickael,
> > > >
> > > > > Can we do step 3 without breaking any compatibility? If so then
> that
> > > > sounds like a good idea.
> > > >
> > > > As far as I know, the answer is yes; I am now updating my PR, so I
> will
> > > > notify you as soon as I complete the work.
> > > >
> > > > Best,
> > > > Dongjin
> > > >
> > > > On Wed, Dec 15, 2021 at 2:00 AM Mickael Maison <
> > mickael.mai...@gmail.com>
> > > > wrote:
> > > >
> > > >> Hi Dongjin,
> > > >>
> > > >> Sorry for the late reply. Can we do step 3 without breaking any
> > > >> compatibility? If so then that sounds like a good idea.
> > > >>
> > > >> Thanks,
> > > >> Mickael
> > > >>
> > > >>
> > > >>
> > > >> On Tue, Nov 23, 2021 at 2:08 PM Dongjin Lee 
> > wrote:
> > > >> >
> > > >> > Hi Mickael,
> > > >> >
> > > >> > I also thought over the issue thoroughly and would like to
> propose a
> > > >> minor
> > > >> > change to your proposal:
> > > >> >
> > > >> > 1. Deprecate log4j-appender now
> > > >> > 2. Document how to migrate into logging-log4j2
> > > >> > 3. (Changed) Replace the log4j-appender (in turn log4j 1.x)
> > > >> dependencies in
> > > >> > tools, trogdor, and shell and upgrade to log4j2 in 3.x, removing
> > log4j
> > > >> 1.x
> > > >> > dependencies.
> > > >> > 4. (Changed) Remove log4j-appender in Kafka 4.0
> > > >> >
> > > >> > What we need to do for the log4j2 upgrade is just removing the
> log4j
> > > >> > dependencies only, for they can cause a classpath error. And
> > actually,
> > > >> we
> > > >> > can do it without discontinuing publishing the log4j-appender
> > artifact.
> > > >> So,
> > > >> > I suggest separating the upgrade to log4j2 and removing the
> > > >> log4j-appender
> > > >> > module.
> > > >> >
> > > >> > How do you think? If you agree, I will update the KIP and the PR
> > > >> > accordingly ASAP.
> > > >> >
> > > >> > Thanks,
> > > >> > Dongjin
> > > >> >
> > > >> > On Mon, Nov 15, 2021 at 8:06 PM Mickael Maison <
> > > >> mickael.mai...@gmail.com>
> > > >> > wrote:
> > > >> >
> > > >> > > Hi Dongjin,
> > > >> > >
> > > >> > > Thanks for the clarifications.
> > > >> > >
> > > >> > > I wonder if a simpler course of action could be:
> > > >> > > - Deprecate log4j-appender now
> > > >> > > - Document how to use logging-log4j2
> > > >> > > - Remove log4j-appender and all the log4j dependencies in Kafka
> > 4.0
> > > >> > >
> > > >> > > This delays KIP-653 till Kafka 4.0 but (so far) Kafka is not
> > directly
> > > >> > > affected by the log4j CVEs. At least this gives us a clear and
> > simple
> > > >> > > roadmap to follow.
> > > >> > >
> > > >> > > What do you think?
> > > >> > >
> > > >> > > On Tue, Nov 9, 2021 at 12:12 PM Dongjin Lee  >
> > > >> wrote:
> > > >> > > >
> > > >> > > > Hi Mickael,
> > > >> > > >
> > > >> > > > I

[jira] [Created] (KAFKA-13403) KafkaServer crashes when deleting topics due to the race in log deletion

2021-10-25 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-13403:


 Summary: KafkaServer crashes when deleting topics due to the race 
in log deletion
 Key: KAFKA-13403
 URL: https://issues.apache.org/jira/browse/KAFKA-13403
 Project: Kafka
  Issue Type: Bug
  Components: core
Affects Versions: 2.4.1
Reporter: Haruki Okada


h2. Environment
 * OS: CentOS Linux release 7.6
 * Kafka version: 2.4.1

 ** But as far as I checked the code, I think same phenomenon could happen even 
on trunk
 * Kafka log directory: RAID1+0 (i.e. not using JBOD so only single log.dirs is 
set)
 * Java version: AdoptOpenJDK 1.8.0_282

h2. Phenomenon

When we were in the middle of deleting several topics by `kafka-topics.sh 
--delete --topic blah-blah`, one broker in our cluster crashed due to following 
exception:

 
{code:java}
[2021-10-21 18:19:19,122] ERROR Shutdown broker because all log dirs in 
/data/kafka have failed (kafka.log.LogManager)
{code}
 

 

We also found NoSuchFileException was thrown right before the crash when 
LogManager tried to delete logs for some partitions.

 
{code:java}
[2021-10-21 18:19:18,849] ERROR Error while deleting log for foo-bar-topic-5 in 
dir /data/kafka (kafka.server.LogDirFailureChannel)
java.nio.file.NoSuchFileException: 
/data/kafka/foo-bar-topic-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete/03877066.timeindex.deleted
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)
at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)
at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)
at java.nio.file.Files.walkFileTree(Files.java:2706)
at java.nio.file.Files.walkFileTree(Files.java:2742)
at org.apache.kafka.common.utils.Utils.delete(Utils.java:732)
at kafka.log.Log.$anonfun$delete$2(Log.scala:2036)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at kafka.log.Log.maybeHandleIOException(Log.scala:2343)
at kafka.log.Log.delete(Log.scala:2030)
at kafka.log.LogManager.deleteLogs(LogManager.scala:826)
at kafka.log.LogManager.$anonfun$deleteLogs$6(LogManager.scala:840)
at 
kafka.utils.KafkaScheduler.$anonfun$schedule$2(KafkaScheduler.scala:116)
at kafka.utils.CoreUtils$$anon$1.run(CoreUtils.scala:65)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
{code}
So, the log-dir was marked as offline and ended up with KafkaServer crash 
because the broker has only single log-dir.
h2. Cause

We also found below logs right before the NoSuchFileException.

 
{code:java}
[2021-10-21 18:18:17,829] INFO Log for partition foo-bar-5 is renamed to 
/data/kafka/foo-bar-5.df3626d2d9eb41a2aeb0b8d55d7942bd-delete and is scheduled 
for deletion (kafka.log.LogManager)
[2021-10-21 18:18:17,900] INFO [Log partition=foo-bar-5, dir=/data/kafka] Found 
deletable segments with base offsets [3877066] due to retention time 
17280ms breach (kafka.log.Log)[2021-10-21 18:18:17,901] INFO [Log 
partition=foo-bar-5, dir=/data/kafka] Scheduling segments for deletion 
List(LogSegment(baseOffset=3877066, size=90316366, 
lastModifiedTime=1634634956000, largestTime=1634634955854)) (kafka.log.Log)
{code}
After checking through Kafka code, we concluded that there was a race between 
"kafka-log-retention" and "kafka-delete-logs" scheduler threads.

 

Detailed timeline was like below:

- Precondition: there was two log segments (3877066, 4271262) for the partition 
foo-bar-5

 
||time||thread||event||files under partition dir||
|2021-10-21 18:18:17,901|kafka-log-retention|Scheduled deletion for segment 
3877066 due to retention|3877066, 4271262|
|2021-10-21 18:19:17,830|kafka-delete-logs

Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-10-20 Thread Haruki Okada
Hi all,

Thanks for participating in the vote.
With 3 binding +1 votes (Rajini Sivaram, Mickael Maison, David Jacot) and 2
non-binding +1 votes (Luke Chen, Lucas Bradstreet), the vote passes.

I will update the KIP status then raise a PR later.

Thanks,

2021年10月20日(水) 20:58 David Jacot :

> +1 (binding). Thanks for the KIP!
>
> Best,
> David
>
> On Wed, Oct 20, 2021 at 12:09 PM Haruki Okada  wrote:
>
> > Currently, KIP-764 got
> > +1 binding
> > +2 non-binding
> >
> > votes. Could anyone help checking the KIP and join the vote?
> >
> > Thanks,
> >
> > 2021年10月18日(月) 18:21 Haruki Okada :
> >
> > > Hi Lucas.
> > >
> > > Thanks for the vote.
> > > As far as I understand, it should be enough to adjust
> > `net.core.somaxconn`
> > > and `net.ipv4.tcp_max_syn_backlog` accordingly for allocating
> sufficient
> > > backlog size.
> > >
> > >
> > > 2021年10月18日(月) 17:37 Lucas Bradstreet :
> > >
> > >> The other related kernel config that I can recall is
> > >> net.ipv4.tcp_max_syn_backlog.
> > >>
> > >> On Mon, Oct 18, 2021 at 4:32 PM Lucas Bradstreet 
> > >> wrote:
> > >>
> > >> > Thank you for the KIP. +1 (non-binding)
> > >> >
> > >> > For the implementation can we ensure that any kernel parameters that
> > may
> > >> > need to also be adjusted are documented in the config documentation
> > >> > (e.g. net.core.somaxconn)?
> > >> >
> > >> >
> > >> > On Mon, Oct 18, 2021 at 4:23 PM Haruki Okada 
> > >> wrote:
> > >> >
> > >> >> Hi Luke.
> > >> >>
> > >> >> Thank you for the vote.
> > >> >> Updated KIP to link to ServerSocket#bind javadoc.
> > >> >>
> > >> >>
> > >> >> 2021年10月18日(月) 17:00 Luke Chen :
> > >> >>
> > >> >> > Hi Okada,
> > >> >> > Thanks for the KIP.
> > >> >> > +1 (non-binding)
> > >> >> >
> > >> >> > One thing to add is that you should add ServerSocket#bind java
> doc
> > >> link
> > >> >> > into the KIP.
> > >> >> > I don't think everyone is familiar with the definition of the
> > method
> > >> >> > parameters.
> > >> >> >
> > >> >> > Thank you.
> > >> >> > Luke
> > >> >> >
> > >> >> > On Mon, Oct 18, 2021 at 3:43 PM Haruki Okada <
> ocadar...@gmail.com>
> > >> >> wrote:
> > >> >> >
> > >> >> > > Hi Kafka.
> > >> >> > >
> > >> >> > > Let me bump this VOTE thread for the KIP.
> > >> >> > > We applied proposed changes in the KIP to our large Kafka
> cluster
> > >> by
> > >> >> > > building patched Kafka internally and confirmed it's working
> > well.
> > >> >> > >
> > >> >> > > Please feel free to give your feedback if there's any points to
> > be
> > >> >> > > clarified in the KIP.
> > >> >> > >
> > >> >> > > Thanks,
> > >> >> > >
> > >> >> > > 2021年8月9日(月) 11:25 Haruki Okada :
> > >> >> > >
> > >> >> > > > Thanks for your comment LI-san.
> > >> >> > > >
> > >> >> > > > Could anyone else review and vote for the KIP?
> > >> >> > > >
> > >> >> > > > I think the situation described in the KIP's motivation can
> > >> happen
> > >> >> in
> > >> >> > any
> > >> >> > > > large-scale Kafka deployment, so may be helpful for many
> users
> > >> while
> > >> >> > the
> > >> >> > > > proposed changes are small enough.
> > >> >> > > >
> > >> >> > > >
> > >> >> > > > Thanks,
> > >> >> > > >
> > >> >> > > > 2021年8月3日(火) 15:49 Xiangyuan LI :
> > >> >> > > >
> &g

Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-10-20 Thread Haruki Okada
Currently, KIP-764 got
+1 binding
+2 non-binding

votes. Could anyone help checking the KIP and join the vote?

Thanks,

2021年10月18日(月) 18:21 Haruki Okada :

> Hi Lucas.
>
> Thanks for the vote.
> As far as I understand, it should be enough to adjust `net.core.somaxconn`
> and `net.ipv4.tcp_max_syn_backlog` accordingly for allocating sufficient
> backlog size.
>
>
> 2021年10月18日(月) 17:37 Lucas Bradstreet :
>
>> The other related kernel config that I can recall is
>> net.ipv4.tcp_max_syn_backlog.
>>
>> On Mon, Oct 18, 2021 at 4:32 PM Lucas Bradstreet 
>> wrote:
>>
>> > Thank you for the KIP. +1 (non-binding)
>> >
>> > For the implementation can we ensure that any kernel parameters that may
>> > need to also be adjusted are documented in the config documentation
>> > (e.g. net.core.somaxconn)?
>> >
>> >
>> > On Mon, Oct 18, 2021 at 4:23 PM Haruki Okada 
>> wrote:
>> >
>> >> Hi Luke.
>> >>
>> >> Thank you for the vote.
>> >> Updated KIP to link to ServerSocket#bind javadoc.
>> >>
>> >>
>> >> 2021年10月18日(月) 17:00 Luke Chen :
>> >>
>> >> > Hi Okada,
>> >> > Thanks for the KIP.
>> >> > +1 (non-binding)
>> >> >
>> >> > One thing to add is that you should add ServerSocket#bind java doc
>> link
>> >> > into the KIP.
>> >> > I don't think everyone is familiar with the definition of the method
>> >> > parameters.
>> >> >
>> >> > Thank you.
>> >> > Luke
>> >> >
>> >> > On Mon, Oct 18, 2021 at 3:43 PM Haruki Okada 
>> >> wrote:
>> >> >
>> >> > > Hi Kafka.
>> >> > >
>> >> > > Let me bump this VOTE thread for the KIP.
>> >> > > We applied proposed changes in the KIP to our large Kafka cluster
>> by
>> >> > > building patched Kafka internally and confirmed it's working well.
>> >> > >
>> >> > > Please feel free to give your feedback if there's any points to be
>> >> > > clarified in the KIP.
>> >> > >
>> >> > > Thanks,
>> >> > >
>> >> > > 2021年8月9日(月) 11:25 Haruki Okada :
>> >> > >
>> >> > > > Thanks for your comment LI-san.
>> >> > > >
>> >> > > > Could anyone else review and vote for the KIP?
>> >> > > >
>> >> > > > I think the situation described in the KIP's motivation can
>> happen
>> >> in
>> >> > any
>> >> > > > large-scale Kafka deployment, so may be helpful for many users
>> while
>> >> > the
>> >> > > > proposed changes are small enough.
>> >> > > >
>> >> > > >
>> >> > > > Thanks,
>> >> > > >
>> >> > > > 2021年8月3日(火) 15:49 Xiangyuan LI :
>> >> > > >
>> >> > > >> Hi Haruki Okada:
>> >> > > >>   i read your comment, thx for your detail explain!
>> >> > > >>   add backlog parameter is a useful suggestion, hope it could
>> >> added to
>> >> > > >> kafka.
>> >> > > >>
>> >> > > >> Haruki Okada  于2021年8月2日周一 上午7:43写道:
>> >> > > >>
>> >> > > >> > Hi, Kafka.
>> >> > > >> >
>> >> > > >> > I would like to start a vote on KIP that makes SocketServer
>> >> > acceptor's
>> >> > > >> > backlog size configurable.
>> >> > > >> >
>> >> > > >> > KIP:
>> >> > > >> >
>> >> > > >> >
>> >> > > >>
>> >> > >
>> >> >
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
>> >> > > >> >
>> >> > > >> > Discussion thread:
>> >> > > >> >
>> >> > > >> >
>> >> > > >>
>> >> > >
>> >> >
>> >>
>> https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E
>> >> > > >> >
>> >> > > >> > Thanks,
>> >> > > >> >
>> >> > > >> > --
>> >> > > >> > 
>> >> > > >> > Okada Haruki
>> >> > > >> > ocadar...@gmail.com
>> >> > > >> > 
>> >> > > >> >
>> >> > > >>
>> >> > > >
>> >> > > >
>> >> > > > --
>> >> > > > 
>> >> > > > Okada Haruki
>> >> > > > ocadar...@gmail.com
>> >> > > > 
>> >> > > >
>> >> > >
>> >> > >
>> >> > > --
>> >> > > 
>> >> > > Okada Haruki
>> >> > > ocadar...@gmail.com
>> >> > > 
>> >> > >
>> >> >
>> >>
>> >>
>> >> --
>> >> 
>> >> Okada Haruki
>> >> ocadar...@gmail.com
>> >> 
>> >>
>> >
>>
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-10-18 Thread Haruki Okada
Hi Lucas.

Thanks for the vote.
As far as I understand, it should be enough to adjust `net.core.somaxconn`
and `net.ipv4.tcp_max_syn_backlog` accordingly for allocating sufficient
backlog size.


2021年10月18日(月) 17:37 Lucas Bradstreet :

> The other related kernel config that I can recall is
> net.ipv4.tcp_max_syn_backlog.
>
> On Mon, Oct 18, 2021 at 4:32 PM Lucas Bradstreet 
> wrote:
>
> > Thank you for the KIP. +1 (non-binding)
> >
> > For the implementation can we ensure that any kernel parameters that may
> > need to also be adjusted are documented in the config documentation
> > (e.g. net.core.somaxconn)?
> >
> >
> > On Mon, Oct 18, 2021 at 4:23 PM Haruki Okada 
> wrote:
> >
> >> Hi Luke.
> >>
> >> Thank you for the vote.
> >> Updated KIP to link to ServerSocket#bind javadoc.
> >>
> >>
> >> 2021年10月18日(月) 17:00 Luke Chen :
> >>
> >> > Hi Okada,
> >> > Thanks for the KIP.
> >> > +1 (non-binding)
> >> >
> >> > One thing to add is that you should add ServerSocket#bind java doc
> link
> >> > into the KIP.
> >> > I don't think everyone is familiar with the definition of the method
> >> > parameters.
> >> >
> >> > Thank you.
> >> > Luke
> >> >
> >> > On Mon, Oct 18, 2021 at 3:43 PM Haruki Okada 
> >> wrote:
> >> >
> >> > > Hi Kafka.
> >> > >
> >> > > Let me bump this VOTE thread for the KIP.
> >> > > We applied proposed changes in the KIP to our large Kafka cluster by
> >> > > building patched Kafka internally and confirmed it's working well.
> >> > >
> >> > > Please feel free to give your feedback if there's any points to be
> >> > > clarified in the KIP.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > 2021年8月9日(月) 11:25 Haruki Okada :
> >> > >
> >> > > > Thanks for your comment LI-san.
> >> > > >
> >> > > > Could anyone else review and vote for the KIP?
> >> > > >
> >> > > > I think the situation described in the KIP's motivation can happen
> >> in
> >> > any
> >> > > > large-scale Kafka deployment, so may be helpful for many users
> while
> >> > the
> >> > > > proposed changes are small enough.
> >> > > >
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > 2021年8月3日(火) 15:49 Xiangyuan LI :
> >> > > >
> >> > > >> Hi Haruki Okada:
> >> > > >>   i read your comment, thx for your detail explain!
> >> > > >>   add backlog parameter is a useful suggestion, hope it could
> >> added to
> >> > > >> kafka.
> >> > > >>
> >> > > >> Haruki Okada  于2021年8月2日周一 上午7:43写道:
> >> > > >>
> >> > > >> > Hi, Kafka.
> >> > > >> >
> >> > > >> > I would like to start a vote on KIP that makes SocketServer
> >> > acceptor's
> >> > > >> > backlog size configurable.
> >> > > >> >
> >> > > >> > KIP:
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
> >> > > >> >
> >> > > >> > Discussion thread:
> >> > > >> >
> >> > > >> >
> >> > > >>
> >> > >
> >> >
> >>
> https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E
> >> > > >> >
> >> > > >> > Thanks,
> >> > > >> >
> >> > > >> > --
> >> > > >> > 
> >> > > >> > Okada Haruki
> >> > > >> > ocadar...@gmail.com
> >> > > >> > 
> >> > > >> >
> >> > > >>
> >> > > >
> >> > > >
> >> > > > --
> >> > > > 
> >> > > > Okada Haruki
> >> > > > ocadar...@gmail.com
> >> > > > 
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > 
> >> > > Okada Haruki
> >> > > ocadar...@gmail.com
> >> > > 
> >> > >
> >> >
> >>
> >>
> >> --
> >> 
> >> Okada Haruki
> >> ocadar...@gmail.com
> >> 
> >>
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-10-18 Thread Haruki Okada
Hi Luke.

Thank you for the vote.
Updated KIP to link to ServerSocket#bind javadoc.


2021年10月18日(月) 17:00 Luke Chen :

> Hi Okada,
> Thanks for the KIP.
> +1 (non-binding)
>
> One thing to add is that you should add ServerSocket#bind java doc link
> into the KIP.
> I don't think everyone is familiar with the definition of the method
> parameters.
>
> Thank you.
> Luke
>
> On Mon, Oct 18, 2021 at 3:43 PM Haruki Okada  wrote:
>
> > Hi Kafka.
> >
> > Let me bump this VOTE thread for the KIP.
> > We applied proposed changes in the KIP to our large Kafka cluster by
> > building patched Kafka internally and confirmed it's working well.
> >
> > Please feel free to give your feedback if there's any points to be
> > clarified in the KIP.
> >
> > Thanks,
> >
> > 2021年8月9日(月) 11:25 Haruki Okada :
> >
> > > Thanks for your comment LI-san.
> > >
> > > Could anyone else review and vote for the KIP?
> > >
> > > I think the situation described in the KIP's motivation can happen in
> any
> > > large-scale Kafka deployment, so may be helpful for many users while
> the
> > > proposed changes are small enough.
> > >
> > >
> > > Thanks,
> > >
> > > 2021年8月3日(火) 15:49 Xiangyuan LI :
> > >
> > >> Hi Haruki Okada:
> > >>   i read your comment, thx for your detail explain!
> > >>   add backlog parameter is a useful suggestion, hope it could added to
> > >> kafka.
> > >>
> > >> Haruki Okada  于2021年8月2日周一 上午7:43写道:
> > >>
> > >> > Hi, Kafka.
> > >> >
> > >> > I would like to start a vote on KIP that makes SocketServer
> acceptor's
> > >> > backlog size configurable.
> > >> >
> > >> > KIP:
> > >> >
> > >> >
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
> > >> >
> > >> > Discussion thread:
> > >> >
> > >> >
> > >>
> >
> https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E
> > >> >
> > >> > Thanks,
> > >> >
> > >> > --
> > >> > 
> > >> > Okada Haruki
> > >> > ocadar...@gmail.com
> > >> > 
> > >> >
> > >>
> > >
> > >
> > > --
> > > 
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > 
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-10-18 Thread Haruki Okada
Hi Kafka.

Let me bump this VOTE thread for the KIP.
We applied proposed changes in the KIP to our large Kafka cluster by
building patched Kafka internally and confirmed it's working well.

Please feel free to give your feedback if there's any points to be
clarified in the KIP.

Thanks,

2021年8月9日(月) 11:25 Haruki Okada :

> Thanks for your comment LI-san.
>
> Could anyone else review and vote for the KIP?
>
> I think the situation described in the KIP's motivation can happen in any
> large-scale Kafka deployment, so may be helpful for many users while the
> proposed changes are small enough.
>
>
> Thanks,
>
> 2021年8月3日(火) 15:49 Xiangyuan LI :
>
>> Hi Haruki Okada:
>>   i read your comment, thx for your detail explain!
>>   add backlog parameter is a useful suggestion, hope it could added to
>> kafka.
>>
>> Haruki Okada  于2021年8月2日周一 上午7:43写道:
>>
>> > Hi, Kafka.
>> >
>> > I would like to start a vote on KIP that makes SocketServer acceptor's
>> > backlog size configurable.
>> >
>> > KIP:
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
>> >
>> > Discussion thread:
>> >
>> >
>> https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E
>> >
>> > Thanks,
>> >
>> > --
>> > 
>> > Okada Haruki
>> > ocadar...@gmail.com
>> > 
>> >
>>
>
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-764 Configurable backlog size for creating Acceptor

2021-08-09 Thread Haruki Okada
Hi, Mao.

> we should name the property socket.listen.backlog.size for better clarity

Sounds good to me. I updated the KIP to name the property as
`socket.listen.backlog.size` .

2021年8月9日(月) 23:30 David Mao :

> Hi Haruki,
>
> I think it makes sense to have this as a configurable property. I think we
> should name the property socket.listen.backlog.size for better clarity on
> what the property configures. Besides that, the proposal looks good to me.
>
> David
>
> On Wed, Jul 28, 2021 at 8:09 AM Haruki Okada  wrote:
>
> > Hi, Kafka.
> >
> > Does anyone have any thoughts or suggestions about the KIP?
> > If there seems to be no, I would like to start a VOTE later.
> >
> >
> > Thanks,
> >
> >
> > 2021年7月22日(木) 16:17 Haruki Okada :
> >
> > > Hi, Kafka.
> > >
> > > I proposed KIP-764, which tries to add new KafkaConfig to adjust
> > > Acceptor's backlog size.
> > > As described in the KIP and the ticket KAFKA-9648, currently backlog
> size
> > > is fixed value (50) and it may not be enough to handle incoming
> > connections
> > > from massive clients.
> > >
> > > So we would like to make it configurable.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
> > >
> > > --
> > > 
> > > Okada Haruki
> > > ocadar...@gmail.com
> > > 
> > >
> >
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Re: [VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-08-08 Thread Haruki Okada
Thanks for your comment LI-san.

Could anyone else review and vote for the KIP?

I think the situation described in the KIP's motivation can happen in any
large-scale Kafka deployment, so may be helpful for many users while the
proposed changes are small enough.


Thanks,

2021年8月3日(火) 15:49 Xiangyuan LI :

> Hi Haruki Okada:
>   i read your comment, thx for your detail explain!
>   add backlog parameter is a useful suggestion, hope it could added to
> kafka.
>
> Haruki Okada  于2021年8月2日周一 上午7:43写道:
>
> > Hi, Kafka.
> >
> > I would like to start a vote on KIP that makes SocketServer acceptor's
> > backlog size configurable.
> >
> > KIP:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
> >
> > Discussion thread:
> >
> >
> https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E
> >
> > Thanks,
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



[VOTE] KIP-764 Configurable backlog size for creating Acceptor

2021-08-01 Thread Haruki Okada
Hi, Kafka.

I would like to start a vote on KIP that makes SocketServer acceptor's
backlog size configurable.

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor

Discussion thread:
https://lists.apache.org/thread.html/rd77469b7de0190d601dd37bd6894e1352a674d08038bcfe7ff68a1e0%40%3Cdev.kafka.apache.org%3E

Thanks,

-- 

Okada Haruki
ocadar...@gmail.com



Re: [DISCUSS] KIP-764 Configurable backlog size for creating Acceptor

2021-07-28 Thread Haruki Okada
Hi, Kafka.

Does anyone have any thoughts or suggestions about the KIP?
If there seems to be no, I would like to start a VOTE later.


Thanks,


2021年7月22日(木) 16:17 Haruki Okada :

> Hi, Kafka.
>
> I proposed KIP-764, which tries to add new KafkaConfig to adjust
> Acceptor's backlog size.
> As described in the KIP and the ticket KAFKA-9648, currently backlog size
> is fixed value (50) and it may not be enough to handle incoming connections
> from massive clients.
>
> So we would like to make it configurable.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor
>
> --
> 
> Okada Haruki
> ocadar...@gmail.com
> 
>


-- 

Okada Haruki
ocadar...@gmail.com



[DISCUSS] KIP-764 Configurable backlog size for creating Acceptor

2021-07-22 Thread Haruki Okada
Hi, Kafka.

I proposed KIP-764, which tries to add new KafkaConfig to adjust Acceptor's
backlog size.
As described in the KIP and the ticket KAFKA-9648, currently backlog size
is fixed value (50) and it may not be enough to handle incoming connections
from massive clients.

So we would like to make it configurable.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-764%3A+Configurable+backlog+size+for+creating+Acceptor

-- 

Okada Haruki
ocadar...@gmail.com



Re: Request permission for KIP creation

2021-07-21 Thread Haruki Okada
Hi Bill.

Thank you for your quick response!

2021年7月22日(木) 7:40 Bill Bejeck :

> Hi Haruki,
>
> You're all set with the wiki permissions now.
>
> Regards,
> Bill
>
> On Wed, Jul 21, 2021 at 1:43 PM Haruki Okada  wrote:
>
> > Hi, Kafka.
> >
> > I would like to request permission for KIP creation under
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > to propose a config-addition related to
> > https://issues.apache.org/jira/browse/KAFKA-9648 .
> >
> > JIRA id: ocadaruma
> > email: ocadar...@gmail.com
> >
> >
> > Thanks,
> >
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Request permission for KIP creation

2021-07-21 Thread Haruki Okada
Hi, Kafka.

I would like to request permission for KIP creation under
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
to propose a config-addition related to
https://issues.apache.org/jira/browse/KAFKA-9648 .

JIRA id: ocadaruma
email: ocadar...@gmail.com


Thanks,

-- 

Okada Haruki
ocadar...@gmail.com



Re: Splitting partition may cause message loss for consumers

2021-02-18 Thread Haruki Okada
Thanks for your quick response.
Yeah, agree with that. (also replied on the issue)

2021年2月19日(金) 16:25 Luke Chen :

> Hi Okada san,
> Yes, I agree the "latest" setting in this situation is not good, and we
> should document it.
> But I don't think we should change the default auto.offset.reset setting to
> the earliest.
> The auto.offset.reset setting starts before kafka V1.0, which means, there
> are already a lot of users using it and get used to it now.
>
> Thanks.
> Luke
>
> On Fri, Feb 19, 2021 at 2:24 PM Haruki Okada  wrote:
>
> > Hi, Kafka.
> >
> > Recently I noticed that splitting partition may cause message delivery
> loss
> > for consumers with auto.offset.reset=latest.
> >
> > I described details in https://issues.apache.org/jira/browse/KAFKA-12261
> .
> >
> > Since delivery loss is undesired in most cases, I think this should be
> > described in ConsumerConfig.AUTO_OFFSET_RESET_DOC (or somewhere) at
> least.
> >
> > What do you think?
> >
> >
> > Thanks,
> > --
> > 
> > Okada Haruki
> > ocadar...@gmail.com
> > 
> >
>


-- 

Okada Haruki
ocadar...@gmail.com



Splitting partition may cause message loss for consumers

2021-02-18 Thread Haruki Okada
Hi, Kafka.

Recently I noticed that splitting partition may cause message delivery loss
for consumers with auto.offset.reset=latest.

I described details in https://issues.apache.org/jira/browse/KAFKA-12261 .

Since delivery loss is undesired in most cases, I think this should be
described in ConsumerConfig.AUTO_OFFSET_RESET_DOC (or somewhere) at least.

What do you think?


Thanks,
-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-12261) auto.offset.reset should be earliest by default

2021-01-31 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-12261:


 Summary: auto.offset.reset should be earliest by default
 Key: KAFKA-12261
 URL: https://issues.apache.org/jira/browse/KAFKA-12261
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.7.0
Reporter: Haruki Okada


As of now, auto.offset.reset of ConsumerConfig is "latest" by default.

 

This could be a pitfall that causes message delivery loss when we split topic's 
partitions like below:

Say we have a topic-X which have only 1 partition.
 # split topic-X to 2 partitions by kafka-topics.sh --alter --topic topic-X 
--partitions 2 (topic-X-1 is added)
 # producer knows that new partitions are added by refreshing metadata. starts 
to produce to topic-X-1
 # bit later, consumer knows that new partitions are added and triggering 
consumer rebalance, then starts consuming topic-X-1

 ** upon starting consumption, it resets its offset to log-end-offset

If the producer sent several records before 3, they could be not-delivered to 
the consumer.

 

This behavior isn't preferable in most cases, so auto.offset.reset should be 
set to "earliest" by default to avoid this pitfall.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Improving crash-restart catchup speed

2020-12-15 Thread Haruki Okada
Hi.
One possible solution I can imagine is to use replication throttle.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-73+Replication+Quotas

You can set lower replication quota for other brokers -> failed broker
traffic during catching-up to prevent exhausting storage throughput.

2020年12月15日(火) 20:56 Gokul Ramanan Subramanian :

> Hi.
>
>
> When a broker crashes and restarts after a long time (say 30 minutes),
> the broker has to do some work to sync all its replicas with the
> leaders. If the broker is a preferred replica for some partition, the
> broker may become the leader as part of a preferred replica leader
> election, while it is still catching up on some other partitions.
>
>
> This scenario can lead to a high incoming throughput on the broker
> during the catch up phase and cause back pressure with certain storage
> volumes (which have a fixed max throughput). This backpressure can
> slow down recovery time, and manifest in the form of client
> application errors in producing / consuming data on / from the
> recovering broker.
>
>
> I am looking for solutions to mitigate this problem. There are two
> solutions that I am aware of.
>
>
> 1. Scale out the cluster to have more brokers, so that the replication
> traffic is smaller per broker during recovery.
>
>
> 2. Keep preferred replica leader elections disabled and manually run
> one instance of preferred replica leader election after the broker has
> come back up and fully caught up.
>
>
> Are there other solutions?
>


-- 

Okada Haruki
ocadar...@gmail.com



[jira] [Created] (KAFKA-10690) Produce-response delay caused by lagging replica fetch which blocks in-sync one

2020-11-05 Thread Haruki Okada (Jira)
Haruki Okada created KAFKA-10690:


 Summary: Produce-response delay caused by lagging replica fetch 
which blocks in-sync one
 Key: KAFKA-10690
 URL: https://issues.apache.org/jira/browse/KAFKA-10690
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 2.4.1
Reporter: Haruki Okada
 Attachments: image-2020-11-06-11-15-21-781.png, 
image-2020-11-06-11-15-38-390.png, image-2020-11-06-11-17-09-910.png

h2. Our environment
 * Kafka version: 2.4.1

h2. Phenomenon
 * Produce response time 99th (remote scope) degrades to 500ms, which is 20 
times worse than usual
 ** Meanwhile, the cluster was running replica reassignment to service-in new 
machine to recover replicas which held by failed (Hardware issue) broker machine

!image-2020-11-06-11-15-21-781.png|width=292,height=166!
h2. Analysis

Let's say
 * broker-X: The broker we observed produce latency degradation
 * broker-Y: The broker under servicing-in

broker-Y was catching up replicas of partitions:
 * partition-A: has relatively small log size
 * partition-B: has large log size

(actually, broker-Y was catching-up many other partitions. I noted only two 
partitions here to make explanation simple)

broker-X was the leader for both partition-A and partition-B.

We found that both partition-A and partition-B are assigned to same 
ReplicaFetcherThread of broker-Y, and produce latency started to degrade right 
after broker-Y finished catching up partition-A.

!image-2020-11-06-11-17-09-910.png|width=476,height=174!

Besides, we observed disk reads on broker-X during service-in. (This is natural 
since old segments are likely not in page cache)

!image-2020-11-06-11-15-38-390.png|width=292,height=193!

So we suspected that:
 * In-sync replica fetch (partition-A) was involved by lagging replica fetch 
(partition-B), which should be slow because it causes actual disk reads
 ** Since ReplicaFetcherThread sends fetch requests in blocking manner, next 
fetch request can't be sent until one fetch request completes
 ** => Causes in-sync replica fetch for partitions assigned to same replica 
fetcher thread to delay
 ** => Causes remote scope produce latency degradation

h2. Possible fix

We think this issue can be addressed by designating part of 
ReplicaFetcherThread (or creating another thread pool) for lagging replica 
catching-up, but not so sure this is the appropriate way.

Please give your opinions about this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Can't find any sendfile system call trace from Kafka process?

2020-09-02 Thread Haruki Okada
There are two cases that zero-copy fetch thanks to sendfile don't work.

- SSL encryption is enabled
  * Need to encrypt on Kafka process before sending to client
  -
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/network/SslTransportLayer.java#L946
  * Unlike plaintext transport layer which directly write to socket:
  -
https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/network/PlaintextTransportLayer.java#L214
- Message down conversion happens
  * refs:
https://kafka.apache.org/documentation/#upgrade_10_performance_impact

Does your environment match above cases?

2020年9月1日(火) 9:05 Ming Liu :

> Hi Kafka dev community,
>  As we know, one major reason that Kafka is fast is because it is using
> sendfile() for zero copy, as what it described at
> https://kafka.apache.org/documentation/#producerconfigs,
>
>
>
> *This combination of pagecache and sendfile means that on a Kafka cluster
> where the consumers are mostly caught up you will see no read activity on
> the disks whatsoever as they will be serving data entirely from cache.*
>
> However, when I ran tracing on all my kafka brokers, I didn't get a
> single sendfile system call, why is this? Does it eventually translate to
> plain read/write syscalls?
>
> sudo ./syscount -p 126806 -d 30
> Tracing syscalls, printing top 10... Ctrl+C to quit.
> [17:44:10]
> SYSCALL  COUNT
> epoll_wait108482
> write  107165
> epoll_ctl 95058
> futex   86716
> read   86388
> pread   26910
> fstat   9213
> getrusage  120
> close27
> open 21
>


-- 

Okada Haruki
ocadar...@gmail.com