date:20231019

Re: [PR] KAFKA-14595 Move ReassignPartitionsCommand to java [kafka]

2023-10-19 Thread via GitHub



nizhikov commented on PR #13247:
URL: https://github.com/apache/kafka/pull/13247#issuecomment-1772180903

   CI looks OK


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Created] (KAFKA-15658) Zookeeper 3.6.3 jar | CVE-2023-44981

2023-10-19 Thread masood (Jira)

masood created KAFKA-15658:
--

 Summary: Zookeeper 3.6.3 jar | CVE-2023-44981 
 Key: KAFKA-15658
 URL: https://issues.apache.org/jira/browse/KAFKA-15658
 Project: Kafka
  Issue Type: Bug
Reporter: masood


The [CVE-2023-44981|https://www.mend.io/vulnerability-database/CVE-2023-44981]  
vulnerability has been reported in the zookeeper.jar. 

It's worth noting that the latest version of Kafka has a dependency on version 
3.8.2 of Zookeeper, which is also impacted by this vulnerability. 

[https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper/3.8.2|https://mvnrepository.com/artifact/org.apache.zookeeper/zookeeper/3.8.2.]

could you please verify its impact on the Kafka.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-15591) Trogdor produce workload reports errors in KRaft mode

2023-10-19 Thread Xi Yang (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1578#comment-1578
 ] 

Xi Yang edited comment on KAFKA-15591 at 10/20/23 6:01 AM:
---

I print out the topic description after creating the topic. It looks like the 
partitions are correctly elected before Trogdor starts producing messages. 
However, the producer still reports the NOT_LEADER_OR_FOLLOWER error.

Topic desc:(name=foo1, internal=false, partitions=(partition=0, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=1, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=2, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=3, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=4, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=5, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=6, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=7, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=8, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=9, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)), authorizedOperations=null)
Create topics:[foo1-9, foo1-8, foo1-7, foo1-6, foo1-5, foo1-4, foo1-3, foo1-2, 
foo1-1, foo1-0]
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-7, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-7 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-6, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-6 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-5, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-5 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-4, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-4 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broke

[jira] [Commented] (KAFKA-15591) Trogdor produce workload reports errors in KRaft mode

2023-10-19 Thread Xi Yang (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1578#comment-1578
 ] 

Xi Yang commented on KAFKA-15591:
-

I print out the topic description after creating the topic. It looks like the 
partitions are correctly elected before Trogdor starts producing messages. 
However, the producer still reports the NOT_LEADER_OR_FOLLOWER error.

```Topic desc:(name=foo1, internal=false, partitions=(partition=0, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=1, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=2, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=3, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=4, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=5, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=6, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=7, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=8, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)),(partition=9, 
leader=localhost:9092 (id: 1 rack: null), replicas=localhost:9092 (id: 1 rack: 
null), isr=localhost:9092 (id: 1 rack: null)), authorizedOperations=null)
Create topics:[foo1-9, foo1-8, foo1-7, foo1-6, foo1-5, foo1-4, foo1-3, foo1-2, 
foo1-1, foo1-0]
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-7, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-7 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-6, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-6 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-5, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-5 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests intended for any replica, this error indicates 
that the broker is not a replica of the topic partition.. Going to request 
metadata update now (org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Got error produce 
response with correlation id 4 on topic-partition foo1-4, retrying (2147483646 
attempts left). Error: NOT_LEADER_OR_FOLLOWER 
(org.apache.kafka.clients.producer.internals.Sender)
[2023-10-20 05:43:42,843] WARN [Producer clientId=producer-1] Received invalid 
metadata error in produce request on partition foo1-4 due to 
org.apache.kafka.common.errors.NotLeaderOrFollowerException: For requests 
intended only for the leader, this error indicates that the broker is not the 
current leader. For requests inten

Re: [PR] KAFKA-15632: Drop the invalid remote log metadata events [kafka]

2023-10-19 Thread via GitHub



kamalcph commented on code in PR #14576:
URL: https://github.com/apache/kafka/pull/14576#discussion_r1364317572


##
storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataCache.java:
##
@@ -302,22 +307,26 @@ public void 
addCopyInProgressSegment(RemoteLogSegmentMetadata remoteLogSegmentMe
 
 RemoteLogSegmentId remoteLogSegmentId = 
remoteLogSegmentMetadata.remoteLogSegmentId();
 RemoteLogSegmentMetadata existingMetadata = 
idToSegmentMetadata.get(remoteLogSegmentId);
-checkStateTransition(existingMetadata != null ? 
existingMetadata.state() : null,
-remoteLogSegmentMetadata.state());
-
+boolean isValid = checkStateTransition(existingMetadata != null ? 
existingMetadata.state() : null,
+remoteLogSegmentMetadata.state(), 
remoteLogSegmentMetadata.remoteLogSegmentId());
+if (!isValid) {
+return;
+}
 for (Integer epoch : 
remoteLogSegmentMetadata.segmentLeaderEpochs().keySet()) {
 leaderEpochEntries.computeIfAbsent(epoch, leaderEpoch -> new 
RemoteLogLeaderEpochState())
 
.handleSegmentWithCopySegmentStartedState(remoteLogSegmentId);
 }
-
 idToSegmentMetadata.put(remoteLogSegmentId, remoteLogSegmentMetadata);
 }
 
-private void checkStateTransition(RemoteLogSegmentState existingState, 
RemoteLogSegmentState targetState) {
-if (!RemoteLogSegmentState.isValidTransition(existingState, 
targetState)) {
-throw new IllegalStateException(
-"Current state: " + existingState + " can not be 
transitioned to target state: " + targetState);
+private boolean checkStateTransition(RemoteLogSegmentState existingState,
+ RemoteLogSegmentState targetState,
+ RemoteLogSegmentId segmentId) {
+boolean isValid = 
RemoteLogSegmentState.isValidTransition(existingState, targetState);
+if (!isValid) {
+log.error("Current state: {} can not be transitioned to target 
state: {}, segmentId: {}. Dropping the event",

Review Comment:
   Logging the error instead of throwing the exception as it will stop the 
internal consumer which consumes from the remote log metadata topic.
   
   To clarify, producer `enable.idempotence` is set to true by default from 
v3.2. In our internal cluster, producer idempotence was not enabled and we have 
seen the out-of-order messages in the internal topic. Once this issue happens, 
the internal consumer stops processing the message, then fails to upload the 
pending segments to remote storage. This issue is not recoverable even after 
broker restarts.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Ismael Juma (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1567#comment-1567
 ] 

Ismael Juma commented on KAFKA-15657:
-

I was wondering the same. We should fix KAFKA-15653 and see if it's the source 
of the issues you have been seeing. I am not aware of any other change that 
would result in that sort of problem.

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a separate issue for), and INVALID_TXN_STATE and 
> UNKNOWN_SERVER_ERROR.
> INVALID_TXN_STATE is being returned even though the partitions have been 
> added to the transaction (AddPartitionsToTxn). Nothing about the code has 
> changed between 3.5 and 3.6, and I have loop-integration-tested this code 
> against 3.5 thousands of times. 3.6 is newly - and always - returning 
> INVALID_TXN_STATE. If I change the code to retry on INVALID_TXN_STATE, I 
> eventually quickly (always) receive UNKNOWN_SERVER_ERROR. In looking at the 
> broker logs, the broker indicates that sequence numbers are out of order - 
> but (a) I am repeating requests that were in order (so something on the 
> broker got a little haywire maybe? or maybe this is due to me ignoring 
> invalid_txn_state?), _and_ I am not receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I 
> am receiving UNKNOWN_SERVER_ERROR.
> I think the main problem is the client unexpectedly receiving 
> INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
> USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Travis Bischel (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1565#comment-1565
 ] 

Travis Bischel commented on KAFKA-15657:


I'm beginning to suspect that KAFKA-15653 may eventually lead to this, I never 
experience this bug without first experiencing the NPEs while appending. I'll 
wait until 15653 is addressed and loop-test seeing if this still occurs.

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a separate issue for), and INVALID_TXN_STATE and 
> UNKNOWN_SERVER_ERROR.
> INVALID_TXN_STATE is being returned even though the partitions have been 
> added to the transaction (AddPartitionsToTxn). Nothing about the code has 
> changed between 3.5 and 3.6, and I have loop-integration-tested this code 
> against 3.5 thousands of times. 3.6 is newly - and always - returning 
> INVALID_TXN_STATE. If I change the code to retry on INVALID_TXN_STATE, I 
> eventually quickly (always) receive UNKNOWN_SERVER_ERROR. In looking at the 
> broker logs, the broker indicates that sequence numbers are out of order - 
> but (a) I am repeating requests that were in order (so something on the 
> broker got a little haywire maybe? or maybe this is due to me ignoring 
> invalid_txn_state?), _and_ I am not receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I 
> am receiving UNKNOWN_SERVER_ERROR.
> I think the main problem is the client unexpectedly receiving 
> INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
> USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-15629) proposal to introduce IQv2 Query Types: TimestampedKeyQuery and TimestampedRangeQuery

2023-10-19 Thread Hanyu Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-15629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanyu Zheng updated KAFKA-15629:

Fix Version/s: 3.7.0

> proposal to introduce IQv2 Query Types: TimestampedKeyQuery and 
> TimestampedRangeQuery
> -
>
> Key: KAFKA-15629
> URL: https://issues.apache.org/jira/browse/KAFKA-15629
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Hanyu Zheng
>Assignee: Hanyu Zheng
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> KIP-992: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-992%3A+Proposal+to+introduce+IQv2+Query+Types%3A+TimestampedKeyQuery+and+TimestampedRangeQuery
> In the current IQv2 code, there are noticeable differences when interfacing 
> with plain-kv-store and ts-kv-store. Notably, the return type V acts as a 
> simple value for plain-kv-store but evolves into ValueAndTimestamp for 
> ts-kv-store, which presents type safety issues in the API.
> Even if IQv2 hasn't gained widespread adoption, an immediate fix might bring 
> compatibility concerns.
> This brings us to the essence of our proposal: the introduction of distinct 
> query types. One that returns a plain value, another for values accompanied 
> by timestamps.
> While querying a ts-kv-store for a plain value and then extracting it is 
> feasible, it doesn't make sense to query a plain-kv-store for a 
> ValueAndTimestamp.
> Our vision is for plain-kv-store to always return V, while ts-kv-store should 
> return ValueAndTimestamp.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-15527) Add reverseRange and reverseAll query over kv-store in IQv2

2023-10-19 Thread Hanyu Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-15527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hanyu Zheng updated KAFKA-15527:

Fix Version/s: 3.7.0

> Add reverseRange and reverseAll query over kv-store in IQv2
> ---
>
> Key: KAFKA-15527
> URL: https://issues.apache.org/jira/browse/KAFKA-15527
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Hanyu Zheng
>Assignee: Hanyu Zheng
>Priority: Major
>  Labels: kip
> Fix For: 3.7.0
>
>
> Add reverseRange and reverseAll query over kv-store in IQv2
> Update an implementation of the Query interface, introduced in [KIP-796: 
> Interactive Query 
> v2|https://cwiki.apache.org/confluence/display/KAFKA/KIP-796%3A+Interactive+Query+v2]
>  , to support reverseRange and reverseAll.
> Use bounded query to achieve reverseRange and use unbounded query to achieve 
> reverseAll.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] KAFKA-14808: fix leaderless partition issue when controller removes u… [kafka]

2023-10-19 Thread via GitHub



github-actions[bot] commented on PR #13451:
URL: https://github.com/apache/kafka/pull/13451#issuecomment-1772023180

   This PR is being marked as stale since it has not had any activity in 90 
days. If you would like to keep this PR alive, please ask a committer for 
review. If the PR has  merge conflicts, please update it with the latest from 
trunk (or appropriate release branch)  If this PR is no longer valid or 
desired, please feel free to close it. If no activity occurs in the next 30 
days, it will be automatically closed.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KAFKA-7699) Improve wall-clock time punctuations

2023-10-19 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-7699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1550#comment-1550
 ] 

Matthias J. Sax commented on KAFKA-7699:


Happy to support you. The KIP wiki page describes how it works: 
[https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals] 
– if you have any questions about it, happy to answer them.

> Improve wall-clock time punctuations
> 
>
> Key: KAFKA-7699
> URL: https://issues.apache.org/jira/browse/KAFKA-7699
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Matthias J. Sax
>Priority: Major
>  Labels: needs-kip
>
> Currently, wall-clock time punctuation allow to schedule periodic call backs 
> based on wall-clock time progress. The punctuation time starts, when the 
> punctuation is scheduled, thus, it's non-deterministic what is desired for 
> many use cases (I want a call-back in 5 minutes from "now").
> It would be a nice improvement, to allow users to "anchor" wall-clock 
> punctation, too, similar to a cron job: Thus, a punctuation would be 
> triggered at "fixed" times like the beginning of the next hour, independent 
> when the punctuation was registered.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15653) NPE in ChunkedByteStream

2023-10-19 Thread Ismael Juma (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1549#comment-1549
 ] 

Ismael Juma commented on KAFKA-15653:
-

cc [~divijvaidya] 

> NPE in ChunkedByteStream
> 
>
> Key: KAFKA-15653
> URL: https://issues.apache.org/jira/browse/KAFKA-15653
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
> Environment: Docker container on a Linux laptop, using the latest 
> release.
>Reporter: Travis Bischel
>Priority: Major
>
> When looping franz-go integration tests, I received an UNKNOWN_SERVER_ERROR 
> from producing. The broker logs for the failing request:
>  
> {noformat}
> [2023-10-19 22:29:58,160] ERROR [ReplicaManager broker=2] Error processing 
> append operation on partition 
> 2fa8995d8002fbfe68a96d783f26aa2c5efc15368bf44ed8f2ab7e24b41b9879-24 
> (kafka.server.ReplicaManager)
> java.lang.NullPointerException
>   at 
> org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
>   at 
> org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
>   at kafka.log.UnifiedLog.append(UnifiedLog.scala:805)
>   at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
>   at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>   at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
>   at scala.collection.mutable.HashMap.map(HashMap.scala:35)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
>   at kafka.server.ReplicaManager.appendEntries$1(ReplicaManager.scala:754)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18(ReplicaManager.scala:874)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18$adapted(ReplicaManager.scala:874)
>   at 
> kafka.server.KafkaRequestHandler$.$anonfun$wrap$3(KafkaRequestHandler.scala:73)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:130)
>   at java.base/java.lang.Thread.run(Unknown Source)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (KAFKA-15653) NPE in ChunkedByteStream

2023-10-19 Thread Travis Bischel (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1542#comment-1542
 ] 

Travis Bischel edited comment on KAFKA-15653 at 10/20/23 2:55 AM:
--

{noformat}
[2023-10-20 02:31:00,204] ERROR [ReplicaManager broker=1] Error processing 
append operation on partition 
2c69b88eab8670ef1fd0e55b81b9e000995386afd8756ea342494d36911e6f01-29 
(kafka.server.ReplicaManager)
java.lang.NullPointerException: Cannot invoke "java.nio.ByteBuffer.hasArray()" 
because "this.intermediateBufRef" is null 
at 
org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
at 
org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
at 
org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
at 
org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
at 
org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
at kafka.log.UnifiedLog.$anonfun$append$2(UnifiedLog.scala:805)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:1845)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:754)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:686)
at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:149)
at java.base/java.lang.Thread.run(Thread.java:833)
{noformat}


was (Author: twmb):
Not just :
{noformat}
[2023-10-20 02:31:00,204] ERROR [ReplicaManager broker=1] Error processing 
append operation on partition 
2c69b88eab8670ef1fd0e55b81b9e000995386afd8756ea342494d36911e6f01-29 
(kafka.server.ReplicaManager)
java.lang.NullPointerException: Cannot invoke "java.nio.ByteBuffer.hasArray()" 
because "this.intermediateBufRef" is null 
at 
org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
at 
org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
at 
org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
at 
org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
at 
org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
at kafka.log.UnifiedLog.$anonfun$append$2(UnifiedLog.scala:805)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:1845)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLi

Re: [PR] KAFKA-15607:Possible NPE is thrown in MirrorCheckpointTask [kafka]

2023-10-19 Thread via GitHub



hudeqi commented on code in PR #14587:
URL: https://github.com/apache/kafka/pull/14587#discussion_r1366378637


##
connect/mirror/src/test/java/org/apache/kafka/connect/mirror/MirrorCheckpointTaskTest.java:
##
@@ -169,6 +169,33 @@ public void testSyncOffset() {
 "Consumer 2 " + topic2 + " failed");
 }
 
+@Test
+public void testSyncOffsetForTargetGroupWithNullOffsetAndMetadata() {
+Map> 
idleConsumerGroupsOffset = new HashMap<>();
+Map> 
checkpointsPerConsumerGroup = new HashMap<>();
+
+String consumer = "consumer";
+String topic = "topic";
+Map ct = new HashMap<>();
+TopicPartition tp = new TopicPartition(topic, 0);
+// Simulate other clients such as sarama to reset the group offset of 
the target cluster to -1. At this time,
+// the obtained `OffsetAndMetadata` of the target cluster is null.

Review Comment:
   committed.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15607:Possible NPE is thrown in MirrorCheckpointTask [kafka]

2023-10-19 Thread via GitHub

hudeqi commented on PR #14587:
URL: https://github.com/apache/kafka/pull/14587#issuecomment-1771985878

> Thanks @hudeqi, I think this is reasonable. Do you know why Sarama sets
offsets to -1? If it's for normal operations and not indicative of something
wrong, we may not even need to log a warning message in that case and could
change the check
[here](https://github.com/apache/kafka/blob/af747fbfed7e81617c3b3ad0e4dc8c857aa9502b/connect/mirror/src/main/java/org/apache/kafka/connect/mirror/MirrorCheckpointTask.java#L325)
from `!targetConsumerOffset.containsKey(topicPartition)` to
`targetConsumerOffset.get(topicPartition) == null`.

Thanks your review. @C0urante . In fact, directly resetting to -1 is an
abnormal operation, whether for Sarama or other clients. This problem was
discovered in this way: when using the Sarama client, we wanted to reset the
group's offset to the latest, so we passed in the `OffsetNewest` in Sarama as a
parameter to call the reset offset method. Finally, it was discovered that the
offset was reset to -1. The reason is that the value of `OffsetNewest` is -1.
For Sarama, resetting to the latest should be another operation process, but
this kind of misoperation is not intercepted like the java client, which can be
deal friendly. So this issue occurred when encountering scenarios like MM2. So
I think it is better to add a warn log here.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Comment Edited] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Travis Bischel (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1537#comment-1537
 ] 

Travis Bischel edited comment on KAFKA-15657 at 10/20/23 2:35 AM:
--

re: first comment – the client doesn't advance to producing unless 
AddPartitionsToTxn succeeds. If the request partially succeeds, failed 
partitions are stripped and only successfully added partitions are produced. 
The logic is definitely hard to follow if you're not familiar with the code, 
but here's issuing/stripping: 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L442-L498,]
 and here's where the request is issued (in the same function as producing – 
before the produce request is issued): 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L316-L357]

Also wrt race condition – these tests also pass against the redpanda binary, 
which has always had KIP-890 semantics / has never allowed transactional 
produce requests unless the partition has been added to the transaction (in 
fact this is part of how I caught some early redpanda bugs with _that_ 
implementation).

 

re: second comment, I'll capture some debug logs so you can see both the client 
logs and the container. The tests currently are using v3. I'm currently running 
this in a loop:

```

docker compose down; sleep 1; docker compose up -d ; sleep 5 ; while go test 
-run Txn/cooperative > logs; do echo whoo; docker compose down; sleep 1; docker 
compose up -d. sleep 5; done

```

Once this fails, I'll upload the logs. This is currently ignoring 
INVALID_RECORD, which I more regularly run into. I may remove gating this to 
just the cooperative test and instead run it against all balancers at once (it 
seems heavier load runs into the problem more frequently).

 

Also this does remind me though, somebody had a feature request that 
deliberately abused the ability to produce before AddPartitionsToTxn was done, 
I need to remove support of this for 3.6+. This _is_ exercised in franz-go's CI 
right now and will fail CI for 3.6+ (see the doc comment on 
[EndBeginTxnUnsafe|https://pkg.go.dev/github.com/twmb/franz-go/pkg/kgo#EndBeginTxnHow]).

Edit: KAFKA-15653 may be complicating the investigation here, too.


was (Author: twmb):
re: first comment – the client doesn't advance to producing unless 
AddPartitionsToTxn succeeds. If the request partially succeeds, failed 
partitions are stripped and only successfully added partitions are produced. 
The logic is definitely hard to follow if you're not familiar with the code, 
but here's issuing/stripping: 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L442-L498,]
 and here's where the request is issued (in the same function as producing – 
before the produce request is issued): 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L316-L357]

Also wrt race condition – these tests also pass against the redpanda binary, 
which has always had KIP-890 semantics / has never allowed transactional 
produce requests unless the partition has been added to the transaction (in 
fact this is part of how I caught some early redpanda bugs with _that_ 
implementation).

 

re: second comment, I'll capture some debug logs so you can see both the client 
logs and the container. The tests currently are using v3. I'm currently running 
this in a loop:

```

docker compose down; sleep 1; docker compose up -d ; sleep 5 ; while go test 
-run Txn/cooperative > logs; do echo whoo; docker compose down; sleep 1; docker 
compose up -d. sleep 5; done

```

Once this fails, I'll upload the logs. This is currently ignoring 
INVALID_RECORD, which I more regularly run into. I may remove gating this to 
just the cooperative test and instead run it against all balancers at once (it 
seems heavier load runs into the problem more frequently).

 

Also this does remind me though, somebody had a feature request that 
deliberately abused the ability to produce before AddPartitionsToTxn was done, 
I need to remove support of this for 3.6+. This _is_ exercised in franz-go's CI 
right now and will fail CI for 3.6+ (see the doc comment on 
[EndBeginTxnUnsafe|https://pkg.go.dev/github.com/twmb/franz-go/pkg/kgo#EndBeginTxnHow]).

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a se

[jira] [Commented] (KAFKA-15653) NPE in ChunkedByteStream

2023-10-19 Thread Travis Bischel (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1542#comment-1542
 ] 

Travis Bischel commented on KAFKA-15653:


Not just :
{noformat}
[2023-10-20 02:31:00,204] ERROR [ReplicaManager broker=1] Error processing 
append operation on partition 
2c69b88eab8670ef1fd0e55b81b9e000995386afd8756ea342494d36911e6f01-29 
(kafka.server.ReplicaManager)
java.lang.NullPointerException: Cannot invoke "java.nio.ByteBuffer.hasArray()" 
because "this.intermediateBufRef" is null 
at 
org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
at 
org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
at 
org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
at 
org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
at 
org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
at 
org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
at kafka.log.UnifiedLog.$anonfun$append$2(UnifiedLog.scala:805)
at kafka.log.UnifiedLog.append(UnifiedLog.scala:1845)
at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
at 
kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
at 
kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
at 
scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
at 
scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:400)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at 
scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:728)
at scala.collection.TraversableLike.map(TraversableLike.scala:286)
at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
at scala.collection.AbstractTraversable.map(Traversable.scala:108)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:754)
at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:686)
at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:149)
at java.base/java.lang.Thread.run(Thread.java:833)
{noformat}

> NPE in ChunkedByteStream
> 
>
> Key: KAFKA-15653
> URL: https://issues.apache.org/jira/browse/KAFKA-15653
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
> Environment: Docker container on a Linux laptop, using the latest 
> release.
>Reporter: Travis Bischel
>Priority: Major
>
> When looping franz-go integration tests, I received an UNKNOWN_SERVER_ERROR 
> from producing. The broker logs for the failing request:
>  
> {noformat}
> [2023-10-19 22:29:58,160] ERROR [ReplicaManager broker=2] Error processing 
> append operation on partition 
> 2fa8995d8002fbfe68a96d783f26aa2c5efc15368bf44ed8f2ab7e24b41b9879-24 
> (kafka.server.ReplicaManager)
> java.lang.NullPointerException
>   at 
> org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
>   at 
> org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
>   at kafka.log.UnifiedLog.append(UnifiedLog.scala:805)
>   at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
>   at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>   at 
> sc

[jira] [Updated] (KAFKA-15653) NPE in ChunkedByteStream

2023-10-19 Thread Travis Bischel (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-15653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Travis Bischel updated KAFKA-15653:
---
Summary: NPE in ChunkedByteStream  (was: NPE in ChunkedByteStream.)

> NPE in ChunkedByteStream
> 
>
> Key: KAFKA-15653
> URL: https://issues.apache.org/jira/browse/KAFKA-15653
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
> Environment: Docker container on a Linux laptop, using the latest 
> release.
>Reporter: Travis Bischel
>Priority: Major
>
> When looping franz-go integration tests, I received an UNKNOWN_SERVER_ERROR 
> from producing. The broker logs for the failing request:
>  
> {noformat}
> [2023-10-19 22:29:58,160] ERROR [ReplicaManager broker=2] Error processing 
> append operation on partition 
> 2fa8995d8002fbfe68a96d783f26aa2c5efc15368bf44ed8f2ab7e24b41b9879-24 
> (kafka.server.ReplicaManager)
> java.lang.NullPointerException
>   at 
> org.apache.kafka.common.utils.ChunkedBytesStream.(ChunkedBytesStream.java:89)
>   at 
> org.apache.kafka.common.record.CompressionType$3.wrapForInput(CompressionType.java:105)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.recordInputStream(DefaultRecordBatch.java:273)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.compressedIterator(DefaultRecordBatch.java:277)
>   at 
> org.apache.kafka.common.record.DefaultRecordBatch.skipKeyValueIterator(DefaultRecordBatch.java:352)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsetsCompressed(LogValidator.java:358)
>   at 
> org.apache.kafka.storage.internals.log.LogValidator.validateMessagesAndAssignOffsets(LogValidator.java:165)
>   at kafka.log.UnifiedLog.append(UnifiedLog.scala:805)
>   at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:719)
>   at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1313)
>   at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1301)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:1210)
>   at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>   at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
>   at scala.collection.mutable.HashMap.map(HashMap.scala:35)
>   at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:1198)
>   at kafka.server.ReplicaManager.appendEntries$1(ReplicaManager.scala:754)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18(ReplicaManager.scala:874)
>   at 
> kafka.server.ReplicaManager.$anonfun$appendRecords$18$adapted(ReplicaManager.scala:874)
>   at 
> kafka.server.KafkaRequestHandler$.$anonfun$wrap$3(KafkaRequestHandler.scala:73)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:130)
>   at java.base/java.lang.Thread.run(Unknown Source)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] KAFKA-15346: add support for single-Key_single-timestamp IQs with versioned state stores (KIP-960) [kafka]

2023-10-19 Thread via GitHub



aliehsaeedii opened a new pull request, #14596:
URL: https://github.com/apache/kafka/pull/14596

   This PR implements KIP-960.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Travis Bischel (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1537#comment-1537
 ] 

Travis Bischel commented on KAFKA-15657:


re: first comment – the client doesn't advance to producing unless 
AddPartitionsToTxn succeeds. If the request partially succeeds, failed 
partitions are stripped and only successfully added partitions are produced. 
The logic is definitely hard to follow if you're not familiar with the code, 
but here's issuing/stripping: 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L442-L498,]
 and here's where the request is issued (in the same function as producing – 
before the produce request is issued): 
[here|https://github.com/twmb/franz-go/blob/ae169a1f35c2ee6b130c4e520632b33e6c491e0b/pkg/kgo/sink.go#L316-L357]

Also wrt race condition – these tests also pass against the redpanda binary, 
which has always had KIP-890 semantics / has never allowed transactional 
produce requests unless the partition has been added to the transaction (in 
fact this is part of how I caught some early redpanda bugs with _that_ 
implementation).

 

re: second comment, I'll capture some debug logs so you can see both the client 
logs and the container. The tests currently are using v3. I'm currently running 
this in a loop:

```

docker compose down; sleep 1; docker compose up -d ; sleep 5 ; while go test 
-run Txn/cooperative > logs; do echo whoo; docker compose down; sleep 1; docker 
compose up -d. sleep 5; done

```

Once this fails, I'll upload the logs. This is currently ignoring 
INVALID_RECORD, which I more regularly run into. I may remove gating this to 
just the cooperative test and instead run it against all balancers at once (it 
seems heavier load runs into the problem more frequently).

 

Also this does remind me though, somebody had a feature request that 
deliberately abused the ability to produce before AddPartitionsToTxn was done, 
I need to remove support of this for 3.6+. This _is_ exercised in franz-go's CI 
right now and will fail CI for 3.6+ (see the doc comment on 
[EndBeginTxnUnsafe|https://pkg.go.dev/github.com/twmb/franz-go/pkg/kgo#EndBeginTxnHow]).

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a separate issue for), and INVALID_TXN_STATE and 
> UNKNOWN_SERVER_ERROR.
> INVALID_TXN_STATE is being returned even though the partitions have been 
> added to the transaction (AddPartitionsToTxn). Nothing about the code has 
> changed between 3.5 and 3.6, and I have loop-integration-tested this code 
> against 3.5 thousands of times. 3.6 is newly - and always - returning 
> INVALID_TXN_STATE. If I change the code to retry on INVALID_TXN_STATE, I 
> eventually quickly (always) receive UNKNOWN_SERVER_ERROR. In looking at the 
> broker logs, the broker indicates that sequence numbers are out of order - 
> but (a) I am repeating requests that were in order (so something on the 
> broker got a little haywire maybe? or maybe this is due to me ignoring 
> invalid_txn_state?), _and_ I am not receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I 
> am receiving UNKNOWN_SERVER_ERROR.
> I think the main problem is the client unexpectedly receiving 
> INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
> USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[PR] MINOR: Enable kraft test in kafka.api and kafka.network [kafka]

2023-10-19 Thread via GitHub



dengziming opened a new pull request, #14595:
URL: https://github.com/apache/kafka/pull/14595

   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15481: Fix concurrency bug in RemoteIndexCache [kafka]

2023-10-19 Thread via GitHub



iit2009060 commented on PR #14483:
URL: https://github.com/apache/kafka/pull/14483#issuecomment-1771965660

   > @iit2009060 , do you have any comments to this PR?
   @showuon No ,  I am good . Thanks @jeel2420  for addressing the review 
comments. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] MINOR:Remove unused method parameter in ConsumerGroupCommand [kafka]

2023-10-19 Thread via GitHub



showuon merged PR #14585:
URL: https://github.com/apache/kafka/pull/14585


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] MINOR:Remove unused method parameter in ConsumerGroupCommand [kafka]

2023-10-19 Thread via GitHub



showuon commented on PR #14585:
URL: https://github.com/apache/kafka/pull/14585#issuecomment-1771961258

   Failed tests are unrelated:
   ```
   Build / JDK 17 and Scala 2.13 / 
kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.testProduceConsumeWithPrefixedAcls(String).quorum=kraft
   Build / JDK 17 and Scala 2.13 / 
org.apache.kafka.streams.integration.NamedTopologyIntegrationTest.shouldAddAndRemoveNamedTopologiesBeforeStartingAndRouteQueriesToCorrectTopology()
   Build / JDK 8 and Scala 2.12 / 
org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest.testOffsetTranslationBehindReplicationFlow()
   Build / JDK 8 and Scala 2.12 / 
kafka.server.DescribeClusterRequestTest.testDescribeClusterRequestIncludingClusterAuthorizedOperations(String).quorum=kraft
   Build / JDK 8 and Scala 2.12 / 
kafka.server.DescribeClusterRequestTest.testDescribeClusterRequestIncludingClusterAuthorizedOperations(String).quorum=kraft
   Build / JDK 8 and Scala 2.12 / 
org.apache.kafka.streams.integration.ConsistencyVectorIntegrationTest.shouldHaveSamePositionBoundActiveAndStandBy
   Build / JDK 8 and Scala 2.12 / 
org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.verifyStore[cache=false,
 log=false, supplier=ROCKS_WINDOW, kind=DSL]
   Build / JDK 11 and Scala 2.13 / 
kafka.api.ConsumerBounceTest.testConsumptionWithBrokerFailures()
   Build / JDK 11 and Scala 2.13 / 
org.apache.kafka.controller.QuorumControllerTest.testTimeouts()
   Build / JDK 11 and Scala 2.13 / 
org.apache.kafka.controller.QuorumControllerTest.testEarlyControllerResults()
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15481: Fix concurrency bug in RemoteIndexCache [kafka]

2023-10-19 Thread via GitHub



showuon commented on PR #14483:
URL: https://github.com/apache/kafka/pull/14483#issuecomment-1771958677

   @iit2009060 , do you have any comments to this PR? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15566: Fix flaky tests in FetchRequestTest.scala in KRaft mode [kafka]

2023-10-19 Thread via GitHub



showuon merged PR #14573:
URL: https://github.com/apache/kafka/pull/14573


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] KAFKA-15566: Fix flaky tests in FetchRequestTest.scala in KRaft mode [kafka]

2023-10-19 Thread via GitHub



showuon commented on PR #14573:
URL: https://github.com/apache/kafka/pull/14573#issuecomment-1771954831

   Ran 3 times of CI build and no fetchRequestTest failures. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[jira] [Commented] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Justine Olshan (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1526#comment-1526
 ] 

Justine Olshan commented on KAFKA-15657:


[~twmb] Can you confirm if the AddPartitionsToTxn calls are succeeding? And 
what version they are using? I am concerned the partitions might not be added 
correctly.

 

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a separate issue for), and INVALID_TXN_STATE and 
> UNKNOWN_SERVER_ERROR.
> INVALID_TXN_STATE is being returned even though the partitions have been 
> added to the transaction (AddPartitionsToTxn). Nothing about the code has 
> changed between 3.5 and 3.6, and I have loop-integration-tested this code 
> against 3.5 thousands of times. 3.6 is newly - and always - returning 
> INVALID_TXN_STATE. If I change the code to retry on INVALID_TXN_STATE, I 
> eventually quickly (always) receive UNKNOWN_SERVER_ERROR. In looking at the 
> broker logs, the broker indicates that sequence numbers are out of order - 
> but (a) I am repeating requests that were in order (so something on the 
> broker got a little haywire maybe? or maybe this is due to me ignoring 
> invalid_txn_state?), _and_ I am not receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I 
> am receiving UNKNOWN_SERVER_ERROR.
> I think the main problem is the client unexpectedly receiving 
> INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
> USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Justine Olshan (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-15657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1524#comment-1524
 ] 

Justine Olshan commented on KAFKA-15657:


Hey Travis. INVALID_TXN_STATE likely indicates there was a race condition or a 
bug in the client. In this case, the transaction should abort. This is part of 
the work of KIP-890. 

I wonder if there is a bug in the client that caused hanging (or late messages 
getting through) before and it is just being caught now.

If you want to disable transaction verification, you can by setting 
transaction.partition.verification.enable to false in your server config files.

> Unexpected errors when producing transactionally in 3.6
> ---
>
> Key: KAFKA-15657
> URL: https://issues.apache.org/jira/browse/KAFKA-15657
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 3.6.0
>Reporter: Travis Bischel
>Priority: Major
>
> In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
> (which I created a separate issue for), and INVALID_TXN_STATE and 
> UNKNOWN_SERVER_ERROR.
> INVALID_TXN_STATE is being returned even though the partitions have been 
> added to the transaction (AddPartitionsToTxn). Nothing about the code has 
> changed between 3.5 and 3.6, and I have loop-integration-tested this code 
> against 3.5 thousands of times. 3.6 is newly - and always - returning 
> INVALID_TXN_STATE. If I change the code to retry on INVALID_TXN_STATE, I 
> eventually quickly (always) receive UNKNOWN_SERVER_ERROR. In looking at the 
> broker logs, the broker indicates that sequence numbers are out of order - 
> but (a) I am repeating requests that were in order (so something on the 
> broker got a little haywire maybe? or maybe this is due to me ignoring 
> invalid_txn_state?), _and_ I am not receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I 
> am receiving UNKNOWN_SERVER_ERROR.
> I think the main problem is the client unexpectedly receiving 
> INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
> USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (KAFKA-15657) Unexpected errors when producing transactionally in 3.6

2023-10-19 Thread Travis Bischel (Jira)

Travis Bischel created KAFKA-15657:
--

 Summary: Unexpected errors when producing transactionally in 3.6
 Key: KAFKA-15657
 URL: https://issues.apache.org/jira/browse/KAFKA-15657
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 3.6.0
Reporter: Travis Bischel


In loop-testing the franz-go client, I am frequently receiving INVALID_RECORD 
(which I created a separate issue for), and INVALID_TXN_STATE and 
UNKNOWN_SERVER_ERROR.

INVALID_TXN_STATE is being returned even though the partitions have been added 
to the transaction (AddPartitionsToTxn). Nothing about the code has changed 
between 3.5 and 3.6, and I have loop-integration-tested this code against 3.5 
thousands of times. 3.6 is newly - and always - returning INVALID_TXN_STATE. If 
I change the code to retry on INVALID_TXN_STATE, I eventually quickly (always) 
receive UNKNOWN_SERVER_ERROR. In looking at the broker logs, the broker 
indicates that sequence numbers are out of order - but (a) I am repeating 
requests that were in order (so something on the broker got a little haywire 
maybe? or maybe this is due to me ignoring invalid_txn_state?), _and_ I am not 
receiving OUT_OF_ORDER_SEQUENCE_NUMBER, I am receiving UNKNOWN_SERVER_ERROR.

I think the main problem is the client unexpectedly receiving 
INVALID_TXN_STATE, but a second problem here is that OOOSN is being mapped to 
USE on return for some reason.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [PR] KAFKA-15605: Fix topic deletion handling during ZK migration [kafka]

2023-10-19 Thread via GitHub



mumrah commented on code in PR #14545:
URL: https://github.com/apache/kafka/pull/14545#discussion_r1366294162


##
metadata/src/main/java/org/apache/kafka/metadata/migration/KRaftMigrationZkWriter.java:
##
@@ -146,6 +147,15 @@ void handleTopicsSnapshot(TopicsImage topicsImage, 
KRaftMigrationOperationConsum
 Map> changedPartitions = new 
HashMap<>();
 Map> newPartitions = new 
HashMap<>();
 
+Set pendingTopicDeletions = 
migrationClient.topicClient().readPendingTopicDeletions();
+if (!pendingTopicDeletions.isEmpty()) {
+operationConsumer.accept(
+DELETE_PENDING_TOPIC_DELETION,
+"Delete pending topic deletions",
+migrationState -> 
migrationClient.topicClient().clearPendingTopicDeletions(pendingTopicDeletions, 
migrationState)

Review Comment:
   Yea, this is in `handleTopicsSnapshot` which is sync'ing the TopicImage to 
ZK. Really it doesn't need to happen each time when we handle a snapshot, but I 
figured putting it here was better than having additional one-off logic at 
migration time 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

1 2 3 >

1 - 100 of 280 matches

Mail list logo