Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #396

2021-08-05 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 489437 lines...]
[2021-08-06T03:14:43.755Z] 
[2021-08-06T03:14:43.755Z] ApiVersionsRequestTest > 
testApiVersionsRequestWithUnsupportedVersion() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestWithUnsupportedVersion()[1]
 PASSED
[2021-08-06T03:14:43.755Z] 
[2021-08-06T03:14:43.755Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0()[1] 
STARTED
[2021-08-06T03:14:45.848Z] 
[2021-08-06T03:14:45.848Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0()[1] 
PASSED
[2021-08-06T03:14:45.848Z] 
[2021-08-06T03:14:45.848Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV3() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV3()[1] 
STARTED
[2021-08-06T03:14:48.495Z] 
[2021-08-06T03:14:48.495Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV3() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV3()[1] 
PASSED
[2021-08-06T03:14:48.495Z] 
[2021-08-06T03:14:48.495Z] ApiVersionsRequestTest > 
testApiVersionsRequestThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestThroughControlPlaneListener()[1]
 STARTED
[2021-08-06T03:14:50.357Z] 
[2021-08-06T03:14:50.357Z] ApiVersionsRequestTest > 
testApiVersionsRequestThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestThroughControlPlaneListener()[1]
 PASSED
[2021-08-06T03:14:50.357Z] 
[2021-08-06T03:14:50.357Z] ApiVersionsRequestTest > testApiVersionsRequest() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequest()[1] STARTED
[2021-08-06T03:14:53.190Z] 
[2021-08-06T03:14:53.190Z] ApiVersionsRequestTest > testApiVersionsRequest() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequest()[1] PASSED
[2021-08-06T03:14:53.190Z] 
[2021-08-06T03:14:53.190Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0ThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0ThroughControlPlaneListener()[1]
 STARTED
[2021-08-06T03:14:55.732Z] 
[2021-08-06T03:14:55.732Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0ThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0ThroughControlPlaneListener()[1]
 PASSED
[2021-08-06T03:14:55.732Z] 
[2021-08-06T03:14:55.732Z] LogDirFailureTest > testIOExceptionDuringLogRoll() 
STARTED
[2021-08-06T03:15:05.093Z] 
[2021-08-06T03:15:05.093Z] LogDirFailureTest > testIOExceptionDuringLogRoll() 
PASSED
[2021-08-06T03:15:05.093Z] 
[2021-08-06T03:15:05.093Z] LogDirFailureTest > 
testIOExceptionDuringCheckpoint() STARTED
[2021-08-06T03:15:10.509Z] 
[2021-08-06T03:15:10.509Z] LogDirFailureTest > 
testIOExceptionDuringCheckpoint() PASSED
[2021-08-06T03:15:10.509Z] 
[2021-08-06T03:15:10.509Z] LogDirFailureTest > 
testProduceErrorFromFailureOnCheckpoint() STARTED
[2021-08-06T03:15:15.036Z] 
[2021-08-06T03:15:15.036Z] LogDirFailureTest > 
testProduceErrorFromFailureOnCheckpoint() PASSED
[2021-08-06T03:15:15.036Z] 
[2021-08-06T03:15:15.036Z] LogDirFailureTest > 
brokerWithOldInterBrokerProtocolShouldHaltOnLogDirFailure() STARTED
[2021-08-06T03:15:23.416Z] 
[2021-08-06T03:15:23.416Z] LogDirFailureTest > 
brokerWithOldInterBrokerProtocolShouldHaltOnLogDirFailure() PASSED
[2021-08-06T03:15:23.416Z] 
[2021-08-06T03:15:23.416Z] LogDirFailureTest > 
testReplicaFetcherThreadAfterLogDirFailureOnFollower() STARTED
[2021-08-06T03:15:29.747Z] 
[2021-08-06T03:15:29.747Z] LogDirFailureTest > 
testReplicaFetcherThreadAfterLogDirFailureOnFollower() PASSED
[2021-08-06T03:15:29.747Z] 
[2021-08-06T03:15:29.747Z] LogDirFailureTest > 
testProduceErrorFromFailureOnLogRoll() STARTED
[2021-08-06T03:15:34.013Z] 
[2021-08-06T03:15:34.013Z] LogDirFailureTest > 
testProduceErrorFromFailureOnLogRoll() PASSED
[2021-08-06T03:15:34.013Z] 
[2021-08-06T03:15:34.013Z] LogOffsetTest > 
testFetchOffsetsBeforeWithChangingSegmentSize() STARTED
[2021-08-06T03:15:37.422Z] 
[2021-08-06T03:15:37.422Z] LogOffsetTest > 
testFetchOffsetsBeforeWithChangingSegmentSize() PASSED
[2021-08-06T03:15:37.422Z] 
[2021-08-06T03:15:37.422Z] LogOffsetTest > testGetOffsetsBeforeEarliestTime() 
STARTED
[2021-08-06T03:15:41.740Z] 
[2021-08-06T03:15:41.740Z] LogOffsetTest > testGetOffsetsBeforeEarliestTime() 
PASSED
[2021-08-06T03:15:41.740Z] 
[2021-08-06T03:15:41.740Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampAfterTruncate() STARTED
[2021-08-06T03:15:45.691Z] 
[2021-08-06T03:15:45.691Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampAfterTruncate() PASSED
[2021-08-06T03:15:45.691Z] 
[2021-08-06T03:15:45.691Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps() STARTED

[jira] [Resolved] (KAFKA-13132) Upgrading to topic IDs in LISR requests has gaps introduced in 3.0

2021-08-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13132.
-
Resolution: Fixed

> Upgrading to topic IDs in LISR requests has gaps introduced in 3.0
> --
>
> Key: KAFKA-13132
> URL: https://issues.apache.org/jira/browse/KAFKA-13132
> Project: Kafka
>  Issue Type: Bug
>Reporter: Justine Olshan
>Assignee: Justine Olshan
>Priority: Blocker
> Fix For: 3.0.0
>
>
> With the change in 3.0 to how topic IDs are assigned to logs, a bug was 
> inadvertently introduced. Now, topic IDs will only be assigned on the load of 
> the log to a partition in LISR requests. This means we will only assign topic 
> IDs for newly created topics/partitions, on broker startup, or potentially 
> when a partition is reassigned.
>  
> In the case of upgrading from an IBP before 2.8, we may have a scenario where 
> we upgrade the controller to IBP 3.0 (or even 2.8) last. (Ie, the controller 
> is IBP < 2.8 and all other brokers are on the newest IBP) Upon the last 
> broker upgrading, we will elect a new controller but its LISR request will 
> not result in topic IDs being assigned to logs of existing topics. They will 
> only be assigned in the cases mentioned above.
> *Keep in mind, in this scenario, topic IDs will be still be assigned in the 
> controller/ZK to all new and pre-existing topics and will show up in 
> metadata.*  This means we are not ensured the same guarantees we had in 2.8. 
> *It is just the LISR/partition.metadata part of the code that is affected.* 
>  
> The problem is two-fold
>  1. We ignore LISR requests when the partition leader epoch has not increased 
> (previously we assigned the ID before this check)
>  2. We only assign the topic ID when we are associating the log with the 
> partition in replicamanager for the first time. Though in the scenario 
> described above, we have logs associated with partitions that need to be 
> upgraded.
>  
> We should check the if the LISR request is resulting in a topic ID addition 
> and add logic to logs already associated to partitions in replica manager.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #395

2021-08-05 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 489669 lines...]
[2021-08-06T00:58:17.465Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0()[1] 
STARTED
[2021-08-06T00:58:19.718Z] 
[2021-08-06T00:58:19.718Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0()[1] 
PASSED
[2021-08-06T00:58:19.718Z] 
[2021-08-06T00:58:19.718Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV3() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV3()[1] 
STARTED
[2021-08-06T00:58:24.154Z] 
[2021-08-06T00:58:24.154Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV3() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV3()[1] 
PASSED
[2021-08-06T00:58:24.154Z] 
[2021-08-06T00:58:24.154Z] ApiVersionsRequestTest > 
testApiVersionsRequestThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestThroughControlPlaneListener()[1]
 STARTED
[2021-08-06T00:58:25.923Z] 
[2021-08-06T00:58:25.923Z] ApiVersionsRequestTest > 
testApiVersionsRequestThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestThroughControlPlaneListener()[1]
 PASSED
[2021-08-06T00:58:25.923Z] 
[2021-08-06T00:58:25.923Z] ApiVersionsRequestTest > testApiVersionsRequest() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequest()[1] STARTED
[2021-08-06T00:58:28.841Z] 
[2021-08-06T00:58:28.841Z] ApiVersionsRequestTest > testApiVersionsRequest() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequest()[1] PASSED
[2021-08-06T00:58:28.841Z] 
[2021-08-06T00:58:28.841Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0ThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0ThroughControlPlaneListener()[1]
 STARTED
[2021-08-06T00:58:30.780Z] 
[2021-08-06T00:58:30.780Z] ApiVersionsRequestTest > 
testApiVersionsRequestValidationV0ThroughControlPlaneListener() > 
kafka.server.ApiVersionsRequestTest.testApiVersionsRequestValidationV0ThroughControlPlaneListener()[1]
 PASSED
[2021-08-06T00:58:30.780Z] 
[2021-08-06T00:58:30.780Z] LogDirFailureTest > testIOExceptionDuringLogRoll() 
STARTED
[2021-08-06T00:58:38.790Z] 
[2021-08-06T00:58:38.790Z] LogDirFailureTest > testIOExceptionDuringLogRoll() 
PASSED
[2021-08-06T00:58:38.790Z] 
[2021-08-06T00:58:38.790Z] LogDirFailureTest > 
testIOExceptionDuringCheckpoint() STARTED
[2021-08-06T00:58:44.843Z] 
[2021-08-06T00:58:44.843Z] LogDirFailureTest > 
testIOExceptionDuringCheckpoint() PASSED
[2021-08-06T00:58:44.843Z] 
[2021-08-06T00:58:44.843Z] LogDirFailureTest > 
testProduceErrorFromFailureOnCheckpoint() STARTED
[2021-08-06T00:58:49.724Z] 
[2021-08-06T00:58:49.724Z] LogDirFailureTest > 
testProduceErrorFromFailureOnCheckpoint() PASSED
[2021-08-06T00:58:49.724Z] 
[2021-08-06T00:58:49.724Z] LogDirFailureTest > 
brokerWithOldInterBrokerProtocolShouldHaltOnLogDirFailure() STARTED
[2021-08-06T00:58:58.607Z] 
[2021-08-06T00:58:58.607Z] LogDirFailureTest > 
brokerWithOldInterBrokerProtocolShouldHaltOnLogDirFailure() PASSED
[2021-08-06T00:58:58.607Z] 
[2021-08-06T00:58:58.607Z] LogDirFailureTest > 
testReplicaFetcherThreadAfterLogDirFailureOnFollower() STARTED
[2021-08-06T00:59:03.100Z] 
[2021-08-06T00:59:03.100Z] LogDirFailureTest > 
testReplicaFetcherThreadAfterLogDirFailureOnFollower() PASSED
[2021-08-06T00:59:03.100Z] 
[2021-08-06T00:59:03.100Z] LogDirFailureTest > 
testProduceErrorFromFailureOnLogRoll() STARTED
[2021-08-06T00:59:10.561Z] 
[2021-08-06T00:59:10.561Z] LogDirFailureTest > 
testProduceErrorFromFailureOnLogRoll() PASSED
[2021-08-06T00:59:10.561Z] 
[2021-08-06T00:59:10.561Z] LogOffsetTest > 
testFetchOffsetsBeforeWithChangingSegmentSize() STARTED
[2021-08-06T00:59:14.002Z] 
[2021-08-06T00:59:14.002Z] LogOffsetTest > 
testFetchOffsetsBeforeWithChangingSegmentSize() PASSED
[2021-08-06T00:59:14.002Z] 
[2021-08-06T00:59:14.002Z] LogOffsetTest > testGetOffsetsBeforeEarliestTime() 
STARTED
[2021-08-06T00:59:17.252Z] 
[2021-08-06T00:59:17.252Z] LogOffsetTest > testGetOffsetsBeforeEarliestTime() 
PASSED
[2021-08-06T00:59:17.252Z] 
[2021-08-06T00:59:17.252Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampAfterTruncate() STARTED
[2021-08-06T00:59:21.545Z] 
[2021-08-06T00:59:21.545Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampAfterTruncate() PASSED
[2021-08-06T00:59:21.545Z] 
[2021-08-06T00:59:21.545Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps() STARTED
[2021-08-06T00:59:26.206Z] 
[2021-08-06T00:59:26.206Z] LogOffsetTest > 
testFetchOffsetByTimestampForMaxTimestampWithUnorderedTimestamps() PASSED
[2021-08-06T00:59:26.206Z] 
[2021-08-06T00:59:26.206Z] LogOffsetTest > testGetOffsetsForUnknownTopic() 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.0 #86

2021-08-05 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-13174) Log Compaction Blocked Forever by Unstable Offset/Unclosed Transaction

2021-08-05 Thread Michael Jaschob (Jira)
Michael Jaschob created KAFKA-13174:
---

 Summary: Log Compaction Blocked Forever by Unstable 
Offset/Unclosed Transaction
 Key: KAFKA-13174
 URL: https://issues.apache.org/jira/browse/KAFKA-13174
 Project: Kafka
  Issue Type: Bug
  Components: core, log cleaner
Affects Versions: 2.5.1
Reporter: Michael Jaschob


Our production cluster experienced a single __consumer_offsets partition that 
was growing without ever being compacted. A closer inspection of the cleaner 
logs showed that the log was forever uncleanable at an offset from July 28, 
which had been written ~7 days previously:
{code:java}
[2021-08-02 19:08:39,650] DEBUG Finding range of cleanable offsets for 
log=__consumer_offsets-9. Last clean offset=Some(75795702964) now=1627956519650 
=> firstDirtyOffset=75795702964 firstUncleanableOffset=75868740168 
activeSegment.baseOffset=76691318694 (kafka.log.LogCleanerManager$)
{code}
Using the log dumper tool, we were able to examine the records/batches around 
this offset and determined that the proximate cause was an "open" transaction 
that was never committed or aborted. We saw this:
 - a consumer group offset commit for group {{foo-group}} for topic-partition 
{{foo-topic-46}} from pid 1686325 (log offset 75868731509)
 - a transactional COMMIT marker from pid 1686325 (log offset 75868731579)
 - another consumer group offset commit for group {{foo-group}} for 
topic-partition {{foo-topic-46}} for pid 1686325 (log offset 75868740168, our 
first uncleanable offset)

Here's the raw log dumper output:
{code:java}
baseOffset: 75868731509 lastOffset: 75868731509 count: 1 baseSequence: 0 
lastSequence: 0 producerId: 1686325 producerEpoch: 249 partitionLeaderEpoch: 
320 isTransactional: true isControl: false position: 98725764 CreateTime: 
1627495733656 size: 232 magic
: 2 compresscodec: NONE crc: 485368943 isvalid: true
| offset: 75868731509 CreateTime: 1627495733656 keysize: 126 valuesize: 36 
sequence: 0 headerKeys: [] key: 
offset_commit::group=foo-group,partition=foo-topic-46 payload: 
offset=59016695,metadata=AQAAAXruS8Fg
...

baseOffset: 75868731579 lastOffset: 75868731579 count: 1 baseSequence: -1 
lastSequence: -1 producerId: 1686325 producerEpoch: 249 partitionLeaderEpoch: 
320 isTransactional: true isControl: true position: 98732634 CreateTime: 
1627495733700 size: 78 magic
: 2 compresscodec: NONE crc: 785483064 isvalid: true
| offset: 75868731579 CreateTime: 1627495733700 keysize: 4 valuesize: 6 
sequence: -1 headerKeys: [] endTxnMarker: COMMIT coordinatorEpoch: 143
...

baseOffset: 75868740168 lastOffset: 75868740168 count: 1 baseSequence: 0 
lastSequence: 0 producerId: 1686325 producerEpoch: 249 partitionLeaderEpoch: 
320 isTransactional: true isControl: false position: 99599843 CreateTime: 
1627495737629 size: 232 magic: 2 compresscodec: NONE crc: 1222829008 isvalid: 
true
| offset: 75868740168 CreateTime: 1627495737629 keysize: 126 valuesize: 36 
sequence: 0 headerKeys: [] key: 
offset_commit::group=foo-group,partition=foo-topic-46 payload: 
offset=59016695,metadata=AQAAAXruS8Fg
...
{code}
There was no further activity from that pid 1686325. In fact, the KStream 
application in question stalled on that partition because of this unstable 
offset/open transaction: {{The following partitions still have unstable offsets 
which are not cleared on the broker side: [foo-topic-46], this could be either 
transactional offsets waiting for completion, or normal offsets waiting for 
replication after appending to local log}}

We then captured the producer snapshot file from the broker data directory and 
wrote a quick tool to dump it as text. From its output, we found that the 
transactional producer in question (pid 1686325) was still considered alive 
with its hanging transaction at 75868740168:
{code:java}
ArraySeq(ProducerStateEntry(producerId=1686325, producerEpoch=249, 
currentTxnFirstOffset=Some(75868740168), coordinatorEpoch=143, 
lastTimestamp=1627495737629, batchMetadata=Queue(BatchMetadata(firstSeq=0, 
lastSeq=0, firstOffset=75868740168, lastOffset=75868740168, 
timestamp=1627495737629)))
{code}
This was very perplexing. As far as we can tell, the code in both Apache Kafka 
2.5.1 and in trunk essentially treats an open transaction like we had as 
uncleanable, which in practice blocks the log from ever being compacted again, 
for all eternity. Once a pid has an open transaction - a defined 
{{currentTxnFirstOffset}} - {{ProducerStateManager}} will [never expire the 
producer|https://github.com/apache/kafka/blob/2.5.1/core/src/main/scala/kafka/log/ProducerStateManager.scala#L576-L577],
 even after {{transactional.id.expiration.ms}} has passed. This, obviously, has 
severe repercussions on a topic like __consumer_offsets (long coordinator load 
times, always-growing disk usage).

While we're not sure what led to this hanging open transaction (note: we were 
running 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #394

2021-08-05 Thread Apache Jenkins Server
See 




Jenkins build became unstable: Kafka » Kafka Branch Builder » 3.0 #85

2021-08-05 Thread Apache Jenkins Server
See 




Handling retriable exceptions during Connect source task start

2021-08-05 Thread Sergei Morozov
Hi,

I'm trying to address an issue in Debezium (DBZ-3823
) where a source connector task
cannot recover from a retriable exception.

The root cause is that the task interacts with the source database during
SourceTask#start as but Kafka Connect doesn't handle retriable exceptions
thrown at this stage as retriable. KIP-298

that
originally introduced handling of retriable exception doesn't describe
handling task start exceptions, so it's unclear to me whether those aren't
allowed by design or it was just out of the scope of the KIP.

My current working solution
 relies
on the internal Debezium implementation of the task restart which
introduces certain risks (the details are in the PR description).

The question is: are retriable exceptions during start disallowed by
design, and the task must not throw retriable exceptions during start, or
it's just currently not supported by the Connect framework and I just need
to implement proper error handling in the connector?

Thanks!

-- 
Sergei Morozov


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #393

2021-08-05 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-13173) KRaft controller does not handle simultaneous broker expirations correctly

2021-08-05 Thread Jason Gustafson (Jira)
Jason Gustafson created KAFKA-13173:
---

 Summary: KRaft controller does not handle simultaneous broker 
expirations correctly
 Key: KAFKA-13173
 URL: https://issues.apache.org/jira/browse/KAFKA-13173
 Project: Kafka
  Issue Type: Bug
Reporter: Jason Gustafson


In `ReplicationControlManager.fenceStaleBrokers`, we find all of the current 
stale replicas and attempt to remove them from the ISR. However, when multiple 
expirations occur at once, we do not properly accumulate the ISR changes. For 
example, I ran a test where the ISR of a partition was initialized to [1, 2, 
3]. Then I triggered a timeout of replicas 2 and 3 at the same time. The 
records that were generated by `fenceStaleBrokers` were the following:

{code}
ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, 
topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 3], leader=1, replicas=null, 
removingReplicas=null, addingReplicas=null) at version 0), 
ApiMessageAndVersion(FenceBrokerRecord(id=2, epoch=102) at version 0), 
ApiMessageAndVersion(PartitionChangeRecord(partitionId=0, 
topicId=_seg8hBuSymBHUQ1sMKr2g, isr=[1, 2], leader=1, replicas=null, 
removingReplicas=null, addingReplicas=null) at version 0), 
ApiMessageAndVersion(FenceBrokerRecord(id=3, epoch=103) at version 0)]
{code}

First the ISR is shrunk to [1, 3] as broker 2 is fenced. We also see the record 
to fence broker 2. Then the ISR is modified to [1, 2] as the fencing of broker 
3 is handled. So we did not account for the fact that we had already fenced 
broker 2 in the request. 

A simple solution for now is to change the logic to handle fencing only one 
broker at a time. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Fwd: Handling retriable exceptions during Connect source task start

2021-08-05 Thread Sergei Morozov
Hi,

I'm trying to address an issue in Debezium (DBZ-3823
) where a source connector task
cannot recover from a retriable exception.

The root cause is that the task interacts with the source database during
SourceTask#start but Kafka Connect doesn't handle retriable exceptions
thrown at this stage as retriable. KIP-298

that
originally introduced handling of retriable exception doesn't describe
handling task start exceptions, so it's unclear to me whether those aren't
allowed by design or it was just out of the scope of the KIP.

My current working solution
 relies
on the internal Debezium implementation of the task restart which
introduces certain risks (the details are in the PR description).

The question is: are retriable exceptions during start disallowed by
design, and the task must not throw retriable exceptions during start, or
it's just currently not supported by the Connect framework and I just need
to implement proper error handling in the connector?

Thanks!

-- 
Sergei Morozov


Jenkins build is back to stable : Kafka » Kafka Branch Builder » 3.0 #84

2021-08-05 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-13172) Document in Streams 3.0 that due to rocksDB footer version in-filght downgrade is not supported

2021-08-05 Thread Guozhang Wang (Jira)
Guozhang Wang created KAFKA-13172:
-

 Summary: Document in Streams 3.0 that due to rocksDB footer 
version in-filght downgrade is not supported
 Key: KAFKA-13172
 URL: https://issues.apache.org/jira/browse/KAFKA-13172
 Project: Kafka
  Issue Type: Improvement
Reporter: Guozhang Wang
Assignee: Guozhang Wang






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #392

2021-08-05 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.0 #83

2021-08-05 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-13168) KRaft observers should not have a replica id

2021-08-05 Thread David Arthur (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Arthur resolved KAFKA-13168.
--
Resolution: Fixed

> KRaft observers should not have a replica id
> 
>
> Key: KAFKA-13168
> URL: https://issues.apache.org/jira/browse/KAFKA-13168
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Ryan Dielhenn
>Priority: Blocker
>  Labels: kip-500
> Fix For: 3.0.0, 3.1.0
>
>
> To avoid miss configuration of a broker affecting the quorum of the cluster 
> metadata partition when a Kafka node is configure as broker only the replica 
> id for the KRaft client should be set to {{Optional::empty()}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-08-05 Thread Konstantine Karantasis
Jose, thanks for the heads up on the 3 new blocker candidates.

I read the tickets and they have clear descriptions and implementation
details.
However, at this stage to be able to make a call and approve new blockers
I'd appreciate it if we could get some insight regarding the risk and the
necessity of a fix. A rough ETA would also be helpful.

Having said that, based on the descriptions and the existence of a few
other blockers, I'm tentatively approving KAFKA-13161, KAFKA-13165, and
KAFKA-13168 and we might have to make a new assessment if these are the
only blockers in the next few days or if we notice a regression during
testing.

Konstantine



On Thu, Aug 5, 2021 at 10:04 AM Konstantine Karantasis <
kkaranta...@apache.org> wrote:

>
> Thanks for reporting this new issue Ryan,
>
> It's important and this issue seems to have clearly regressed dynamic
> default configs in the 3.0 branch.
> So, it's approved.
>
> Konstantine
>
>
> On Wed, Aug 4, 2021 at 4:34 PM José Armando García Sancio
>  wrote:
>
>> Hey all,
>>
>> For the KIP-500 work for 3.0 we would like to propose the following
>> Jiras as blockers:
>>
>> 1. https://issues.apache.org/jira/browse/KAFKA-13168
>> 2. https://issues.apache.org/jira/browse/KAFKA-13165
>> 3. https://issues.apache.org/jira/browse/KAFKA-13161
>>
>> The description for each Jira should have more details.
>>
>> Thanks,
>> -Jose
>>
>> On Tue, Aug 3, 2021 at 12:14 PM Ryan Dielhenn
>>  wrote:
>> >
>> > Hi Konstantine,
>> >
>> > I would like to report another bug in KRaft.
>> >
>> > The ConfigHandler that processes dynamic broker config deltas in KRaft
>> > expects that the default resource name for dynamic broker configs is the
>> > old default entity name used in ZK: "". Since dynamic default
>> > broker configs are persisted as empty string in the quorum instead of
>> > "", the brokers are not updating the their default
>> configuration
>> > when they see empty string as a resource name in the config delta and
>> are
>> > throwing a NumberFormatException when they try to parse the resource
>> name
>> > to process it as a per-broker configuration.
>> >
>> > I filed a JIRA: https://issues.apache.org/jira/browse/KAFKA-13160
>> >
>> > I also have a PR to fix this:
>> https://github.com/apache/kafka/pull/11168
>> >
>> > I think that this should be a blocker for 3.0 because dynamic default
>> > broker configs will not be usable in KRaft otherwise.
>> >
>> > Best,
>> > Ryan Dielhenn
>> >
>> > On Sat, Jul 31, 2021 at 10:42 AM Konstantine Karantasis <
>> > kkaranta...@apache.org> wrote:
>> >
>> > > Thanks Ryan,
>> > >
>> > > Approved. Seems also like a low risk fix.
>> > > With that opportunity, let's make sure there are no other configs that
>> > > would need a similar validation.
>> > >
>> > > Konstantine
>> > >
>> > > On Fri, Jul 30, 2021 at 8:33 AM Ryan Dielhenn
>> > >  wrote:
>> > >
>> > > > Hey Konstantine,
>> > > >
>> > > > Thanks for the question. If these configs are not validated the
>> user's
>> > > > experience will be affected and upgrades from 3.0 will be harder.
>> > > >
>> > > > Best,
>> > > > Ryan Dielhenn
>> > > >
>> > > > On Thu, Jul 29, 2021 at 3:59 PM Konstantine Karantasis <
>> > > > kkaranta...@apache.org> wrote:
>> > > >
>> > > > > Thanks for reporting this issue Ryan.
>> > > > >
>> > > > > I believe what you mention corresponds to the ticket you created
>> here:
>> > > > > https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13151
>> > > > >
>> > > > > What happens if the configurations are present but the broker
>> doesn't
>> > > > fail
>> > > > > at startup when configured to run in KRaft mode?
>> > > > > Asking to see if we have any workarounds in our availability.
>> > > > >
>> > > > > Thanks,
>> > > > > Konstantine
>> > > > >
>> > > > > On Thu, Jul 29, 2021 at 2:51 PM Ryan Dielhenn
>> > > > >  wrote:
>> > > > >
>> > > > > > Hi,
>> > > > > >
>> > > > > > Disregard log.clean.policy being included in this blocker.
>> > > > > >
>> > > > > > Best,
>> > > > > > Ryan Dielhenn
>> > > > > >
>> > > > > > On Thu, Jul 29, 2021 at 2:38 PM Ryan Dielhenn <
>> > > rdielh...@confluent.io>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hey Konstantine,
>> > > > > > >
>> > > > > > > I'd like to report another bug in KRaft.
>> > > > > > >
>> > > > > > > log.cleanup.policy, alter.config.policy.class.name, and
>> > > > > > > create.topic.policy.class.name are all unsupported by KRaft
>> but
>> > > > KRaft
>> > > > > > > servers allow them to be configured. I believe this should be
>> > > > > considered
>> > > > > > a
>> > > > > > > blocker and that KRaft servers should fail startup if any of
>> these
>> > > > are
>> > > > > > > configured. I do not have a PR yet but will soon.
>> > > > > > >
>> > > > > > > On another note, I have a PR for the dynamic broker
>> configuration
>> > > fix
>> > > > > > > here: https://github.com/apache/kafka/pull/11141
>> > > > > > >
>> > > > > > > Best,
>> > > > > > > Ryan Dielhenn
>> > > > > > >
>> > > > > > > On Wed, May 26, 2021 at 2:48 

Re: [DISCUSS] Apache Kafka 3.0.0 release plan with new updated dates

2021-08-05 Thread Konstantine Karantasis
Thanks for reporting this new issue Ryan,

It's important and this issue seems to have clearly regressed dynamic
default configs in the 3.0 branch.
So, it's approved.

Konstantine


On Wed, Aug 4, 2021 at 4:34 PM José Armando García Sancio
 wrote:

> Hey all,
>
> For the KIP-500 work for 3.0 we would like to propose the following
> Jiras as blockers:
>
> 1. https://issues.apache.org/jira/browse/KAFKA-13168
> 2. https://issues.apache.org/jira/browse/KAFKA-13165
> 3. https://issues.apache.org/jira/browse/KAFKA-13161
>
> The description for each Jira should have more details.
>
> Thanks,
> -Jose
>
> On Tue, Aug 3, 2021 at 12:14 PM Ryan Dielhenn
>  wrote:
> >
> > Hi Konstantine,
> >
> > I would like to report another bug in KRaft.
> >
> > The ConfigHandler that processes dynamic broker config deltas in KRaft
> > expects that the default resource name for dynamic broker configs is the
> > old default entity name used in ZK: "". Since dynamic default
> > broker configs are persisted as empty string in the quorum instead of
> > "", the brokers are not updating the their default configuration
> > when they see empty string as a resource name in the config delta and are
> > throwing a NumberFormatException when they try to parse the resource name
> > to process it as a per-broker configuration.
> >
> > I filed a JIRA: https://issues.apache.org/jira/browse/KAFKA-13160
> >
> > I also have a PR to fix this: https://github.com/apache/kafka/pull/11168
> >
> > I think that this should be a blocker for 3.0 because dynamic default
> > broker configs will not be usable in KRaft otherwise.
> >
> > Best,
> > Ryan Dielhenn
> >
> > On Sat, Jul 31, 2021 at 10:42 AM Konstantine Karantasis <
> > kkaranta...@apache.org> wrote:
> >
> > > Thanks Ryan,
> > >
> > > Approved. Seems also like a low risk fix.
> > > With that opportunity, let's make sure there are no other configs that
> > > would need a similar validation.
> > >
> > > Konstantine
> > >
> > > On Fri, Jul 30, 2021 at 8:33 AM Ryan Dielhenn
> > >  wrote:
> > >
> > > > Hey Konstantine,
> > > >
> > > > Thanks for the question. If these configs are not validated the
> user's
> > > > experience will be affected and upgrades from 3.0 will be harder.
> > > >
> > > > Best,
> > > > Ryan Dielhenn
> > > >
> > > > On Thu, Jul 29, 2021 at 3:59 PM Konstantine Karantasis <
> > > > kkaranta...@apache.org> wrote:
> > > >
> > > > > Thanks for reporting this issue Ryan.
> > > > >
> > > > > I believe what you mention corresponds to the ticket you created
> here:
> > > > > https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13151
> > > > >
> > > > > What happens if the configurations are present but the broker
> doesn't
> > > > fail
> > > > > at startup when configured to run in KRaft mode?
> > > > > Asking to see if we have any workarounds in our availability.
> > > > >
> > > > > Thanks,
> > > > > Konstantine
> > > > >
> > > > > On Thu, Jul 29, 2021 at 2:51 PM Ryan Dielhenn
> > > > >  wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Disregard log.clean.policy being included in this blocker.
> > > > > >
> > > > > > Best,
> > > > > > Ryan Dielhenn
> > > > > >
> > > > > > On Thu, Jul 29, 2021 at 2:38 PM Ryan Dielhenn <
> > > rdielh...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hey Konstantine,
> > > > > > >
> > > > > > > I'd like to report another bug in KRaft.
> > > > > > >
> > > > > > > log.cleanup.policy, alter.config.policy.class.name, and
> > > > > > > create.topic.policy.class.name are all unsupported by KRaft
> but
> > > > KRaft
> > > > > > > servers allow them to be configured. I believe this should be
> > > > > considered
> > > > > > a
> > > > > > > blocker and that KRaft servers should fail startup if any of
> these
> > > > are
> > > > > > > configured. I do not have a PR yet but will soon.
> > > > > > >
> > > > > > > On another note, I have a PR for the dynamic broker
> configuration
> > > fix
> > > > > > > here: https://github.com/apache/kafka/pull/11141
> > > > > > >
> > > > > > > Best,
> > > > > > > Ryan Dielhenn
> > > > > > >
> > > > > > > On Wed, May 26, 2021 at 2:48 PM Konstantine Karantasis
> > > > > > >  wrote:
> > > > > > >
> > > > > > >> Hi all,
> > > > > > >>
> > > > > > >> Please find below the updated release plan for the Apache
> Kafka
> > > > 3.0.0
> > > > > > >> release.
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=177046466
> > > > > > >>
> > > > > > >> New suggested dates for the release are as follows:
> > > > > > >>
> > > > > > >> KIP Freeze is 09 June 2021 (same date as in the initial plan)
> > > > > > >> Feature Freeze is 30 June 2021 (new date, extended by two
> weeks)
> > > > > > >> Code Freeze is 14 July 2021 (new date, extended by two weeks)
> > > > > > >>
> > > > > > >> At least two weeks of stabilization will follow Code Freeze.
> > > > > > >>
> > > > > > >> The release plan is up to date and currently includes all the
> > > > approved
> > > 

[jira] [Resolved] (KAFKA-13167) KRaft broker should heartbeat immediately during controlled shutdown

2021-08-05 Thread Jason Gustafson (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gustafson resolved KAFKA-13167.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

> KRaft broker should heartbeat immediately during controlled shutdown
> 
>
> Key: KAFKA-13167
> URL: https://issues.apache.org/jira/browse/KAFKA-13167
> Project: Kafka
>  Issue Type: Bug
>Reporter: Jason Gustafson
>Assignee: Jason Gustafson
>Priority: Major
> Fix For: 3.0.0
>
>
> Controlled shutdown in KRaft is signaled through a heartbeat request with the 
> `shouldShutDown` flag set to true. When we begin controlled shutdown, we 
> should immediately schedule the next heartbeat instead of waiting for the 
> next periodic heartbeat so that we can shutdown more quickly. Otherwise 
> controlled shutdown can be delayed by several seconds.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Request Permissions to Contribute

2021-08-05 Thread Guozhang Wang
Hello Jordan,

I've added you to the permission list, you should be able to create wiki
pages now.

Guozhang

On Wed, Aug 4, 2021 at 2:27 PM Jordan Bull  wrote:

> Hi,
>
> I'd like to request permissions to contribute to Kafka to propose a KIP.
>
> Wiki ID: jbull
> Jira ID: jbull
>
> Thanks you,
> Jordan
>
>

-- 
-- Guozhang


[jira] [Created] (KAFKA-13171) KIP-500 Setup and named docker volumes

2021-08-05 Thread Claudio Carcaci (Jira)
Claudio Carcaci created KAFKA-13171:
---

 Summary: KIP-500 Setup and named docker volumes
 Key: KAFKA-13171
 URL: https://issues.apache.org/jira/browse/KAFKA-13171
 Project: Kafka
  Issue Type: Improvement
Affects Versions: 2.8.0
Reporter: Claudio Carcaci


Following the KIP-500 instructions to enable the Kraft mode 
([https://github.com/apache/kafka/blob/trunk/config/kraft/README.md)] the 
command:
{code:java}
$ ./bin/kafka-storage.sh format -t  -c ./config/kraft/server.properties
{code}
will create a "meta.properties" file in the logs folder (i.e. 
/tmp/kraft-combined-logs).

If I build a Docker image with Kraft mode enabled and I mount a named volume on 
the /tmp/kraft-combined-logs the content of the folder will be overwritten 
(emptied) by the Docker named volume content. So I will lose the 
meta.properties file.

A possible solution would be to create the meta.properties file in a separated 
folder and store all the created logs elsewhere so that by mounting a named 
volume the logs folder can start empty.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #391

2021-08-05 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-13170) Flaky Test InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown

2021-08-05 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-13170:
--

 Summary: Flaky Test 
InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown
 Key: KAFKA-13170
 URL: https://issues.apache.org/jira/browse/KAFKA-13170
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Reporter: A. Sophie Blee-Goldman


[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11176/2/testReport/org.apache.kafka.streams.processor.internals/InternalTopicManagerTest/Build___JDK_8_and_Scala_2_12___shouldRetryDeleteTopicWhenTopicUnknown_2/]
{code:java}
Stacktracejava.lang.AssertionError: unexpected exception type thrown; 
expected: but 
was:
  at org.junit.Assert.assertThrows(Assert.java:1020)
  at org.junit.Assert.assertThrows(Assert.java:981)
  at 
org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenRetriableException(InternalTopicManagerTest.java:526)
  at 
org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown(InternalTopicManagerTest.java:497)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13169) Flaky Test QueryableStateIntegrationTest.shouldBeAbleToQueryStateWithNonZeroSizedCache

2021-08-05 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-13169:
--

 Summary: Flaky Test 
QueryableStateIntegrationTest.shouldBeAbleToQueryStateWithNonZeroSizedCache
 Key: KAFKA-13169
 URL: https://issues.apache.org/jira/browse/KAFKA-13169
 Project: Kafka
  Issue Type: Bug
  Components: streams, unit tests
Reporter: A. Sophie Blee-Goldman


[https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11176/2/testReport/org.apache.kafka.streams.processor.internals/InternalTopicManagerTest/Build___JDK_8_and_Scala_2_12___shouldRetryDeleteTopicWhenTopicUnknown/]
{code:java}
Stacktrace
java.lang.AssertionError: unexpected exception type thrown; 
expected: but 
was:
  at org.junit.Assert.assertThrows(Assert.java:1020)
  at org.junit.Assert.assertThrows(Assert.java:981)
  at 
org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenRetriableException(InternalTopicManagerTest.java:526)
  at 
org.apache.kafka.streams.processor.internals.InternalTopicManagerTest.shouldRetryDeleteTopicWhenTopicUnknown(InternalTopicManagerTest.java:497)
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[DISCUSS] KIP-766: fetch/findSessions queries with open endpoints for SessionStore/WindowStore

2021-08-05 Thread Luke Chen
Hi everyone,

I'd like to start the discussion for *KIP-766: fetch/findSessions queries
with open endpoints for WindowStore/SessionStore*.

This is a follow-up KIP for KIP-763: Range queries with open endpoints
.
In KIP-763, we focused on *ReadOnlyKeyValueStore*, in this KIP, we'll focus
on *ReadOnlySessionStore* and *ReadOnlyWindowStore, *to have open endpoints
queries for SessionStore/WindowStore.

The KIP can be found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=186876596


Thank you.
Luke