[jira] [Created] (KAFKA-14196) Flaky OffsetValidationTest seems to indicate potential duplication issue during rebalance

2022-09-01 Thread Philip Nee (Jira)
Philip Nee created KAFKA-14196:
--

 Summary: Flaky OffsetValidationTest seems to indicate potential 
duplication issue during rebalance
 Key: KAFKA-14196
 URL: https://issues.apache.org/jira/browse/KAFKA-14196
 Project: Kafka
  Issue Type: Bug
  Components: clients, consumer
Affects Versions: 3.2.1
Reporter: Philip Nee


Several flaky tests under OffsetValidationTest are indicating potential 
consumer duplication issue, when autocommit is enabled.  Below shows the 
failure message:

 
{code:java}
Total consumed records 3366 did not match consumed position 3331 {code}
 

After investigating the log, I discovered that the data consumed between the 
start of a rebalance event and the async commit was lost for those failing 
tests.  In the example below, the rebalance event kicks in at around 
1662054846995 (first record), and the async commit of the offset 3739 is 
completed at around 1662054847015 (right before partitions_revoked).

 
{code:java}
{"timestamp":1662054846995,"name":"records_consumed","count":3,"partitions":[{"topic":"test_topic","partition":0,"count":3,"minOffset":3739,"maxOffset":3741}]}
{"timestamp":1662054846998,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3742,"maxOffset":3743}]}
{"timestamp":1662054847008,"name":"records_consumed","count":2,"partitions":[{"topic":"test_topic","partition":0,"count":2,"minOffset":3744,"maxOffset":3745}]}
{"timestamp":1662054847016,"name":"partitions_revoked","partitions":[{"topic":"test_topic","partition":0}]}
{"timestamp":1662054847031,"name":"partitions_assigned","partitions":[{"topic":"test_topic","partition":0}]}
{"timestamp":1662054847038,"name":"records_consumed","count":23,"partitions":[{"topic":"test_topic","partition":0,"count":23,"minOffset":3739,"maxOffset":3761}]}
 {code}
A few things to note here:
 # This is highly flaky, I found 1/4 runs will fail the tests
 # Manually calling commitSync in the onPartitionsRevoke cb seems to alleviate 
the issue
 # Setting includeMetadataInTimeout to false also seems to alleviate the issue.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #57

2022-09-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 574330 lines...]
[2022-09-02T01:50:22.371Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInner[caching enabled = false] PASSED
[2022-09-02T01:50:22.371Z] 
[2022-09-02T01:50:22.371Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuter[caching enabled = false] STARTED
[2022-09-02T01:50:27.504Z] 
[2022-09-02T01:50:27.504Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerOuter[caching enabled = false] PASSED
[2022-09-02T01:50:27.504Z] 
[2022-09-02T01:50:27.504Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = false] STARTED
[2022-09-02T01:50:30.856Z] 
[2022-09-02T01:50:30.857Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuter[caching enabled = false] PASSED
[2022-09-02T01:50:30.857Z] 
[2022-09-02T01:50:30.857Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeft[caching enabled = false] STARTED
[2022-09-02T01:50:36.167Z] 
[2022-09-02T01:50:36.167Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = false] PASSED
[2022-09-02T01:50:36.167Z] 
[2022-09-02T01:50:36.167Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = false] STARTED
[2022-09-02T01:50:39.351Z] 
[2022-09-02T01:50:39.351Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeft[caching enabled = false] PASSED
[2022-09-02T01:50:39.351Z] 
[2022-09-02T01:50:39.351Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerInner[caching enabled = false] STARTED
[2022-09-02T01:50:44.637Z] 
[2022-09-02T01:50:44.638Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = false] PASSED
[2022-09-02T01:50:44.638Z] 
[2022-09-02T01:50:44.638Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterOuter[caching enabled = false] STARTED
[2022-09-02T01:50:46.107Z] 
[2022-09-02T01:50:46.107Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerInner[caching enabled = false] PASSED
[2022-09-02T01:50:46.107Z] 
[2022-09-02T01:50:46.107Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerOuter[caching enabled = false] STARTED
[2022-09-02T01:50:52.666Z] 
[2022-09-02T01:50:52.666Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterOuter[caching enabled = false] PASSED
[2022-09-02T01:50:52.666Z] 
[2022-09-02T01:50:52.666Z] 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest > 
shouldReportCorrectEndOffsetInformation STARTED
[2022-09-02T01:50:53.683Z] 
[2022-09-02T01:50:53.683Z] 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest > 
shouldReportCorrectEndOffsetInformation PASSED
[2022-09-02T01:50:53.683Z] 
[2022-09-02T01:50:53.683Z] 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest > 
shouldReportCorrectCommittedOffsetInformation STARTED
[2022-09-02T01:50:54.429Z] 
[2022-09-02T01:50:54.429Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerOuter[caching enabled = false] PASSED
[2022-09-02T01:50:54.429Z] 
[2022-09-02T01:50:54.429Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = false] STARTED
[2022-09-02T01:50:54.936Z] 
[2022-09-02T01:50:54.936Z] 
org.apache.kafka.streams.integration.TaskMetadataIntegrationTest > 
shouldReportCorrectCommittedOffsetInformation PASSED
[2022-09-02T01:50:55.953Z] 
[2022-09-02T01:50:55.953Z] 
org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest
 > shouldThrowErrorAfterSourceTopicDeleted STARTED
[2022-09-02T01:51:01.087Z] 
[2022-09-02T01:51:01.087Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = false] PASSED
[2022-09-02T01:51:01.087Z] 
[2022-09-02T01:51:01.087Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = false] STARTED
[2022-09-02T01:51:04.273Z] 
[2022-09-02T01:51:04.273Z] 
org.apache.kafka.streams.processor.internals.HandlingSourceTopicDeletionIntegrationTest
 > shouldThrowErrorAfterSourceTopicDeleted PASSED
[2022-09-02T01:51:05.370Z] 
[2022-09-02T01:51:05.370Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers STARTED
[2022-09-02T01:51:08.276Z] 
[2022-09-02T01:51:08.276Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = false] PASSED
[2022-09-02T01:51:08.276Z] 
[2022-09-02T01:51:08.276Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterOuter[ca

[VOTE] KIP-678: New Kafka Connect SMT for plainText => struct with Regex

2022-09-01 Thread gyejun choi
Hi all,
I'd like to start a vote for KIP-678: New Kafka Connect SMT for plainText
=> struct with Regex

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-678%3A+New+Kafka+Connect+SMT+for+plainText+%3D%3E+Struct%28or+Map%29+with+Regex

JIRA: https://github.com/apache/kafka/pull/12219

Discussion thread:
https://lists.apache.org/thread/xb57l7j953k8dfgqvktb09y31vzpm1xx
https://lists.apache.org/thread/7t1k0ko8l973v4oj3l983j7qpwolhyzf

Thanks,
whsoul


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1194

2022-09-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 507984 lines...]
[2022-09-02T00:25:49.635Z] 
[2022-09-02T00:25:49.635Z] IQv2IntegrationTest > shouldFetchFromPartition() 
STARTED
[2022-09-02T00:25:51.203Z] 
[2022-09-02T00:25:51.203Z] IQv2IntegrationTest > shouldRejectNonRunningActive() 
PASSED
[2022-09-02T00:25:51.724Z] 
[2022-09-02T00:25:51.724Z] IQv2IntegrationTest > shouldFetchFromPartition() 
PASSED
[2022-09-02T00:25:51.724Z] 
[2022-09-02T00:25:51.724Z] IQv2IntegrationTest > 
shouldFetchExplicitlyFromAllPartitions() STARTED
[2022-09-02T00:25:53.264Z] 
[2022-09-02T00:25:53.264Z] InternalTopicIntegrationTest > 
shouldCompactTopicsForKeyValueStoreChangelogs() STARTED
[2022-09-02T00:25:54.651Z] 
[2022-09-02T00:25:54.651Z] IQv2IntegrationTest > 
shouldFetchExplicitlyFromAllPartitions() PASSED
[2022-09-02T00:25:54.651Z] 
[2022-09-02T00:25:54.651Z] IQv2IntegrationTest > shouldFailUnknownStore() 
STARTED
[2022-09-02T00:25:54.651Z] 
[2022-09-02T00:25:54.651Z] IQv2IntegrationTest > shouldFailUnknownStore() PASSED
[2022-09-02T00:25:54.651Z] 
[2022-09-02T00:25:54.651Z] IQv2IntegrationTest > shouldRejectNonRunningActive() 
STARTED
[2022-09-02T00:25:55.174Z] 
[2022-09-02T00:25:55.174Z] InternalTopicIntegrationTest > 
shouldCompactTopicsForKeyValueStoreChangelogs() PASSED
[2022-09-02T00:25:55.174Z] 
[2022-09-02T00:25:55.174Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() STARTED
[2022-09-02T00:25:56.739Z] 
[2022-09-02T00:25:56.740Z] IQv2IntegrationTest > shouldRejectNonRunningActive() 
PASSED
[2022-09-02T00:25:58.308Z] 
[2022-09-02T00:25:58.308Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() PASSED
[2022-09-02T00:25:58.308Z] 
[2022-09-02T00:25:58.308Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() STARTED
[2022-09-02T00:25:58.828Z] 
[2022-09-02T00:25:58.829Z] InternalTopicIntegrationTest > 
shouldCompactTopicsForKeyValueStoreChangelogs() STARTED
[2022-09-02T00:25:59.351Z] 
[2022-09-02T00:25:59.351Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() PASSED
[2022-09-02T00:26:00.394Z] 
[2022-09-02T00:26:00.394Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) STARTED
[2022-09-02T00:26:00.915Z] 
[2022-09-02T00:26:00.915Z] InternalTopicIntegrationTest > 
shouldCompactTopicsForKeyValueStoreChangelogs() PASSED
[2022-09-02T00:26:00.915Z] 
[2022-09-02T00:26:00.915Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() STARTED
[2022-09-02T00:26:04.059Z] 
[2022-09-02T00:26:04.059Z] InternalTopicIntegrationTest > 
shouldGetToRunningWithWindowedTableInFKJ() PASSED
[2022-09-02T00:26:04.059Z] 
[2022-09-02T00:26:04.059Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() STARTED
[2022-09-02T00:26:04.059Z] 
[2022-09-02T00:26:04.059Z] InternalTopicIntegrationTest > 
shouldCompactAndDeleteTopicsForWindowStoreChangelogs() PASSED
[2022-09-02T00:26:06.143Z] 
[2022-09-02T00:26:06.143Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) PASSED
[2022-09-02T00:26:06.143Z] 
[2022-09-02T00:26:06.143Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() STARTED
[2022-09-02T00:26:06.317Z] 
[2022-09-02T00:26:06.317Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) STARTED
[2022-09-02T00:26:07.262Z] 
[2022-09-02T00:26:07.262Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() PASSED
[2022-09-02T00:26:07.262Z] 
[2022-09-02T00:26:07.262Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) STARTED
[2022-09-02T00:26:11.868Z] 
[2022-09-02T00:26:11.868Z] KStreamAggregationIntegrationTest > 
shouldAggregateSlidingWindows(TestInfo) PASSED
[2022-09-02T00:26:11.868Z] 
[2022-09-02T00:26:11.868Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() STARTED
[2022-09-02T00:26:14.056Z] 
[2022-09-02T00:26:14.056Z] KStreamAggregationIntegrationTest > 
shouldReduceSessionWindows() PASSED
[2022-09-02T00:26:14.056Z] 
[2022-09-02T00:26:14.056Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) STARTED
[2022-09-02T00:26:15.150Z] 
[2022-09-02T00:26:15.150Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) PASSED
[2022-09-02T00:26:15.150Z] 
[2022-09-02T00:26:15.150Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) STARTED
[2022-09-02T00:26:18.714Z] 
[2022-09-02T00:26:18.714Z] KStreamAggregationIntegrationTest > 
shouldReduceSlidingWindows(TestInfo) PASSED
[2022-09-02T00:26:18.714Z] 
[2022-09-02T00:26:18.714Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) STARTED
[2022-09-02T00:26:20.457Z] 
[2022-09-02T00:26:20.457Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) PASSED
[2022-09-02T00:26:20.457Z] 
[2022-09-02T00:26:20.457Z] KStreamAggregationI

Problem with Kafka KRaft 3.1.X

2022-09-01 Thread Paul Brebner
Hi all,

I've been attempting to benchmark Kafka KRaft version for an ApacheCon talk
and have identified 2 problems:

1 - it's still impossible to create large number of partitions/topics - I
can create more than the comparable Zookeeper version but still not
"millions" - this is with RF=1 (as anything higher needs huge clusters to
cope with the replication CPU overhead) only, and no load on the clusters
yet (i.e. purely a topic/partition creation experiment).

2 - eventually the topic/partition creation command causes the Kafka
process to fail - looks like a memory error -

java.lang.OutOfMemoryError: Metaspace
OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x7f4f554f9000, 65536, 1) failed; error='Not enough
space' (errno=12)

or similar error

seems to happen consistently around 30,000+ partitions - this is on a test
EC2 instance with 32GB Ram, 500,000 file descriptors (increased from
default) and 64GB disk (plenty spare). I'm not an OS expert, but the kafka
process and the OS both seem to have plenty of RAM when this error occurs.

So there's 3 questions really: What's going wrong exactly? How to achieve
more partitions? And should the topic create command (just using the CLI at
present to create topics) really be capable of killing the Kafka instance,
or should it fail and throw an error, and the Kafka instance still continue
working...

Regards, Paul Brebner


Re: [VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread John Roesler
Thanks for the KIP, Alex!

+1 (binding) from me. 

-John

On Thu, Sep 1, 2022, at 09:51, Guozhang Wang wrote:
> +1, thanks Alex!
>
> On Thu, Sep 1, 2022 at 6:33 AM Bruno Cadonna  wrote:
>
>> Thanks for the KIP!
>>
>> +1 (binding)
>>
>> Best,
>> Bruno
>>
>> On 01.09.22 15:26, Colt McNealy wrote:
>> > +1
>> >
>> > Hi Alex,
>> >
>> > Thank you for your work on the KIP. I'm not a committer so my vote is
>> > non-binding but I strongly support this improvement.
>> >
>> > Thank you,
>> > Colt McNealy
>> > *Founder, LittleHorse.io*
>> >
>> >
>> > On Thu, Sep 1, 2022 at 8:20 AM Alexander Sorokoumov
>> >  wrote:
>> >
>> >> Hi All,
>> >>
>> >> I would like to start a voting thread on KIP-844, which introduces
>> >> transactional state stores to avoid wiping local state on crash failure
>> >> under EOS.
>> >>
>> >> KIP:
>> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
>> >> Discussion thread:
>> >> https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
>> >> Jira: https://issues.apache.org/jira/browse/KAFKA-12549
>> >>
>> >> Best,
>> >> Alex
>> >>
>> >
>>
>
>
> -- 
> -- Guozhang


Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

2022-09-01 Thread Jun Rao
Hi, David,

Thanks for the KIP. Overall, the main benefits of the KIP seem to be fewer
RPCs during rebalance and more efficient support of wildcard. A few
comments below.

30. ConsumerGroupHeartbeatRequest
30.1 ServerAssignor is a singleton. Do we plan to support rolling changing
of the partition assignor in the consumers?
30.2 For each field, could you explain whether it's required in every
request or the scenarios when it needs to be filled? For example, it's not
clear to me when TopicPartitions needs to be filled.

31. In the current consumer protocol, the rack affinity between the client
and the broker is only considered during fetching, but not during assigning
partitions to consumers. Sometimes, once the assignment is made, there is
no opportunity for read affinity because no replicas of assigned partitions
are close to the member. I am wondering if we should use this opportunity
to address this by including rack in GroupMember.

32. On the metric side, often, it's useful to know how busy a group
coordinator is. By moving the event loop model, it seems that we could add
a metric that tracks the fraction of the time the event loop is doing the
actual work.

33. Could we add a section on coordinator failover handling? For example,
does it need to trigger the check if any group with the wildcard
subscription now has a new matching topic?

34. ConsumerGroupMetadataValue, ConsumerGroupPartitionMetadataValue,
ConsumerGroupMemberMetadataValue: Could we document what the epoch field
reflects? For example, does the epoch in ConsumerGroupMetadataValue reflect
the latest group epoch? What about the one in
ConsumerGroupPartitionMetadataValue and ConsumerGroupMemberMetadataValue?

35. "the group coordinator will ensure that the following invariants are
met: ... All members exists." It's possible for a member not to get any
assigned partitions, right?

36. "He can rejoins the group with a member epoch equals to 0": When would
a consumer rejoin and what member id would be used?

37. "Instead, power users will have the ability to trigger a reassignment
by either providing a non-zero reason or by updating the assignor
metadata." Hmm, this seems to be conflicting with the deprecation of
Consumer#enforeRebalance.

38. The reassignment examples are nice. But the section seems to have
multiple typos.
38.1 When the group transitions to epoch 2, B immediately gets into
"epoch=1, partitions=[foo-2]", which seems incorrect.
38.2 When the group transitions to epoch 3, C seems to get into epoch=3,
partitions=[foo-1] too early.
38.3 After A transitions to epoch 3, C still has A - epoch=2,
partitions=[foo-0].

39. Rolling upgrade of consumers: Do we support the upgrade from any old
version to new one?

Thanks,

Jun

On Mon, Aug 29, 2022 at 9:20 AM David Jacot 
wrote:

> Hi all,
>
> The KIP states that we will re-implement the coordinator in Java. I
> discussed this offline with a few folks and folks are concerned that
> we could introduce many regressions in the old protocol if we do so.
> Therefore, I am going to remove this statement from the KIP. It is an
> implementation detail after all so it does not have to be decided at
> this stage. We will likely start by trying to refactor the current
> implementation as a first step.
>
> Cheers,
> David
>
> On Mon, Aug 29, 2022 at 3:52 PM David Jacot  wrote:
> >
> > Hi Luke,
> >
> > > 1.1. I think the state machine are: "Empty, assigning, reconciling,
> stable,
> > > dead" mentioned in Consumer Group States section, right?
> >
> > This sentence does not refer to those group states but rather to a
> > state machine replication (SMR). This refers to the entire state of
> > group coordinator which is replicated via the log layer. I will
> > clarify this in the KIP.
> >
> > > 1.2. What do you mean "each state machine is modelled as an event
> loop"?
> >
> > The idea is to follow a model similar to the new quorum controller. We
> > will have N threads to process events. Each __consumer_offsets
> > partition is assigned to a unique thread and all the events (e.g.
> > requests, callbacks, etc.) are processed by this thread. This simplify
> > concurrency and will enable us to do simulation testing for the group
> > coordinator.
> >
> > > 1.3. Why do we need a state machine per *__consumer_offsets*
> partitions?
> > > Not a state machine "per consumer group" owned by a group coordinator?
> For
> > > example, if one group coordinator owns 2 consumer groups, and both
> exist in
> > > *__consumer_offsets-0*, will we have 1 state machine for it, or 2?
> >
> > See 1.1. The confusion comes from there, I think.
> >
> > > 1.4. I know the "*group.coordinator.threads" *is the number of threads
> used
> > > to run the state machines. But I'm wondering if the purpose of the
> threads
> > > is only to keep the state of each consumer group (or
> *__consumer_offsets*
> > > partitions?), and no heavy computation, why should we need
> multi-threads
> > > here?
> >
> > See 1.2. The idea is to have an ability t

[VOTE] Apache Kafka 3.3.0 RC1

2022-09-01 Thread José Armando García Sancio
Hello Kafka users, developers and client-developers,

This is the first candidate for the release of Apache Kafka 3.3.0.
There are some issues that we still have to resolve before we can make
a final release. Those issues are documented here:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.3.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC%20%20%20%20%20%20

Even though there are known issues, I think there is value in testing
this release candidate.

Release notes for the 3.3.0 release:
https://home.apache.org/~jsancio/kafka-3.3.0-rc1/RELEASE_NOTES.html

Please download and test.

Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts (source and binary):
https://home.apache.org/~jsancio/kafka-3.3.0-rc1/

* Maven artifacts:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~jsancio/kafka-3.3.0-rc1/javadoc/

* The 3.3.0 tag:
https://github.com/apache/kafka/releases/tag/3.3.0-rc1

* Documentation:
The documentation specifically for 3.3 didn't get updated. I am working on that.

* Protocol:
The protocol documentation for 3.3 did get updated. I am working on that.

* Successful Jenkins builds for the 3.3 branch:
Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.3/50/

The system tests are flaky. I am working on getting a green build.

Thanks,
-- 
-José


[jira] [Created] (KAFKA-14195) Fix KRaft AlterConfig policy usage for Legacy/Full case

2022-09-01 Thread Ron Dagostino (Jira)
Ron Dagostino created KAFKA-14195:
-

 Summary: Fix KRaft AlterConfig policy usage for Legacy/Full case
 Key: KAFKA-14195
 URL: https://issues.apache.org/jira/browse/KAFKA-14195
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.3
Reporter: Ron Dagostino
Assignee: Ron Dagostino


The fix for https://issues.apache.org/jira/browse/KAFKA-14039 adjusted the 
invocation of the alter configs policy check in KRaft to match the behavior in 
ZooKeeper, which is to only provide the configs that were explicitly sent in 
the request. While the code was correct for the incremental alter configs case, 
the code actually included the implicit deletions for the 
legacy/non-incremental alter configs case, and those implicit deletions are not 
included in the ZooKeeper-based invocation. The implicit deletions should not 
be passed in the legacy case.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-01 Thread Jorge Esteban Quilcate Otoya
Hi Sagar and Yash,

Thanks for your feedback!

> 1) I am assuming the new metrics would be task level metric.

1.1 Yes, it will be a task level metric, implemented on the
Worker[Source/Sink]Task.

> Could you specify the way it's done for other sink/source connector?

1.2. Not sure what do you mean by this. Could you elaborate a bit more?

> 2. I am slightly confused about the e2e latency metric...

2.1. Yes, I see. I was trying to bring a similar concept as in Streams with
KIP-613, though the e2e concept may not be translatable.
We could keep it as `sink-record-latency` to avoid conflating concepts. A
similar metric naming was proposed in KIP-489 but at the consumer level —
though it seems dormant for a couple of years.

> However, the put-batch time measures the
> time to put a batch of records to external sink. So, I would assume the 2
> can't be added as is to compute the e2e latency. Maybe I am missing
> something here. Could you plz clarify this.

2.2. Yes, agree. Not necessarily added, but with the 3 latencies (poll,
convert, putBatch) will be clearer where the bottleneck may be, and
represent the internal processing.

> however, as per the KIP it looks like it will be
> the latency between when the record was written to Kafka and when the
> record is returned by a sink task's consumer's poll?

3.1. Agree. 2.1. could help to clarify this.

> One more thing - I was wondering
> if there's a particular reason for having a min metric for e2e latency but
> not for convert time?

3.2. Was following KIP-613 for e2e which seems useful to compare with Max a
get an idea of the window of results, though current latencies in Connector
do not include Min, and that's why I haven't added it for convert latency.
Do you think it make sense to extend latency metrics with Min?

KIP is updated to clarify some of these changes.

Many thanks,
Jorge.

On Thu, 1 Sept 2022 at 18:11, Yash Mayya  wrote:

> Hi Jorge,
>
> Thanks for the KIP! I have the same confusion with the e2e-latency metrics
> as Sagar above. "e2e" would seem to indicate the latency between when the
> record was written to Kafka and when the record was written to the sink
> system by the connector - however, as per the KIP it looks like it will be
> the latency between when the record was written to Kafka and when the
> record is returned by a sink task's consumer's poll? I think that metric
> will be a little confusing to interpret. One more thing - I was wondering
> if there's a particular reason for having a min metric for e2e latency but
> not for convert time?
>
> Thanks,
> Yash
>
> On Thu, Sep 1, 2022 at 8:59 PM Sagar  wrote:
>
> > Hi Jorge,
> >
> > Thanks for the KIP. It looks like a very good addition. I skimmed through
> > once and had a couple of questions =>
> >
> > 1) I am assuming the new metrics would be task level metric. Could you
> > specify the way it's done for other sink/source connector?
> > 2) I am slightly confused about the e2e latency metric. Let's consider
> the
> > sink connector metric. If I look at the way it's supposed to be
> calculated,
> > i.e the difference between the record timestamp and the wall clock time,
> it
> > looks like a per record metric. However, the put-batch time measures the
> > time to put a batch of records to external sink. So, I would assume the 2
> > can't be added as is to compute the e2e latency. Maybe I am missing
> > something here. Could you plz clarify this.
> >
> > Thanks!
> > Sagar.
> >
> > On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya <
> > quilcate.jo...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > > I'd like to start a discussion thread on KIP-864: Add End-To-End
> Latency
> > > Metrics to Connectors.
> > > This KIP aims to improve the metrics available on Source and Sink
> > > Connectors to measure end-to-end latency, including source and sink
> > record
> > > conversion time, and sink record e2e latency (similar to KIP-613 for
> > > Streams).
> > >
> > > The KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> > >
> > > Please take a look and let me know what you think.
> > >
> > > Cheers,
> > > Jorge.
> > >
> >
>


Re: Re: [DISCUSS] KIP-821: Connect Transforms support for nested structures

2022-09-01 Thread Jorge Esteban Quilcate Otoya
Hi Chris,

Thanks for your feedback!

1. Yes, it will be context-dependent. I have added rules and scenarios to
the nested notation to cover the happy path and edge cases. In short,
backticks will be not be considered as part of the field name when they are
wrapping a field name: first backtick at the beginning of the path or after
a dot, and closing backtick before a dot or at the end of the path.
If backticks happen to be in those positions, use backslash to escape them.
2. You're right, that's a typo. Fixing it.
3. I don't think so, I have added a scenario to clarify this.

KIP is updated. Hopefully the rules and scenarios help to close any open
gap. Let me know if you see any cases that is not considered to address it.

Cheers,
Jorge.

On Wed, 31 Aug 2022 at 20:02, Chris Egerton  wrote:

> Hi Robert and Jorge,
>
> I think the backtick/backslash proposal works, but I'm a little unclear on
> some of the details:
>
> 1. Are backticks only given special treatment when they immediately follow
> a non-escaped dot? E.g., "foo.b`ar.ba`z" would refer to "foo" -> "b`ar" ->
> "ba`z" instead of "foo" -> "bar.baz"? Based on the example where the name
> "a.b`.c" refers to "a" -> "b`" -> "c", it seems like this is the case, but
> I'm worried this might cause confusion since the role of the backtick and
> the need to escape it becomes context-dependent.
>
> 2. In the example table near the beginning of the KIP, the name "a.`b\`.c`"
> refers to "a" -> "b`c". What happened to the dot in the second part of the
> name? Should it refer to "a" -> "b`.c" instead?
>
> 3. Is it ever necessary to escape backslashes themselves? If so, when?
>
> Overall, I wish we could come up with a prettier/simpler approach, but the
> benefits provided by the dual backtick/dot syntax are too great to deny:
> there are no correctness issues like the ones posed with double-dot
> escaping that would lead to ambiguity, the most common cases are still very
> simple to work with, and there's no risk of interfering with JSON escape
> mechanisms (in most cases) or single-quote shell quoting (which may be
> relevant when connector configurations are defined on the command line).
> Thanks for the suggestion, Robert!
>
> Cheers,
>
> Chris
>


RE: Re: Permission to assign Apache Kafka Jiras to myself

2022-09-01 Thread Yash Mayya
Hi David,

Thanks for granting me the requested permission! Could you please also
grant me wiki (Confluence) permissions on the same username and email?

Thanks,
Yash

On 2022/08/11 14:34:28 David Jacot wrote:
> Hi Yash,
>
> You are all set.
>
> Best,
> David
>
> On Thu, Aug 11, 2022 at 3:47 PM Yash Mayya  wrote:
> >
> > Hey folks,
> >
> > I can't currently assign Apache Kafka Jiras to myself and I just
discovered
> > that someone needs to add me to the contributors list in order for me
to be
> > able to do that. Could someone please do that for me? Here are my Jira
> > account details:
> >
> > Username: yash.mayya
> > Email: yash.ma...@gmail.com
> >
> > Thanks,
> > Yash
>


Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-01 Thread Yash Mayya
Hi Jorge,

Thanks for the KIP! I have the same confusion with the e2e-latency metrics
as Sagar above. "e2e" would seem to indicate the latency between when the
record was written to Kafka and when the record was written to the sink
system by the connector - however, as per the KIP it looks like it will be
the latency between when the record was written to Kafka and when the
record is returned by a sink task's consumer's poll? I think that metric
will be a little confusing to interpret. One more thing - I was wondering
if there's a particular reason for having a min metric for e2e latency but
not for convert time?

Thanks,
Yash

On Thu, Sep 1, 2022 at 8:59 PM Sagar  wrote:

> Hi Jorge,
>
> Thanks for the KIP. It looks like a very good addition. I skimmed through
> once and had a couple of questions =>
>
> 1) I am assuming the new metrics would be task level metric. Could you
> specify the way it's done for other sink/source connector?
> 2) I am slightly confused about the e2e latency metric. Let's consider the
> sink connector metric. If I look at the way it's supposed to be calculated,
> i.e the difference between the record timestamp and the wall clock time, it
> looks like a per record metric. However, the put-batch time measures the
> time to put a batch of records to external sink. So, I would assume the 2
> can't be added as is to compute the e2e latency. Maybe I am missing
> something here. Could you plz clarify this.
>
> Thanks!
> Sagar.
>
> On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya <
> quilcate.jo...@gmail.com> wrote:
>
> > Hi all,
> >
> > I'd like to start a discussion thread on KIP-864: Add End-To-End Latency
> > Metrics to Connectors.
> > This KIP aims to improve the metrics available on Source and Sink
> > Connectors to measure end-to-end latency, including source and sink
> record
> > conversion time, and sink record e2e latency (similar to KIP-613 for
> > Streams).
> >
> > The KIP is here:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
> >
> > Please take a look and let me know what you think.
> >
> > Cheers,
> > Jorge.
> >
>


[jira] [Created] (KAFKA-14194) NPE in Cluster.nodeIfOnline

2022-09-01 Thread Andrew Dean (Jira)
Andrew Dean created KAFKA-14194:
---

 Summary: NPE in Cluster.nodeIfOnline
 Key: KAFKA-14194
 URL: https://issues.apache.org/jira/browse/KAFKA-14194
 Project: Kafka
  Issue Type: Bug
Reporter: Andrew Dean


When utilizing rack-aware Kafka consumers and the Kafka broker cluster is 
restarted an NPE can occur during transient metadata updates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1193

2022-09-01 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 422658 lines...]
[2022-09-01T15:35:53.146Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) STARTED
[2022-09-01T15:35:57.137Z] 
[2022-09-01T15:35:57.137Z] KStreamAggregationIntegrationTest > 
shouldReduce(TestInfo) PASSED
[2022-09-01T15:35:57.137Z] 
[2022-09-01T15:35:57.137Z] KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) STARTED
[2022-09-01T15:36:00.006Z] 
[2022-09-01T15:36:00.006Z] KStreamAggregationIntegrationTest > 
shouldAggregate(TestInfo) PASSED
[2022-09-01T15:36:00.006Z] 
[2022-09-01T15:36:00.006Z] KStreamAggregationIntegrationTest > 
shouldCount(TestInfo) STARTED
[2022-09-01T15:36:04.731Z] 
[2022-09-01T15:36:04.731Z] KStreamAggregationIntegrationTest > 
shouldCount(TestInfo) PASSED
[2022-09-01T15:36:04.731Z] 
[2022-09-01T15:36:04.731Z] KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) STARTED
[2022-09-01T15:36:08.558Z] 
[2022-09-01T15:36:08.558Z] KStreamAggregationIntegrationTest > 
shouldGroupByKey(TestInfo) PASSED
[2022-09-01T15:36:08.558Z] 
[2022-09-01T15:36:08.558Z] KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) STARTED
[2022-09-01T15:36:12.433Z] 
[2022-09-01T15:36:12.433Z] KStreamAggregationIntegrationTest > 
shouldCountWithInternalStore(TestInfo) PASSED
[2022-09-01T15:36:12.433Z] 
[2022-09-01T15:36:12.433Z] KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() STARTED
[2022-09-01T15:36:14.373Z] 
[2022-09-01T15:36:14.373Z] KStreamAggregationIntegrationTest > 
shouldCountUnlimitedWindows() PASSED
[2022-09-01T15:36:14.373Z] 
[2022-09-01T15:36:14.373Z] KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) STARTED
[2022-09-01T15:36:17.926Z] 
[2022-09-01T15:36:17.926Z] KStreamAggregationIntegrationTest > 
shouldReduceWindowed(TestInfo) PASSED
[2022-09-01T15:36:17.926Z] 
[2022-09-01T15:36:17.926Z] KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() STARTED
[2022-09-01T15:36:18.855Z] 
[2022-09-01T15:36:18.855Z] KStreamAggregationIntegrationTest > 
shouldCountSessionWindows() PASSED
[2022-09-01T15:36:18.855Z] 
[2022-09-01T15:36:18.855Z] KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) STARTED
[2022-09-01T15:36:23.523Z] 
[2022-09-01T15:36:23.523Z] KStreamAggregationIntegrationTest > 
shouldAggregateWindowed(TestInfo) PASSED
[2022-09-01T15:36:25.598Z] 
[2022-09-01T15:36:25.599Z] FineGrainedAutoResetIntegrationTest > 
shouldOnlyReadRecordsWhereEarliestSpecifiedWithNoCommittedOffsetsWithDefaultGlobalAutoOffsetResetEarliest()
 PASSED
[2022-09-01T15:36:25.599Z] 
[2022-09-01T15:36:25.599Z] FineGrainedAutoResetIntegrationTest > 
shouldThrowStreamsExceptionNoResetSpecified() STARTED
[2022-09-01T15:36:25.599Z] 
[2022-09-01T15:36:25.599Z] FineGrainedAutoResetIntegrationTest > 
shouldThrowStreamsExceptionNoResetSpecified() PASSED
[2022-09-01T15:36:25.599Z] 
[2022-09-01T15:36:25.599Z] GlobalKTableIntegrationTest > 
shouldGetToRunningWithOnlyGlobalTopology() STARTED
[2022-09-01T15:36:25.599Z] 
[2022-09-01T15:36:25.599Z] GlobalKTableIntegrationTest > 
shouldGetToRunningWithOnlyGlobalTopology() PASSED
[2022-09-01T15:36:25.599Z] 
[2022-09-01T15:36:25.599Z] GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin() STARTED
[2022-09-01T15:36:28.084Z] 
[2022-09-01T15:36:28.084Z] GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableLeftJoin() PASSED
[2022-09-01T15:36:28.084Z] 
[2022-09-01T15:36:28.084Z] GlobalKTableIntegrationTest > 
shouldRestoreGlobalInMemoryKTableOnRestart() STARTED
[2022-09-01T15:36:29.181Z] 
[2022-09-01T15:36:29.181Z] GlobalKTableIntegrationTest > 
shouldRestoreGlobalInMemoryKTableOnRestart() PASSED
[2022-09-01T15:36:29.181Z] 
[2022-09-01T15:36:29.181Z] GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin() STARTED
[2022-09-01T15:36:30.651Z] streams-3: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:30.651Z] streams-5: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:30.651Z] streams-1: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:31.159Z] 
[2022-09-01T15:36:31.159Z] GlobalKTableIntegrationTest > 
shouldKStreamGlobalKTableJoin() PASSED
[2022-09-01T15:36:31.668Z] streams-4: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:31.668Z] streams-0: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:31.668Z] streams-6: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:31.668Z] streams-2: SMOKE-TEST-CLIENT-CLOSED
[2022-09-01T15:36:33.208Z] 
[2022-09-01T15:36:33.208Z] GlobalThreadShutDownOrderTest > 
shouldFinishGlobalStoreOperationOnShutDown() STARTED
[2022-09-01T15:36:38.347Z] 
[2022-09-01T15:36:38.347Z] FAILURE: Build failed with an exception.
[2022-09-01T15:36:38.347Z] 
[2022-09-01T15:36:38.347Z] * What went wrong:
[2022-09-01T15:36:38.347Z] Execution failed for task ':core:unitTest'.
[2022-09-01T15:36:38.347Z] > Process 'Gradle Test Executor 131' finished with 
non-zero exit value 1
[2022-09-01T15:36:38.347Z]   This problem might be ca

Re: [DISCUSS] KIP-864: Add End-To-End Latency Metrics to Connectors

2022-09-01 Thread Sagar
Hi Jorge,

Thanks for the KIP. It looks like a very good addition. I skimmed through
once and had a couple of questions =>

1) I am assuming the new metrics would be task level metric. Could you
specify the way it's done for other sink/source connector?
2) I am slightly confused about the e2e latency metric. Let's consider the
sink connector metric. If I look at the way it's supposed to be calculated,
i.e the difference between the record timestamp and the wall clock time, it
looks like a per record metric. However, the put-batch time measures the
time to put a batch of records to external sink. So, I would assume the 2
can't be added as is to compute the e2e latency. Maybe I am missing
something here. Could you plz clarify this.

Thanks!
Sagar.

On Tue, Aug 30, 2022 at 8:43 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi all,
>
> I'd like to start a discussion thread on KIP-864: Add End-To-End Latency
> Metrics to Connectors.
> This KIP aims to improve the metrics available on Source and Sink
> Connectors to measure end-to-end latency, including source and sink record
> conversion time, and sink record e2e latency (similar to KIP-613 for
> Streams).
>
> The KIP is here:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-864%3A+Add+End-To-End+Latency+Metrics+to+Connectors
>
> Please take a look and let me know what you think.
>
> Cheers,
> Jorge.
>


RE: Re: Re: KIP 678: New Kafka Connect SMT for plainText => Struct(or Map) with Regex

2022-09-01 Thread Chris Egerton
Hi whsoul,

No further comments from me, and sorry for the delay. This looks good! Feel
free to open a vote thread and I can kick things off with a +1.

Cheers,

Chris


Re: [VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread Guozhang Wang
+1, thanks Alex!

On Thu, Sep 1, 2022 at 6:33 AM Bruno Cadonna  wrote:

> Thanks for the KIP!
>
> +1 (binding)
>
> Best,
> Bruno
>
> On 01.09.22 15:26, Colt McNealy wrote:
> > +1
> >
> > Hi Alex,
> >
> > Thank you for your work on the KIP. I'm not a committer so my vote is
> > non-binding but I strongly support this improvement.
> >
> > Thank you,
> > Colt McNealy
> > *Founder, LittleHorse.io*
> >
> >
> > On Thu, Sep 1, 2022 at 8:20 AM Alexander Sorokoumov
> >  wrote:
> >
> >> Hi All,
> >>
> >> I would like to start a voting thread on KIP-844, which introduces
> >> transactional state stores to avoid wiping local state on crash failure
> >> under EOS.
> >>
> >> KIP:
> >>
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
> >> Discussion thread:
> >> https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
> >> Jira: https://issues.apache.org/jira/browse/KAFKA-12549
> >>
> >> Best,
> >> Alex
> >>
> >
>


-- 
-- Guozhang


[GitHub] [kafka-site] sadatrafsan commented on pull request #431: Brain Station 23 adopted Kafka

2022-09-01 Thread GitBox


sadatrafsan commented on PR #431:
URL: https://github.com/apache/kafka-site/pull/431#issuecomment-1234341677

   solved


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] sadatrafsan closed pull request #431: Brain Station 23 adopted Kafka

2022-09-01 Thread GitBox


sadatrafsan closed pull request #431: Brain Station 23 adopted Kafka
URL: https://github.com/apache/kafka-site/pull/431


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread Bruno Cadonna

Thanks for the KIP!

+1 (binding)

Best,
Bruno

On 01.09.22 15:26, Colt McNealy wrote:

+1

Hi Alex,

Thank you for your work on the KIP. I'm not a committer so my vote is
non-binding but I strongly support this improvement.

Thank you,
Colt McNealy
*Founder, LittleHorse.io*


On Thu, Sep 1, 2022 at 8:20 AM Alexander Sorokoumov
 wrote:


Hi All,

I would like to start a voting thread on KIP-844, which introduces
transactional state stores to avoid wiping local state on crash failure
under EOS.

KIP:

https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
Discussion thread:
https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
Jira: https://issues.apache.org/jira/browse/KAFKA-12549

Best,
Alex





Re: [VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread Colt McNealy
+1

Hi Alex,

Thank you for your work on the KIP. I'm not a committer so my vote is
non-binding but I strongly support this improvement.

Thank you,
Colt McNealy
*Founder, LittleHorse.io*


On Thu, Sep 1, 2022 at 8:20 AM Alexander Sorokoumov
 wrote:

> Hi All,
>
> I would like to start a voting thread on KIP-844, which introduces
> transactional state stores to avoid wiping local state on crash failure
> under EOS.
>
> KIP:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
> Discussion thread:
> https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
> Jira: https://issues.apache.org/jira/browse/KAFKA-12549
>
> Best,
> Alex
>


[VOTE] KIP-844: Transactional State Stores

2022-09-01 Thread Alexander Sorokoumov
Hi All,

I would like to start a voting thread on KIP-844, which introduces
transactional state stores to avoid wiping local state on crash failure
under EOS.

KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-844%3A+Transactional+State+Stores
Discussion thread:
https://lists.apache.org/thread/4vc18t0o2wsk0n235dd4pd1hlr1p6gm2
Jira: https://issues.apache.org/jira/browse/KAFKA-12549

Best,
Alex


RE: ARM/PowerPC builds

2022-09-01 Thread Amit Ramswaroop Baheti
Hi Colin,

My concern was to resolve the build failures. I think we should be fine running 
the builds nightly on PowerPC.

I am wondering how to enable that. Do we need to create another Jenkinsfile for 
nightly build?


-Amit Baheti

From: Colin McCabe 
Sent: Tuesday, August 30, 2022 1:44 AM
To: dev@kafka.apache.org 
Subject: [EXTERNAL] Re: ARM/PowerPC builds

Hi Amit,

I don't see why we need to run a PowerPC build on every test run. It seems 
excessive, given that most Java code should work on PowerPC without any 
changes. Why not just run it nightly?

best,
Colin


On Mon, Aug 22, 2022, at 08:05, Amit Ramswaroop Baheti wrote:
> I am planning to raise a PR for re-enabling PowerPC on the pipeline.
>
> I will be monitoring PowerPC for failures. I hope this resolves the
> concern about build failures on PowerPC. Let me know otherwise.
>
> Thanks,
> Amit Baheti
>
> -Original Message-
> From: Amit Ramswaroop Baheti 
> Sent: 10 August 2022 20:09
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] RE: ARM/PowerPC builds
>
> I looked at the PR failures on PowerPC jenkins node & the issue was due
> to few stuck gradle daemons. I could find that using "./gradlew
> --status".
> After I cleaned them up using "./gradlew --stop", I was able to build &
> test existing PR's in the jenkins workspace on that node.
>
> I also pulled another PR #12488 manually & could execute build and
> tests successfully on that node. Further the build and test is
> executing within minutes. So no problem on the performance front.
>
> I am wondering if Apache infra team has any automation to clear stuck
> gradle daemons. If not, perhaps we may look at one using cron job and
> make such infra issues selfheal.
>
> I think it should be fine now to re-enable the PowerPC node in the pipeline.
>
> Thanks,
> Amit Baheti
>
> -Original Message-
> From: Amit Ramswaroop Baheti 
> Sent: 05 August 2022 17:52
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] RE: ARM/PowerPC builds
>
> Hi,
>
> I am looking into the failures on PowerPC.
>
> I will share more details once I have something concrete & hopefully we
> would be able to enable it again soon.
>
> -Amit Baheti
>
> -Original Message-
> From: Colin McCabe 
> Sent: 04 August 2022 22:39
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] Re: ARM/PowerPC builds
>
> Hi Matthew,
>
> Can you open a JIRA for the test failures you have seen on M1?
>
> By the way, I have an M1 myself.
>
> best,
> Colin
>
> On Thu, Aug 4, 2022, at 04:12, Matthew Benedict de Detrich wrote:
>> Quite happy to have this change gone through since the ARM builds were
>> constantly failing however I iterate what Divij Vaidya is saying. I
>> just recently got a new MacBook M1 laptop that has ARM architecture
>> and even locally the tests fail (these are the same tests that also
>> failed in Jenkins).
>>
>> Should get to the root of the issue especially as more people will get
>> newer Apple laptops over time.
>>
>> --
>> Matthew de Detrich
>> Aiven Deutschland GmbH
>> Immanuelkirchstraße 26, 10405 Berlin
>> Amtsgericht Charlottenburg, HRB 209739 B
>>
>> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
>> m: +491603708037
>> w: aiven.io e: matthew.dedetr...@aiven.io On 4. Aug 2022, 12:36 +0200,
>> Divij Vaidya , wrote:
>>> Thank you. This would greatly improve the PR experience since now,
>>> there is higher probability for it to be green.
>>>
>>> Side question though, do we know why ARM tests are timing out? Should
>>> we start a JIRA with Apache Infra to root cause?
>>>
>>> —
>>> Divij Vaidya
>>>
>>>
>>>
>>> On Thu, Aug 4, 2022 at 12:42 AM Colin McCabe  wrote:
>>>
>>> > Just a quick note. Today we committed INVALID URI REMOVED
>>> > che_kafka_pull_12380&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=50nG1jjahH
>>> > 8a5uFJPfv74FQ5POP1oomXeKf8Xylvgw4&m=g4U2eMxnb6_oWrLqMssiRMCn_beLcy75GeHjl2_RYFce1iI2i2HfDSLLFK8gjKrV&s=7rXLDMIeH-7tYpHu6rFBeRVd8kqK0SMKIgb9_nvr6HI&e=
>>> >   , "MINOR: Remove ARM/PowerPC builds from Jenkinsfile #12380". This PR 
>>> > removes the ARM and PowerPC builds from the Jenkinsfile.
>>> >
>>> > The rationale is that these builds seem to be failing all the time,
>>> > and this is very disruptive. I personally didn't see any successes
>>> > in the last week or two. So I think we need to rethink this integration a 
>>> > bit.
>>> >
>>> > I'd suggest that we run these builds as nightly builds rather than
>>> > on each commit. It's going to be rare that we make a change that
>>> > succeeds on x86 but breaks on PowerPC or ARM. This would let us
>>> > have very long timeouts on our ARM and PowerPC builds (they could
>>> > take all night if necessary), hence avoiding this issue.
>>> >
>>> > best,
>>> > Colin
>>> >
>>> --
>>> Divij Vaidya


Re: [DISCUSS] KIP-844: Transactional State Stores

2022-09-01 Thread Alexander Sorokoumov
Hey Guozhang,

Sounds good. I annotated all added StateStore methods (commit, recover,
transactional) with @Evolving.

Best,
Alex



On Wed, Aug 31, 2022 at 7:32 PM Guozhang Wang  wrote:

> Hello Alex,
>
> Thanks for the detailed replies, I think that makes sense, and in the long
> run we would need some public indicators from StateStore to determine if
> checkpoints can really be used to indicate clean snapshots.
>
> As for the @Evolving label, I think we can still keep it but for a
> different reason, since as we add more state management functionalities in
> the near future we may need to revisit the public APIs again and hence
> keeping it as @Evolving would allow us to modify if necessary, in an easier
> path than deprecate -> delete after several minor releases.
>
> Besides that, I have no further comments about the KIP.
>
>
> Guozhang
>
> On Fri, Aug 26, 2022 at 1:51 AM Alexander Sorokoumov
>  wrote:
>
> > Hey Guozhang,
> >
> >
> > I think that we will have to keep StateStore#transactional() because
> > post-commit checkpointing of non-txn state stores will break the
> guarantees
> > we want in ProcessorStateManager#initializeStoreOffsetsFromCheckpoint for
> > correct recovery. Let's consider checkpoint-recovery behavior under EOS
> > that we want to support:
> >
> > 1. Non-txn state stores should checkpoint on graceful shutdown and
> restore
> > from that checkpoint.
> >
> > 2. Non-txn state stores should delete local data during recovery after a
> > crash failure.
> >
> > 3. Txn state stores should checkpoint on commit and on graceful shutdown.
> > These stores should roll back uncommitted changes instead of deleting all
> > local data.
> >
> >
> > #1 and #2 are already supported; this proposal adds #3. Essentially, we
> > have two parties at play here - the post-commit checkpointing in
> > StreamTask#postCommit and recovery in ProcessorStateManager#
> > initializeStoreOffsetsFromCheckpoint. Together, these methods must allow
> > all three workflows and prevent invalid behavior, e.g., non-txn stores
> > should not checkpoint post-commit to avoid keeping uncommitted data on
> > recovery.
> >
> >
> > In the current state of the prototype, we checkpoint only txn state
> stores
> > post-commit under EOS using StateStore#transactional(). If we remove
> > StateStore#transactional() and always checkpoint post-commit,
> > ProcessorStateManager#initializeStoreOffsetsFromCheckpoint will have to
> > determine whether to delete local data. Non-txn implementation of
> > StateStore#recover can't detect if it has uncommitted writes. Since its
> > default implementation must always return either true or false, signaling
> > whether it is restored into a valid committed-only state. If
> > StateStore#recover always returns true, we preserve uncommitted writes
> and
> > violate correctness. Otherwise, ProcessorStateManager#
> > initializeStoreOffsetsFromCheckpoint would always delete local data even
> > after
> > a graceful shutdown.
> >
> >
> > With StateStore#transactional we avoid checkpointing non-txn state stores
> > and prevent that problem during recovery.
> >
> >
> > Best,
> >
> > Alex
> >
> > On Fri, Aug 19, 2022 at 1:05 AM Guozhang Wang 
> wrote:
> >
> > > Hello Alex,
> > >
> > > Thanks for the replies!
> > >
> > > > As long as we allow custom user implementations of that interface, we
> > > should
> > > probably either keep that flag to distinguish between transactional and
> > > non-transactional implementations or change the contract behind the
> > > interface. What do you think?
> > >
> > > Regarding this question, I thought that in the long run, we may always
> > > write checkpoints regardless of txn v.s. non-txn stores, in which case
> we
> > > would not need that `StateStore#transactional()`. But for now in order
> > for
> > > backward compatibility edge cases we still need to distinguish on
> whether
> > > or not to write checkpoints. Maybe I was mis-reading its purposes? If
> > yes,
> > > please let me know.
> > >
> > >
> > > On Mon, Aug 15, 2022 at 7:56 AM Alexander Sorokoumov
> > >  wrote:
> > >
> > > > Hey Guozhang,
> > > >
> > > > Thank you for elaborating! I like your idea to introduce a
> > StreamsConfig
> > > > specifically for the default store APIs. You mentioned Materialized,
> > but
> > > I
> > > > think changes in StreamJoined follow the same logic.
> > > >
> > > > I updated the KIP and the prototype according to your suggestions:
> > > > * Add a new StoreType and a StreamsConfig for transactional RocksDB.
> > > > * Decide whether Materialized/StreamJoined are transactional based on
> > the
> > > > configured StoreType.
> > > > * Move RocksDBTransactionalMechanism to
> > > > org.apache.kafka.streams.state.internals to remove it from the
> proposal
> > > > scope.
> > > > * Add a flag in new Stores methods to configure a state store as
> > > > transactional. Transactional state stores use the default
> transactional
> > > > mechanism.
> > > > * The changes above allowed to remove all cha

Re: Pricing plan

2022-09-01 Thread Robin Moffatt
You can contact Confluent here: https://confluent.io/contact


-- 

Robin Moffatt | Principal Developer Advocate | ro...@confluent.io | @rmoff


On Thu, 1 Sept 2022 at 09:44, Uzair Ahmed Mughal 
wrote:

> can you please then provide the pricing plan of Confluent.
> Regards Uzair.
>
> On Thu, Sep 1, 2022 at 1:40 PM Robin Moffatt 
> wrote:
>
> > Apache Kafka is licensed under Apache 2.0 and free to use.
> >
> > There are a variety of companies that will sell you a self-hosted
> platform
> > built on Kafka, or a Cloud-hosted version of Kafka.
> > These include Confluent (disclaimer: I work for them), Red Hat, AWS,
> Aiven,
> > Instaclustr, Cloudera, and more.
> >
> >
> > --
> >
> > Robin Moffatt | Principal Developer Advocate | ro...@confluent.io |
> @rmoff
> >
> >
> > On Thu, 1 Sept 2022 at 08:47, Uzair Ahmed Mughal 
> > wrote:
> >
> > > Hello, we are looking for a pricing plan for Kafka in detail, can you
> > > please help us out?
> > > Thanks and regards
> > > Uzair Mughal
> > > IVACY
> > >
> >
>


Re: [DISCUSS] KIP-858: Handle JBOD broker disk failure in KRaft

2022-09-01 Thread deng ziming
Hi Igor,

I think this KIP can solve the current problems, I have some problems relating 
to the migration section.

Since we have bumped broker RPC version and metadata record version, there will 
be some problems between brokers/controllers of different versions. In ZK mode 
we use IBP as a flag to help solve this, in KRaft mode we use a feature 
flag(metadata.version) as a flag for using new RPC/metadata or not. 

Assuming that we are upgrading from 3.3 to 3.4, firstly the finalized version 
of metadata.version is 3.3, brokers will use version 1 of 
`BrokerRegistrationRequest` which contains no `OfflineLogDirectories`, finally 
the finalized version of metadata.version is 3.4, but brokers will no longer 
send `BrokerRegistrationRequest` unless we restart the broker, so controllers 
can’t be aware of the `OfflineLogDirectories` of each broker, so we should 
reconsider the suggestion of Jason to use `BrokerHeartbeatRequest` to 
communicate `OfflineLogDirectories`.

Of course this problem can be solved through a double roll(restart broker twice 
when upgrading), but we should trying to avoid it since we now have feature 
flag.

One solution is that we include `OfflineLogDirectories` both in 
`BrokerRegistrationRequest` and `BrokerHeartbeatRequest` requests, when 
upgrading from 3.3 to 3.4 the `BrokerRegistrationRequest.OfflineLogDirectories` 
is empty whereas when upgrading from 3.4 to 3.5 it will not be empty. And maybe 
we can also remove `LogDirectoriesOfflineRequest` you proposed in this KIP.

--
Best,
Ziming


> On Aug 18, 2022, at 11:24 PM, Igor Soarez  wrote:
> 
> Hi Ziming,
> 
> I'm sorry it took me a while to reply.
> 
> Thank you for having a look at this KIP and providing feedback.
> 
>> 1. We have a version field in meta.properties, currently it’s 1, and we can
>> set it to 2 in this KIP, and we can give an example of server.properties and
>> it’s corresponding meta.properties generated by the storage command tool.
> 
>> 2. When using storage format tool we should specify cluster-id, it will be an
>> arduous work if we should manually specify directory-ids for all log dirs,
>> I think you can make it more clear about this change that the directory-ids
>> are generated automatically.
> 
> Thank you for these suggestions. I've updated the KIP to:
> 
> * include an example
> * clarify that the version field would be changed to 2
> * clarify that the log directory UUIDs are automatically generated
> 
>> 3. When controller place a replica, it will select a broker as well as a log
>> directory, the latter is currently accomplished in the broker side, so this
>> will be a big change?
> 
> I think there can be benefits, as Jason described previously, if we change how
> log directories are assigned as follow-up work.
> 
> From a codebase surface area perspective, it is definitely a big change
> because there are many models, types and interfaces that assume replicas are
> identified solely by a broker id.
> That will have to change to include the directory UUID, many lines of code 
> will
> be affected.
> 
> But in terms of behavior it shouldn't be a big change at all. Brokers 
> currently
> select the log directory with the least logs in it. This isn't a very nice
> policy, as logs can have wildly different sizes, and log directories can have
> different capacities. But it is a policy that the controller can keep.
> 
> If we decide to extend the selection policy and keep it in the broker,
> the broker will continue to be able to override the controller's selection
> of log directory, using the `AssignReplicasToDirectories` RPC.
> 
>> 4. When handling log directory failures we will update Leader and ISR using
>> the existing replica state machine, what is this state machine referring to,
>> do you mean the AlterPartition RPC?
> 
> No, I mean that it will use the same rules and mechanism
> (`PartitionChangeBuilder`) that is used when a broker is fenced, shutdown or
> unregistered.
> 
> I think maybe the term "replica state machine" isn't the very appropriate.
> I've updated the KIP to rephrase this section.
> 
> Thanks,
> 
> --
> Igor



Re: Pricing plan

2022-09-01 Thread Uzair Ahmed Mughal
so we can have an estimate.

On Thu, Sep 1, 2022 at 1:43 PM Uzair Ahmed Mughal 
wrote:

> can you please then provide the pricing plan of Confluent.
> Regards Uzair.
>
> On Thu, Sep 1, 2022 at 1:40 PM Robin Moffatt 
> wrote:
>
>> Apache Kafka is licensed under Apache 2.0 and free to use.
>>
>> There are a variety of companies that will sell you a self-hosted platform
>> built on Kafka, or a Cloud-hosted version of Kafka.
>> These include Confluent (disclaimer: I work for them), Red Hat, AWS,
>> Aiven,
>> Instaclustr, Cloudera, and more.
>>
>>
>> --
>>
>> Robin Moffatt | Principal Developer Advocate | ro...@confluent.io |
>> @rmoff
>>
>>
>> On Thu, 1 Sept 2022 at 08:47, Uzair Ahmed Mughal 
>> wrote:
>>
>> > Hello, we are looking for a pricing plan for Kafka in detail, can you
>> > please help us out?
>> > Thanks and regards
>> > Uzair Mughal
>> > IVACY
>> >
>>
>


Re: Pricing plan

2022-09-01 Thread Uzair Ahmed Mughal
can you please then provide the pricing plan of Confluent.
Regards Uzair.

On Thu, Sep 1, 2022 at 1:40 PM Robin Moffatt 
wrote:

> Apache Kafka is licensed under Apache 2.0 and free to use.
>
> There are a variety of companies that will sell you a self-hosted platform
> built on Kafka, or a Cloud-hosted version of Kafka.
> These include Confluent (disclaimer: I work for them), Red Hat, AWS, Aiven,
> Instaclustr, Cloudera, and more.
>
>
> --
>
> Robin Moffatt | Principal Developer Advocate | ro...@confluent.io | @rmoff
>
>
> On Thu, 1 Sept 2022 at 08:47, Uzair Ahmed Mughal 
> wrote:
>
> > Hello, we are looking for a pricing plan for Kafka in detail, can you
> > please help us out?
> > Thanks and regards
> > Uzair Mughal
> > IVACY
> >
>


Re: Pricing plan

2022-09-01 Thread Robin Moffatt
Apache Kafka is licensed under Apache 2.0 and free to use.

There are a variety of companies that will sell you a self-hosted platform
built on Kafka, or a Cloud-hosted version of Kafka.
These include Confluent (disclaimer: I work for them), Red Hat, AWS, Aiven,
Instaclustr, Cloudera, and more.


-- 

Robin Moffatt | Principal Developer Advocate | ro...@confluent.io | @rmoff


On Thu, 1 Sept 2022 at 08:47, Uzair Ahmed Mughal 
wrote:

> Hello, we are looking for a pricing plan for Kafka in detail, can you
> please help us out?
> Thanks and regards
> Uzair Mughal
> IVACY
>


Pricing plan

2022-09-01 Thread Uzair Ahmed Mughal
Hello, we are looking for a pricing plan for Kafka in detail, can you
please help us out?
Thanks and regards
Uzair Mughal
IVACY