from:"Matthias J. Sax"



[ 
https://issues.apache.org/jira/browse/KAFKA-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725357#comment-17725357
 ] 

Matthias J. Sax commented on KAFKA-7497:


Seems to be fixed. Cf https://issues.apache.org/jira/browse/KAFKA-14209 

> Kafka Streams should support self-join on streams
> -
>
> Key: KAFKA-7497
> URL: https://issues.apache.org/jira/browse/KAFKA-7497
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Robin Moffatt
>Priority: Major
>  Labels: needs-kip
>
> There are valid reasons to want to join a stream to itself, but Kafka Streams 
> does not currently support this ({{Invalid topology: Topic foo has already 
> been registered by another source.}}).  To perform the join requires creating 
> a second stream as a clone of the first, and then doing a join between the 
> two. This is a clunky workaround and results in unnecessary duplication of 
> data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-7497) Kafka Streams should support self-join on streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7497.

Resolution: Fixed

> Kafka Streams should support self-join on streams
> -
>
> Key: KAFKA-7497
> URL: https://issues.apache.org/jira/browse/KAFKA-7497
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Robin Moffatt
>Priority: Major
>  Labels: needs-kip
>
> There are valid reasons to want to join a stream to itself, but Kafka Streams 
> does not currently support this ({{Invalid topology: Topic foo has already 
> been registered by another source.}}).  To perform the join requires creating 
> a second stream as a clone of the first, and then doing a join between the 
> two. This is a clunky workaround and results in unnecessary duplication of 
> data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-7497) Kafka Streams should support self-join on streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-7497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7497.

Resolution: Fixed

> Kafka Streams should support self-join on streams
> -
>
> Key: KAFKA-7497
> URL: https://issues.apache.org/jira/browse/KAFKA-7497
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Robin Moffatt
>Priority: Major
>  Labels: needs-kip
>
> There are valid reasons to want to join a stream to itself, but Kafka Streams 
> does not currently support this ({{Invalid topology: Topic foo has already 
> been registered by another source.}}).  To perform the join requires creating 
> a second stream as a clone of the first, and then doing a join between the 
> two. This is a clunky workaround and results in unnecessary duplication of 
> data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14173) TopologyTestDriver does not use mock wall clock time when sending test records



 [ 
https://issues.apache.org/jira/browse/KAFKA-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14173.
-
Resolution: Not A Problem

> TopologyTestDriver does not use mock wall clock time when sending test records
> --
>
> Key: KAFKA-14173
> URL: https://issues.apache.org/jira/browse/KAFKA-14173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams-test-utils
>Affects Versions: 2.3.1
>Reporter: Guido Josquin
>Priority: Minor
>
> I am trying to test a stream-stream join with `TopologyTestDriver`. My goal 
> is to confirm that my topology performs the following left join correctly.
> {code:java}
> bills
>   .leftJoin(payments)(
> {
>   case (billValue, null) => billValue
>   case (billValue, paymentValue) => (billValue.toInt - 
> paymentValue.toInt).toString
> },
> JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMillis(100))
>   )
>   .to("debt")
> {code}
>  
> In other words, if we see a `bill` and a `payment` within 100ms, the payment 
> should be subtracted from the bill. If we do not see a payment, the debt is 
> simply the bill.
> Here is the test code.
> {code:java}
> val simpleLeftJoinTopology = new SimpleLeftJoinTopology
> val driver = new TopologyTestDriver(simpleLeftJoinTopology.topology)
> val serde = Serdes.stringSerde
> val bills = driver.createInputTopic("bills", serde.serializer, 
> serde.serializer)
> val payments = driver.createInputTopic("payments", serde.serializer, 
> serde.serializer)
> val debt = driver.createOutputTopic("debt", serde.deserializer, 
> serde.deserializer)
> bills.pipeInput("fred", "100")
> bills.pipeInput("george", "20")
> payments.pipeInput("fred", "95")
> // When in doubt, sleep twice
> driver.advanceWallClockTime(Duration.ofMillis(500))
> Thread.sleep(500)
> // Send a new record to cause the previous window to be closed
> payments.pipeInput("percy", "0")
> val keyValues = debt.readKeyValuesToList()
> keyValues should contain theSameElementsAs Seq(
>   // This record is present
>   new KeyValue[String, String]("fred", "5"),
>   // This record is missing
>   new KeyValue[String, String]("george", "20")
> )
> {code}
> Full code available at [https://github.com/Oduig/kstreams-left-join-example]
> Is seems that advancing the wall clock time, sleeping, or sending an extra 
> record, never triggers the join condition when data only arrives on the left 
> side. It is possible to circumvent this by passing an explicit event time 
> with each test record. (See 
> https://stackoverflow.com/questions/73443812/using-kafka-streams-topologytestdriver-how-to-test-left-join-between-two-strea/73540161#73540161)
>  
> However, the behavior deviates from a real Kafka broker. With a real broker, 
> if we do not send an event, it uses the wall clock time of the broker 
> instead. The behavior under test should be the same: 
> `driver.advanceWallClockTime` should provide the default time to be used for 
> `TestTopic.pipeInput`, when no other time is specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14173) TopologyTestDriver does not use mock wall clock time when sending test records



[ 
https://issues.apache.org/jira/browse/KAFKA-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17725356#comment-17725356
 ] 

Matthias J. Sax commented on KAFKA-14173:
-

Just discovering this ticket.

I guess you would need to use `TestInputTopic#advanceTime` for this case?

Closing the ticket as "no an issue", as the API is there. Feel free to follow 
up.

> TopologyTestDriver does not use mock wall clock time when sending test records
> --
>
> Key: KAFKA-14173
> URL: https://issues.apache.org/jira/browse/KAFKA-14173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams-test-utils
>Affects Versions: 2.3.1
>Reporter: Guido Josquin
>Priority: Minor
>
> I am trying to test a stream-stream join with `TopologyTestDriver`. My goal 
> is to confirm that my topology performs the following left join correctly.
> {code:java}
> bills
>   .leftJoin(payments)(
> {
>   case (billValue, null) => billValue
>   case (billValue, paymentValue) => (billValue.toInt - 
> paymentValue.toInt).toString
> },
> JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMillis(100))
>   )
>   .to("debt")
> {code}
>  
> In other words, if we see a `bill` and a `payment` within 100ms, the payment 
> should be subtracted from the bill. If we do not see a payment, the debt is 
> simply the bill.
> Here is the test code.
> {code:java}
> val simpleLeftJoinTopology = new SimpleLeftJoinTopology
> val driver = new TopologyTestDriver(simpleLeftJoinTopology.topology)
> val serde = Serdes.stringSerde
> val bills = driver.createInputTopic("bills", serde.serializer, 
> serde.serializer)
> val payments = driver.createInputTopic("payments", serde.serializer, 
> serde.serializer)
> val debt = driver.createOutputTopic("debt", serde.deserializer, 
> serde.deserializer)
> bills.pipeInput("fred", "100")
> bills.pipeInput("george", "20")
> payments.pipeInput("fred", "95")
> // When in doubt, sleep twice
> driver.advanceWallClockTime(Duration.ofMillis(500))
> Thread.sleep(500)
> // Send a new record to cause the previous window to be closed
> payments.pipeInput("percy", "0")
> val keyValues = debt.readKeyValuesToList()
> keyValues should contain theSameElementsAs Seq(
>   // This record is present
>   new KeyValue[String, String]("fred", "5"),
>   // This record is missing
>   new KeyValue[String, String]("george", "20")
> )
> {code}
> Full code available at [https://github.com/Oduig/kstreams-left-join-example]
> Is seems that advancing the wall clock time, sleeping, or sending an extra 
> record, never triggers the join condition when data only arrives on the left 
> side. It is possible to circumvent this by passing an explicit event time 
> with each test record. (See 
> https://stackoverflow.com/questions/73443812/using-kafka-streams-topologytestdriver-how-to-test-left-join-between-two-strea/73540161#73540161)
>  
> However, the behavior deviates from a real Kafka broker. With a real broker, 
> if we do not send an event, it uses the wall clock time of the broker 
> instead. The behavior under test should be the same: 
> `driver.advanceWallClockTime` should provide the default time to be used for 
> `TestTopic.pipeInput`, when no other time is specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14173) TopologyTestDriver does not use mock wall clock time when sending test records



 [ 
https://issues.apache.org/jira/browse/KAFKA-14173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14173.
-
Resolution: Not A Problem

> TopologyTestDriver does not use mock wall clock time when sending test records
> --
>
> Key: KAFKA-14173
> URL: https://issues.apache.org/jira/browse/KAFKA-14173
> Project: Kafka
>  Issue Type: Bug
>  Components: streams-test-utils
>Affects Versions: 2.3.1
>Reporter: Guido Josquin
>Priority: Minor
>
> I am trying to test a stream-stream join with `TopologyTestDriver`. My goal 
> is to confirm that my topology performs the following left join correctly.
> {code:java}
> bills
>   .leftJoin(payments)(
> {
>   case (billValue, null) => billValue
>   case (billValue, paymentValue) => (billValue.toInt - 
> paymentValue.toInt).toString
> },
> JoinWindows.ofTimeDifferenceWithNoGrace(Duration.ofMillis(100))
>   )
>   .to("debt")
> {code}
>  
> In other words, if we see a `bill` and a `payment` within 100ms, the payment 
> should be subtracted from the bill. If we do not see a payment, the debt is 
> simply the bill.
> Here is the test code.
> {code:java}
> val simpleLeftJoinTopology = new SimpleLeftJoinTopology
> val driver = new TopologyTestDriver(simpleLeftJoinTopology.topology)
> val serde = Serdes.stringSerde
> val bills = driver.createInputTopic("bills", serde.serializer, 
> serde.serializer)
> val payments = driver.createInputTopic("payments", serde.serializer, 
> serde.serializer)
> val debt = driver.createOutputTopic("debt", serde.deserializer, 
> serde.deserializer)
> bills.pipeInput("fred", "100")
> bills.pipeInput("george", "20")
> payments.pipeInput("fred", "95")
> // When in doubt, sleep twice
> driver.advanceWallClockTime(Duration.ofMillis(500))
> Thread.sleep(500)
> // Send a new record to cause the previous window to be closed
> payments.pipeInput("percy", "0")
> val keyValues = debt.readKeyValuesToList()
> keyValues should contain theSameElementsAs Seq(
>   // This record is present
>   new KeyValue[String, String]("fred", "5"),
>   // This record is missing
>   new KeyValue[String, String]("george", "20")
> )
> {code}
> Full code available at [https://github.com/Oduig/kstreams-left-join-example]
> Is seems that advancing the wall clock time, sleeping, or sending an extra 
> record, never triggers the join condition when data only arrives on the left 
> side. It is possible to circumvent this by passing an explicit event time 
> with each test record. (See 
> https://stackoverflow.com/questions/73443812/using-kafka-streams-topologytestdriver-how-to-test-left-join-between-two-strea/73540161#73540161)
>  
> However, the behavior deviates from a real Kafka broker. With a real broker, 
> if we do not send an event, it uses the wall clock time of the broker 
> instead. The behavior under test should be the same: 
> `driver.advanceWallClockTime` should provide the default time to be used for 
> `TestTopic.pipeInput`, when no other time is specified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered



 [ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-10575:

Labels: kip  (was: )

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>  Labels: kip
> Fix For: 3.5.0
>
>
> Part of KIP-869: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility]
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered



 [ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-10575:

Description: 
Part of KIP-869: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility]

Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete the 
restoration of an active task and transit it to the running state. However the 
restoration can also be stopped when the restoring task gets closed (because it 
gets migrated to another client, for example). We should also trigger the 
callback indicating its progress when the restoration stopped in any scenarios.

  was:Today we only trigger `StateRestoreListener#onRestoreEnd` when we 
complete the restoration of an active task and transit it to the running state. 
However the restoration can also be stopped when the restoring task gets closed 
(because it gets migrated to another client, for example). We should also 
trigger the callback indicating its progress when the restoration stopped in 
any scenarios.


> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Part of KIP-869: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-869%3A+Improve+Streams+State+Restoration+Visibility]
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered



 [ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-10575.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered



 [ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-10575.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
> Fix For: 3.5.0
>
>
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-10575) StateRestoreListener#onRestoreEnd should always be triggered



 [ 
https://issues.apache.org/jira/browse/KAFKA-10575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-10575:
---

Assignee: Guozhang Wang  (was: highluck)

> StateRestoreListener#onRestoreEnd should always be triggered
> 
>
> Key: KAFKA-10575
> URL: https://issues.apache.org/jira/browse/KAFKA-10575
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Assignee: Guozhang Wang
>Priority: Major
>
> Today we only trigger `StateRestoreListener#onRestoreEnd` when we complete 
> the restoration of an active task and transit it to the running state. 
> However the restoration can also be stopped when the restoring task gets 
> closed (because it gets migrated to another client, for example). We should 
> also trigger the callback indicating its progress when the restoration 
> stopped in any scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [VOTE] 3.4.1 RC0

2023-05-22 Thread Matthias J. Sax


Thanks a lot!

-Matthias

On 5/21/23 7:27 PM, Luke Chen wrote:

Hi Matthias,

Yes, I agree we should get this hotfix into 3.4.1.
I've backported into the 3.4 branch.
I'll create a new RC for 3.4.1.

Thanks.
Luke

On Mon, May 22, 2023 at 5:13 AM Matthias J. Sax  wrote:


Hi Luke,

RC0 for 3.4.1 includes a fix for
https://issues.apache.org/jira/browse/KAFKA-14862. We recently
discovered that tge fix itself introduces a regression. We have already
a PR to fix-forward the regression:
https://github.com/apache/kafka/pull/13734

I think we should get the open PR merged, and back part not only to 3.5,
but also to 3.4.1, and get a new RC for 3.4.1.

Thoughts?


-Matthias


On 5/19/23 6:12 AM, Josep Prat wrote:

Hi Luke,
This gets a +1 from my end. I believe non-binding because if I understand
it correctly, binding votes for releases are only issued by PMCs (


https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses

).

I did the following validations:
- Verified signatures and checksums for all the generated artifacts
- Built from source with Java 11 and Scala 2.13.10
- Run unit tests
- Run integration tests
- Run the quickstart with Zookeeper and KRaft

Best,

On Wed, May 17, 2023 at 2:11 PM Josep Prat  wrote:


Hi Luke,

I ran the tests from the source package you created and I didn't get any
of the test failures you had on your CI build. I got other flaky tests
though, that after being run in isolation ran successfully. I'll try to

run

signature validation, and some further testing later today or later this
week.

Best,

On Wed, May 17, 2023 at 12:43 PM Federico Valeri 
wrote:


Hi Luke, thanks for running the release.

Looks like the Maven artifacts are not in staging:



https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka-clients/3.4.1/


Documentation still has 3.4.0, instead of 3.4.1 (not sure if this will
be aligned later):
https://kafka.apache.org/34/documentation.html#producerapi

Br
Fede


On Wed, May 17, 2023 at 5:24 AM Luke Chen  wrote:


Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.4.1.

This is a bugfix release with several fixes since the release of

3.4.0.

A

few of the major issues include:
- core
KAFKA-14644 <https://issues.apache.org/jira/browse/KAFKA-14644>

Process

should stop after failure in raft IO thread
KAFKA-14946 <https://issues.apache.org/jira/browse/KAFKA-14946> KRaft
controller node shutting down while renouncing leadership
KAFKA-14887 <https://issues.apache.org/jira/browse/KAFKA-14887> ZK

session

timeout can cause broker to shutdown
- client
KAFKA-14639 <https://issues.apache.org/jira/browse/KAFKA-14639> Kafka
CooperativeStickyAssignor revokes/assigns partition in one rebalance

cycle

- connect
KAFKA-12558 <https://issues.apache.org/jira/browse/KAFKA-12558> MM2

may not

sync partition offsets correctly
KAFKA-14666 <https://issues.apache.org/jira/browse/KAFKA-14666> MM2

should

translate consumer group offsets behind replication flow
- stream
KAFKA-14172 <https://issues.apache.org/jira/browse/KAFKA-14172> bug:

State

stores lose state when tasks are reassigned under EOS

Release notes for the 3.4.1 release:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by May 24, 2023
Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~showuon/kafka-3.4.1-rc0/

* Maven artifacts to be voted upon:


https://repository.apache.org/content/groups/staging/org/apache/kafka/


* Javadoc:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/javadoc/

* Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
https://github.com/apache/kafka/releases/tag/3.4.1-rc0

* Documentation:
https://kafka.apache.org/34/documentation.html

* Protocol:
https://kafka.apache.org/34/protocol.html

The most recent build has had test failures. These all appear to be

due

to

flakiness, but it would be nice if someone more familiar with the

failed

tests could confirm this. I may update this thread with passing build

links

if I can get one, or start a new release vote thread if test failures

must

be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/133/

System tests:
Will update the results later

Thank you.
Luke





--
[image: Aiven] <https://www.aiven.io>

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io <https://www.aiven.io>   |
<https://www.facebook.com/aivencloud>
<https://www.linkedin.com/company/aiven/>   <

https://twitter.com/aiven_io>

*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

Re: [VOTE] 3.4.1 RC0

2023-05-22 Thread Matthias J. Sax


Thanks a lot!

-Matthias

On 5/21/23 7:27 PM, Luke Chen wrote:

Hi Matthias,

Yes, I agree we should get this hotfix into 3.4.1.
I've backported into the 3.4 branch.
I'll create a new RC for 3.4.1.

Thanks.
Luke

On Mon, May 22, 2023 at 5:13 AM Matthias J. Sax  wrote:


Hi Luke,

RC0 for 3.4.1 includes a fix for
https://issues.apache.org/jira/browse/KAFKA-14862. We recently
discovered that tge fix itself introduces a regression. We have already
a PR to fix-forward the regression:
https://github.com/apache/kafka/pull/13734

I think we should get the open PR merged, and back part not only to 3.5,
but also to 3.4.1, and get a new RC for 3.4.1.

Thoughts?


-Matthias


On 5/19/23 6:12 AM, Josep Prat wrote:

Hi Luke,
This gets a +1 from my end. I believe non-binding because if I understand
it correctly, binding votes for releases are only issued by PMCs (


https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses

).

I did the following validations:
- Verified signatures and checksums for all the generated artifacts
- Built from source with Java 11 and Scala 2.13.10
- Run unit tests
- Run integration tests
- Run the quickstart with Zookeeper and KRaft

Best,

On Wed, May 17, 2023 at 2:11 PM Josep Prat  wrote:


Hi Luke,

I ran the tests from the source package you created and I didn't get any
of the test failures you had on your CI build. I got other flaky tests
though, that after being run in isolation ran successfully. I'll try to

run

signature validation, and some further testing later today or later this
week.

Best,

On Wed, May 17, 2023 at 12:43 PM Federico Valeri 
wrote:


Hi Luke, thanks for running the release.

Looks like the Maven artifacts are not in staging:



https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka-clients/3.4.1/


Documentation still has 3.4.0, instead of 3.4.1 (not sure if this will
be aligned later):
https://kafka.apache.org/34/documentation.html#producerapi

Br
Fede


On Wed, May 17, 2023 at 5:24 AM Luke Chen  wrote:


Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.4.1.

This is a bugfix release with several fixes since the release of

3.4.0.

A

few of the major issues include:
- core
KAFKA-14644 <https://issues.apache.org/jira/browse/KAFKA-14644>

Process

should stop after failure in raft IO thread
KAFKA-14946 <https://issues.apache.org/jira/browse/KAFKA-14946> KRaft
controller node shutting down while renouncing leadership
KAFKA-14887 <https://issues.apache.org/jira/browse/KAFKA-14887> ZK

session

timeout can cause broker to shutdown
- client
KAFKA-14639 <https://issues.apache.org/jira/browse/KAFKA-14639> Kafka
CooperativeStickyAssignor revokes/assigns partition in one rebalance

cycle

- connect
KAFKA-12558 <https://issues.apache.org/jira/browse/KAFKA-12558> MM2

may not

sync partition offsets correctly
KAFKA-14666 <https://issues.apache.org/jira/browse/KAFKA-14666> MM2

should

translate consumer group offsets behind replication flow
- stream
KAFKA-14172 <https://issues.apache.org/jira/browse/KAFKA-14172> bug:

State

stores lose state when tasks are reassigned under EOS

Release notes for the 3.4.1 release:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by May 24, 2023
Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~showuon/kafka-3.4.1-rc0/

* Maven artifacts to be voted upon:


https://repository.apache.org/content/groups/staging/org/apache/kafka/


* Javadoc:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/javadoc/

* Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
https://github.com/apache/kafka/releases/tag/3.4.1-rc0

* Documentation:
https://kafka.apache.org/34/documentation.html

* Protocol:
https://kafka.apache.org/34/protocol.html

The most recent build has had test failures. These all appear to be

due

to

flakiness, but it would be nice if someone more familiar with the

failed

tests could confirm this. I may update this thread with passing build

links

if I can get one, or start a new release vote thread if test failures

must

be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/133/

System tests:
Will update the results later

Thank you.
Luke





--
[image: Aiven] <https://www.aiven.io>

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io <https://www.aiven.io>   |
<https://www.facebook.com/aivencloud>
<https://www.linkedin.com/company/aiven/>   <

https://twitter.com/aiven_io>

*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

Re: [VOTE] 3.4.1 RC0

2023-05-21 Thread Matthias J. Sax


Hi Luke,

RC0 for 3.4.1 includes a fix for 
https://issues.apache.org/jira/browse/KAFKA-14862. We recently 
discovered that tge fix itself introduces a regression. We have already 
a PR to fix-forward the regression: 
https://github.com/apache/kafka/pull/13734


I think we should get the open PR merged, and back part not only to 3.5, 
but also to 3.4.1, and get a new RC for 3.4.1.


Thoughts?


-Matthias


On 5/19/23 6:12 AM, Josep Prat wrote:

Hi Luke,
This gets a +1 from my end. I believe non-binding because if I understand
it correctly, binding votes for releases are only issued by PMCs (
https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses
).

I did the following validations:
- Verified signatures and checksums for all the generated artifacts
- Built from source with Java 11 and Scala 2.13.10
- Run unit tests
- Run integration tests
- Run the quickstart with Zookeeper and KRaft

Best,

On Wed, May 17, 2023 at 2:11 PM Josep Prat  wrote:


Hi Luke,

I ran the tests from the source package you created and I didn't get any
of the test failures you had on your CI build. I got other flaky tests
though, that after being run in isolation ran successfully. I'll try to run
signature validation, and some further testing later today or later this
week.

Best,

On Wed, May 17, 2023 at 12:43 PM Federico Valeri 
wrote:


Hi Luke, thanks for running the release.

Looks like the Maven artifacts are not in staging:

https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka-clients/3.4.1/

Documentation still has 3.4.0, instead of 3.4.1 (not sure if this will
be aligned later):
https://kafka.apache.org/34/documentation.html#producerapi

Br
Fede


On Wed, May 17, 2023 at 5:24 AM Luke Chen  wrote:


Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.4.1.

This is a bugfix release with several fixes since the release of 3.4.0.

A

few of the major issues include:
- core
KAFKA-14644  Process
should stop after failure in raft IO thread
KAFKA-14946  KRaft
controller node shutting down while renouncing leadership
KAFKA-14887  ZK

session

timeout can cause broker to shutdown
- client
KAFKA-14639  Kafka
CooperativeStickyAssignor revokes/assigns partition in one rebalance

cycle

- connect
KAFKA-12558  MM2

may not

sync partition offsets correctly
KAFKA-14666  MM2

should

translate consumer group offsets behind replication flow
- stream
KAFKA-14172  bug:

State

stores lose state when tasks are reassigned under EOS

Release notes for the 3.4.1 release:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by May 24, 2023
Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~showuon/kafka-3.4.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/javadoc/

* Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
https://github.com/apache/kafka/releases/tag/3.4.1-rc0

* Documentation:
https://kafka.apache.org/34/documentation.html

* Protocol:
https://kafka.apache.org/34/protocol.html

The most recent build has had test failures. These all appear to be due

to

flakiness, but it would be nice if someone more familiar with the failed
tests could confirm this. I may update this thread with passing build

links

if I can get one, or start a new release vote thread if test failures

must

be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/133/

System tests:
Will update the results later

Thank you.
Luke





--
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |

   
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

Re: [VOTE] 3.4.1 RC0

2023-05-21 Thread Matthias J. Sax


Hi Luke,

RC0 for 3.4.1 includes a fix for 
https://issues.apache.org/jira/browse/KAFKA-14862. We recently 
discovered that tge fix itself introduces a regression. We have already 
a PR to fix-forward the regression: 
https://github.com/apache/kafka/pull/13734


I think we should get the open PR merged, and back part not only to 3.5, 
but also to 3.4.1, and get a new RC for 3.4.1.


Thoughts?


-Matthias


On 5/19/23 6:12 AM, Josep Prat wrote:

Hi Luke,
This gets a +1 from my end. I believe non-binding because if I understand
it correctly, binding votes for releases are only issued by PMCs (
https://cwiki.apache.org/confluence/display/KAFKA/Release+Process#ReleaseProcess-Afterthevotepasses
).

I did the following validations:
- Verified signatures and checksums for all the generated artifacts
- Built from source with Java 11 and Scala 2.13.10
- Run unit tests
- Run integration tests
- Run the quickstart with Zookeeper and KRaft

Best,

On Wed, May 17, 2023 at 2:11 PM Josep Prat  wrote:


Hi Luke,

I ran the tests from the source package you created and I didn't get any
of the test failures you had on your CI build. I got other flaky tests
though, that after being run in isolation ran successfully. I'll try to run
signature validation, and some further testing later today or later this
week.

Best,

On Wed, May 17, 2023 at 12:43 PM Federico Valeri 
wrote:


Hi Luke, thanks for running the release.

Looks like the Maven artifacts are not in staging:

https://repository.apache.org/content/groups/staging/org/apache/kafka/kafka-clients/3.4.1/

Documentation still has 3.4.0, instead of 3.4.1 (not sure if this will
be aligned later):
https://kafka.apache.org/34/documentation.html#producerapi

Br
Fede


On Wed, May 17, 2023 at 5:24 AM Luke Chen  wrote:


Hello Kafka users, developers and client-developers,

This is the first candidate for release of Apache Kafka 3.4.1.

This is a bugfix release with several fixes since the release of 3.4.0.

A

few of the major issues include:
- core
KAFKA-14644  Process
should stop after failure in raft IO thread
KAFKA-14946  KRaft
controller node shutting down while renouncing leadership
KAFKA-14887  ZK

session

timeout can cause broker to shutdown
- client
KAFKA-14639  Kafka
CooperativeStickyAssignor revokes/assigns partition in one rebalance

cycle

- connect
KAFKA-12558  MM2

may not

sync partition offsets correctly
KAFKA-14666  MM2

should

translate consumer group offsets behind replication flow
- stream
KAFKA-14172  bug:

State

stores lose state when tasks are reassigned under EOS

Release notes for the 3.4.1 release:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/RELEASE_NOTES.html

*** Please download, test and vote by May 24, 2023
Kafka's KEYS file containing PGP keys we use to sign the release:
https://kafka.apache.org/KEYS

* Release artifacts to be voted upon (source and binary):
https://home.apache.org/~showuon/kafka-3.4.1-rc0/

* Maven artifacts to be voted upon:
https://repository.apache.org/content/groups/staging/org/apache/kafka/

* Javadoc:
https://home.apache.org/~showuon/kafka-3.4.1-rc0/javadoc/

* Tag to be voted upon (off 3.4 branch) is the 3.4.1 tag:
https://github.com/apache/kafka/releases/tag/3.4.1-rc0

* Documentation:
https://kafka.apache.org/34/documentation.html

* Protocol:
https://kafka.apache.org/34/protocol.html

The most recent build has had test failures. These all appear to be due

to

flakiness, but it would be nice if someone more familiar with the failed
tests could confirm this. I may update this thread with passing build

links

if I can get one, or start a new release vote thread if test failures

must

be addressed beyond re-running builds until they pass.

Unit/integration tests:
https://ci-builds.apache.org/job/Kafka/job/kafka/job/3.4/133/

System tests:
Will update the results later

Thank you.
Luke





--
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |

   
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B

Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-05-19 Thread Matthias J. Sax

pache.org/jira/browse/KAFKA-14840

(nearly

done)


I just wanted to check with you before

cherry-picking

these

to

3.5


David


On Mon, Apr 24, 2023 at 1:18 PM Mickael Maison <

mickael.mai...@gmail.com


wrote:


Hi Justine,

That makes sense. Feel free to revert that

commit

in

3.5.


Thanks,
Mickael

On Mon, Apr 24, 2023 at 7:16 PM Mickael Maison

<

mickael.mai...@gmail.com>

wrote:


Hi Josep,

Thanks for letting me know!

On Mon, Apr 24, 2023 at 6:58 PM Justine

Olshan

 wrote:


Hey Mickael,

I've just opened a blocker to revert

KAFKA-14561

in

3.5.

There

are

a

few

blocker bugs that I don't think I can fix

before

the

code

freeze,

so I

think for the quality of the release, we

should

just

revert the

commit.


Thanks,
Justine

On Fri, Apr 21, 2023 at 1:23 PM Josep Prat




wrote:


Hi Mickael,

Greg Harris managed to fix a flaky test

in



https://github.com/apache/kafka/pull/13575,

I

cherry-picked

it

to

the 3.5

(and 3.4) branch. I updated the Jira to

reflect

that

is

now

fixed on

3.5.0

as well as 3.6.0.
Let me know if I forgot anything.

Best,

On Fri, Apr 21, 2023 at 3:44 PM Mickael

Maison

<

mickael.mai...@gmail.com>

wrote:


Hi,

Just a quick reminder that code freeze

is

next

week.

We still have 27 JIRAs targeting 3.5

[0]

including

quite a

few

bugs

and flaky test issues opened recently.

If

you

have

time,

take

one

of

these items or help with the reviews.

I'll send another update next once

we've

entered

code

freeze.


0:




















https://issues.apache.org/jira/browse/KAFKA-13421?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.5.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC


Thanks,
Mickael

On Thu, Apr 20, 2023 at 9:14 PM Mickael

Maison <

mickael.mai...@gmail.com


wrote:


Hi Ron,

Yes feel free to merge that fix.

Thanks

for

letting

me

know!


Mickael

On Thu, Apr 20, 2023 at 8:15 PM Ron

Dagostino <

rndg...@gmail.com



wrote:


Hi Mickael.  I would like to merge


https://github.com/apache/kafka/pull/13532

(KAFKA-14887:

No

shutdown

for ZK session expiration in

feature

processing) to

the 3.5

branch.

It is a very small and focused fix

that

can

cause

unexpected

broker

shutdowns when there is

instability in

the

connectivity to

ZooKeeper.

The risk is very low.

Ron


On Tue, Apr 18, 2023 at 9:57 AM

Mickael

Maison

<

mickael.mai...@gmail.com> wrote:


Hi David,

Thanks for the update. I've

marked

KAFKA-14869 as

fixed

in

3.5.0, I

guess you'll only resolve this

ticket

once

you

merge

the

backports

to

earlier branches. The ticket will

have

to

be

resolved to

run

the

release but that should leave you

enough

time.


Thanks,
Mickael

On Tue, Apr 18, 2023 at 3:42 PM

David

Jacot

 wrote:


Hi Mickael,

FYI - I just merged the two

PRs for

KIP-915 to

trunk/3.5.

We are

all good.


Cheers,
David

On Mon, Apr 17, 2023 at 5:10 PM

Mickael

Maison

<

mickael.mai...@gmail.com>

wrote:


Hi Chris,

I was looking at that just

now!

As

you

said,

the

PRs

merged

provide

some functionality so I think

it's

fine

to

deliver

the

KIP

across 2

releases.
I left a comment in



https://issues.apache.org/jira/browse/KAFKA-14876

to document what's in 3.5.

Thanks,
Mickael


On Mon, Apr 17, 2023 at

5:05 PM

Chris

Egerton



wrote:


Hi Mickael,

It looks like we missed the

feature

freeze

cutoff

for

part

but

not all of

KIP-875 [1]. The features

we've

been

able

to

merge

so

far are

the new

STOPPED state for

connectors

[2]

and

the

API

for

reading

offsets [3]. The

features we have not been

able

to

merge yet

are the

APIs for

altering and

resetting offsets.

The already-merged

features are

useful

on

their own

and I

believe it

should

be acceptable to release

them

separately

from

the

not-yet-merged ones,

but

wanted to double-check

with you

that

it's

alright

to

split

this KIP

across

two or more releases,

starting

with

3.5.0.


Cheers,

Chris

[1] -
























https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect

[2] -

https://github.com/apache/kafka/pull/13424

[3] -

https://github.com/apache/kafka/pull/13434


On Fri, Apr 14, 2023 at

10:13 AM

Matthias

J.

Sax <

mj...@apache.org>

wrote:



Thanks a lot!

On 4/14/23 5:32 AM,

Mickael

Maison

wrote:

Hi Matthias,

I merged the PR before

cutting

the

3.5

branch.


Thanks,
Mickael

On Fri, Apr 14, 2023 at

2:31 PM

Mickael

Maison

<

mickael.mai...@gmail.com>

wrote:


Hi David,

I've created the 3.5

branch.

Feel

free to

cherry

pick

these 2

commits

when they are ready.

Thanks,
Mickael

On Fri, Apr 14, 2023

at

11:23 AM

Satish

Duggana

<

satish.dugg...@gmail.com



wrote:


Thanks Luke for

helping

with

the

reviews

and

adding a

few tests in

a

couple of PRs.

Hi Mickae

Re: Query regarding implementation of KStreams with Hbase

2023-05-12 Thread Matthias J. Sax

Kafka Streams is designed to read and write from a broker cluster. It's 
not designed to write data to different system like HBase.


If you want to get data from Kafka to HBase, you should use Kafka Connect.

Of course, it's possible (but not recommended) to implement your own 
`Processor` and do whatever you want with the data inside Kafka Streams.


HTH.

-Matthias

On 5/11/23 10:38 AM, Rohit M wrote:

Hi team,

There is lot in internet where Kstreams read data from topic , perform some
transformation and write it back to a topic. But I wonder if write to hbase
table is possible with KStream?. And what I mean by this is that we should
read data from topic using KStream , perform some operations like we do on
a dataframe and then write it to a hbase table
I didn't find any resources on internet implementing KStreams with hbase. I
would be glad if could get some help with a piece of code in scala
preferably to read from a topic or even a hbase table using KStream
application , perform some transformation and write it to a hbase table


Regards
Rohit M

Re: Some questions on Kafka on order of messages with mutiple partitions

2023-05-12 Thread Matthias J. Sax


Does having  9 partitions with 9 replication factors make sense here?


A replication factor of 9 sounds very high. For production, replication 
factor of 3 is recommended.


How many partitions you want/need is a different question, and cannot be 
answered in a general way.



"Yes" to all other questions.


-Matthias



On 5/12/23 9:50 AM, Mich Talebzadeh wrote:

Hi,

I have used Apache Kafka in conjunction with Spark as a messaging 
source. This rather dated diagram describes it


I have two physical hosts each 64 GB, running RHES 7.6, these are called 
rhes75 and rhes76 respectively. The Zookeeper version is 3.7.1 and kafka 
version is 3.4.0



image.png
I have a topic md -> MarketData that has been defined as below

kafka-topics.sh --create --bootstrap-server 
rhes75:9092,rhes75:9093,rhes75:9094,rhes76:9092,rhes76:9093,rhes76:9094,rhes76:9095,rhes76:9096, rhes76:9097 --replication-factor 9 --partitions 9 --topic md


kafka-topics.sh --describe --bootstrap-server 
rhes75:9092,rhes75:9093,rhes75:9094,rhes76:9092,rhes76:9093,rhes76:9094,rhes76:9095,rhes76:9096, rhes76:9097 --topic md



This is working fine

Topic: md       TopicId: UfQly87bQPCbVKoH-PQheg PartitionCount: 9   
ReplicationFactor: 9    Configs: segment.bytes=1073741824
         Topic: md       Partition: 0    Leader: 12      Replicas: 
12,10,8,2,9,11,1,7,3  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 1    Leader: 9       Replicas: 
9,8,2,12,11,1,7,3,10  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 2    Leader: 11      Replicas: 
11,2,12,9,1,7,3,10,8  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 3    Leader: 1       Replicas: 
1,12,9,11,7,3,10,8,2  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 4    Leader: 7       Replicas: 
7,9,11,1,3,10,8,2,12  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 5    Leader: 3       Replicas: 
3,11,1,7,10,8,2,12,9  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 6    Leader: 10      Replicas: 
10,1,7,3,8,2,12,9,11  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 7    Leader: 8       Replicas: 
8,7,3,10,2,12,9,11,1  Isr: 10,1,9,2,12,7,3,11,8
         Topic: md       Partition: 8    Leader: 2       Replicas: 
2,3,10,8,12,9,11,1,7  Isr: 10,1,9,2,12,7,3,11,8


However, I have a number of questions

 1. Does having  9 partitions with 9 replication factors make sense here?
 2. As I understand the parallelism is equal to the number of partitions
for a topic.
 3. Kafka only provides a total order over messages *within a
partition*, not between different partitions in a topic and in
this case I have one topic
 4.

Data within a Partition will be stored in the order in which it is
written, therefore, data read from a Partition will be read in order
for that partition?

 5.

Finally if I want to get messages in order across multiple all 9
partitionss, then I need to group messages with a key, so that
messages with the samekey goto the samepartition and withinthat
partition the messages are ordered

Thanks


*Disclaimer:* Use it at your own risk.Any and all responsibility for any 
loss, damage or destruction of data or any other property which may 
arise from relying on this email's technical content is explicitly 
disclaimed. The author will in no case be liable for any monetary 
damages arising from such loss, damage or destruction.

[jira] [Commented] (KAFKA-13349) Allow Iterator.remove on KeyValueIterator

2023-05-11 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-13349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722005#comment-17722005
 ] 

Matthias J. Sax commented on KAFKA-13349:
-

Yes, we want to add `remove()` to interface 
[https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/KeyValueIterator.java]
 and thus all implementation will need to support it.

> Allow Iterator.remove on KeyValueIterator
> -
>
> Key: KAFKA-13349
> URL: https://issues.apache.org/jira/browse/KAFKA-13349
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: needs-kip, newbie++
>
> Today Stream's state store's range iterator does not support `remove`. We 
> could consider adding such support for all the built-in state stores:
> * RocksDB's native iterator does not support removal, but we can always do a 
> delete(key) concurrently while the iterator is open on the snapshot.
> * In-Memory: straight forward implementation.
> The benefit of that is then for range-and-delete truncation operation we do 
> not necessarily have to be cautious about concurrent modification exceptions. 
> This could also help GC with in-memory stores.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14911) Add system tests for rolling upgrade path of KIP-904

2023-05-11 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722004#comment-17722004
 ] 

Matthias J. Sax commented on KAFKA-14911:
-

No worries. And thanks for helping on reviewing! Equally important.

> Add system tests for rolling upgrade path of KIP-904
> 
>
> Key: KAFKA-14911
> URL: https://issues.apache.org/jira/browse/KAFKA-14911
> Project: Kafka
>  Issue Type: Test
>Reporter: Farooq Qaiser
>Assignee: Victoria Xia
>Priority: Major
> Fix For: 3.5.0
>
>
> As per [~mjsax] comment 
> [here|https://github.com/apache/kafka/pull/10747#pullrequestreview-1376539752],
>  we should add a system test to test the rolling upgrade path for 
> [KIP-904|https://cwiki.apache.org/confluence/x/P5VbDg] which introduces a new 
> serialization format for groupBy internal repartition topics and was 
> implemented as part of https://issues.apache.org/jira/browse/KAFKA-12446 
> There is `StreamsUpgradeTest.java` and `streams_upgrade_test.py` (cf 
> `test_rolling_upgrade_with_2_bounces`) as a starting point.
> Might be best to do a similar thing as for FK-joins, and add a new test 
> variation. 
> The tricky thing about the test would be, to ensure that the repartition 
> topic is not empty when we do the bounce, so the test should be setup 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14981) Set `group.instance.id` in streams consumer so that rebalance will not happen if a instance is restarted

2023-05-11 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17722003#comment-17722003
 ] 

Matthias J. Sax commented on KAFKA-14981:
-

I was not aware that there was (or maybe still are) issue. Are there any 
tickets for it?

> Set `group.instance.id` in streams consumer so that rebalance will not happen 
> if a instance is restarted
> 
>
> Key: KAFKA-14981
> URL: https://issues.apache.org/jira/browse/KAFKA-14981
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Hao Li
>Priority: Minor
>
> `group.instance.id` enables static membership so that if a consumer is 
> restarted within `session.timeout.ms`, rebalance will not be triggered and 
> originally assignment can be returned directly from broker. We can set this 
> id in Kafka streams using `threadId` so that no rebalance is trigger within 
> `session.timeout.ms`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: apply for permission to contribute to Apache Kafka

2023-05-10 Thread Matthias J. Sax

I just checked permissions and you should be all set. Did you try to log 
out and log in again?

-Matthias

On 5/9/23 10:04 PM, Doe John wrote:

Thanks,

After obtaining permission, I want to assign this JIRA ticket 
 to myself, but there's no 「Assign」 button for me.

image.png
Is there any problem here?

Best Regards,
John Doe

Luke Chen mailto:show...@gmail.com>> 于2023年5月10日 
周三 01:04写道：

Done.

Thanks.
Luke

On Sat, May 6, 2023 at 9:38 PM Doe John mailto:zh2725284...@gmail.com>> wrote:

 > my Jira ID: johndoe
 >
 > on email zh2725284...@gmail.com 
 >
 > Thanks!
 >

Re: Question ❓

2023-05-10 Thread Matthias J. Sax


Partitions are not for different users.

If you want to isolate users, you would do it at the topic level. You 
could use ACLs to grant access to different topics: 
https://kafka.apache.org/documentation/#security_authz



-Matthias

On 5/9/23 11:11 AM, влад тасканов wrote:


Hi. I recently started studying kafka and raised a question. Is it possible 
for each user to make a separate queue? as I understand it, there is a broker 
with different topics, and each topic had the number of partitions = the number 
of users. if yes, you can link to an example or explanation. Google didn't 
help me.

[jira] [Commented] (KAFKA-14981) Set `group.instance.id` in streams consumer so that rebalance will not happen if a instance is restarted

2023-05-09 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17721114#comment-17721114
 ] 

Matthias J. Sax commented on KAFKA-14981:
-

Very interesting idea – given that we persist the thread-id (aka process-id) in 
the state directory on local disk, it could help. And even if we don't persist 
it (because there is no local storage), it seems no harm would be done if the 
id changes every single time.

Wondering if we would need a KIP for this. By gut feeling is no, but not sure.

> Set `group.instance.id` in streams consumer so that rebalance will not happen 
> if a instance is restarted
> 
>
> Key: KAFKA-14981
> URL: https://issues.apache.org/jira/browse/KAFKA-14981
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Hao Li
>Priority: Minor
>
> `group.instance.id` enables static membership so that if a consumer is 
> restarted within `session.timeout.ms`, rebalance will not be triggered and 
> originally assignment can be returned directly from broker. We can set this 
> id in Kafka streams using `threadId` so that no rebalance is trigger within 
> `session.timeout.ms`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] KIP-923: Add A Grace Period to Stream Table Join

2023-05-08 Thread Matthias J. Sax


Thanks for the KIP! Also some question/comments from my side:

10) Notation: you use the term "late data" but I think you mean 
out-of-order. We reserve the term "late" to records that arrive after 
grace period passed, and thus, "late == out-of-order data that is dropped".



20) "There is only one option from the stream side and only recently is 
there a second option on the table side."


What are those options? Victoria already asked about the table side, but 
I am also not sure what option you mean for the stream side?



30) "If the table side uses a materialized version store the value is 
the latest by stream time rather than by offset within its defined grace 
period."


The phrase "the value is the latest by stream time" is confusing -- in 
the end, a versioned stores multiple versions, not just one.



40) I am also wondering about ordering. In general, KS tries to preserve 
offset-order during processing (with some exception, when offset order 
preservation is not clearly defined). Given that the stream-side buffer 
is really just a "linear buffer", we could easily preserve offset-order. 
But I also see a benefit of re-ordering and emitting out-of-order data 
right away when read (instead of blocking them behind in-order records 
that are not ready yet). -- It might even be a possibility, to let users 
pick a emit strategy eg "EmitStrategy.preserveOffsets" (name just a 
placeholder).


The KIP should explain this in more detail and also discuss different 
options and mention them in "Rejected alternatives" in case we don't 
want to include them.



50) What happens when users change the grace period? Especially, when 
they turn it on/off (but also increasing/decreasing is an interesting 
point)? I think we should try to support this if possible; the 
"Compatibility" section needs to cover switching on/off in more detail.



-Matthias




On 5/2/23 2:06 PM, Victoria Xia wrote:

Cool KIP, Walker! Thanks for sharing this proposal.

A few clarifications:

1. Is the order that records exit the buffer in necessarily the same as the
order that records enter the buffer in, or no? Based on the description in
the KIP, it sounds like the answer is no, i.e., records will exit the
buffer in increasing timestamp order, which means that they may be ordered
(even for the same key) compared to the input order.

2. What happens if the join grace period is nonzero, and a stream-side
record arrives with a timestamp that is older than the current stream time
minus the grace period? Will this record trigger a join result, or will it
be dropped? Based on the description for what happens when the join grace
period is set to zero, it sounds like the late record will be dropped, even
if the join grace period is nonzero. Is that true?

3. What could cause stream time to advance, for purposes of removing
records from the join buffer? For example, will new records arriving on the
table side of the join cause stream time to advance? From the KIP it sounds
like only stream-side records will advance stream time -- does that mean
that the join processor itself will have to track this stream time?

Also +1 to Lucas's question about what options will be available for
configuring the join buffer. Will users have the option to choose whether
they want the buffer to be in-memory vs persistent?

- Victoria

On Fri, Apr 28, 2023 at 11:54 AM Lucas Brutschy
 wrote:


HI Walker,

thanks for the KIP! We definitely need this. I have two questions:

  - Have you considered allowing the customization of the underlying
buffer implementation? As I can see, `StreamJoined` lets you customize
the underlying store via a `WindowStoreSupplier`. Would it make sense
for `Joined` to have this as well? I can imagine one may want to limit
the number of records in the buffer, for example. If we hit the
maximum, the only option would be to drop semantic guarantees, but
users may still want to do this.
  - With "second option on the table side" you are referring to
versioned tables, right? Will the buffer on the stream side behave any
different whether the table side is versioned or not?

Finally, I think a simple example in the motivation section could help
non-experts understand the KIP.

Best,
Lucas

On Tue, Apr 25, 2023 at 9:13 PM Walker Carlson
 wrote:


Hello everybody,

I have a stream proposal to improve the stream table join by adding a

grace

period and buffer to the stream side of the join to allow processing in
timestamp order matching the recent improvements of the versioned tables.

Please take a look here 

and

share your thoughts.

best,
Walker

[jira] [Commented] (KAFKA-14957) Default value for state.dir is confusing

2023-05-03 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17719079#comment-17719079
 ] 

Matthias J. Sax commented on KAFKA-14957:
-

Ah. Thanks.

That's gonna be nasty to fix... This part of the docs is generated from the 
code... So it depends on the platform that does the build what ends up in the 
docs. Would require a larger change to generate it differently...

> Default value for state.dir is confusing
> 
>
> Key: KAFKA-14957
> URL: https://issues.apache.org/jira/browse/KAFKA-14957
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: Mickael Maison
>Priority: Minor
>  Labels: beginner, newbie
>
> The default value for state.dir is documented as 
> /var/folders/0t/68svdzmx1sld0mxjl8dgmmzmgq/T//kafka-streams
> This is misleading, the value will be different in each environment as it 
> computed using System.getProperty("java.io.tmpdir"). We should update the 
> description to mention how the path is computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14957) Default value for state.dir is confusing



 [ 
https://issues.apache.org/jira/browse/KAFKA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14957:

Component/s: docs

> Default value for state.dir is confusing
> 
>
> Key: KAFKA-14957
> URL: https://issues.apache.org/jira/browse/KAFKA-14957
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: Mickael Maison
>Priority: Minor
>
> The default value for state.dir is documented as 
> /var/folders/0t/68svdzmx1sld0mxjl8dgmmzmgq/T//kafka-streams
> This is misleading, the value will be different in each environment as it 
> computed using System.getProperty("java.io.tmpdir"). We should update the 
> description to mention how the path is computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14957) Default value for state.dir is confusing



 [ 
https://issues.apache.org/jira/browse/KAFKA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14957:

Labels: beginner newbie  (was: )

> Default value for state.dir is confusing
> 
>
> Key: KAFKA-14957
> URL: https://issues.apache.org/jira/browse/KAFKA-14957
> Project: Kafka
>  Issue Type: Bug
>  Components: docs, streams
>Reporter: Mickael Maison
>Priority: Minor
>  Labels: beginner, newbie
>
> The default value for state.dir is documented as 
> /var/folders/0t/68svdzmx1sld0mxjl8dgmmzmgq/T//kafka-streams
> This is misleading, the value will be different in each environment as it 
> computed using System.getProperty("java.io.tmpdir"). We should update the 
> description to mention how the path is computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14957) Default value for state.dir is confusing



 [ 
https://issues.apache.org/jira/browse/KAFKA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14957:

Priority: Minor  (was: Major)

> Default value for state.dir is confusing
> 
>
> Key: KAFKA-14957
> URL: https://issues.apache.org/jira/browse/KAFKA-14957
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Mickael Maison
>Priority: Minor
>
> The default value for state.dir is documented as 
> /var/folders/0t/68svdzmx1sld0mxjl8dgmmzmgq/T//kafka-streams
> This is misleading, the value will be different in each environment as it 
> computed using System.getProperty("java.io.tmpdir"). We should update the 
> description to mention how the path is computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14957) Default value for state.dir is confusing



[ 
https://issues.apache.org/jira/browse/KAFKA-14957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17718759#comment-17718759
 ] 

Matthias J. Sax commented on KAFKA-14957:
-

[~mimaison] – Thanks.

What part of the docs are you referring too?

> Default value for state.dir is confusing
> 
>
> Key: KAFKA-14957
> URL: https://issues.apache.org/jira/browse/KAFKA-14957
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Mickael Maison
>Priority: Major
>
> The default value for state.dir is documented as 
> /var/folders/0t/68svdzmx1sld0mxjl8dgmmzmgq/T//kafka-streams
> This is misleading, the value will be different in each environment as it 
> computed using System.getProperty("java.io.tmpdir"). We should update the 
> description to mention how the path is computed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Adding non-committers as Github collaborators

2023-04-28 Thread Matthias J. Sax

In general I am +1

The only question I have is about

You may only have 20 active collaborators at any given time per repository.

Not sure if this is a concern or not? I would assume not, but wanted to
bring it to everyone's attention.

There is actually also a way to allow people to re-trigger Jenkins jobs:
https://github.com/apache/kafka/pull/13578

Retriggering test is a little bit more sensitive as our resources are
limited, and we should avoid overwhelming Jenkins even more.

-Matthias

On 4/27/23 11:45 AM, David Arthur wrote:

Hey folks,

I stumbled across this wiki page from the infra team that describes the
various features supported in the ".asf.yaml" file:
https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features

One section that looked particularly interesting was
https://cwiki.apache.org/confluence/display/INFRA/Git+-+.asf.yaml+features#Git.asf.yamlfeatures-AssigningexternalcollaboratorswiththetriageroleonGitHub

github:
collaborators:
- userA
- userB

This would allow us to define non-committers as collaborators on the Github
project. Concretely, this means they would receive the "triage" Github role
(defined here
https://docs.github.com/en/organizations/managing-user-access-to-your-organizations-repositories/repository-roles-for-an-organization#permissions-for-each-role).
Practically, this means we could let non-committers do things like assign
labels and reviewers on Pull Requests.

I wanted to see what the committer group thought about this feature. I
think it could be useful.

Cheers,
David

[jira] [Updated] (KAFKA-14949) Add Streams upgrade tests from AK 3.4

2023-04-28 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14949:

Component/s: system tests

> Add Streams upgrade tests from AK 3.4
> -
>
> Key: KAFKA-14949
> URL: https://issues.apache.org/jira/browse/KAFKA-14949
> Project: Kafka
>  Issue Type: Task
>  Components: streams, system tests
>Reporter: Victoria Xia
>Priority: Critical
>
> Streams upgrade tests currently only test upgrading from 3.3 and earlier 
> versions 
> ([link|https://github.com/apache/kafka/blob/056657d84d84e116ffc9460872945b4d2b479ff3/tests/kafkatest/tests/streams/streams_application_upgrade_test.py#L30]).
>  We should add 3.4 as an "upgrade_from" version into these tests, in light of 
> the upcoming 3.5 release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Re-visit end of life policy

2023-04-25 Thread Matthias J. Sax


Adding the user-mailing list. Seems relevant to everybody.

On 4/20/23 2:45 AM, Divij Vaidya wrote:

Thank you Matthias for your comments.

I agree with you that the decision should be driven based on strong
community ask as it introduces a significant overhead on the maintainers. I
was hoping that more folks (users of Kafka) would contribute to this thread
with their opinion but perhaps, I need to find alternative ways to get data
about Kafka version usage in the wild. Given the effort of migrating major
versions (2.x to 3.x), I am actually surprised that we don't hear more
often from the users about the community's 12 month EOL policy.

I will get back on this thread once I have more data to support the
proposal.

--
Divij Vaidya



On Thu, Apr 20, 2023 at 3:52 AM Matthias J. Sax  wrote:


While I understand the desire, I tend to agree with Ismael.

In general, it's a significant amount of work not just to do the actual
releases, but also the cherry-pick bug-fixed to older branches. Code
diverges very quickly, and a clean cherry-pick is usually only possible
for one or two branches. And it's not just simple conflicts that are
easy to resolve, but it often even implies to do a full new fix, if the
corresponding code was refactored, what is more often the case than one
might think.

If there is no very strong ask from the community, I would rather let
committer spent their time reviewing PRs instead and help contributors
to get the work merged.

Just my 2ct.

-Matthias


On 4/13/23 2:52 PM, Ismael Juma wrote:

Clarification below.

I did not understand your point about maintenance expense to ensure

compatibility. I am confused because, IMO, irrespective of our bug fix
support duration for minor versions, we should ensure that all prior

minor

versions are compatible. Hence, increasing the support duration to 24
months will not add more expense than today to ensure compatibility.



No, I am not saying that. I am saying that there is no reason not to
upgrade from one minor release to another since we provide full
compatibility between minor releases. The expensive part is that we

release

3 times a year, so you have to support 6 releases at any given point in
time. More importantly, you have to validate all these releases, handle

any

additional bugs and so on. When it comes to the CVE stuff, you also have

to

deal with cases where a project you depend on forces an upgrade to a
release with compatibility impact and so on. Having seen this first hand,
it's a significant amount of work.

Ismael

Re: [DISCUSS] Re-visit end of life policy

2023-04-25 Thread Matthias J. Sax


Adding the user-mailing list. Seems relevant to everybody.

On 4/20/23 2:45 AM, Divij Vaidya wrote:

Thank you Matthias for your comments.

I agree with you that the decision should be driven based on strong
community ask as it introduces a significant overhead on the maintainers. I
was hoping that more folks (users of Kafka) would contribute to this thread
with their opinion but perhaps, I need to find alternative ways to get data
about Kafka version usage in the wild. Given the effort of migrating major
versions (2.x to 3.x), I am actually surprised that we don't hear more
often from the users about the community's 12 month EOL policy.

I will get back on this thread once I have more data to support the
proposal.

--
Divij Vaidya



On Thu, Apr 20, 2023 at 3:52 AM Matthias J. Sax  wrote:


While I understand the desire, I tend to agree with Ismael.

In general, it's a significant amount of work not just to do the actual
releases, but also the cherry-pick bug-fixed to older branches. Code
diverges very quickly, and a clean cherry-pick is usually only possible
for one or two branches. And it's not just simple conflicts that are
easy to resolve, but it often even implies to do a full new fix, if the
corresponding code was refactored, what is more often the case than one
might think.

If there is no very strong ask from the community, I would rather let
committer spent their time reviewing PRs instead and help contributors
to get the work merged.

Just my 2ct.

-Matthias


On 4/13/23 2:52 PM, Ismael Juma wrote:

Clarification below.

I did not understand your point about maintenance expense to ensure

compatibility. I am confused because, IMO, irrespective of our bug fix
support duration for minor versions, we should ensure that all prior

minor

versions are compatible. Hence, increasing the support duration to 24
months will not add more expense than today to ensure compatibility.



No, I am not saying that. I am saying that there is no reason not to
upgrade from one minor release to another since we provide full
compatibility between minor releases. The expensive part is that we

release

3 times a year, so you have to support 6 releases at any given point in
time. More importantly, you have to validate all these releases, handle

any

additional bugs and so on. When it comes to the CVE stuff, you also have

to

deal with cases where a project you depend on forces an upgrade to a
release with compatibility impact and so on. Having seen this first hand,
it's a significant amount of work.

Ismael

[jira] [Assigned] (KAFKA-14911) Add system tests for rolling upgrade path of KIP-904



 [ 
https://issues.apache.org/jira/browse/KAFKA-14911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14911:
---

Assignee: Victoria Xia

> Add system tests for rolling upgrade path of KIP-904
> 
>
> Key: KAFKA-14911
> URL: https://issues.apache.org/jira/browse/KAFKA-14911
> Project: Kafka
>  Issue Type: Test
>Reporter: Farooq Qaiser
>Assignee: Victoria Xia
>Priority: Major
> Fix For: 3.5.0
>
>
> As per [~mjsax] comment 
> [here|https://github.com/apache/kafka/pull/10747#pullrequestreview-1376539752],
>  we should add a system test to test the rolling upgrade path for 
> [KIP-904|https://cwiki.apache.org/confluence/x/P5VbDg] which introduces a new 
> serialization format for groupBy internal repartition topics and was 
> implemented as part of https://issues.apache.org/jira/browse/KAFKA-12446 
> There is `StreamsUpgradeTest.java` and `streams_upgrade_test.py` (cf 
> `test_rolling_upgrade_with_2_bounces`) as a starting point.
> Might be best to do a similar thing as for FK-joins, and add a new test 
> variation. 
> The tricky thing about the test would be, to ensure that the repartition 
> topic is not empty when we do the bounce, so the test should be setup 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-14839) Exclude protected variable from JavaDocs



 [ 
https://issues.apache.org/jira/browse/KAFKA-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14839:
---

Assignee: Atul Sharma

> Exclude protected variable from JavaDocs
> 
>
> Key: KAFKA-14839
> URL: https://issues.apache.org/jira/browse/KAFKA-14839
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation, streams
>            Reporter: Matthias J. Sax
>Assignee: Atul Sharma
>Priority: Major
>
> Cf 
> [https://kafka.apache.org/31/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#enableSpuriousResultFix]
> The variable `enableSpuriousResultFix` is protected, and it's not public API, 
> and thus should not show up in the JavaDocs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14839) Exclude protected variable from JavaDocs



[ 
https://issues.apache.org/jira/browse/KAFKA-14839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17716493#comment-17716493
 ] 

Matthias J. Sax commented on KAFKA-14839:
-

Sure!

> Exclude protected variable from JavaDocs
> 
>
> Key: KAFKA-14839
> URL: https://issues.apache.org/jira/browse/KAFKA-14839
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation, streams
>            Reporter: Matthias J. Sax
>Priority: Major
>
> Cf 
> [https://kafka.apache.org/31/javadoc/org/apache/kafka/streams/kstream/JoinWindows.html#enableSpuriousResultFix]
> The variable `enableSpuriousResultFix` is protected, and it's not public API, 
> and thus should not show up in the JavaDocs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14936) Add Grace Period To Stream Table Join



 [ 
https://issues.apache.org/jira/browse/KAFKA-14936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14936:

Labels: kip streams  (was: streams)

> Add Grace Period To Stream Table Join
> -
>
> Key: KAFKA-14936
> URL: https://issues.apache.org/jira/browse/KAFKA-14936
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Walker Carlson
>Assignee: Walker Carlson
>Priority: Major
>  Labels: kip, streams
>
> Include the grace period for stream table joins as described in kip 923.
> Also add a rocksDB time based queueing implementation of 
> `TimeOrderedKeyValueBuffer`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14172) bug: State stores lose state when tasks are reassigned under EOS wit…



 [ 
https://issues.apache.org/jira/browse/KAFKA-14172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14172:

Fix Version/s: 3.4.1

> bug: State stores lose state when tasks are reassigned under EOS wit…
> -
>
> Key: KAFKA-14172
> URL: https://issues.apache.org/jira/browse/KAFKA-14172
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.1
>Reporter: Martin Hørslev
>Assignee: Guozhang Wang
>Priority: Critical
> Fix For: 3.5.0, 3.4.1
>
>
> h1. State stores lose state when tasks are reassigned under EOS with standby 
> replicas and default acceptable lag.
> I have observed that state stores used in a transform step under a Exactly 
> Once semantics ends up losing state after a rebalancing event that includes 
> reassignment of tasks to previous standby task within the acceptable standby 
> lag.
>  
> The problem is reproduceable and an integration test have been created to 
> showcase the [issue|https://github.com/apache/kafka/pull/12540]. 
> A detailed description of the observed issue is provided 
> [here|https://github.com/apache/kafka/pull/12540/files?short_path=3ca480e#diff-3ca480ef093a1faa18912e1ebc679be492b341147b96d7a85bda59911228ef45]
> Similar issues have been observed and reported to StackOverflow for example 
> [here|https://stackoverflow.com/questions/69038181/kafka-streams-aggregation-data-loss-between-instance-restarts-and-rebalances].
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14922) kafka-streams-application-reset deletes topics not belonging to specified application-id



 [ 
https://issues.apache.org/jira/browse/KAFKA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14922:

Labels: beginner needs-kip newbie  (was: )

> kafka-streams-application-reset deletes topics not belonging to specified 
> application-id
> 
>
> Key: KAFKA-14922
> URL: https://issues.apache.org/jira/browse/KAFKA-14922
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, tools
>Affects Versions: 3.4.0
>Reporter: Jørgen
>Priority: Major
>  Labels: beginner, needs-kip, newbie
>
> Slack-thread: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1681908267206849]
> When running the command _kafka-streams-application-reset --bootstrap-servers 
> $BOOTSTRAP --application-id foo_ all internal topics that _starts with_ foo 
> is deleted. This happens even if there's no application-id named foo.
> Example:
> {code:java}
> Application IDs:
> foo-v1
> foo-v2
> Internal topics:
> foo-v1-repartition-topic-repartition
> foo-v2-repartition-topic-repartition 
> Application reset:
> kafka-streams-application-reset --bootstrap-servers $BOOTSTRAP 
> --application-id foo
> > No input or intermediate topics specified. Skipping seek.
> Deleting inferred internal topics [foo-v2-repartition-topic-repartition, 
> foo-v1-repartition-topic-repartition]
> Done.{code}
> Expected behaviour is that the command fails as there are no application-id's 
> with the name foo instead of deleting all foo* topics. 
> This is critical on typos or if application-ids starts with the same name as 
> others (for example if we had foo-v21 and wanted to reset foo-v2)
> The bug should be located here: 
> [https://github.com/apache/kafka/blob/c14f56b48461f01743146d58987bc8661ba0d459/tools/src/main/java/org/apache/kafka/tools/StreamsResetter.java#L693]
> Should check that the topics matches the application-id exactly instead of 
> checking that it starts with the application-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14862) Outer stream-stream join does not output all results with multiple input partitions

2023-04-24 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14862:

Affects Version/s: 3.1.0

> Outer stream-stream join does not output all results with multiple input 
> partitions
> ---
>
> Key: KAFKA-14862
> URL: https://issues.apache.org/jira/browse/KAFKA-14862
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0
>Reporter: Bruno Cadonna
>    Assignee: Matthias J. Sax
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
>
> If I execute the following Streams app once with two input topics each with 1 
> partition and then with input topics each with two partitions, I get 
> different results.
>   
> {code:java}
> final KStream leftSide = builder.stream(leftSideTopic);
> final KStream rightSide = builder.stream(rightSideTopic);
> final KStream leftAndRight = leftSide.outerJoin(
> rightSide,
> (leftValue, rightValue) ->
> (rightValue == null) ? leftValue + "/NOTPRESENT": leftValue + "/" + 
> rightValue,
> JoinWindows.ofTimeDifferenceAndGrace(
> Duration.ofSeconds(20), 
> Duration.ofSeconds(10)),
> StreamJoined.with(
> Serdes.String(), /* key */
> Serdes.String(), /* left value */
> Serdes.String()  /* right value */
> ));
> leftAndRight.print(Printed.toSysOut());
> {code}
> To reproduce, produce twice the following batch of records with an interval 
> greater than window + grace period (i.e. > 30 seconds) in between the two 
> batches:
> {code}
> (0, 0)
> (1, 1)
> (2, 2)
> (3, 3)
> (4, 4)
> (5, 5)
> (6, 6)
> (7, 7)
> (8, 8)
> (9, 9)
> {code}
> With input topics with 1 partition I get:
> {code}
> [KSTREAM-PROCESSVALUES-08]: 0, 0/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 1, 1/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 2, 2/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 3, 3/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 4, 4/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 5, 5/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 6, 6/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 7, 7/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 8, 8/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 9, 9/NOTPRESENT
> {code}
> With input topics with 2 partitions I get:
> {code}
> [KSTREAM-PROCESSVALUES-08]: 1, 1/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 3, 3/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 4, 4/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 7, 7/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 8, 8/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 9, 9/NOTPRESENT
> {code}
> I would expect to get the same set of records, maybe in a different order due 
> to the partitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [ANNOUNCE] New PMC chair: Mickael Maison

2023-04-21 Thread Matthias J. Sax


Congrats Mickael!

And thanks a lot for taking on this additional task! Glad to have you!


-Matthias

On 4/21/23 9:40 AM, Viktor Somogyi-Vass wrote:

Jun, thank you for all your hard work! Also, congrats Mickael, it is very
well deserved :)

Best,
Viktor

On Fri, Apr 21, 2023, 18:15 Adam Bellemare  wrote:


Thank you for all your hard work Jun - that's a decade-long legacy!
And congratulations to you Mickael!

On Fri, Apr 21, 2023 at 11:20 AM Josep Prat 
wrote:


Thanks Jun for your work as Chair all these years!
Congratulations Mickael!

Best,

———
Josep Prat

Aiven Deutschland GmbH

Alexanderufer 3-7, 10117 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Fri, Apr 21, 2023, 17:10 Jun Rao  wrote:


Hi, everyone,

After more than 10 years, I am stepping down as the PMC chair of Apache
Kafka. We now have a new chair Mickael Maison, who has been a PMC

member

since 2020. I plan to continue to contribute to Apache Kafka myself.

Congratulations, Mickael!

Jun

Re: [ANNOUNCE] New PMC chair: Mickael Maison

2023-04-21 Thread Matthias J. Sax


Congrats Mickael!

And thanks a lot for taking on this additional task! Glad to have you!


-Matthias

On 4/21/23 9:40 AM, Viktor Somogyi-Vass wrote:

Jun, thank you for all your hard work! Also, congrats Mickael, it is very
well deserved :)

Best,
Viktor

On Fri, Apr 21, 2023, 18:15 Adam Bellemare  wrote:


Thank you for all your hard work Jun - that's a decade-long legacy!
And congratulations to you Mickael!

On Fri, Apr 21, 2023 at 11:20 AM Josep Prat 
wrote:


Thanks Jun for your work as Chair all these years!
Congratulations Mickael!

Best,

———
Josep Prat

Aiven Deutschland GmbH

Alexanderufer 3-7, 10117 Berlin

Amtsgericht Charlottenburg, HRB 209739 B

Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen

m: +491715557497

w: aiven.io

e: josep.p...@aiven.io

On Fri, Apr 21, 2023, 17:10 Jun Rao  wrote:


Hi, everyone,

After more than 10 years, I am stepping down as the PMC chair of Apache
Kafka. We now have a new chair Mickael Maison, who has been a PMC

member

since 2020. I plan to continue to contribute to Apache Kafka myself.

Congratulations, Mickael!

Jun

[jira] [Commented] (KAFKA-14722) Make BooleanSerde public

2023-04-21 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17715157#comment-17715157
 ] 

Matthias J. Sax commented on KAFKA-14722:
-

I did a PR: [https://github.com/apache/kafka/pull/13577] – Just merged it.

> Make BooleanSerde public
> 
>
> Key: KAFKA-14722
> URL: https://issues.apache.org/jira/browse/KAFKA-14722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Spacrocket
>Priority: Minor
>  Labels: beginner, kip, newbie
> Fix For: 3.5.0
>
>
> KIP-907: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-907%3A+Add+Boolean+Serde+to+public+interface]
>  
> We introduce a "BooleanSerde" via 
> [https://github.com/apache/kafka/pull/13249] as internal class. We could make 
> it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14922) kafka-streams-application-reset deletes topics not belonging to specified application-id

2023-04-20 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714770#comment-17714770
 ] 

Matthias J. Sax commented on KAFKA-14922:
-

{quote}We could add a warning with the list of internal topics found to delete 
and ask for a confirmation to make it harder to inadvertently delete internal 
topics of other application IDs.
{quote}

There is already a `--dry-run` option, but we could of course also try adding a 
`--execute` one and flip it around... Would need a KIP of course.
 
{quote}An _improvement_ would be to only return topics that exactly contains 
the applicationId provided.{quote}
{color:#172b4d}I don't believe it's possible to implement this.{color}
 
{quote}Would not cover the case where other applicationIds starts with 
applicationId provided (foo-v1 would delete foo-v1-2 topics, etc)
{quote}
 
Both seem to be the same issue? Note: we already do 
`topicName.startsWith(options.valueOf(applicationIdOption) + "-")`, ie, we add 
the expected `-` – if your app.id uses a dash like `myApp-v1` there is nothing 
we can do about it. It provides a protection for `appV1` vs `appV2` and if you 
pass in `app` it won't match either of them, but if `-` is use inside app.id, 
it seems there is nothing we can do about it.

> kafka-streams-application-reset deletes topics not belonging to specified 
> application-id
> 
>
> Key: KAFKA-14922
> URL: https://issues.apache.org/jira/browse/KAFKA-14922
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, tools
>Affects Versions: 3.4.0
>Reporter: Jørgen
>Priority: Major
>
> Slack-thread: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1681908267206849]
> When running the command _kafka-streams-application-reset --bootstrap-servers 
> $BOOTSTRAP --application-id foo_ all internal topics that _starts with_ foo 
> is deleted. This happens even if there's no application-id named foo.
> Example:
> {code:java}
> Application IDs:
> foo-v1
> foo-v2
> Internal topics:
> foo-v1-repartition-topic-repartition
> foo-v2-repartition-topic-repartition 
> Application reset:
> kafka-streams-application-reset --bootstrap-servers $BOOTSTRAP 
> --application-id foo
> > No input or intermediate topics specified. Skipping seek.
> Deleting inferred internal topics [foo-v2-repartition-topic-repartition, 
> foo-v1-repartition-topic-repartition]
> Done.{code}
> Expected behaviour is that the command fails as there are no application-id's 
> with the name foo instead of deleting all foo* topics. 
> This is critical on typos or if application-ids starts with the same name as 
> others (for example if we had foo-v21 and wanted to reset foo-v2)
> The bug should be located here: 
> [https://github.com/apache/kafka/blob/c14f56b48461f01743146d58987bc8661ba0d459/tools/src/main/java/org/apache/kafka/tools/StreamsResetter.java#L693]
> Should check that the topics matches the application-id exactly instead of 
> checking that it starts with the application-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14922) kafka-streams-application-reset deletes topics not belonging to specified application-id



 [ 
https://issues.apache.org/jira/browse/KAFKA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14922:

Component/s: streams
 tools

> kafka-streams-application-reset deletes topics not belonging to specified 
> application-id
> 
>
> Key: KAFKA-14922
> URL: https://issues.apache.org/jira/browse/KAFKA-14922
> Project: Kafka
>  Issue Type: Bug
>  Components: streams, tools
>Affects Versions: 3.4.0
>Reporter: Jørgen
>Priority: Major
>
> Slack-thread: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1681908267206849]
> When running the command _kafka-streams-application-reset --bootstrap-servers 
> $BOOTSTRAP --application-id foo_ all internal topics that _starts with_ foo 
> is deleted. This happens even if there's no application-id named foo.
> Example:
> {code:java}
> Application IDs:
> foo-v1
> foo-v2
> Internal topics:
> foo-v1-repartition-topic-repartition
> foo-v2-repartition-topic-repartition 
> Application reset:
> kafka-streams-application-reset --bootstrap-servers $BOOTSTRAP 
> --application-id foo
> > No input or intermediate topics specified. Skipping seek.
> Deleting inferred internal topics [foo-v2-repartition-topic-repartition, 
> foo-v1-repartition-topic-repartition]
> Done.{code}
> Expected behaviour is that the command fails as there are no application-id's 
> with the name foo instead of deleting all foo* topics. 
> This is critical on typos or if application-ids starts with the same name as 
> others (for example if we had foo-v21 and wanted to reset foo-v2)
> The bug should be located here: 
> [https://github.com/apache/kafka/blob/c14f56b48461f01743146d58987bc8661ba0d459/tools/src/main/java/org/apache/kafka/tools/StreamsResetter.java#L693]
> Should check that the topics matches the application-id exactly instead of 
> checking that it starts with the application-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Re-visit end of life policy

2023-04-19 Thread Matthias J. Sax


While I understand the desire, I tend to agree with Ismael.

In general, it's a significant amount of work not just to do the actual 
releases, but also the cherry-pick bug-fixed to older branches. Code 
diverges very quickly, and a clean cherry-pick is usually only possible 
for one or two branches. And it's not just simple conflicts that are 
easy to resolve, but it often even implies to do a full new fix, if the 
corresponding code was refactored, what is more often the case than one 
might think.


If there is no very strong ask from the community, I would rather let 
committer spent their time reviewing PRs instead and help contributors 
to get the work merged.


Just my 2ct.

-Matthias


On 4/13/23 2:52 PM, Ismael Juma wrote:

Clarification below.

I did not understand your point about maintenance expense to ensure

compatibility. I am confused because, IMO, irrespective of our bug fix
support duration for minor versions, we should ensure that all prior minor
versions are compatible. Hence, increasing the support duration to 24
months will not add more expense than today to ensure compatibility.



No, I am not saying that. I am saying that there is no reason not to
upgrade from one minor release to another since we provide full
compatibility between minor releases. The expensive part is that we release
3 times a year, so you have to support 6 releases at any given point in
time. More importantly, you have to validate all these releases, handle any
additional bugs and so on. When it comes to the CVE stuff, you also have to
deal with cases where a project you depend on forces an upgrade to a
release with compatibility impact and so on. Having seen this first hand,
it's a significant amount of work.

Ismael

[jira] [Commented] (KAFKA-14922) kafka-streams-application-reset deletes topics not belonging to specified application-id



[ 
https://issues.apache.org/jira/browse/KAFKA-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17714354#comment-17714354
 ] 

Matthias J. Sax commented on KAFKA-14922:
-

Thanks for creating this ticket. It's a know issue but it's unclear how it 
could be fixed.

The problem is, that topic name have the patter 
`--` – I am not sure how we could look for 
an _exact_ match (we don't know the full topic name)? If there is a way, please 
let us know. But I think we need to close this as "won't fix" unfortunately. 

> kafka-streams-application-reset deletes topics not belonging to specified 
> application-id
> 
>
> Key: KAFKA-14922
> URL: https://issues.apache.org/jira/browse/KAFKA-14922
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.4.0
>Reporter: Jørgen
>Priority: Major
>
> Slack-thread: 
> [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1681908267206849]
> When running the command _kafka-streams-application-reset --bootstrap-servers 
> $BOOTSTRAP --application-id foo_ all internal topics that _starts with_ foo 
> is deleted. This happens even if there's no application-id named foo.
> Example:
> {code:java}
> Application IDs:
> foo-v1
> foo-v2
> Internal topics:
> foo-v1-repartition-topic-repartition
> foo-v2-repartition-topic-repartition 
> Application reset:
> kafka-streams-application-reset --bootstrap-servers $BOOTSTRAP 
> --application-id foo
> > No input or intermediate topics specified. Skipping seek.
> Deleting inferred internal topics [foo-v2-repartition-topic-repartition, 
> foo-v1-repartition-topic-repartition]
> Done.{code}
> Expected behaviour is that the command fails as there are no application-id's 
> with the name foo instead of deleting all foo* topics. 
> This is critical on typos or if application-ids starts with the same name as 
> others (for example if we had foo-v21 and wanted to reset foo-v2)
> The bug should be located here: 
> [https://github.com/apache/kafka/blob/c14f56b48461f01743146d58987bc8661ba0d459/tools/src/main/java/org/apache/kafka/tools/StreamsResetter.java#L693]
> Should check that the topics matches the application-id exactly instead of 
> checking that it starts with the application-id.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-4327) Move Reset Tool from core to streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-4327.

Fix Version/s: (was: 4.0.0)
   Resolution: Fixed

This was resolved via https://issues.apache.org/jira/browse/KAFKA-14586.

> Move Reset Tool from core to streams
> 
>
> Key: KAFKA-4327
> URL: https://issues.apache.org/jira/browse/KAFKA-4327
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Priority: Blocker
>  Labels: kip
>
> This is a follow up on https://issues.apache.org/jira/browse/KAFKA-4008
> Currently, Kafka Streams Application Reset Tool is part of {{core}} module 
> due to ZK dependency. After KIP-4 got merged, this dependency can be dropped 
> and the Reset Tool can be moved to {{streams}} module.
> This should also update {{InternalTopicManager#filterExistingTopics}} that 
> revers to ResetTool in an exception message:
>  {{"Use 'kafka.tools.StreamsResetter' tool"}}
>  -> {{"Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"}}
> Doing this JIRA also requires to update the docs with regard to broker 
> backward compatibility – not all broker support "topic delete request" and 
> thus, the reset tool will not be backward compatible to all broker versions.
> KIP-756: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-4327) Move Reset Tool from core to streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-4327.

Fix Version/s: (was: 4.0.0)
   Resolution: Fixed

This was resolved via https://issues.apache.org/jira/browse/KAFKA-14586.

> Move Reset Tool from core to streams
> 
>
> Key: KAFKA-4327
> URL: https://issues.apache.org/jira/browse/KAFKA-4327
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Priority: Blocker
>  Labels: kip
>
> This is a follow up on https://issues.apache.org/jira/browse/KAFKA-4008
> Currently, Kafka Streams Application Reset Tool is part of {{core}} module 
> due to ZK dependency. After KIP-4 got merged, this dependency can be dropped 
> and the Reset Tool can be moved to {{streams}} module.
> This should also update {{InternalTopicManager#filterExistingTopics}} that 
> revers to ResetTool in an exception message:
>  {{"Use 'kafka.tools.StreamsResetter' tool"}}
>  -> {{"Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"}}
> Doing this JIRA also requires to update the docs with regard to broker 
> backward compatibility – not all broker support "topic delete request" and 
> thus, the reset tool will not be backward compatible to all broker versions.
> KIP-756: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-4327) Move Reset Tool from core to streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-4327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-4327:
--

Assignee: (was: Jorge Esteban Quilcate Otoya)

> Move Reset Tool from core to streams
> 
>
> Key: KAFKA-4327
> URL: https://issues.apache.org/jira/browse/KAFKA-4327
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Priority: Blocker
>  Labels: kip
> Fix For: 4.0.0
>
>
> This is a follow up on https://issues.apache.org/jira/browse/KAFKA-4008
> Currently, Kafka Streams Application Reset Tool is part of {{core}} module 
> due to ZK dependency. After KIP-4 got merged, this dependency can be dropped 
> and the Reset Tool can be moved to {{streams}} module.
> This should also update {{InternalTopicManager#filterExistingTopics}} that 
> revers to ResetTool in an exception message:
>  {{"Use 'kafka.tools.StreamsResetter' tool"}}
>  -> {{"Use '" + kafka.tools.StreamsResetter.getClass().getName() + "' tool"}}
> Doing this JIRA also requires to update the docs with regard to broker 
> backward compatibility – not all broker support "topic delete request" and 
> thus, the reset tool will not be backward compatible to all broker versions.
> KIP-756: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14586) Move StreamsResetter to tools

2023-04-18 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713836#comment-17713836
 ] 

Matthias J. Sax commented on KAFKA-14586:
-

Thanks for providing context, and no worries about not knowing about the other 
KIP (there is too many things going on, and I also just realized the overlap).

Yes, `StreamsResetter` might be used programmatically, so we should add a 
redirection. Who will do this? Guess we should get it in before code freeze to 
not delay the release.

I am not worried about moving the test because it's not user facing.

Overall, it seem we can close out the other KIP and ticket as "subsumed" by 
this ticket/KIP. I can do the cleanup for it.

Just let me know if there is anything I can help with, or if the matter is 
resolved after we got the missing redirection merged.

 

> Move StreamsResetter to tools
> -
>
> Key: KAFKA-14586
> URL: https://issues.apache.org/jira/browse/KAFKA-14586
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Sagar Rao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-14862) Outer stream-stream join does not output all results with multiple input partitions

2023-04-17 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14862:
---

Assignee: Matthias J. Sax

> Outer stream-stream join does not output all results with multiple input 
> partitions
> ---
>
> Key: KAFKA-14862
> URL: https://issues.apache.org/jira/browse/KAFKA-14862
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Bruno Cadonna
>    Assignee: Matthias J. Sax
>Priority: Major
>
> If I execute the following Streams app once with two input topics each with 1 
> partition and then with input topics each with two partitions, I get 
> different results.
>   
> {code:java}
> final KStream leftSide = builder.stream(leftSideTopic);
> final KStream rightSide = builder.stream(rightSideTopic);
> final KStream leftAndRight = leftSide.outerJoin(
> rightSide,
> (leftValue, rightValue) ->
> (rightValue == null) ? leftValue + "/NOTPRESENT": leftValue + "/" + 
> rightValue,
> JoinWindows.ofTimeDifferenceAndGrace(
> Duration.ofSeconds(20), 
> Duration.ofSeconds(10)),
> StreamJoined.with(
> Serdes.String(), /* key */
> Serdes.String(), /* left value */
> Serdes.String()  /* right value */
> ));
> leftAndRight.print(Printed.toSysOut());
> {code}
> To reproduce, produce twice the following batch of records with an interval 
> greater than window + grace period (i.e. > 30 seconds) in between the two 
> batches:
> {code}
> (0, 0)
> (1, 1)
> (2, 2)
> (3, 3)
> (4, 4)
> (5, 5)
> (6, 6)
> (7, 7)
> (8, 8)
> (9, 9)
> {code}
> With input topics with 1 partition I get:
> {code}
> [KSTREAM-PROCESSVALUES-08]: 0, 0/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 1, 1/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 2, 2/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 3, 3/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 4, 4/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 5, 5/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 6, 6/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 7, 7/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 8, 8/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 9, 9/NOTPRESENT
> {code}
> With input topics with 2 partitions I get:
> {code}
> [KSTREAM-PROCESSVALUES-08]: 1, 1/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 3, 3/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 4, 4/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 7, 7/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 8, 8/NOTPRESENT
> [KSTREAM-PROCESSVALUES-08]: 9, 9/NOTPRESENT
> {code}
> I would expect to get the same set of records, maybe in a different order due 
> to the partitioning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14586) Move StreamsResetter to tools

2023-04-17 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14586?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713233#comment-17713233
 ] 

Matthias J. Sax commented on KAFKA-14586:
-

[~mimaison] [~sagarrao] – I am just realizing that we did this as part of 3.5 – 
We actually had https://issues.apache.org/jira/browse/KAFKA-4327 that we did 
not do in the past, because we thought we should only do it in major release, 
as it seems to be a breaking change (we also had 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-756%3A+Move+StreamsResetter+tool+outside+of+core]
 for it)

Seems you solve the issue about introducing a breaking change with some 
"redirection" according to KIP-906. So we can we close K4327 and the K-756?

> Move StreamsResetter to tools
> -
>
> Key: KAFKA-14586
> URL: https://issues.apache.org/jira/browse/KAFKA-14586
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Sagar Rao
>Priority: Major
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14911) Add system tests for rolling upgrade path of KIP-904

2023-04-17 Thread Matthias J. Sax (Jira)



[ 
https://issues.apache.org/jira/browse/KAFKA-14911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17713230#comment-17713230
 ] 

Matthias J. Sax commented on KAFKA-14911:
-

[~fqpublic] – are you planning to pickup this ticket?

> Add system tests for rolling upgrade path of KIP-904
> 
>
> Key: KAFKA-14911
> URL: https://issues.apache.org/jira/browse/KAFKA-14911
> Project: Kafka
>  Issue Type: Test
>Reporter: Farooq Qaiser
>Priority: Major
> Fix For: 3.5.0
>
>
> As per [~mjsax] comment 
> [here|https://github.com/apache/kafka/pull/10747#pullrequestreview-1376539752],
>  we should add a system test to test the rolling upgrade path for 
> [KIP-904|https://cwiki.apache.org/confluence/x/P5VbDg] which introduces a new 
> serialization format for groupBy internal repartition topics and was 
> implemented as part of https://issues.apache.org/jira/browse/KAFKA-12446 
> There is `StreamsUpgradeTest.java` and `streams_upgrade_test.py` (cf 
> `test_rolling_upgrade_with_2_bounces`) as a starting point.
> Might be best to do a similar thing as for FK-joins, and add a new test 
> variation. 
> The tricky thing about the test would be, to ensure that the repartition 
> topic is not empty when we do the bounce, so the test should be setup 
> accordingly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-04-14 Thread Matthias J. Sax


Thanks a lot!

On 4/14/23 5:32 AM, Mickael Maison wrote:

Hi Matthias,

I merged the PR before cutting the 3.5 branch.

Thanks,
Mickael

On Fri, Apr 14, 2023 at 2:31 PM Mickael Maison  wrote:


Hi David,

I've created the 3.5 branch. Feel free to cherry pick these 2 commits
when they are ready.

Thanks,
Mickael

On Fri, Apr 14, 2023 at 11:23 AM Satish Duggana
 wrote:


Thanks Luke for helping with the reviews and adding a few tests in a
couple of PRs.

Hi Mickael,
I raised 3 PRs recently for tiered storage, one is merged. The other 2
PRs are in the critical path of non-tiered storage changes also.
Especially, in consumer fetch and retention cleanup paths. These need
to be thoroughly reviewed and avoid any regressions in that area. We
should merge them to the trunk as soon as possible to make it easier
to work on follow-up PRs. IMO, we can avoid merging these PRs in 3.5
just before the release without baking for a longer duration. We can
take a call on this later after the reviews are done.

Many of the individual functionalities related to tiered storage like
default topic based RLMM implementation, enhanced follower fetch
protocol implementation for tiered storage, copying remote log
segments are merged.
There are 2 PRs for consumer fetch for remote record reads, remote
retention cleanup and topic deletion functionality are under review.

I do not think it can be considered as an early access review even
with the 2 PRs in review. Luke and I synced up and agreed on the same.
Most of the recent functionality is added with a few unit tests. We
plan to have follow-up PRs on the immediate pending items and also
raise PRs in the next few weeks on unit tests, integration test
framework and several integration tests for many of these
functionalities tying together.

Thanks,
Satish.


On Fri, 14 Apr 2023 at 12:52, Matthias J. Sax  wrote:


Hey Mickael,

we have one open PR for KIP-914 left. Would be great if you could merge
it before cutting the 3.5 branch. If you don't want to merge it and
prefer that I cherry-pick it to 3.5 branch later, also works for me.

I did close the ticket already as resolved. It's just a minor change the
KIP.

https://github.com/apache/kafka/pull/13565


Thanks a lot!
-Matthias


On 4/13/23 4:32 AM, David Jacot wrote:

Hi Mickael,

Thanks for the heads up. As raised by Jeff earlier in this thread, we would
like to get the two small patches [1][2] for KIP-915 in 3.5. The PRs are in
review and I should be able to merge them in the next few days. I will
cherry-pick them to the release branch in your create it in the meantime.

[1] https://github.com/apache/kafka/pull/13511
[2] https://github.com/apache/kafka/pull/13526

Best,
David

On Thu, Apr 13, 2023 at 12:55 PM Mickael Maison 
wrote:


Hi Luke,

Thanks for the heads up. This would be great to get Tiered Storage in
Early Access. Let me know if you can't get everything done this week.

Mickael

On Thu, Apr 13, 2023 at 12:54 PM Mickael Maison
 wrote:


Hi,

We've now reached feature freeze for 3.5.0. From now on, only bug
fixes and changes related to stabilizing the release should be merged.

I plan to create the release branch tomorrow (Friday 14). After this
point, you'll have to cherry pick changes to the release branch when
merging a PR. I'll send another message once the branch has been
created.

I've updated the release plan and started moving KIPs that are not
complete to the postponed section. For now I've kept a few KIPs that
are still in progress. If they are not fully merged when I create a
branch, I'll mark them as postponed too.

The next milestone is code freeze on April 26.

Thanks,
Mickael

On Wed, Apr 12, 2023 at 12:24 PM Luke Chen  wrote:


Hi Mickael,

I'd like to ask for some more days for KIP-405 tiered storage PRs to
include in v3.5.
Currently, we have 1 PR under reviewing (
https://github.com/apache/kafka/pull/13535), and 1 PR soon will be

opened

for review.
After these 2 PRs merged, we can have an "Early Access" for tiered

storage

feature, that allow users to use in non-production environments.
Does that work for you?

Thank you.
Luke

On Thu, Apr 6, 2023 at 2:49 AM Jeff Kim 


wrote:


Hi Mickael,

Thank you.

Best,
Jeff

On Wed, Apr 5, 2023 at 1:28 PM Mickael Maison <

mickael.mai...@gmail.com>

wrote:


Hi Jeff,

Ok, I've added KIP-915 to the release plan.

Thanks,
Mickael

On Wed, Apr 5, 2023 at 6:48 PM Jeff Kim



wrote:


Hi Mickael,

I would like to bring up that KIP-915 proposes to patch 3.5
although it missed the KIP freeze date. If the patch is done

before the

feature freeze date, 4/13, would this be acceptable? If so,

should this

be added to the 3.5.0 Release Plan wiki?

Best,
Jeff

On Mon, Mar 27, 2023 at 1:02 PM Greg Harris



wrote:


Mickael,

Just wanted to let you know that I will not be including

KIP-898 in

the

3.5.0 release.
I think the change needed is not reviewable before the feature

freeze

deadline, and would take resources away from other more

necessary

c

Re: [DISCUSS] Apache Kafka 3.5.0 release

2023-04-14 Thread Matthias J. Sax


Hey Mickael,

we have one open PR for KIP-914 left. Would be great if you could merge 
it before cutting the 3.5 branch. If you don't want to merge it and 
prefer that I cherry-pick it to 3.5 branch later, also works for me.


I did close the ticket already as resolved. It's just a minor change the 
KIP.


https://github.com/apache/kafka/pull/13565


Thanks a lot!
  -Matthias


On 4/13/23 4:32 AM, David Jacot wrote:

Hi Mickael,

Thanks for the heads up. As raised by Jeff earlier in this thread, we would
like to get the two small patches [1][2] for KIP-915 in 3.5. The PRs are in
review and I should be able to merge them in the next few days. I will
cherry-pick them to the release branch in your create it in the meantime.

[1] https://github.com/apache/kafka/pull/13511
[2] https://github.com/apache/kafka/pull/13526

Best,
David

On Thu, Apr 13, 2023 at 12:55 PM Mickael Maison 
wrote:


Hi Luke,

Thanks for the heads up. This would be great to get Tiered Storage in
Early Access. Let me know if you can't get everything done this week.

Mickael

On Thu, Apr 13, 2023 at 12:54 PM Mickael Maison
 wrote:


Hi,

We've now reached feature freeze for 3.5.0. From now on, only bug
fixes and changes related to stabilizing the release should be merged.

I plan to create the release branch tomorrow (Friday 14). After this
point, you'll have to cherry pick changes to the release branch when
merging a PR. I'll send another message once the branch has been
created.

I've updated the release plan and started moving KIPs that are not
complete to the postponed section. For now I've kept a few KIPs that
are still in progress. If they are not fully merged when I create a
branch, I'll mark them as postponed too.

The next milestone is code freeze on April 26.

Thanks,
Mickael

On Wed, Apr 12, 2023 at 12:24 PM Luke Chen  wrote:


Hi Mickael,

I'd like to ask for some more days for KIP-405 tiered storage PRs to
include in v3.5.
Currently, we have 1 PR under reviewing (
https://github.com/apache/kafka/pull/13535), and 1 PR soon will be

opened

for review.
After these 2 PRs merged, we can have an "Early Access" for tiered

storage

feature, that allow users to use in non-production environments.
Does that work for you?

Thank you.
Luke

On Thu, Apr 6, 2023 at 2:49 AM Jeff Kim 


wrote:


Hi Mickael,

Thank you.

Best,
Jeff

On Wed, Apr 5, 2023 at 1:28 PM Mickael Maison <

mickael.mai...@gmail.com>

wrote:


Hi Jeff,

Ok, I've added KIP-915 to the release plan.

Thanks,
Mickael

On Wed, Apr 5, 2023 at 6:48 PM Jeff Kim



wrote:


Hi Mickael,

I would like to bring up that KIP-915 proposes to patch 3.5
although it missed the KIP freeze date. If the patch is done

before the

feature freeze date, 4/13, would this be acceptable? If so,

should this

be added to the 3.5.0 Release Plan wiki?

Best,
Jeff

On Mon, Mar 27, 2023 at 1:02 PM Greg Harris



wrote:


Mickael,

Just wanted to let you know that I will not be including

KIP-898 in

the

3.5.0 release.
I think the change needed is not reviewable before the feature

freeze

deadline, and would take resources away from other more

necessary

changes.


Thanks!
Greg

On Thu, Mar 23, 2023 at 9:01 AM Chia-Ping Tsai <

chia7...@gmail.com>

wrote:



If you have a KIP that is accepted, make sure it is listed

in





https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0

and that it's status is accurate.


Thanks for the reminder. Have added KIP-641 to the list.

Thanks,
Chia-Ping


Mickael Maison  於 2023年3月23日

下午11:51

寫道：


Hi all,

KIP Freeze was yesterday. The next milestone is feature

freeze on

April

12.

If you have a KIP that is accepted, make sure it is listed

in





https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.5.0

and that it's status is accurate.

Thanks,
Mickael

On Fri, Mar 17, 2023 at 6:22 PM Christo Lolov <

christolo...@gmail.com>

wrote:


Hello!

What would you suggest as the best way to get more eyes on

KIP-902 as

I

would like it to be included it in 3.5.0?


Best,
Christo


On 16 Mar 2023, at 10:33, Mickael Maison <

mickael.mai...@gmail.com>

wrote:


Hi,

This is a reminder that KIP freeze is less than a week

away (22

Mar).

For a KIP to be considered for this release, it must be

voted

and

accepted by that date.

Feature freeze will be 3 weeks after this, so if you

want KIPs

or

other significant changes in the release, please get

them ready

soon.


Thanks,
Mickael


On Tue, Feb 14, 2023 at 10:44 PM Ismael Juma <

ism...@juma.me.uk



wrote:


Thanks!

Ismael

On Tue, Feb 14, 2023 at 1:07 PM Mickael Maison <

mickael.mai...@gmail.com>

wrote:


Hi Ismael,

Good call. I shifted all dates by 2 weeks and moved

them to

Wednesdays.


Thanks,
Mickael

On Tue, Feb 14, 2023 at 6:01 PM Ismael Juma <

ism...@juma.me.uk



wrote:


Thanks Mickael. A couple of notes:

1. We typically choose a Wednesday for the various

freeze

dates -

there

are

often 1-2 day slips and it's better if that doesn't

require

people

working

[jira] [Resolved] (KAFKA-7499) Extend ProductionExceptionHandler to cover serialization exceptions



 [ 
https://issues.apache.org/jira/browse/KAFKA-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7499.

Fix Version/s: 3.5.0
   Resolution: Fixed

> Extend ProductionExceptionHandler to cover serialization exceptions
> ---
>
> Key: KAFKA-7499
> URL: https://issues.apache.org/jira/browse/KAFKA-7499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Philip Nee
>Priority: Major
>  Labels: beginner, kip, newbie, newbie++
> Fix For: 3.5.0
>
>
> In 
> [KIP-210|https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce],
>  an exception handler for the write path was introduced. This exception 
> handler covers exception that are raised in the producer callback.
> However, serialization happens before the data is handed to the producer with 
> Kafka Streams itself and the producer uses `byte[]/byte[]` key-value-pair 
> types.
> Thus, we might want to extend the ProductionExceptionHandler to cover 
> serialization exception, too, to skip over corrupted output messages. An 
> example could be a "String" message that contains invalid JSON and should be 
> serialized as JSON.
> KIP-399 (not voted yet; feel free to pick it up): 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-7499) Extend ProductionExceptionHandler to cover serialization exceptions



 [ 
https://issues.apache.org/jira/browse/KAFKA-7499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-7499.

Fix Version/s: 3.5.0
   Resolution: Fixed

> Extend ProductionExceptionHandler to cover serialization exceptions
> ---
>
> Key: KAFKA-7499
> URL: https://issues.apache.org/jira/browse/KAFKA-7499
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Philip Nee
>Priority: Major
>  Labels: beginner, kip, newbie, newbie++
> Fix For: 3.5.0
>
>
> In 
> [KIP-210|https://cwiki.apache.org/confluence/display/KAFKA/KIP-210+-+Provide+for+custom+error+handling++when+Kafka+Streams+fails+to+produce],
>  an exception handler for the write path was introduced. This exception 
> handler covers exception that are raised in the producer callback.
> However, serialization happens before the data is handed to the producer with 
> Kafka Streams itself and the producer uses `byte[]/byte[]` key-value-pair 
> types.
> Thus, we might want to extend the ProductionExceptionHandler to cover 
> serialization exception, too, to skip over corrupted output messages. An 
> example could be a "String" message that contains invalid JSON and should be 
> serialized as JSON.
> KIP-399 (not voted yet; feel free to pick it up): 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-399%3A+Extend+ProductionExceptionHandler+to+cover+serialization+exceptions]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14834) Improved processor semantics for versioned stores



 [ 
https://issues.apache.org/jira/browse/KAFKA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14834.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Improved processor semantics for versioned stores
> -
>
> Key: KAFKA-14834
> URL: https://issues.apache.org/jira/browse/KAFKA-14834
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
>  Labels: kip, streams
> Fix For: 3.5.0
>
>
> With the introduction of versioned state stores in 
> [KIP-889|https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores],
>  we should leverage them to provide improved join semantics. 
> As described in 
> [KIP-914|https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+DSL+Processor+Semantics+for+Versioned+Stores],
>  we will make the following four improvements:
>  * stream-table joins will perform a timestamped lookup (using the 
> stream-side record timestamp) if the table is versioned
>  * table-table joins, including foreign key joins, will not produce new join 
> results on out-of-order records (by key) from versioned tables
>  * table filters will disable the existing optimization to not send duplicate 
> tombstones when applied to a versioned table
>  * table aggregations will ignore out-of-order records when aggregating a 
> versioned table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14834) Improved processor semantics for versioned stores



 [ 
https://issues.apache.org/jira/browse/KAFKA-14834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14834.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Improved processor semantics for versioned stores
> -
>
> Key: KAFKA-14834
> URL: https://issues.apache.org/jira/browse/KAFKA-14834
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
>  Labels: kip, streams
> Fix For: 3.5.0
>
>
> With the introduction of versioned state stores in 
> [KIP-889|https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores],
>  we should leverage them to provide improved join semantics. 
> As described in 
> [KIP-914|https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+DSL+Processor+Semantics+for+Versioned+Stores],
>  we will make the following four improvements:
>  * stream-table joins will perform a timestamped lookup (using the 
> stream-side record timestamp) if the table is versioned
>  * table-table joins, including foreign key joins, will not produce new join 
> results on out-of-order records (by key) from versioned tables
>  * table filters will disable the existing optimization to not send duplicate 
> tombstones when applied to a versioned table
>  * table aggregations will ignore out-of-order records when aggregating a 
> versioned table



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14209) Optimize stream stream self join to use single state store

2023-04-12 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14209:

Description: 
KIP-862: 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-862%3A+Self-join+optimization+for+stream-stream+joins]
 

For stream-stream joins that join the same source, we can omit one state store 
since they contain the same data.

  was:For stream-stream joins that join the same source, we can omit one state 
store since they contain the same data.


> Optimize stream stream self join to use single state store
> --
>
> Key: KAFKA-14209
> URL: https://issues.apache.org/jira/browse/KAFKA-14209
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vicky Papavasileiou
>Assignee: Vicky Papavasileiou
>Priority: Major
>  Labels: kip
> Fix For: 3.4.0
>
>
> KIP-862: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-862%3A+Self-join+optimization+for+stream-stream+joins]
>  
> For stream-stream joins that join the same source, we can omit one state 
> store since they contain the same data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14209) Optimize stream stream self join to use single state store

2023-04-12 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14209:

Labels: kip  (was: )

> Optimize stream stream self join to use single state store
> --
>
> Key: KAFKA-14209
> URL: https://issues.apache.org/jira/browse/KAFKA-14209
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Vicky Papavasileiou
>Assignee: Vicky Papavasileiou
>Priority: Major
>  Labels: kip
> Fix For: 3.4.0
>
>
> For stream-stream joins that join the same source, we can omit one state 
> store since they contain the same data.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: Fwd: [VOTE] KIP-914 Join Processor Semantics for Versioned Stores

2023-04-11 Thread Matthias J. Sax

If we send old and new value as two messages, this should work I guess? 
Victory could confirm. -- But not if we send old/new as a single message 
in case the new-key does not change?


-Matthias

On 4/11/23 5:25 AM, Lucas Brutschy wrote:

Hi,

No concerns at all, just a clarifying question from my side: for
detecting out-of-order records, I need both new and old timestamp, I
suppose I get it for the new record via timestamp extractor, can I not
get it the same way from the old record that is passed down to the
aggregation after KIP-904?

Thanks,
Lucas

On Tue, Apr 11, 2023 at 5:35 AM Matthias J. Sax  wrote:


Thanks.

One question: for the repartition topic format change, do we want to
re-use flag=2, or should we introduce flag=3, and determine when
compiling the DSL into the Topology if we want/need to include the
timestamp, and if not, use format version=2 to avoid unnecessary overhead?


-Matthias

On 4/10/23 5:47 PM, Victoria Xia wrote:

Hi everyone,

While wrapping up the implementation for KIP-914, I have discovered that
two more DSL processors require semantic updates in the presence of
versioned tables:

 - The table filter processor has an optimization to drop nulls if the
 previous filtered value is also null. When the upstream table is versioned,
 this optimization should be disabled in order to preserve proper version
 history in the presence of out-of-order data.
 - When performing an aggregation over a versioned table, only the latest
 value by timestamp (per key) should be included in the final aggregate
 value. This is not happening today in the presence of out-of-order data,
 due to the way that TableSourceNodes call `get(key)` in order to determine
 the "old value" which is to be removed from the aggregate as part of
 applying an update. To fix this, aggregations should ignore out-of-order
 records when aggregating versioned tables.
- In order to implement this change, table aggregate processors need
a way to determine whether a record is out-of-order or not. This
cannot be
done by querying the source table value getter as that store belongs to 
a
different subtopology (because a repartition occurs before
aggregation). As
such, an additional timestamp must be included in the repartition topic.
The 3.5 release already includes an update to the repartition
topic format
(with upgrade implications properly handled) via KIP-904

<https://cwiki.apache.org/confluence/display/KAFKA/KIP-904%3A+Kafka+Streams+-+Guarantee+subtractor+is+called+before+adder+if+key+has+not+changed>,
so making an additional change to the repartition topic format to add a
timestamp comes at no additional cost to users.


I have updated the KIP
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+DSL+Processor+Semantics+for+Versioned+Stores>
itself with more detail about each of these changes. Please let me know if
there are any concerns. In the absence of dissent, I'd like to include
these changes along with the rest of KIP-914 in the 3.5 release.

Apologies for not noticing these additional semantics implications earlier,
Victoria

-- Forwarded message -
From: Victoria Xia 
Date: Wed, Mar 22, 2023 at 10:08 AM
Subject: Re: [VOTE] KIP-914 Join Processor Semantics for Versioned Stores
To: 


Thanks for voting, everyone! We have three binding yes votes with no
objections during four full days of voting. I will close the vote and mark
the KIP as accepted, right in time for the 3.5 release.

Thanks,
Victoria

On Wed, Mar 22, 2023 at 7:11 AM Bruno Cadonna  wrote:


+1 (binding)

Thanks Victoria!

Best,
Bruno

On 20.03.23 17:13, Matthias J. Sax wrote:

+1 (binding)

On 3/20/23 9:05 AM, Guozhang Wang wrote:

+1, thank you Victoria!

On Sat, Mar 18, 2023 at 8:27 AM Victoria Xia
 wrote:


Hi all,

I'd like to start a vote on KIP-914 for updating the Kafka Streams join
processors to use proper timestamp-based semantics in applications with
versioned stores:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+Join+Processor+Semantics+for+Versioned+Stores


To avoid compatibility concerns, I'd like to include the changes from
this
KIP together with KIP-889
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores



(for introducing versioned stores) in the upcoming 3.5 release. I will
close the vote on the 3.5 KIP deadline, March 22, if there are no
objections before then.

Thanks,
Victoria

Re: Fwd: [VOTE] KIP-914 Join Processor Semantics for Versioned Stores

2023-04-10 Thread Matthias J. Sax


Thanks.

One question: for the repartition topic format change, do we want to 
re-use flag=2, or should we introduce flag=3, and determine when 
compiling the DSL into the Topology if we want/need to include the 
timestamp, and if not, use format version=2 to avoid unnecessary overhead?



-Matthias

On 4/10/23 5:47 PM, Victoria Xia wrote:

Hi everyone,

While wrapping up the implementation for KIP-914, I have discovered that
two more DSL processors require semantic updates in the presence of
versioned tables:

- The table filter processor has an optimization to drop nulls if the
previous filtered value is also null. When the upstream table is versioned,
this optimization should be disabled in order to preserve proper version
history in the presence of out-of-order data.
- When performing an aggregation over a versioned table, only the latest
value by timestamp (per key) should be included in the final aggregate
value. This is not happening today in the presence of out-of-order data,
due to the way that TableSourceNodes call `get(key)` in order to determine
the "old value" which is to be removed from the aggregate as part of
applying an update. To fix this, aggregations should ignore out-of-order
records when aggregating versioned tables.
   - In order to implement this change, table aggregate processors need
   a way to determine whether a record is out-of-order or not. This
cannot be
   done by querying the source table value getter as that store belongs to a
   different subtopology (because a repartition occurs before
aggregation). As
   such, an additional timestamp must be included in the repartition topic.
   The 3.5 release already includes an update to the repartition
topic format
   (with upgrade implications properly handled) via KIP-904
   
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-904%3A+Kafka+Streams+-+Guarantee+subtractor+is+called+before+adder+if+key+has+not+changed>,
   so making an additional change to the repartition topic format to add a
   timestamp comes at no additional cost to users.


I have updated the KIP
<https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+DSL+Processor+Semantics+for+Versioned+Stores>
itself with more detail about each of these changes. Please let me know if
there are any concerns. In the absence of dissent, I'd like to include
these changes along with the rest of KIP-914 in the 3.5 release.

Apologies for not noticing these additional semantics implications earlier,
Victoria

-- Forwarded message -
From: Victoria Xia 
Date: Wed, Mar 22, 2023 at 10:08 AM
Subject: Re: [VOTE] KIP-914 Join Processor Semantics for Versioned Stores
To: 


Thanks for voting, everyone! We have three binding yes votes with no
objections during four full days of voting. I will close the vote and mark
the KIP as accepted, right in time for the 3.5 release.

Thanks,
Victoria

On Wed, Mar 22, 2023 at 7:11 AM Bruno Cadonna  wrote:


+1 (binding)

Thanks Victoria!

Best,
Bruno

On 20.03.23 17:13, Matthias J. Sax wrote:

+1 (binding)

On 3/20/23 9:05 AM, Guozhang Wang wrote:

+1, thank you Victoria!

On Sat, Mar 18, 2023 at 8:27 AM Victoria Xia
 wrote:


Hi all,

I'd like to start a vote on KIP-914 for updating the Kafka Streams join
processors to use proper timestamp-based semantics in applications with
versioned stores:


https://cwiki.apache.org/confluence/display/KAFKA/KIP-914%3A+Join+Processor+Semantics+for+Versioned+Stores


To avoid compatibility concerns, I'd like to include the changes from
this
KIP together with KIP-889
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores



(for introducing versioned stores) in the upcoming 3.5 release. I will
close the vote on the 3.5 KIP deadline, March 22, if there are no
objections before then.

Thanks,
Victoria

[jira] [Assigned] (KAFKA-14054) Unexpected client shutdown as TimeoutException is thrown as IllegalStateException

2023-04-10 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14054:
---

Assignee: Matthias J. Sax

> Unexpected client shutdown as TimeoutException is thrown as 
> IllegalStateException
> -
>
> Key: KAFKA-14054
> URL: https://issues.apache.org/jira/browse/KAFKA-14054
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.1.0, 3.2.0, 3.1.1
>Reporter: Donald
>    Assignee: Matthias J. Sax
>Priority: Major
>
>  Re: 
> https://forum.confluent.io/t/bug-timeoutexception-is-thrown-as-illegalstateexception-causing-client-shutdown/5460/2
> 1) TimeoutException is thrown as IllegalStateException in 
> {_}org.apache.kafka.streams.processor.internals.StreamTask#commitNeeded{_}. 
> Which causes the client to shutdown in 
> {_}org.apache.kafka.streams.KafkaStreams#getActionForThrowable{_}.
> 2) Should Timeout be a recoverable error which is expected to be handled by 
> User?
> 3) This issue is exposed by change KAFKA-12887 which was introduced in 
> kafka-streams ver 3.1.0
> *code referenced*
> {code:java|title=org.apache.kafka.streams.processor.internals.StreamTask#commitNeeded}
> public boolean commitNeeded() {
> if (commitNeeded) {
> return true;
> } else {
> for (final Map.Entry entry : 
> consumedOffsets.entrySet()) {
> final TopicPartition partition = entry.getKey();
> try {
> final long offset = mainConsumer.position(partition);
> if (offset > entry.getValue() + 1) {
> commitNeeded = true;
> entry.setValue(offset - 1);
> }
> } catch (final TimeoutException error) {
> // the `consumer.position()` call should never block, 
> because we know that we did process data
> // for the requested partition and thus the consumer 
> should have a valid local position
> // that it can return immediately
> // hence, a `TimeoutException` indicates a bug and thus 
> we rethrow it as fatal `IllegalStateException`
> throw new IllegalStateException(error);
> } catch (final KafkaException fatal) {
> throw new StreamsException(fatal);
> }
> }
> return commitNeeded;
> }
> }
> {code}
> {code:java|title=org.apache.kafka.streams.KafkaStreams#getActionForThrowable}
> private StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse 
> getActionForThrowable(final Throwable throwable,
>   
>   final StreamsUncaughtExceptionHandler 
> streamsUncaughtExceptionHandler) {
> final StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse 
> action;
> if (wrappedExceptionIsIn(throwable, 
> EXCEPTIONS_NOT_TO_BE_HANDLED_BY_USERS)) {
> action = SHUTDOWN_CLIENT;
> } else {
> action = streamsUncaughtExceptionHandler.handle(throwable);
> }
> return action;
> }
> private void handleStreamsUncaughtException(final Throwable throwable,
> final 
> StreamsUncaughtExceptionHandler streamsUncaughtExceptionHandler,
> final boolean 
> skipThreadReplacement) {
> final StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse 
> action = getActionForThrowable(throwable, streamsUncaughtExceptionHandler);
> if (oldHandler) {
> log.warn("Stream's new uncaught exception handler is set as well 
> as the deprecated old handler." +
> "The old handler will be ignored as long as a new handler 
> is set.");
> }
> switch (action) {
> case REPLACE_THREAD:
> if (!skipThreadReplacement) {
> log.error("Replacing thread in the streams uncaught 
> exception handler", throwable);
> replaceStreamThread(throwable);
> } else {
> log.debug("Skipping thread replacement for recoverable 
> error");
> }
> break;
> case SHUTDOWN_CLIENT:
> log.error("En

[jira] [Reopened] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-14318:
-

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14318.
-
Resolution: Fixed

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14318.
-
Resolution: Fixed

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Reopened] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reopened KAFKA-14318:
-

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Assigned] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax reassigned KAFKA-14318:
---

Assignee: (was: A. Sophie Blee-Goldman)

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14318) KIP-878: Autoscaling for Statically Partitioned Streams



 [ 
https://issues.apache.org/jira/browse/KAFKA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14318:

Fix Version/s: (was: 3.5.0)

> KIP-878: Autoscaling for Statically Partitioned Streams
> ---
>
> Key: KAFKA-14318
> URL: https://issues.apache.org/jira/browse/KAFKA-14318
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: A. Sophie Blee-Goldman
>Assignee: A. Sophie Blee-Goldman
>Priority: Major
>  Labels: kip
>
> [KIP-878: Autoscaling for Statically Partitioned 
> Streams|https://cwiki.apache.org/confluence/display/KAFKA/KIP-878%3A+Autoscaling+for+Statically+Partitioned+Streams]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14491) Introduce Versioned Key-Value Stores to Kafka Streams

2023-04-05 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14491.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Introduce Versioned Key-Value Stores to Kafka Streams
> -
>
> Key: KAFKA-14491
> URL: https://issues.apache.org/jira/browse/KAFKA-14491
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
>  Labels: kip, streams
> Fix For: 3.5.0
>
>
> The key-value state stores used by Kafka Streams today maintain only the 
> latest value associated with each key. In order to support applications which 
> require access to older record versions, Kafka Streams should have versioned 
> state stores. Versioned state stores are similar to key-value stores except 
> they can store multiple record versions for a single key. An example use case 
> for versioned key-value stores is in providing proper temporal join semantics 
> for stream-tables joins with regards to out-of-order data.
> See KIP for more: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14491) Introduce Versioned Key-Value Stores to Kafka Streams

2023-04-05 Thread Matthias J. Sax (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-14491?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14491.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Introduce Versioned Key-Value Stores to Kafka Streams
> -
>
> Key: KAFKA-14491
> URL: https://issues.apache.org/jira/browse/KAFKA-14491
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
>  Labels: kip, streams
> Fix For: 3.5.0
>
>
> The key-value state stores used by Kafka Streams today maintain only the 
> latest value associated with each key. In order to support applications which 
> require access to older record versions, Kafka Streams should have versioned 
> state stores. Versioned state stores are similar to key-value stores except 
> they can store multiple record versions for a single key. An example use case 
> for versioned key-value stores is in providing proper temporal join semantics 
> for stream-tables joins with regards to out-of-order data.
> See KIP for more: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-889%3A+Versioned+State+Stores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14864) Memory leak in KStreamWindowAggregate with ON_WINDOW_CLOSE emit strategy



 [ 
https://issues.apache.org/jira/browse/KAFKA-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14864.
-
Fix Version/s: 3.4.1
   3.3.3
   Resolution: Fixed

> Memory leak in KStreamWindowAggregate with ON_WINDOW_CLOSE emit strategy
> 
>
> Key: KAFKA-14864
> URL: https://issues.apache.org/jira/browse/KAFKA-14864
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
> Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> The Streams DSL processor implementation for the ON_WINDOW_CLOSE emit 
> strategy during KStream windowed aggregations opens a key-value iterator but 
> does not call `close()` on it 
> ([link|https://github.com/apache/kafka/blob/5afedd9ac37c4d740f47867cfd31eaed15dc542f/streams/src/main/java/org/apache/kafka/streams/kstream/internals/AbstractKStreamTimeWindowAggregateProcessor.java#L203]),
>  despite the Javadocs for the iterator making clear that users must do so in 
> order to release resources 
> ([link|https://github.com/apache/kafka/blob/5afedd9ac37c4d740f47867cfd31eaed15dc542f/streams/src/main/java/org/apache/kafka/streams/state/KeyValueIterator.java#L27]).
>   
> I discovered this bug while running load testing benchmarks and noticed that 
> some runs were sporadically hitting OOMs, so it is definitely possible to hit 
> this in practice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Resolved] (KAFKA-14864) Memory leak in KStreamWindowAggregate with ON_WINDOW_CLOSE emit strategy



 [ 
https://issues.apache.org/jira/browse/KAFKA-14864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-14864.
-
Fix Version/s: 3.4.1
   3.3.3
   Resolution: Fixed

> Memory leak in KStreamWindowAggregate with ON_WINDOW_CLOSE emit strategy
> 
>
> Key: KAFKA-14864
> URL: https://issues.apache.org/jira/browse/KAFKA-14864
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.4.0, 3.3.2
>Reporter: Victoria Xia
>Assignee: Victoria Xia
>Priority: Major
> Fix For: 3.5.0, 3.4.1, 3.3.3
>
>
> The Streams DSL processor implementation for the ON_WINDOW_CLOSE emit 
> strategy during KStream windowed aggregations opens a key-value iterator but 
> does not call `close()` on it 
> ([link|https://github.com/apache/kafka/blob/5afedd9ac37c4d740f47867cfd31eaed15dc542f/streams/src/main/java/org/apache/kafka/streams/kstream/internals/AbstractKStreamTimeWindowAggregateProcessor.java#L203]),
>  despite the Javadocs for the iterator making clear that users must do so in 
> order to release resources 
> ([link|https://github.com/apache/kafka/blob/5afedd9ac37c4d740f47867cfd31eaed15dc542f/streams/src/main/java/org/apache/kafka/streams/state/KeyValueIterator.java#L27]).
>   
> I discovered this bug while running load testing benchmarks and noticed that 
> some runs were sporadically hitting OOMs, so it is definitely possible to hit 
> this in practice.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (KAFKA-14722) Make BooleanSerde public



[ 
https://issues.apache.org/jira/browse/KAFKA-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17708211#comment-17708211
 ] 

Matthias J. Sax commented on KAFKA-14722:
-

The docs PRs was not merged yet – thus the work is not yet completed. We keep 
Jiras open as reminders about this. Docs are as important as the feature itself.

> Make BooleanSerde public
> 
>
> Key: KAFKA-14722
> URL: https://issues.apache.org/jira/browse/KAFKA-14722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Spacrocket
>Priority: Minor
>  Labels: beginner, kip, newbie
>
> KIP-907: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-907%3A+Add+Boolean+Serde+to+public+interface]
>  
> We introduce a "BooleanSerde" via 
> [https://github.com/apache/kafka/pull/13249] as internal class. We could make 
> it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14722) Make BooleanSerde public



 [ 
https://issues.apache.org/jira/browse/KAFKA-14722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax updated KAFKA-14722:

Fix Version/s: 3.5.0

> Make BooleanSerde public
> 
>
> Key: KAFKA-14722
> URL: https://issues.apache.org/jira/browse/KAFKA-14722
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>            Reporter: Matthias J. Sax
>Assignee: Spacrocket
>Priority: Minor
>  Labels: beginner, kip, newbie
> Fix For: 3.5.0
>
>
> KIP-907: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-907%3A+Add+Boolean+Serde+to+public+interface]
>  
> We introduce a "BooleanSerde" via 
> [https://github.com/apache/kafka/pull/13249] as internal class. We could make 
> it public.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (KAFKA-14864) Memory leak in KStreamWindowAggregate with ON_WINDOW_CLOSE emit strategy