date:20220323

[jira] [Created] (KAFKA-13767) consumer fetch request does not need to put to delayFetch queue when the preferred read replica not local

2022-03-23 Thread zhaobo (Jira)

zhaobo created KAFKA-13767:
--

 Summary: consumer fetch request does not need to put to  
delayFetch queue when the preferred read replica not local 
 Key: KAFKA-13767
 URL: https://issues.apache.org/jira/browse/KAFKA-13767
 Project: Kafka
  Issue Type: Improvement
  Components: core
Affects Versions: 3.3.0
Reporter: zhaobo
 Fix For: 3.3.0






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Dongjin Lee

No. It is why I have no firm position. I will follow the community's
decision.

Thanks,
Dongjin

On Thu, Mar 24, 2022 at 12:34 AM Ismael Juma  wrote:

> Hi Dongjin,
>
> We really appreciate the super valuable work you've been doing here. Do we
> have evidence that customers don't use custom filters/layouts?
>
> Ismael
>
> On Wed, Mar 23, 2022 at 7:53 AM Dongjin Lee  wrote:
>
> > Hi Mikael, Edoardo and Ismael,
> >
> > Sorry for being late. Frankly, I thought KIP-653 is not a breaking change
> > since (as Edoardo stated) unless the user uses custom filters or layouts,
> > log4j-1.2-api.jar 'bridge' jar can handle the cases. It is why the
> > 'Compatibility, Deprecation, and Migration Plan' section of the document
> is
> > so brief. (As far as I know, it is such a rare case, and I thought it
> would
> > not be so problematic.)
> >
> > I have no firm position on the release plan of this feature. Regardless
> of
> > whether the community decides to put off the adoption of log4j2 to 4.0, I
> > will maintain the PR and the preview releases up-to-date as far as
> possible
> > -  although it can be significantly late for my main job. The decision is
> > totally up to the community - No matter how it is decided, I will follow.
> >
> > Best,
> > Dongjin
> >
> > p.s. @Edoardo I'm HIM. (HAHA)
> >
> > On Wed, Mar 23, 2022 at 10:12 PM Ismael Juma  wrote:
> >
> > > Hi Mickael,
> > >
> > > Thanks for your feedback. I agree with the importance of fixing the
> CVEs
> > > and also of not breaking compatibility in a critical layer. Regarding
> > > Apache Kafka 4.0, you suggested it would include:
> > >
> > > - log4j2 migration
> > > - idempotency enablement cleanups
> > > - removal of Java 8 and Scala 2.12 support
> > > - removal of MirrorMaker1
> > >
> > > It's too soon to remove Java 8/Scala 2.12 support, so I don't think
> that
> > > would work. The other things hardly justify a major release so soon.
> Have
> > > we considered adjusting the existing log4j 2 PR so that both libraries
> > > versions are supported for a period of time? Since reload4j doesn't
> have
> > > the CVEs, this would be acceptable and would avoid a premature 4.0
> > release.
> > > I expect 4.0 to be the release after 3.4 or 3.5 given where we are
> right
> > > now.
> > >
> > > Ismael
> > >
> > > On Wed, Mar 23, 2022 at 4:43 AM Mickael Maison <
> mickael.mai...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi Ismael,
> > > >
> > > > About 2)
> > > > We can't keep shipping new releases with dependencies that have CVEs.
> > > > This is negatively impacting the project and eroding the hard earned
> > > > trust we have from our users. Kafka is known to be a robust, reliable
> > > > and up to date project.
> > > >
> > > > With that in mind, and since clearly at this point we're not going to
> > > > update to log4j2 in 3.2.0, I too would be in favor of tactically
> > > > adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
> > > > any known CVEs and surely make the life of many users better!
> > > >
> > > > Now regarding log4j2. I still consider there's value in adopting
> > > > log4j2 (Apache project, plugin ecosystem, reconfiguration support)
> and
> > > > I'd like to see it happen as soon as possible. If unfortunately there
> > > > are compatibility issues, I agree that we can't force breakage in a
> > > > minor release. We've always put a lot of attention into preserving
> > > > compatibility, we should not suddenly stop doing it. So it makes
> sense
> > > > to wait for the next major release.
> > > >
> > > > Currently in many minds, 4.0 is kind of associated with the removal
> of
> > > > ZooKeeper. At this stage, it's still unclear when this will be ready
> > > > and even if I'm optimistic it's still at the very least 6 to 9 months
> > > > away. The code changes to migrate to log4j2 are not trivial and
> > > > there's certainly a high cost in maintaining then outside of trunk
> for
> > > > many months. Dongjin has done a stellar work so far in regularly
> > > > updating his PRs since this KIP was started back in 2020, but we
> can't
> > > > ask him to just keep doing it for another unknown amount of time.
> > > >
> > > > What about if the next release is 4.0? Even if it's light on
> features,
> > > > it would enable us to do quite a few cleanups and migrate to log4j2.
> > > > Then the removal of ZooKeeper can happen in a future major release
> > > > when it's ready.
> > > >
> > > > 4.0 would include:
> > > > - log4j2 migration
> > > > - idempotency enablement cleanups
> > > > - removal of Java 8 and Scala 2.12 support
> > > > - removal of MirrorMaker1
> > > >
> > > > So I propose to adopt reload4j in Kafka 3.2 and make the next release
> > > > 4.0. Let me know what you think.
> > > >
> > > > Thanks,
> > > > Mickael
> > > >
> > > >
> > > >
> > > > On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma 
> wrote:
> > > > >
> > > > > Hi Edoardo,
> > > > >
> > > > > Thanks for the information. That's definitely useful. A couple of
> > > > questions
> > >

[VOTE] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Hao Li

Hi all,

I'd like to start a vote on Kafka Streams KIP-825:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-825%3A+introduce+a+new+API+to+control+when+aggregated+results+are+produced

[jira] [Resolved] (KAFKA-13672) Race condition in DynamicBrokerConfig

2022-03-23 Thread Luke Chen (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13672.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

> Race condition in DynamicBrokerConfig
> -
>
> Key: KAFKA-13672
> URL: https://issues.apache.org/jira/browse/KAFKA-13672
> Project: Kafka
>  Issue Type: Bug
>Reporter: Bruno Cadonna
>Assignee: Liam Clarke-Hutchinson
>Priority: Blocker
> Fix For: 3.2.0, 3.3.0
>
>
> Stacktrace:
> {code:java}
> org.opentest4j.AssertionFailedError: expected: <2> but was: <1>
>   at 
> app//org.junit.jupiter.api.AssertionUtils.fail(AssertionUtils.java:55)
>   at 
> app//org.junit.jupiter.api.AssertionUtils.failNotEqual(AssertionUtils.java:62)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
>   at 
> app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
>   at 
> app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1141)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.$anonfun$waitForConfigOnServer$1(DynamicBrokerReconfigurationTest.scala:1500)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.waitForConfigOnServer(DynamicBrokerReconfigurationTest.scala:1500)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.$anonfun$waitForConfig$1(DynamicBrokerReconfigurationTest.scala:1495)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.$anonfun$waitForConfig$1$adapted(DynamicBrokerReconfigurationTest.scala:1495)
>   at app//scala.collection.IterableOnceOps.foreach(IterableOnce.scala:563)
>   at 
> app//scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:561)
>   at app//scala.collection.AbstractIterable.foreach(Iterable.scala:926)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.waitForConfig(DynamicBrokerReconfigurationTest.scala:1495)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.reconfigureServers(DynamicBrokerReconfigurationTest.scala:1440)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:775)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:768)
>   at app//scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> app//kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:784)
> {code}
> Job: 
> https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-11751/5/testReport/



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Hao Li

Hi all,

I just updated the KIP with option 1 as design and put option 2 and 3 in
rejected alternatives. Since Matthias is strongly against `trigger`,
I adopted the proposed `EmitStrategy` and dropped the "with" in the
function name. So it's like this:

stream
  .groupBy(..)
  .windowedBy(..)
  .emitStrategy(EmitStrategy.onWindowClose())
  .aggregate(..)
  .mapValues(..)

I used `onWindowClose` since `EmitStrategy` is meant to be an interface.

Hao

On Wed, Mar 23, 2022 at 6:35 PM Matthias J. Sax  wrote:

> Wow. Quite a thread... #namingIsHard :D
>
> I won't repeat all arguments which are all very good ones. I can just
> state my personal favorite option:
>
> stream
>   .groupBy(..)
>   .windowedBy(..)
>   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
>   .aggregate(..)
>   .mapValues(..)
>
> Is seems to be the best compromise / trade-off across the board.
> Personally, I would strong advocate against using `trigger()`!
>
>
> -Matthias
>
>
> On 3/23/22 4:38 PM, Guozhang Wang wrote:
> > Hao is right, I think that's the hindsight we have for `suppress` which
> > since can be applied anywhere for a K(windowed)Table, incurs an awkward
> > programming flexibility and I felt it's better to make its application
> > scope more constraint.
> >
> > And I also agree with John that, unless any of us feel strongly about any
> > options, Hao could make the final call about the namings.
> >
> >
> > Guozhang
> >
> > On Wed, Mar 23, 2022 at 1:49 PM Hao Li  wrote:
> >
> >> For
> >>
> >> stream
> >>.groupBy(..)
> >>.windowedBy(..)
> >>.aggregate(..)
> >>.withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
> >>.mapValues(..)
> >>
> >> I think after `aggregate` it's already a table and then the emit
> strategy
> >> is too late to control
> >> how windowed stream is outputted to table. This is the concern Guozhang
> >> raised about having this in existing `suppress` operator as well.
> >>
> >> Thanks,
> >> Hao
> >>
> >> On Wed, Mar 23, 2022 at 1:05 PM Bruno Cadonna 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> Thank you for your answers to my questions!
> >>>
> >>> I see the argument about conciseness of configuring a stream with
> >>> methods instead of config objects. I just miss a bit the descriptive
> >>> aspect.
> >>>
> >>> What about
> >>>
> >>> stream
> >>>.groupBy(..)
> >>>.windowedBy(..)
> >>>.withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
> >>>.aggregate(..)
> >>>.mapValues(..)
> >>>
> >>> I have also another question. Why should emitting of results be
> >>> controlled by the window level api? If I want to emit results for each
> >>> input record the emit strategy is quite independent from the window. So
> >>> I somehow share Matthias' and Guozhang's concern that the emit strategy
> >>> seems misplaced there.
> >>>
> >>> What are the arguments against?
> >>>
> >>> stream
> >>>.groupBy(..)
> >>>.windowedBy(..)
> >>>.aggregate(..)
> >>>.withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
> >>>.mapValues(..)
> >>>
> >>>
> >>> A final administrative request: Hao, could you please add the rejected
> >>> alternatives to the KIP so that future us will know why we rejected
> them?
> >>>
> >>> Best,
> >>> Bruno
> >>>
> >>> On 23.03.22 19:38, John Roesler wrote:
>  Hi all,
> 
>  I can see both sides of this.
> 
>  On one hand, when we say
>  "stream.groupBy().windowBy().count()", it seems like we're
>  telling KS to take the raw stream, group it based on key,
>  then window it based on time, and then compute an
>  aggregation on the windows. In that model, "trigger()" would
>  have to mean something like "trigger it", which doesn't
>  really make sense, since we aren't "triggering" the
>  aggregation (then again, to an outside observer, it would
>  appear that way... food for thought).
> 
>  Another way to look at it is that all we're really doing is
>  configuring a windowed aggreation on the stream, and we're
>  doing it with a progressive builder interface. In other
>  words, the above is just a progressive builder for
>  configuring an operation like
>  "stream.aggregate(groupingConfig, windowingConfig,
>  countFn)". Under the latter interpretation of the DSL, it
>  makes perfect sense to add more optional progressive builder
>  methods like trigger() to the WindowedKStream interfaces.
> 
>  Since part of the motivation for choosing the word "trigger"
>  here is to stay close to what Flink defines, I'll also point
>  out that Flink's syntax is also
>  "stream.keyBy().window().trigger().aggregate()". Not that
>  their API is the holy grail or anything, but it's at least
>  an indication that this API isn't a horrible mistake.
> 
>  All other things being equal, I also prefer to leave tie-
>  breakers in the hands of the contributor. So, if we've all
> >>>

[jira] [Resolved] (KAFKA-13714) Flaky test IQv2StoreIntegrationTest

2022-03-23 Thread John Roesler (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

John Roesler resolved KAFKA-13714.
--
Fix Version/s: 3.2.0
 Assignee: John Roesler
   Resolution: Fixed

> Flaky test IQv2StoreIntegrationTest
> ---
>
> Key: KAFKA-13714
> URL: https://issues.apache.org/jira/browse/KAFKA-13714
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 3.2.0
>Reporter: John Roesler
>Assignee: John Roesler
>Priority: Blocker
> Fix For: 3.2.0
>
>
> I have observed multiple consistency violations in the 
> IQv2StoreIntegrationTest. Since this is the first release of IQv2, and it's 
> apparently a major flaw in the feature, we should not release with this bug 
> outstanding. Depending on the time-table, we may want to block the release or 
> pull the feature until the next release.
>  
> The first observation I have is from 23 Feb 2022. So far all observations 
> point to the range query in particular, and all observations have been for 
> RocksDB stores, including RocksDBStore, TimestampedRocksDBStore, and the 
> windowed store built on RocksDB segments.
> For reference, range queries were implemented on 16 Feb 2022: 
> [https://github.com/apache/kafka/commit/b38f6ba5cc989702180f5d5f8e55ba20444ea884]
> The window-specific range query test has also failed once that I have seen. 
> That feature was implemented on 2 Jan 2022: 
> [https://github.com/apache/kafka/commit/b8f1cf14c396ab04b8968a8fa04d8cf67dd3254c]
>  
> Here are some stack traces I have seen:
> {code:java}
> verifyStore[cache=true, log=true, supplier=ROCKS_KV, kind=PAPI]
> java.lang.AssertionError: 
> Expected: is <[1, 2, 3]>
>  but: was <[1, 2]>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQuery(IQv2StoreIntegrationTest.java:1125)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQueries(IQv2StoreIntegrationTest.java:803)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.verifyStore(IQv2StoreIntegrationTest.java:776)
>  {code}
> {code:java}
> verifyStore[cache=true, log=true, supplier=TIME_ROCKS_KV, kind=PAPI]
> java.lang.AssertionError: 
> Expected: is <[1, 2, 3]>
>  but: was <[1, 3]>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:6)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQuery(IQv2StoreIntegrationTest.java:1131)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQueries(IQv2StoreIntegrationTest.java:809)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.verifyStore(IQv2StoreIntegrationTest.java:778)
>{code}
> {code:java}
> verifyStore[cache=true, log=true, supplier=ROCKS_KV, kind=PAPI]
> java.lang.AssertionError: 
> Result:StateQueryResult{partitionResults={0=SucceededQueryResult{result=org.apache.kafka.streams.state.internals.MeteredKeyValueStore$MeteredKeyValueTimestampedIterator@35025a0a,
>  executionInfo=[], position=Position{position={input-topic={0=1, 
> 1=SucceededQueryResult{result=org.apache.kafka.streams.state.internals.MeteredKeyValueStore$MeteredKeyValueTimestampedIterator@38732364,
>  executionInfo=[], position=Position{position={input-topic={1=1}, 
> globalResult=null}
> Expected: is <[1, 2, 3]>
>  but: was <[1, 2]>
>   at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQuery(IQv2StoreIntegrationTest.java:1129)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.shouldHandleRangeQueries(IQv2StoreIntegrationTest.java:807)
>   at 
> org.apache.kafka.streams.integration.IQv2StoreIntegrationTest.verifyStore(IQv2StoreIntegrationTest.java:780)
>  {code}
> {code:java}
> verifyStore[cache=true, log=false, supplier=ROCKS_WINDOW, kind=DSL] 
>     java.lang.AssertionError: 
> Result:StateQueryResult{partitionResults={0=SucceededQueryResult{result=org.apache.kafka.streams.state.internals.MeteredWindowedKeyValueIterator@2a32fb6,
>  executionInfo=[], position=Position{position={input-topic={0=1, 
> 1=SucceededQueryResult{result=org.apache.kafka.streams.state.internals.MeteredWindowedKeyValueIterator@6107165,
>  executionInfo=[], position=Position{position={input-topic={1=1}, 
> globalResult=null}
>     Expected: is <[0, 1, 2, 3]> 
>          but: was <[0, 2, 3]>
>         at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20)
>

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

2022-03-23 Thread Luke Chen

Hi Artem,

Thanks for the KIP and the patience during discussion!
+1 binding from me.

Luke

On Thu, Mar 24, 2022 at 3:43 AM Ismael Juma  wrote:

> Thanks for the KIP and for taking the time to address all the feedback. +1
> (binding)
>
> Ismael
>
> On Mon, Mar 21, 2022 at 4:52 PM Artem Livshits
>  wrote:
>
> > Hi all,
> >
> > I'd like to start a vote on
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
> > .
> >
> > -Artem
> >
>

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Matthias J. Sax


Wow. Quite a thread... #namingIsHard :D

I won't repeat all arguments which are all very good ones. I can just 
state my personal favorite option:


stream
 .groupBy(..)
 .windowedBy(..)
 .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
 .aggregate(..)
 .mapValues(..)

Is seems to be the best compromise / trade-off across the board. 
Personally, I would strong advocate against using `trigger()`!



-Matthias


On 3/23/22 4:38 PM, Guozhang Wang wrote:

Hao is right, I think that's the hindsight we have for `suppress` which
since can be applied anywhere for a K(windowed)Table, incurs an awkward
programming flexibility and I felt it's better to make its application
scope more constraint.

And I also agree with John that, unless any of us feel strongly about any
options, Hao could make the final call about the namings.


Guozhang

On Wed, Mar 23, 2022 at 1:49 PM Hao Li  wrote:


For

stream
   .groupBy(..)
   .windowedBy(..)
   .aggregate(..)
   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
   .mapValues(..)

I think after `aggregate` it's already a table and then the emit strategy
is too late to control
how windowed stream is outputted to table. This is the concern Guozhang
raised about having this in existing `suppress` operator as well.

Thanks,
Hao

On Wed, Mar 23, 2022 at 1:05 PM Bruno Cadonna  wrote:


Hi,

Thank you for your answers to my questions!

I see the argument about conciseness of configuring a stream with
methods instead of config objects. I just miss a bit the descriptive
aspect.

What about

stream
   .groupBy(..)
   .windowedBy(..)
   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
   .aggregate(..)
   .mapValues(..)

I have also another question. Why should emitting of results be
controlled by the window level api? If I want to emit results for each
input record the emit strategy is quite independent from the window. So
I somehow share Matthias' and Guozhang's concern that the emit strategy
seems misplaced there.

What are the arguments against?

stream
   .groupBy(..)
   .windowedBy(..)
   .aggregate(..)
   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
   .mapValues(..)


A final administrative request: Hao, could you please add the rejected
alternatives to the KIP so that future us will know why we rejected them?

Best,
Bruno

On 23.03.22 19:38, John Roesler wrote:

Hi all,

I can see both sides of this.

On one hand, when we say
"stream.groupBy().windowBy().count()", it seems like we're
telling KS to take the raw stream, group it based on key,
then window it based on time, and then compute an
aggregation on the windows. In that model, "trigger()" would
have to mean something like "trigger it", which doesn't
really make sense, since we aren't "triggering" the
aggregation (then again, to an outside observer, it would
appear that way... food for thought).

Another way to look at it is that all we're really doing is
configuring a windowed aggreation on the stream, and we're
doing it with a progressive builder interface. In other
words, the above is just a progressive builder for
configuring an operation like
"stream.aggregate(groupingConfig, windowingConfig,
countFn)". Under the latter interpretation of the DSL, it
makes perfect sense to add more optional progressive builder
methods like trigger() to the WindowedKStream interfaces.

Since part of the motivation for choosing the word "trigger"
here is to stay close to what Flink defines, I'll also point
out that Flink's syntax is also
"stream.keyBy().window().trigger().aggregate()". Not that
their API is the holy grail or anything, but it's at least
an indication that this API isn't a horrible mistake.

All other things being equal, I also prefer to leave tie-
breakers in the hands of the contributor. So, if we've all
said our piece and Hao still prefers option 1, then (as long
as we don't think it's a horrible mistake), I think we
should just let him go for it.

Speaking of which, after reviewing the responses regarding
deprecating `Suppressed#onWindowClose`, I still think we
should just go ahead and deprecate it. Although it's not
expressed exactly the same way, it still does exactly the
same thing, or so close that it seems confusing to keep
both. But again, if Hao really prefers to keep both, I won't
insist on it :)

Thanks all,
-John

On Wed, 2022-03-23 at 09:59 -0700, Hao Li wrote:

Thanks Bruno!

Argument for option 1 is:
1. Concise and descriptive. It avoids overloading existing functions

and

it's very clear what it's doing. Imagine if there's a autocomplete

feature

in Intellij or other IDE for our DSL in the future, it's not favorable

to

show 6 `windowedBy` functions.
2. Option 1 is operated on `windowedStream` to configure how it should

be

outputted. Option 2 operates on `KGroupedStream` to produce
`windowedStream` as well as configure how `windowedStream` should be
  outputted. I feel it's better to have a `windowedStream` and then
conf

[jira] [Created] (KAFKA-13766) Use `max.poll.interval.ms` as the timeout during complete-rebalance phase

2022-03-23 Thread Guozhang Wang (Jira)

Guozhang Wang created KAFKA-13766:
-

 Summary: Use `max.poll.interval.ms` as the timeout during 
complete-rebalance phase
 Key: KAFKA-13766
 URL: https://issues.apache.org/jira/browse/KAFKA-13766
 Project: Kafka
  Issue Type: Bug
  Components: core
Reporter: Guozhang Wang


The lifetime of a consumer can be categorized in three phases:

1) During normal processing, the broker expects a hb request periodically from 
consumer, and that is timed by the `session.timeout.ms`.

2) During the prepare_rebalance, the broker would expect a join-group request 
to be received within the rebalance.timeout, which is piggy-backed as the 
`max.poll.interval.ms`.

3) During the complete_rebalance, the broker would expect a sync-group request 
to be received again within the `session.timeout.ms`.

So during different phases of the life of the consumer, different timeout would 
be used to bound the timer.

Nowadays with cooperative rebalance protocol, we can still return records and 
process them in the middle of a rebalance from {{consumer.poll}}. In that case, 
for phase 3) we should also use the `max.poll.interval.ms` to bound the timer, 
which is in practice larger than `session.timeout.ms`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (KAFKA-13765) Describe-consumer admin should not return unstable membership information

2022-03-23 Thread Guozhang Wang (Jira)

Guozhang Wang created KAFKA-13765:
-

 Summary: Describe-consumer admin should not return unstable 
membership information
 Key: KAFKA-13765
 URL: https://issues.apache.org/jira/browse/KAFKA-13765
 Project: Kafka
  Issue Type: Bug
  Components: admin
Reporter: Guozhang Wang


When a consumer group is in the “prepare-rebalance” phase, it's unclear if all 
its currently registered members would still be re-joining in the new 
generation or not, in this case, if we simply return the current members map to 
the describe-consumer request it may be misleading as users would be getting 
spurious results that may contain those dropping or even zombie consumers.

So I think during the prepare-rebalance phase, we should either only return 
members who's join-group requests have already been received, OR we simply 
return the response with no members and indicate that via prepare-rebalance 
state the membership info is unstable and hence won't be returned.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.0 #191

2022-03-23 Thread Apache Jenkins Server

See

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.1 #95

2022-03-23 Thread Apache Jenkins Server

See 


Changes:


--
[...truncated 432983 lines...]
[2022-03-24T00:39:39.079Z] > Task :metadata:testClasses UP-TO-DATE
[2022-03-24T00:39:39.079Z] > Task 
:clients:generateMetadataFileForMavenJavaPublication
[2022-03-24T00:39:39.079Z] > Task 
:clients:generatePomFileForMavenJavaPublication
[2022-03-24T00:39:39.079Z] 
[2022-03-24T00:39:39.079Z] > Task :streams:processMessages
[2022-03-24T00:39:39.079Z] Execution optimizations have been disabled for task 
':streams:processMessages' to ensure correctness due to the following reasons:
[2022-03-24T00:39:39.079Z]   - Gradle detected a problem with the following 
location: 
'/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.1/streams/src/generated/java/org/apache/kafka/streams/internals/generated'.
 Reason: Task ':streams:srcJar' uses this output of task 
':streams:processMessages' without declaring an explicit or implicit 
dependency. This can lead to incorrect results being produced, depending on 
what order the tasks are executed. Please refer to 
https://docs.gradle.org/7.2/userguide/validation_problems.html#implicit_dependency
 for more details about this problem.
[2022-03-24T00:39:39.079Z] MessageGenerator: processed 1 Kafka message JSON 
files(s).
[2022-03-24T00:39:39.079Z] 
[2022-03-24T00:39:39.079Z] > Task :streams:compileJava UP-TO-DATE
[2022-03-24T00:39:39.079Z] > Task :streams:classes UP-TO-DATE
[2022-03-24T00:39:39.079Z] > Task :streams:test-utils:compileJava UP-TO-DATE
[2022-03-24T00:39:39.079Z] > Task :streams:copyDependantLibs
[2022-03-24T00:39:39.079Z] > Task :streams:jar UP-TO-DATE
[2022-03-24T00:39:40.241Z] > Task 
:streams:generateMetadataFileForMavenJavaPublication
[2022-03-24T00:39:42.356Z] > Task :connect:api:javadoc
[2022-03-24T00:39:42.356Z] > Task :connect:api:copyDependantLibs UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task :connect:api:jar UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task 
:connect:api:generateMetadataFileForMavenJavaPublication
[2022-03-24T00:39:42.356Z] > Task :connect:json:copyDependantLibs UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task :connect:json:jar UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task 
:connect:json:generateMetadataFileForMavenJavaPublication
[2022-03-24T00:39:42.356Z] > Task :connect:api:javadocJar
[2022-03-24T00:39:42.356Z] > Task :connect:api:compileTestJava UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task :connect:api:testClasses UP-TO-DATE
[2022-03-24T00:39:42.356Z] > Task :connect:api:testJar
[2022-03-24T00:39:42.356Z] > Task 
:connect:json:publishMavenJavaPublicationToMavenLocal
[2022-03-24T00:39:42.356Z] > Task :connect:json:publishToMavenLocal
[2022-03-24T00:39:43.302Z] > Task :connect:api:testSrcJar
[2022-03-24T00:39:43.302Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2022-03-24T00:39:43.302Z] > Task :connect:api:publishToMavenLocal
[2022-03-24T00:39:46.237Z] > Task :streams:javadoc
[2022-03-24T00:39:46.237Z] > Task :streams:javadocJar
[2022-03-24T00:39:47.182Z] 
[2022-03-24T00:39:47.182Z] > Task :clients:javadoc
[2022-03-24T00:39:47.182Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.1/clients/src/main/java/org/apache/kafka/common/security/oauthbearer/secured/OAuthBearerLoginCallbackHandler.java:147:
 warning - Tag @link: reference not found: 
[2022-03-24T00:39:47.182Z] 1 warning
[2022-03-24T00:39:48.129Z] 
[2022-03-24T00:39:48.129Z] > Task :clients:javadocJar
[2022-03-24T00:39:49.171Z] 
[2022-03-24T00:39:49.171Z] > Task :clients:srcJar
[2022-03-24T00:39:49.171Z] Execution optimizations have been disabled for task 
':clients:srcJar' to ensure correctness due to the following reasons:
[2022-03-24T00:39:49.171Z]   - Gradle detected a problem with the following 
location: 
'/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.1/clients/src/generated/java'.
 Reason: Task ':clients:srcJar' uses this output of task 
':clients:processMessages' without declaring an explicit or implicit 
dependency. This can lead to incorrect results being produced, depending on 
what order the tasks are executed. Please refer to 
https://docs.gradle.org/7.2/userguide/validation_problems.html#implicit_dependency
 for more details about this problem.
[2022-03-24T00:39:49.171Z] 
[2022-03-24T00:39:49.171Z] > Task :clients:testJar
[2022-03-24T00:39:50.116Z] > Task :clients:testSrcJar
[2022-03-24T00:39:50.116Z] > Task 
:clients:publishMavenJavaPublicationToMavenLocal
[2022-03-24T00:39:50.116Z] > Task :clients:publishToMavenLocal
[2022-03-24T00:40:08.648Z] > Task :core:compileScala
[2022-03-24T00:41:15.866Z] > Task :core:classes
[2022-03-24T00:41:15.866Z] > Task :core:compileTestJava NO-SOURCE
[2022-03-24T00:41:38.102Z] > Task :core:compileTestScala
[2022-03-24T00:42:19.585Z] > Task :core:testClasses
[2022-03-24T00:42:33.513Z] > Task :streams:compileTestJava
[2022-03-24T00:42:33.513Z] > Task :streams:testClasses
[2022-03-24T00:42:33.513Z] > Task :streams:testJar
[2022-03-24T00:42:33.513Z] > Task

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Guozhang Wang

Hao is right, I think that's the hindsight we have for `suppress` which
since can be applied anywhere for a K(windowed)Table, incurs an awkward
programming flexibility and I felt it's better to make its application
scope more constraint.

And I also agree with John that, unless any of us feel strongly about any
options, Hao could make the final call about the namings.


Guozhang

On Wed, Mar 23, 2022 at 1:49 PM Hao Li  wrote:

> For
>
> stream
>   .groupBy(..)
>   .windowedBy(..)
>   .aggregate(..)
>   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
>   .mapValues(..)
>
> I think after `aggregate` it's already a table and then the emit strategy
> is too late to control
> how windowed stream is outputted to table. This is the concern Guozhang
> raised about having this in existing `suppress` operator as well.
>
> Thanks,
> Hao
>
> On Wed, Mar 23, 2022 at 1:05 PM Bruno Cadonna  wrote:
>
> > Hi,
> >
> > Thank you for your answers to my questions!
> >
> > I see the argument about conciseness of configuring a stream with
> > methods instead of config objects. I just miss a bit the descriptive
> > aspect.
> >
> > What about
> >
> > stream
> >   .groupBy(..)
> >   .windowedBy(..)
> >   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
> >   .aggregate(..)
> >   .mapValues(..)
> >
> > I have also another question. Why should emitting of results be
> > controlled by the window level api? If I want to emit results for each
> > input record the emit strategy is quite independent from the window. So
> > I somehow share Matthias' and Guozhang's concern that the emit strategy
> > seems misplaced there.
> >
> > What are the arguments against?
> >
> > stream
> >   .groupBy(..)
> >   .windowedBy(..)
> >   .aggregate(..)
> >   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
> >   .mapValues(..)
> >
> >
> > A final administrative request: Hao, could you please add the rejected
> > alternatives to the KIP so that future us will know why we rejected them?
> >
> > Best,
> > Bruno
> >
> > On 23.03.22 19:38, John Roesler wrote:
> > > Hi all,
> > >
> > > I can see both sides of this.
> > >
> > > On one hand, when we say
> > > "stream.groupBy().windowBy().count()", it seems like we're
> > > telling KS to take the raw stream, group it based on key,
> > > then window it based on time, and then compute an
> > > aggregation on the windows. In that model, "trigger()" would
> > > have to mean something like "trigger it", which doesn't
> > > really make sense, since we aren't "triggering" the
> > > aggregation (then again, to an outside observer, it would
> > > appear that way... food for thought).
> > >
> > > Another way to look at it is that all we're really doing is
> > > configuring a windowed aggreation on the stream, and we're
> > > doing it with a progressive builder interface. In other
> > > words, the above is just a progressive builder for
> > > configuring an operation like
> > > "stream.aggregate(groupingConfig, windowingConfig,
> > > countFn)". Under the latter interpretation of the DSL, it
> > > makes perfect sense to add more optional progressive builder
> > > methods like trigger() to the WindowedKStream interfaces.
> > >
> > > Since part of the motivation for choosing the word "trigger"
> > > here is to stay close to what Flink defines, I'll also point
> > > out that Flink's syntax is also
> > > "stream.keyBy().window().trigger().aggregate()". Not that
> > > their API is the holy grail or anything, but it's at least
> > > an indication that this API isn't a horrible mistake.
> > >
> > > All other things being equal, I also prefer to leave tie-
> > > breakers in the hands of the contributor. So, if we've all
> > > said our piece and Hao still prefers option 1, then (as long
> > > as we don't think it's a horrible mistake), I think we
> > > should just let him go for it.
> > >
> > > Speaking of which, after reviewing the responses regarding
> > > deprecating `Suppressed#onWindowClose`, I still think we
> > > should just go ahead and deprecate it. Although it's not
> > > expressed exactly the same way, it still does exactly the
> > > same thing, or so close that it seems confusing to keep
> > > both. But again, if Hao really prefers to keep both, I won't
> > > insist on it :)
> > >
> > > Thanks all,
> > > -John
> > >
> > > On Wed, 2022-03-23 at 09:59 -0700, Hao Li wrote:
> > >> Thanks Bruno!
> > >>
> > >> Argument for option 1 is:
> > >> 1. Concise and descriptive. It avoids overloading existing functions
> and
> > >> it's very clear what it's doing. Imagine if there's a autocomplete
> > feature
> > >> in Intellij or other IDE for our DSL in the future, it's not favorable
> > to
> > >> show 6 `windowedBy` functions.
> > >> 2. Option 1 is operated on `windowedStream` to configure how it should
> > be
> > >> outputted. Option 2 operates on `KGroupedStream` to produce
> > >> `windowedStream` as well as configure how `windowedStream` should be
> > >>  outputted. I fee

[jira] [Resolved] (KAFKA-13759) Disable producer idempotence by default in producers instantiated by Connect

2022-03-23 Thread Konstantine Karantasis (Jira)



 [ 
https://issues.apache.org/jira/browse/KAFKA-13759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantine Karantasis resolved KAFKA-13759.

Resolution: Fixed

> Disable producer idempotence by default in producers instantiated by Connect
> 
>
> Key: KAFKA-13759
> URL: https://issues.apache.org/jira/browse/KAFKA-13759
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Konstantine Karantasis
>Assignee: Konstantine Karantasis
>Priority: Major
> Fix For: 3.2.0, 3.1.1, 3.0.2
>
>
> https://issues.apache.org/jira/browse/KAFKA-7077 was merged recently 
> referring to KIP-318. Before that in AK 3.0 idempotence was enabled by 
> default across Kafka producers. 
> However, some compatibility implications were missed in both cases. 
> If idempotence is enabled by default Connect won't be able to communicate via 
> its producers with Kafka brokers older than version 0.11. Perhaps more 
> importantly, for brokers older than version 2.8 the {{IDEMPOTENT_WRITE}} ACL 
> is required to be granted to the principal of the Connect worker. 
> Given the above caveats, this ticket proposes to explicitly disable producer 
> idempotence in Connect by default. This feature, as it happens today, can be 
> enabled by setting worker and/or connector properties. However, enabling it 
> by default should be considered in a major version upgrade and after KIP-318 
> is updated to mention the compatibility requirements and gets officially 
> approved. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (KAFKA-13764) Potential improvements for Connect incremental rebalancing logic

2022-03-23 Thread Chris Egerton (Jira)

Chris Egerton created KAFKA-13764:
-

 Summary: Potential improvements for Connect incremental 
rebalancing logic
 Key: KAFKA-13764
 URL: https://issues.apache.org/jira/browse/KAFKA-13764
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Chris Egerton
Assignee: Chris Egerton


There are a few small changes that we might make to the incremental rebalancing 
logic for Kafka Connect to improve distribution of connectors and tasks across 
a cluster and address potential bugs:
 # During assignment, assign new connectors and tasks across the cluster before 
calculating revocations that may be necessary in order to balance the cluster. 
This way, we can potentially skip a round of revocation by using newly-created 
connectors and tasks to balance out the cluster.
 # Perform connector and task revocation in more cases, such as when one or 
more connectors are reconfigured to use fewer tasks, which can possibly lead to 
an imbalanced cluster.
 # Fix [this 
line|https://github.com/apache/kafka/blob/06ca4850c5b2b12e972f48e03fe4f9c1032f9a3e/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java#L248]
 to use the same aggregation logic that's used 
[here|https://github.com/apache/kafka/blob/06ca4850c5b2b12e972f48e03fe4f9c1032f9a3e/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignor.java#L273-L281]
 in order to avoid overwriting map values when they should be combined instead.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (KAFKA-13763) Improve unit testing coverage for IncrementalCooperativeAssignor

2022-03-23 Thread Chris Egerton (Jira)

Chris Egerton created KAFKA-13763:
-

 Summary: Improve unit testing coverage for 
IncrementalCooperativeAssignor
 Key: KAFKA-13763
 URL: https://issues.apache.org/jira/browse/KAFKA-13763
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Chris Egerton
Assignee: Chris Egerton


The 
[tests|https://github.com/apache/kafka/blob/dcd09de1ed84b43f269eb32fc2baf589a791d468/connect/runtime/src/test/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignorTest.java]
 for the {{IncrementalCooperativeAssignor}} class provide a moderate level of 
coverage and cover some non-trivial cases, but there are some areas for 
improvement that will allow us to iterate on the assignment logic for Kafka 
Connect faster and with greater confidence.

These improvements include:
 * Adding reusable utility methods to assert that a cluster's assignment is 
*balanced* (the difference in the number of connectors and tasks assigned to 
any two workers is at most one) and *complete* (all connectors and tasks are 
assigned to a worker)
 * Removing the existing 
[assertAssignment|https://github.com/apache/kafka/blob/dcd09de1ed84b43f269eb32fc2baf589a791d468/connect/runtime/src/test/java/org/apache/kafka/connect/runtime/distributed/IncrementalCooperativeAssignorTest.java#L1373-L1405]
 methods and replacing them with a more fine-grained alternative that allows 
for more granular assertions about the number of tasks/connectors 
assigned/revoked from each worker during a round of rebalance, instead of the 
total for the entire cluster
 * Adding a reusable utility method to assert the current distribution of 
connectors and tasks across the cluster
 * Decomposing large portions of repeated code for simulating a round of 
rebalancing into a reusable utility method
 * Renaming variable names to improve accuracy/readability (the 
{{expectedMemberConfigs}} field, for example, is pretty poorly named)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Hao Li

For

stream
  .groupBy(..)
  .windowedBy(..)
  .aggregate(..)
  .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
  .mapValues(..)

I think after `aggregate` it's already a table and then the emit strategy
is too late to control
how windowed stream is outputted to table. This is the concern Guozhang
raised about having this in existing `suppress` operator as well.

Thanks,
Hao

On Wed, Mar 23, 2022 at 1:05 PM Bruno Cadonna  wrote:

> Hi,
>
> Thank you for your answers to my questions!
>
> I see the argument about conciseness of configuring a stream with
> methods instead of config objects. I just miss a bit the descriptive
> aspect.
>
> What about
>
> stream
>   .groupBy(..)
>   .windowedBy(..)
>   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
>   .aggregate(..)
>   .mapValues(..)
>
> I have also another question. Why should emitting of results be
> controlled by the window level api? If I want to emit results for each
> input record the emit strategy is quite independent from the window. So
> I somehow share Matthias' and Guozhang's concern that the emit strategy
> seems misplaced there.
>
> What are the arguments against?
>
> stream
>   .groupBy(..)
>   .windowedBy(..)
>   .aggregate(..)
>   .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
>   .mapValues(..)
>
>
> A final administrative request: Hao, could you please add the rejected
> alternatives to the KIP so that future us will know why we rejected them?
>
> Best,
> Bruno
>
> On 23.03.22 19:38, John Roesler wrote:
> > Hi all,
> >
> > I can see both sides of this.
> >
> > On one hand, when we say
> > "stream.groupBy().windowBy().count()", it seems like we're
> > telling KS to take the raw stream, group it based on key,
> > then window it based on time, and then compute an
> > aggregation on the windows. In that model, "trigger()" would
> > have to mean something like "trigger it", which doesn't
> > really make sense, since we aren't "triggering" the
> > aggregation (then again, to an outside observer, it would
> > appear that way... food for thought).
> >
> > Another way to look at it is that all we're really doing is
> > configuring a windowed aggreation on the stream, and we're
> > doing it with a progressive builder interface. In other
> > words, the above is just a progressive builder for
> > configuring an operation like
> > "stream.aggregate(groupingConfig, windowingConfig,
> > countFn)". Under the latter interpretation of the DSL, it
> > makes perfect sense to add more optional progressive builder
> > methods like trigger() to the WindowedKStream interfaces.
> >
> > Since part of the motivation for choosing the word "trigger"
> > here is to stay close to what Flink defines, I'll also point
> > out that Flink's syntax is also
> > "stream.keyBy().window().trigger().aggregate()". Not that
> > their API is the holy grail or anything, but it's at least
> > an indication that this API isn't a horrible mistake.
> >
> > All other things being equal, I also prefer to leave tie-
> > breakers in the hands of the contributor. So, if we've all
> > said our piece and Hao still prefers option 1, then (as long
> > as we don't think it's a horrible mistake), I think we
> > should just let him go for it.
> >
> > Speaking of which, after reviewing the responses regarding
> > deprecating `Suppressed#onWindowClose`, I still think we
> > should just go ahead and deprecate it. Although it's not
> > expressed exactly the same way, it still does exactly the
> > same thing, or so close that it seems confusing to keep
> > both. But again, if Hao really prefers to keep both, I won't
> > insist on it :)
> >
> > Thanks all,
> > -John
> >
> > On Wed, 2022-03-23 at 09:59 -0700, Hao Li wrote:
> >> Thanks Bruno!
> >>
> >> Argument for option 1 is:
> >> 1. Concise and descriptive. It avoids overloading existing functions and
> >> it's very clear what it's doing. Imagine if there's a autocomplete
> feature
> >> in Intellij or other IDE for our DSL in the future, it's not favorable
> to
> >> show 6 `windowedBy` functions.
> >> 2. Option 1 is operated on `windowedStream` to configure how it should
> be
> >> outputted. Option 2 operates on `KGroupedStream` to produce
> >> `windowedStream` as well as configure how `windowedStream` should be
> >>  outputted. I feel it's better to have a `windowedStream` and then
> >> configure how it can be outputted. Somehow I feel option 2 breaks the
> >> builder pattern.
> >> 3. `WindowedByParameters` doesn't seem very descriptive. If we put all
> >> kinds of different parameters into it to avoid future overloading, it's
> too
> >> bloated and not very user friendly.
> >>
> >> I agree option 1's `trigger` function is configuring the stream which
> feels
> >> different from existing `count` or `aggregate` etc. Configuring might be
> >> also a kind of action to stream :) I'm not sure if it breaks DSL
> principle
> >> and if it does,
> >> can we relax the principle given the benefits compa

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #796

2022-03-23 Thread Apache Jenkins Server

See 


Changes:


--
[...truncated 599099 lines...]
[Pipeline] // timeout
[Pipeline] }
[Pipeline] // stage
[Pipeline] }
[2022-03-23T20:34:50.551Z] 
[2022-03-23T20:34:50.551Z] 
org.apache.kafka.streams.integration.StreamsUpgradeTestIntegrationTest > 
testVersionProbingUpgrade PASSED
[2022-03-23T20:34:50.551Z] 
[2022-03-23T20:34:50.551Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldInheritSerdes STARTED
[2022-03-23T20:34:50.551Z] 
[2022-03-23T20:34:50.551Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldInheritSerdes PASSED
[2022-03-23T20:34:50.551Z] 
[2022-03-23T20:34:50.551Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenRecordConstraintIsViolated STARTED
[2022-03-23T20:34:54.129Z] 
[2022-03-23T20:34:54.129Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenRecordConstraintIsViolated PASSED
[2022-03-23T20:34:54.129Z] 
[2022-03-23T20:34:54.129Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldUseDefaultSerdes STARTED
[2022-03-23T20:34:56.757Z] 
[2022-03-23T20:34:56.757Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldUseDefaultSerdes PASSED
[2022-03-23T20:34:56.757Z] 
[2022-03-23T20:34:56.757Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowDisablingChangelog STARTED
[2022-03-23T20:35:00.214Z] 
[2022-03-23T20:35:00.214Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowDisablingChangelog PASSED
[2022-03-23T20:35:00.214Z] 
[2022-03-23T20:35:00.214Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowOverridingChangelogConfig STARTED
[2022-03-23T20:35:03.880Z] 
[2022-03-23T20:35:03.880Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowOverridingChangelogConfig PASSED
[2022-03-23T20:35:03.880Z] 
[2022-03-23T20:35:03.880Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenBytesConstraintIsViolated STARTED
[2022-03-23T20:35:08.488Z] 
[2022-03-23T20:35:08.488Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenBytesConstraintIsViolated PASSED
[2022-03-23T20:35:08.488Z] 
[2022-03-23T20:35:08.488Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldCreateChangelogByDefault STARTED
[2022-03-23T20:35:12.063Z] 
[2022-03-23T20:35:12.063Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldCreateChangelogByDefault PASSED
[2022-03-23T20:35:12.999Z] 
[2022-03-23T20:35:12.999Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor STARTED
[2022-03-23T20:35:12.999Z] 
[2022-03-23T20:35:12.999Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor PASSED
[2022-03-23T20:35:14.782Z] 
[2022-03-23T20:35:14.783Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers STARTED
[2022-03-23T20:35:25.170Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers PASSED
[2022-03-23T20:35:25.171Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount STARTED
[2022-03-23T20:35:25.171Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount PASSED
[2022-03-23T20:35:25.171Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient STARTED
[2022-03-23T20:35:25.171Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient PASSED
[2022-03-23T20:35:25.171Z] 
[2022-03-23T20:35:25.171Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys STARTED
[2022-03-23T20:35:32.343Z] 
[2022-03-23T20:35:32.343Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys PASSED
[2022-03-23T20:35:32.343Z] 
[2022-03-23T20:35:32.343Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThreadsPerClient STARTED
[2022-03-23T20:35:32.514Z] 
[2022-03-23T20:35:32.514Z] 
org.apache.kafka.streams.integration.StreamsUpgradeTestIntegrationTest > 
testVersionProbingUpgrade PASSED
[2022-03-23T20:35:32.514Z] 
[2022-03-23T20:35:32.514Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldInheritSerdes

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Bruno Cadonna


Hi,

Thank you for your answers to my questions!

I see the argument about conciseness of configuring a stream with 
methods instead of config objects. I just miss a bit the descriptive aspect.


What about

stream
 .groupBy(..)
 .windowedBy(..)
 .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
 .aggregate(..)
 .mapValues(..)

I have also another question. Why should emitting of results be 
controlled by the window level api? If I want to emit results for each 
input record the emit strategy is quite independent from the window. So 
I somehow share Matthias' and Guozhang's concern that the emit strategy 
seems misplaced there.


What are the arguments against?

stream
 .groupBy(..)
 .windowedBy(..)
 .aggregate(..)
 .withEmitStrategy(EmitStrategy.ON_WINDOW_CLOSE)
 .mapValues(..)


A final administrative request: Hao, could you please add the rejected 
alternatives to the KIP so that future us will know why we rejected them?


Best,
Bruno

On 23.03.22 19:38, John Roesler wrote:

Hi all,

I can see both sides of this.

On one hand, when we say
"stream.groupBy().windowBy().count()", it seems like we're
telling KS to take the raw stream, group it based on key,
then window it based on time, and then compute an
aggregation on the windows. In that model, "trigger()" would
have to mean something like "trigger it", which doesn't
really make sense, since we aren't "triggering" the
aggregation (then again, to an outside observer, it would
appear that way... food for thought).

Another way to look at it is that all we're really doing is
configuring a windowed aggreation on the stream, and we're
doing it with a progressive builder interface. In other
words, the above is just a progressive builder for
configuring an operation like
"stream.aggregate(groupingConfig, windowingConfig,
countFn)". Under the latter interpretation of the DSL, it
makes perfect sense to add more optional progressive builder
methods like trigger() to the WindowedKStream interfaces.

Since part of the motivation for choosing the word "trigger"
here is to stay close to what Flink defines, I'll also point
out that Flink's syntax is also
"stream.keyBy().window().trigger().aggregate()". Not that
their API is the holy grail or anything, but it's at least
an indication that this API isn't a horrible mistake.

All other things being equal, I also prefer to leave tie-
breakers in the hands of the contributor. So, if we've all
said our piece and Hao still prefers option 1, then (as long
as we don't think it's a horrible mistake), I think we
should just let him go for it.

Speaking of which, after reviewing the responses regarding
deprecating `Suppressed#onWindowClose`, I still think we
should just go ahead and deprecate it. Although it's not
expressed exactly the same way, it still does exactly the
same thing, or so close that it seems confusing to keep
both. But again, if Hao really prefers to keep both, I won't
insist on it :)

Thanks all,
-John

On Wed, 2022-03-23 at 09:59 -0700, Hao Li wrote:

Thanks Bruno!

Argument for option 1 is:
1. Concise and descriptive. It avoids overloading existing functions and
it's very clear what it's doing. Imagine if there's a autocomplete feature
in Intellij or other IDE for our DSL in the future, it's not favorable to
show 6 `windowedBy` functions.
2. Option 1 is operated on `windowedStream` to configure how it should be
outputted. Option 2 operates on `KGroupedStream` to produce
`windowedStream` as well as configure how `windowedStream` should be
 outputted. I feel it's better to have a `windowedStream` and then
configure how it can be outputted. Somehow I feel option 2 breaks the
builder pattern.
3. `WindowedByParameters` doesn't seem very descriptive. If we put all
kinds of different parameters into it to avoid future overloading, it's too
bloated and not very user friendly.

I agree option 1's `trigger` function is configuring the stream which feels
different from existing `count` or `aggregate` etc. Configuring might be
also a kind of action to stream :) I'm not sure if it breaks DSL principle
and if it does,
can we relax the principle given the benefits compared to option 2)? Maybe
John can chime in as the DSL grammar author.

Thanks,
Hao

On Wed, Mar 23, 2022 at 2:59 AM Bruno Cadonna  wrote:


Hi Hao,

I agree with Guozhang: Great summary! Thank you!

Regarding "aligned with other config class names", there is this DSL
grammar John once specified
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+DSL+Grammar
and we have already used it in the code. I found the grammar quite useful.

I am undecided if option 1 is really worth it. What are actually the
arguments in favor of it? Is it only that we do not need to overload
other methods? This does not seem worth to break DSL principles. An
alternative proposal would be to go with option 2 and conform with the
grammar above:

 TimeWindowedKStream windowedBy(final Windows
windows, WindowedByParameters parameters);

Re: [DISCUSS] KIP-651 - Support PEM format for SSL certificates and

2022-03-23 Thread Ismael Juma

Hi Rajini,

On Mon, Mar 21, 2022 at 10:02 AM Rajini Sivaram 
wrote:

> For the background on the current implementation: We use Java's keystore
> loading for JKS/PKCS12 keystore files and these files require passwords. We
>

In Java 18:

"Passwordless keystores (a keystore with no password required to unlock it)
are useful when the keystore is stored in a secure location and is only
intended to store non-sensitive information, such as public X.509
certificates. With a passwordless PKCS12 keystore, certificates are not
encrypted and there is no Mac applied as an integrity check is not
necessary.

Prior to this change, creating a passwordless PKCS12 keystore was
difficult, and required setting various security properties. Now, a
passwordless PKCS12 keystore can be created by simply specifying a null
password to the KeyStore::store(outStream, password) API. The keystore can
then be loaded with a null (or any) password with the KeyStore::load() API.

Issue: JDK-8231107"

https://seanjmullan.org/blog/2022/03/23/jdk18

Ismael

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

2022-03-23 Thread Ismael Juma

Thanks for the KIP and for taking the time to address all the feedback. +1
(binding)

Ismael

On Mon, Mar 21, 2022 at 4:52 PM Artem Livshits
 wrote:

> Hi all,
>
> I'd like to start a vote on
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
> .
>
> -Artem
>

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread John Roesler

Hi all,

I can see both sides of this.

On one hand, when we say
"stream.groupBy().windowBy().count()", it seems like we're
telling KS to take the raw stream, group it based on key,
then window it based on time, and then compute an
aggregation on the windows. In that model, "trigger()" would
have to mean something like "trigger it", which doesn't
really make sense, since we aren't "triggering" the
aggregation (then again, to an outside observer, it would
appear that way... food for thought).

Another way to look at it is that all we're really doing is
configuring a windowed aggreation on the stream, and we're
doing it with a progressive builder interface. In other
words, the above is just a progressive builder for
configuring an operation like
"stream.aggregate(groupingConfig, windowingConfig,
countFn)". Under the latter interpretation of the DSL, it
makes perfect sense to add more optional progressive builder
methods like trigger() to the WindowedKStream interfaces.

Since part of the motivation for choosing the word "trigger"
here is to stay close to what Flink defines, I'll also point
out that Flink's syntax is also
"stream.keyBy().window().trigger().aggregate()". Not that
their API is the holy grail or anything, but it's at least
an indication that this API isn't a horrible mistake.

All other things being equal, I also prefer to leave tie-
breakers in the hands of the contributor. So, if we've all
said our piece and Hao still prefers option 1, then (as long
as we don't think it's a horrible mistake), I think we
should just let him go for it.

Speaking of which, after reviewing the responses regarding
deprecating `Suppressed#onWindowClose`, I still think we
should just go ahead and deprecate it. Although it's not
expressed exactly the same way, it still does exactly the
same thing, or so close that it seems confusing to keep
both. But again, if Hao really prefers to keep both, I won't
insist on it :)

Thanks all,
-John

On Wed, 2022-03-23 at 09:59 -0700, Hao Li wrote:
> Thanks Bruno!
> 
> Argument for option 1 is:
> 1. Concise and descriptive. It avoids overloading existing functions and
> it's very clear what it's doing. Imagine if there's a autocomplete feature
> in Intellij or other IDE for our DSL in the future, it's not favorable to
> show 6 `windowedBy` functions.
> 2. Option 1 is operated on `windowedStream` to configure how it should be
> outputted. Option 2 operates on `KGroupedStream` to produce
> `windowedStream` as well as configure how `windowedStream` should be
> outputted. I feel it's better to have a `windowedStream` and then
> configure how it can be outputted. Somehow I feel option 2 breaks the
> builder pattern.
> 3. `WindowedByParameters` doesn't seem very descriptive. If we put all
> kinds of different parameters into it to avoid future overloading, it's too
> bloated and not very user friendly.
> 
> I agree option 1's `trigger` function is configuring the stream which feels
> different from existing `count` or `aggregate` etc. Configuring might be
> also a kind of action to stream :) I'm not sure if it breaks DSL principle
> and if it does,
> can we relax the principle given the benefits compared to option 2)? Maybe
> John can chime in as the DSL grammar author.
> 
> Thanks,
> Hao
> 
> On Wed, Mar 23, 2022 at 2:59 AM Bruno Cadonna  wrote:
> 
> > Hi Hao,
> > 
> > I agree with Guozhang: Great summary! Thank you!
> > 
> > Regarding "aligned with other config class names", there is this DSL
> > grammar John once specified
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+DSL+Grammar
> > and we have already used it in the code. I found the grammar quite useful.
> > 
> > I am undecided if option 1 is really worth it. What are actually the
> > arguments in favor of it? Is it only that we do not need to overload
> > other methods? This does not seem worth to break DSL principles. An
> > alternative proposal would be to go with option 2 and conform with the
> > grammar above:
> > 
> >  TimeWindowedKStream windowedBy(final Windows
> > windows, WindowedByParameters parameters);
> > 
> > TimeWindowedKStream windowedBy(final SlidingWindows windows,
> > WindowedByParameters parameters);
> > 
> > SessionWindowedKStream windowedBy(final SessionWindows windows,
> > WindowedByParameters parameters);
> > 
> > This is similar to option 2 in the KIP, but it ensures that we put all
> > future needed configs in the parameters object and we do not need to
> > overload the methods anymore.
> > 
> > Then if we also get KAFKA-10298 done, we could even collapse all
> > `windowedBy()` methods into one.
> > 
> > Best,
> > Bruno
> > 
> > On 22.03.22 22:31, Guozhang Wang wrote:
> > > Thanks for the great summary Hao. I'm still learning towards option 2)
> > > here, and I'm in favor of `trigger` as function name, and `Triggered` as
> > > config class name (mainly to be aligned with other config class names).
> > > Also want to see other's preferences between the options, as well as the
> > >

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Hao Li

Thanks Bruno!

Argument for option 1 is:
1. Concise and descriptive. It avoids overloading existing functions and
it's very clear what it's doing. Imagine if there's a autocomplete feature
in Intellij or other IDE for our DSL in the future, it's not favorable to
show 6 `windowedBy` functions.
2. Option 1 is operated on `windowedStream` to configure how it should be
outputted. Option 2 operates on `KGroupedStream` to produce
`windowedStream` as well as configure how `windowedStream` should be
outputted. I feel it's better to have a `windowedStream` and then
configure how it can be outputted. Somehow I feel option 2 breaks the
builder pattern.
3. `WindowedByParameters` doesn't seem very descriptive. If we put all
kinds of different parameters into it to avoid future overloading, it's too
bloated and not very user friendly.

I agree option 1's `trigger` function is configuring the stream which feels
different from existing `count` or `aggregate` etc. Configuring might be
also a kind of action to stream :) I'm not sure if it breaks DSL principle
and if it does,
can we relax the principle given the benefits compared to option 2)? Maybe
John can chime in as the DSL grammar author.

Thanks,
Hao

On Wed, Mar 23, 2022 at 2:59 AM Bruno Cadonna  wrote:

> Hi Hao,
>
> I agree with Guozhang: Great summary! Thank you!
>
> Regarding "aligned with other config class names", there is this DSL
> grammar John once specified
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+DSL+Grammar
> and we have already used it in the code. I found the grammar quite useful.
>
> I am undecided if option 1 is really worth it. What are actually the
> arguments in favor of it? Is it only that we do not need to overload
> other methods? This does not seem worth to break DSL principles. An
> alternative proposal would be to go with option 2 and conform with the
> grammar above:
>
>  TimeWindowedKStream windowedBy(final Windows
> windows, WindowedByParameters parameters);
>
> TimeWindowedKStream windowedBy(final SlidingWindows windows,
> WindowedByParameters parameters);
>
> SessionWindowedKStream windowedBy(final SessionWindows windows,
> WindowedByParameters parameters);
>
> This is similar to option 2 in the KIP, but it ensures that we put all
> future needed configs in the parameters object and we do not need to
> overload the methods anymore.
>
> Then if we also get KAFKA-10298 done, we could even collapse all
> `windowedBy()` methods into one.
>
> Best,
> Bruno
>
> On 22.03.22 22:31, Guozhang Wang wrote:
> > Thanks for the great summary Hao. I'm still learning towards option 2)
> > here, and I'm in favor of `trigger` as function name, and `Triggered` as
> > config class name (mainly to be aligned with other config class names).
> > Also want to see other's preferences between the options, as well as the
> > namings.
> >
> >
> > Guozhang
> >
> >
> >
> > On Tue, Mar 22, 2022 at 12:23 PM Hao Li 
> wrote:
> >
> >> `windowedStream.onWindowClose()` was the original option 1
> >> (`windowedStream.emitFinal()`) but was rejected
> >> because we could add more emit types and this will result in adding more
> >> functions. I still prefer the
> >> "windowedStream.someFunc(Controlled.onWindowClose)"
> >> model since it's flexible and clear that it's configuring the emit
> policy.
> >> Let me summarize all the naming options we have and compare:
> >>
> >> *API function name:*
> >>
> >> *1. `windowedStream.trigger()`*
> >>  Pros:
> >> i. Simple
> >> ii. Similar to Flink's trigger function (is this a con
> actually?)
> >>  Cons:
> >> i. `trigger()` can be confused with Flink trigger (raised by
> John)
> >> ii. `trigger()` feels like an operation instead of a configure
> >> function (raised by Bruno)?
> >>
> >> *2. `windowedStream.emitTrigger()`*
> >>   Pros:
> >> i. Avoid confusion from Flink's trigger API
> >> ii. `emitTrigger` feels like configuring the trigger because
> >> "trigger" here is a noun instead of verbose in `trigger()`
> >>   Cons:
> >>   i: Verbose?
> >>  ii: Not consistent with `Suppressed.untilWindowClose`?
> >>
> >>
> >> *Config class/object name:*
> >>
> >> 1. *`Emitted.onWindowClose()`* and *`Emitted.onEachUpdate()`*
> >>   Cons:
> >>   i. Doesn't go along with `trigger` (raised by Bruno)
> >>
> >> 2. *`Triggered.onWindowClose()`* and *`Triggered.onEachUpdate()`*
> >>
> >> 3. *`EmitTrigger.onWindowClose()`* and *`EmitTrigger.onEachUpdate()`*
> >>
> >> 4. *`(Emit|Trigger)(Config|Policy).onWindowClose()`* and
> >> *`(Emit|Trigger)(Config|Policy).onEachUpdate()`*
> >>   This is a combination of different names like: `EmitConfig`,
> >> `EmitPolicy`, `TriggerConfig` and `TriggerPolicy`...
> >>
> >>
> >> If we are settled with option 1), we can add new options to these names
> and
> >> comment on their Pros and Cons.
> >>
> >> Hao
> >>
> >>
> >>
> >>
> >>
> >> On Tue, Mar 22, 2022 at 10:48 AM Guozhang Wang 
> wrote:
> >>
> >>> I see what

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Ismael Juma

Hi Dongjin,

We really appreciate the super valuable work you've been doing here. Do we
have evidence that customers don't use custom filters/layouts?

Ismael

On Wed, Mar 23, 2022 at 7:53 AM Dongjin Lee  wrote:

> Hi Mikael, Edoardo and Ismael,
>
> Sorry for being late. Frankly, I thought KIP-653 is not a breaking change
> since (as Edoardo stated) unless the user uses custom filters or layouts,
> log4j-1.2-api.jar 'bridge' jar can handle the cases. It is why the
> 'Compatibility, Deprecation, and Migration Plan' section of the document is
> so brief. (As far as I know, it is such a rare case, and I thought it would
> not be so problematic.)
>
> I have no firm position on the release plan of this feature. Regardless of
> whether the community decides to put off the adoption of log4j2 to 4.0, I
> will maintain the PR and the preview releases up-to-date as far as possible
> -  although it can be significantly late for my main job. The decision is
> totally up to the community - No matter how it is decided, I will follow.
>
> Best,
> Dongjin
>
> p.s. @Edoardo I'm HIM. (HAHA)
>
> On Wed, Mar 23, 2022 at 10:12 PM Ismael Juma  wrote:
>
> > Hi Mickael,
> >
> > Thanks for your feedback. I agree with the importance of fixing the CVEs
> > and also of not breaking compatibility in a critical layer. Regarding
> > Apache Kafka 4.0, you suggested it would include:
> >
> > - log4j2 migration
> > - idempotency enablement cleanups
> > - removal of Java 8 and Scala 2.12 support
> > - removal of MirrorMaker1
> >
> > It's too soon to remove Java 8/Scala 2.12 support, so I don't think that
> > would work. The other things hardly justify a major release so soon. Have
> > we considered adjusting the existing log4j 2 PR so that both libraries
> > versions are supported for a period of time? Since reload4j doesn't have
> > the CVEs, this would be acceptable and would avoid a premature 4.0
> release.
> > I expect 4.0 to be the release after 3.4 or 3.5 given where we are right
> > now.
> >
> > Ismael
> >
> > On Wed, Mar 23, 2022 at 4:43 AM Mickael Maison  >
> > wrote:
> >
> > > Hi Ismael,
> > >
> > > About 2)
> > > We can't keep shipping new releases with dependencies that have CVEs.
> > > This is negatively impacting the project and eroding the hard earned
> > > trust we have from our users. Kafka is known to be a robust, reliable
> > > and up to date project.
> > >
> > > With that in mind, and since clearly at this point we're not going to
> > > update to log4j2 in 3.2.0, I too would be in favor of tactically
> > > adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
> > > any known CVEs and surely make the life of many users better!
> > >
> > > Now regarding log4j2. I still consider there's value in adopting
> > > log4j2 (Apache project, plugin ecosystem, reconfiguration support) and
> > > I'd like to see it happen as soon as possible. If unfortunately there
> > > are compatibility issues, I agree that we can't force breakage in a
> > > minor release. We've always put a lot of attention into preserving
> > > compatibility, we should not suddenly stop doing it. So it makes sense
> > > to wait for the next major release.
> > >
> > > Currently in many minds, 4.0 is kind of associated with the removal of
> > > ZooKeeper. At this stage, it's still unclear when this will be ready
> > > and even if I'm optimistic it's still at the very least 6 to 9 months
> > > away. The code changes to migrate to log4j2 are not trivial and
> > > there's certainly a high cost in maintaining then outside of trunk for
> > > many months. Dongjin has done a stellar work so far in regularly
> > > updating his PRs since this KIP was started back in 2020, but we can't
> > > ask him to just keep doing it for another unknown amount of time.
> > >
> > > What about if the next release is 4.0? Even if it's light on features,
> > > it would enable us to do quite a few cleanups and migrate to log4j2.
> > > Then the removal of ZooKeeper can happen in a future major release
> > > when it's ready.
> > >
> > > 4.0 would include:
> > > - log4j2 migration
> > > - idempotency enablement cleanups
> > > - removal of Java 8 and Scala 2.12 support
> > > - removal of MirrorMaker1
> > >
> > > So I propose to adopt reload4j in Kafka 3.2 and make the next release
> > > 4.0. Let me know what you think.
> > >
> > > Thanks,
> > > Mickael
> > >
> > >
> > >
> > > On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma  wrote:
> > > >
> > > > Hi Edoardo,
> > > >
> > > > Thanks for the information. That's definitely useful. A couple of
> > > questions
> > > > for you and the rest of the group:
> > > >
> > > > 1. Did you test the branch using log4j 1.x configs?
> > > > 2. Given the release of https://github.com/qos-ch/reload4j, does it
> > > really
> > > > make sense to force breakage on users in a minor release? Would it
> not
> > be
> > > > better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?
> > > >
> > > > Thanks,
> > > > Ismael
> > > >
> > > > On Mon, Mar 21, 2022 at 8:

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Dongjin Lee

Hi Mikael, Edoardo and Ismael,

Sorry for being late. Frankly, I thought KIP-653 is not a breaking change
since (as Edoardo stated) unless the user uses custom filters or layouts,
log4j-1.2-api.jar 'bridge' jar can handle the cases. It is why the
'Compatibility, Deprecation, and Migration Plan' section of the document is
so brief. (As far as I know, it is such a rare case, and I thought it would
not be so problematic.)

I have no firm position on the release plan of this feature. Regardless of
whether the community decides to put off the adoption of log4j2 to 4.0, I
will maintain the PR and the preview releases up-to-date as far as possible
-  although it can be significantly late for my main job. The decision is
totally up to the community - No matter how it is decided, I will follow.

Best,
Dongjin

p.s. @Edoardo I'm HIM. (HAHA)

On Wed, Mar 23, 2022 at 10:12 PM Ismael Juma  wrote:

> Hi Mickael,
>
> Thanks for your feedback. I agree with the importance of fixing the CVEs
> and also of not breaking compatibility in a critical layer. Regarding
> Apache Kafka 4.0, you suggested it would include:
>
> - log4j2 migration
> - idempotency enablement cleanups
> - removal of Java 8 and Scala 2.12 support
> - removal of MirrorMaker1
>
> It's too soon to remove Java 8/Scala 2.12 support, so I don't think that
> would work. The other things hardly justify a major release so soon. Have
> we considered adjusting the existing log4j 2 PR so that both libraries
> versions are supported for a period of time? Since reload4j doesn't have
> the CVEs, this would be acceptable and would avoid a premature 4.0 release.
> I expect 4.0 to be the release after 3.4 or 3.5 given where we are right
> now.
>
> Ismael
>
> On Wed, Mar 23, 2022 at 4:43 AM Mickael Maison 
> wrote:
>
> > Hi Ismael,
> >
> > About 2)
> > We can't keep shipping new releases with dependencies that have CVEs.
> > This is negatively impacting the project and eroding the hard earned
> > trust we have from our users. Kafka is known to be a robust, reliable
> > and up to date project.
> >
> > With that in mind, and since clearly at this point we're not going to
> > update to log4j2 in 3.2.0, I too would be in favor of tactically
> > adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
> > any known CVEs and surely make the life of many users better!
> >
> > Now regarding log4j2. I still consider there's value in adopting
> > log4j2 (Apache project, plugin ecosystem, reconfiguration support) and
> > I'd like to see it happen as soon as possible. If unfortunately there
> > are compatibility issues, I agree that we can't force breakage in a
> > minor release. We've always put a lot of attention into preserving
> > compatibility, we should not suddenly stop doing it. So it makes sense
> > to wait for the next major release.
> >
> > Currently in many minds, 4.0 is kind of associated with the removal of
> > ZooKeeper. At this stage, it's still unclear when this will be ready
> > and even if I'm optimistic it's still at the very least 6 to 9 months
> > away. The code changes to migrate to log4j2 are not trivial and
> > there's certainly a high cost in maintaining then outside of trunk for
> > many months. Dongjin has done a stellar work so far in regularly
> > updating his PRs since this KIP was started back in 2020, but we can't
> > ask him to just keep doing it for another unknown amount of time.
> >
> > What about if the next release is 4.0? Even if it's light on features,
> > it would enable us to do quite a few cleanups and migrate to log4j2.
> > Then the removal of ZooKeeper can happen in a future major release
> > when it's ready.
> >
> > 4.0 would include:
> > - log4j2 migration
> > - idempotency enablement cleanups
> > - removal of Java 8 and Scala 2.12 support
> > - removal of MirrorMaker1
> >
> > So I propose to adopt reload4j in Kafka 3.2 and make the next release
> > 4.0. Let me know what you think.
> >
> > Thanks,
> > Mickael
> >
> >
> >
> > On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma  wrote:
> > >
> > > Hi Edoardo,
> > >
> > > Thanks for the information. That's definitely useful. A couple of
> > questions
> > > for you and the rest of the group:
> > >
> > > 1. Did you test the branch using log4j 1.x configs?
> > > 2. Given the release of https://github.com/qos-ch/reload4j, does it
> > really
> > > make sense to force breakage on users in a minor release? Would it not
> be
> > > better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?
> > >
> > > Thanks,
> > > Ismael
> > >
> > > On Mon, Mar 21, 2022 at 8:16 AM Edoardo Comar 
> wrote:
> > >
> > > > Hi Ismael and Luke,
> > > > we've tested Dongjin code - porting her preview releases and PR to
> > > > different Kafka code levels (2.8.1+, 3.1.0+, trunk).
> > > > We're happy with it and would love it if her PR was merged in 3.2.0.
> > > >
> > > > To chime in on the issue of compatibility, as we have experienced it,
> > the
> > > > main limitation of the log4j-1.2-api.jar 'bridge' jar is in th

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Ismael Juma

Hi Mickael,

Thanks for your feedback. I agree with the importance of fixing the CVEs
and also of not breaking compatibility in a critical layer. Regarding
Apache Kafka 4.0, you suggested it would include:

- log4j2 migration
- idempotency enablement cleanups
- removal of Java 8 and Scala 2.12 support
- removal of MirrorMaker1

It's too soon to remove Java 8/Scala 2.12 support, so I don't think that
would work. The other things hardly justify a major release so soon. Have
we considered adjusting the existing log4j 2 PR so that both libraries
versions are supported for a period of time? Since reload4j doesn't have
the CVEs, this would be acceptable and would avoid a premature 4.0 release.
I expect 4.0 to be the release after 3.4 or 3.5 given where we are right
now.

Ismael

On Wed, Mar 23, 2022 at 4:43 AM Mickael Maison 
wrote:

> Hi Ismael,
>
> About 2)
> We can't keep shipping new releases with dependencies that have CVEs.
> This is negatively impacting the project and eroding the hard earned
> trust we have from our users. Kafka is known to be a robust, reliable
> and up to date project.
>
> With that in mind, and since clearly at this point we're not going to
> update to log4j2 in 3.2.0, I too would be in favor of tactically
> adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
> any known CVEs and surely make the life of many users better!
>
> Now regarding log4j2. I still consider there's value in adopting
> log4j2 (Apache project, plugin ecosystem, reconfiguration support) and
> I'd like to see it happen as soon as possible. If unfortunately there
> are compatibility issues, I agree that we can't force breakage in a
> minor release. We've always put a lot of attention into preserving
> compatibility, we should not suddenly stop doing it. So it makes sense
> to wait for the next major release.
>
> Currently in many minds, 4.0 is kind of associated with the removal of
> ZooKeeper. At this stage, it's still unclear when this will be ready
> and even if I'm optimistic it's still at the very least 6 to 9 months
> away. The code changes to migrate to log4j2 are not trivial and
> there's certainly a high cost in maintaining then outside of trunk for
> many months. Dongjin has done a stellar work so far in regularly
> updating his PRs since this KIP was started back in 2020, but we can't
> ask him to just keep doing it for another unknown amount of time.
>
> What about if the next release is 4.0? Even if it's light on features,
> it would enable us to do quite a few cleanups and migrate to log4j2.
> Then the removal of ZooKeeper can happen in a future major release
> when it's ready.
>
> 4.0 would include:
> - log4j2 migration
> - idempotency enablement cleanups
> - removal of Java 8 and Scala 2.12 support
> - removal of MirrorMaker1
>
> So I propose to adopt reload4j in Kafka 3.2 and make the next release
> 4.0. Let me know what you think.
>
> Thanks,
> Mickael
>
>
>
> On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma  wrote:
> >
> > Hi Edoardo,
> >
> > Thanks for the information. That's definitely useful. A couple of
> questions
> > for you and the rest of the group:
> >
> > 1. Did you test the branch using log4j 1.x configs?
> > 2. Given the release of https://github.com/qos-ch/reload4j, does it
> really
> > make sense to force breakage on users in a minor release? Would it not be
> > better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?
> >
> > Thanks,
> > Ismael
> >
> > On Mon, Mar 21, 2022 at 8:16 AM Edoardo Comar  wrote:
> >
> > > Hi Ismael and Luke,
> > > we've tested Dongjin code - porting her preview releases and PR to
> > > different Kafka code levels (2.8.1+, 3.1.0+, trunk).
> > > We're happy with it and would love it if her PR was merged in 3.2.0.
> > >
> > > To chime in on the issue of compatibility, as we have experienced it,
> the
> > > main limitation of the log4j-1.2-api.jar 'bridge' jar is in the
> support for
> > > custom Appenders, Filters and Layouts.
> > > If you're using such components, they may need to be rewritten to the
> > > Log4j2 spec and correspondingly use the configuration file in log4j2
> format
> > > (and referenced with the log4j2 system property).
> > > Details at
> > >
> https://logging.apache.org/log4j/2.x/manual/migration.html#ConfigurationCompatibility
> > > and
> > >
> https://logging.apache.org/log4j/2.x/manual/migration.html#Log4j1.2BridgeLimitations
> > >
> > > I think that the above information should find its way in the KIP's
> > > compatibility section.
> > >
> > > HTH
> > > Edo
> > > --
> > > Edoardo Comar
> > > Event Streams for IBM Cloud
> > >
> > >
> > > 
> > > From: Luke Chen 
> > > Sent: 18 March 2022 07:57
> > > To: dev 
> > > Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2
> > >
> > > Hi Dongjin,
> > >
> > > I know there are some discussions about the compatibility issue.
> > > Could you help answer this question?
> > >
> > > Thank you.
>

[jira] [Created] (KAFKA-13762) Kafka brokers are not coming up

2022-03-23 Thread Kamesh (Jira)

Kamesh created KAFKA-13762:
--

 Summary: Kafka brokers are not coming up 
 Key: KAFKA-13762
 URL: https://issues.apache.org/jira/browse/KAFKA-13762
 Project: Kafka
  Issue Type: Bug
Reporter: Kamesh


We are getting below error 

 

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at 
sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:386)
        at 
sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:401)
Caused by: java.net.BindException: Address already in use
        at sun.nio.ch.Net.bind0(Native Method)
        at sun.nio.ch.Net.bind(Net.java:433)
        at sun.nio.ch.Net.bind(Net.java:425)
        at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
        at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
        at sun.net.httpserver.ServerImpl.(ServerImpl.java:100)
        at sun.net.httpserver.HttpServerImpl.(HttpServerImpl.java:50)
        at 
sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
        at com.sun.net.httpserver.HttpServer.create(HttpServer.java:130)
        at 
io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer.(HTTPServer.java:179)
        at 
io.prometheus.jmx.shaded.io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:31)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

RE: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Edoardo Comar

Mickael,
+1 from me - even if you didn't ask for a vote :-)

Edo

--
Edoardo Comar
Event Streams for IBM Cloud

From: Mickael Maison 
Sent: 23 March 2022 11:43
To: dev 
Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2

Hi Ismael,

About 2)
We can't keep shipping new releases with dependencies that have CVEs.
This is negatively impacting the project and eroding the hard earned
trust we have from our users. Kafka is known to be a robust, reliable
and up to date project.

With that in mind, and since clearly at this point we're not going to
update to log4j2 in 3.2.0, I too would be in favor of tactically
adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
any known CVEs and surely make the life of many users better!

Now regarding log4j2. I still consider there's value in adopting
log4j2 (Apache project, plugin ecosystem, reconfiguration support) and
I'd like to see it happen as soon as possible. If unfortunately there
are compatibility issues, I agree that we can't force breakage in a
minor release. We've always put a lot of attention into preserving
compatibility, we should not suddenly stop doing it. So it makes sense
to wait for the next major release.

Currently in many minds, 4.0 is kind of associated with the removal of
ZooKeeper. At this stage, it's still unclear when this will be ready
and even if I'm optimistic it's still at the very least 6 to 9 months
away. The code changes to migrate to log4j2 are not trivial and
there's certainly a high cost in maintaining then outside of trunk for
many months. Dongjin has done a stellar work so far in regularly
updating his PRs since this KIP was started back in 2020, but we can't
ask him to just keep doing it for another unknown amount of time.

What about if the next release is 4.0? Even if it's light on features,
it would enable us to do quite a few cleanups and migrate to log4j2.
Then the removal of ZooKeeper can happen in a future major release
when it's ready.

4.0 would include:
- log4j2 migration
- idempotency enablement cleanups
- removal of Java 8 and Scala 2.12 support
- removal of MirrorMaker1

So I propose to adopt reload4j in Kafka 3.2 and make the next release
4.0. Let me know what you think.

Thanks,
Mickael

On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma  wrote:
>
> Hi Edoardo,
>
> Thanks for the information. That's definitely useful. A couple of questions
> for you and the rest of the group:
>
> 1. Did you test the branch using log4j 1.x configs?
> 2. Given the release of https://github.com/qos-ch/reload4j , does it really
> make sense to force breakage on users in a minor release? Would it not be
> better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?
>
> Thanks,
> Ismael
>
> On Mon, Mar 21, 2022 at 8:16 AM Edoardo Comar  wrote:
>
> > Hi Ismael and Luke,
> > we've tested Dongjin code - porting her preview releases and PR to
> > different Kafka code levels (2.8.1+, 3.1.0+, trunk).
> > We're happy with it and would love it if her PR was merged in 3.2.0.
> >
> > To chime in on the issue of compatibility, as we have experienced it, the
> > main limitation of the log4j-1.2-api.jar 'bridge' jar is in the support for
> > custom Appenders, Filters and Layouts.
> > If you're using such components, they may need to be rewritten to the
> > Log4j2 spec and correspondingly use the configuration file in log4j2 format
> > (and referenced with the log4j2 system property).
> > Details at
> > https://logging.apache.org/log4j/2.x/manual/migration.html#ConfigurationCompatibility
> > and
> > https://logging.apache.org/log4j/2.x/manual/migration.html#Log4j1.2BridgeLimitations
> >
> > I think that the above information should find its way in the KIP's
> > compatibility section.
> >
> > HTH
> > Edo
> > --
> > Edoardo Comar
> > Event Streams for IBM Cloud
> >
> >
> > 
> > From: Luke Chen 
> > Sent: 18 March 2022 07:57
> > To: dev 
> > Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2
> >
> > Hi Dongjin,
> >
> > I know there are some discussions about the compatibility issue.
> > Could you help answer this question?
> >
> > Thank you.
> > Luke
> >
> > On Fri, Mar 18, 2022 at 3:32 AM Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > The KIP compatibility section does not include enough detail. I am
> > puzzled
> > > how we voted +1 given that. I noticed that Colin indicated it would only
> > be
> > > acceptable in a major release unless the new version was fully compatible
> > > (which it is not). Can we clarify what we actually voted for here?
> > >
> > > Ismael
> > >
> > > On Wed, Oct 21, 2020 at 6:41 PM Dongjin Lee  wrote:
> > >
> > > > Hi All,
> > > >
> > > > As of present:
> > > >
> > > > - Binding: +3 (Gwen, John, Colin)
> > > > - Non-binding: +1 (David, Tom)
> > > >
> > > > This KIP is now accepted. Thanks for your votes!
> > > >
> > > > @Colin Sure

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-23 Thread Mickael Maison

Hi Ismael,

About 2)
We can't keep shipping new releases with dependencies that have CVEs.
This is negatively impacting the project and eroding the hard earned
trust we have from our users. Kafka is known to be a robust, reliable
and up to date project.

With that in mind, and since clearly at this point we're not going to
update to log4j2 in 3.2.0, I too would be in favor of tactically
adopting reload4j in 3.2.0. This would allow 3.2.0 to release without
any known CVEs and surely make the life of many users better!

Now regarding log4j2. I still consider there's value in adopting
log4j2 (Apache project, plugin ecosystem, reconfiguration support) and
I'd like to see it happen as soon as possible. If unfortunately there
are compatibility issues, I agree that we can't force breakage in a
minor release. We've always put a lot of attention into preserving
compatibility, we should not suddenly stop doing it. So it makes sense
to wait for the next major release.

Currently in many minds, 4.0 is kind of associated with the removal of
ZooKeeper. At this stage, it's still unclear when this will be ready
and even if I'm optimistic it's still at the very least 6 to 9 months
away. The code changes to migrate to log4j2 are not trivial and
there's certainly a high cost in maintaining then outside of trunk for
many months. Dongjin has done a stellar work so far in regularly
updating his PRs since this KIP was started back in 2020, but we can't
ask him to just keep doing it for another unknown amount of time.

What about if the next release is 4.0? Even if it's light on features,
it would enable us to do quite a few cleanups and migrate to log4j2.
Then the removal of ZooKeeper can happen in a future major release
when it's ready.

4.0 would include:
- log4j2 migration
- idempotency enablement cleanups
- removal of Java 8 and Scala 2.12 support
- removal of MirrorMaker1

So I propose to adopt reload4j in Kafka 3.2 and make the next release
4.0. Let me know what you think.

Thanks,
Mickael

On Mon, Mar 21, 2022 at 4:33 PM Ismael Juma  wrote:
>
> Hi Edoardo,
>
> Thanks for the information. That's definitely useful. A couple of questions
> for you and the rest of the group:
>
> 1. Did you test the branch using log4j 1.x configs?
> 2. Given the release of https://github.com/qos-ch/reload4j, does it really
> make sense to force breakage on users in a minor release? Would it not be
> better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?
>
> Thanks,
> Ismael
>
> On Mon, Mar 21, 2022 at 8:16 AM Edoardo Comar  wrote:
>
> > Hi Ismael and Luke,
> > we've tested Dongjin code - porting her preview releases and PR to
> > different Kafka code levels (2.8.1+, 3.1.0+, trunk).
> > We're happy with it and would love it if her PR was merged in 3.2.0.
> >
> > To chime in on the issue of compatibility, as we have experienced it, the
> > main limitation of the log4j-1.2-api.jar 'bridge' jar is in the support for
> > custom Appenders, Filters and Layouts.
> > If you're using such components, they may need to be rewritten to the
> > Log4j2 spec and correspondingly use the configuration file in log4j2 format
> > (and referenced with the log4j2 system property).
> > Details at
> > https://logging.apache.org/log4j/2.x/manual/migration.html#ConfigurationCompatibility
> > and
> > https://logging.apache.org/log4j/2.x/manual/migration.html#Log4j1.2BridgeLimitations
> >
> > I think that the above information should find its way in the KIP's
> > compatibility section.
> >
> > HTH
> > Edo
> > --
> > Edoardo Comar
> > Event Streams for IBM Cloud
> >
> >
> > 
> > From: Luke Chen 
> > Sent: 18 March 2022 07:57
> > To: dev 
> > Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2
> >
> > Hi Dongjin,
> >
> > I know there are some discussions about the compatibility issue.
> > Could you help answer this question?
> >
> > Thank you.
> > Luke
> >
> > On Fri, Mar 18, 2022 at 3:32 AM Ismael Juma  wrote:
> >
> > > Hi all,
> > >
> > > The KIP compatibility section does not include enough detail. I am
> > puzzled
> > > how we voted +1 given that. I noticed that Colin indicated it would only
> > be
> > > acceptable in a major release unless the new version was fully compatible
> > > (which it is not). Can we clarify what we actually voted for here?
> > >
> > > Ismael
> > >
> > > On Wed, Oct 21, 2020 at 6:41 PM Dongjin Lee  wrote:
> > >
> > > > Hi All,
> > > >
> > > > As of present:
> > > >
> > > > - Binding: +3 (Gwen, John, Colin)
> > > > - Non-binding: +1 (David, Tom)
> > > >
> > > > This KIP is now accepted. Thanks for your votes!
> > > >
> > > > @Colin Sure, I have some plan for providing a compatibility preview.
> > > Let's
> > > > continue in the discussion thread.
> > > >
> > > > All other voters not in KIP-676 Vote thread: KIP-676 (by Tom) is a
> > > > prerequisite of this KIP. Please have a look at that proposal and vote
> > > for
> > > > it.
> > > >
> > > > Best,

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-23 Thread Bruno Cadonna


Hi Hao,

I agree with Guozhang: Great summary! Thank you!

Regarding "aligned with other config class names", there is this DSL 
grammar John once specified

https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Streams+DSL+Grammar
and we have already used it in the code. I found the grammar quite useful.

I am undecided if option 1 is really worth it. What are actually the 
arguments in favor of it? Is it only that we do not need to overload 
other methods? This does not seem worth to break DSL principles. An 
alternative proposal would be to go with option 2 and conform with the 
grammar above:


 TimeWindowedKStream windowedBy(final Windows 
windows, WindowedByParameters parameters);


TimeWindowedKStream windowedBy(final SlidingWindows windows, 
WindowedByParameters parameters);


SessionWindowedKStream windowedBy(final SessionWindows windows, 
WindowedByParameters parameters);


This is similar to option 2 in the KIP, but it ensures that we put all 
future needed configs in the parameters object and we do not need to 
overload the methods anymore.


Then if we also get KAFKA-10298 done, we could even collapse all 
`windowedBy()` methods into one.


Best,
Bruno

On 22.03.22 22:31, Guozhang Wang wrote:

Thanks for the great summary Hao. I'm still learning towards option 2)
here, and I'm in favor of `trigger` as function name, and `Triggered` as
config class name (mainly to be aligned with other config class names).
Also want to see other's preferences between the options, as well as the
namings.


Guozhang



On Tue, Mar 22, 2022 at 12:23 PM Hao Li  wrote:


`windowedStream.onWindowClose()` was the original option 1
(`windowedStream.emitFinal()`) but was rejected
because we could add more emit types and this will result in adding more
functions. I still prefer the
"windowedStream.someFunc(Controlled.onWindowClose)"
model since it's flexible and clear that it's configuring the emit policy.
Let me summarize all the naming options we have and compare:

*API function name:*

*1. `windowedStream.trigger()`*
 Pros:
i. Simple
ii. Similar to Flink's trigger function (is this a con actually?)
 Cons:
i. `trigger()` can be confused with Flink trigger (raised by John)
ii. `trigger()` feels like an operation instead of a configure
function (raised by Bruno)?

*2. `windowedStream.emitTrigger()`*
  Pros:
i. Avoid confusion from Flink's trigger API
ii. `emitTrigger` feels like configuring the trigger because
"trigger" here is a noun instead of verbose in `trigger()`
  Cons:
  i: Verbose?
 ii: Not consistent with `Suppressed.untilWindowClose`?


*Config class/object name:*

1. *`Emitted.onWindowClose()`* and *`Emitted.onEachUpdate()`*
  Cons:
  i. Doesn't go along with `trigger` (raised by Bruno)

2. *`Triggered.onWindowClose()`* and *`Triggered.onEachUpdate()`*

3. *`EmitTrigger.onWindowClose()`* and *`EmitTrigger.onEachUpdate()`*

4. *`(Emit|Trigger)(Config|Policy).onWindowClose()`* and
*`(Emit|Trigger)(Config|Policy).onEachUpdate()`*
  This is a combination of different names like: `EmitConfig`,
`EmitPolicy`, `TriggerConfig` and `TriggerPolicy`...


If we are settled with option 1), we can add new options to these names and
comment on their Pros and Cons.

Hao





On Tue, Mar 22, 2022 at 10:48 AM Guozhang Wang  wrote:


I see what you mean now, and I think it's a fair point that composing
`trigger` and `emitted` seems awkward.

Re: data process operator v.s. control operator, I shared your concern as
well, and here's my train of thoughts: Having only data process operators
was my primary motivation for how we add the suppress operator --- it
indeed "suppresses" data. But as a hind-sight it's disadvantage is that,
for example in Suppressed.onWindowClose() should be only related to an
earlier windowedBy operator which is possibly very far from it in the
resulting DSL code. It's not only a bit awkward for users to write such
code, but also in such cases the DSL builder needs to maintain and
propagate this information to the suppress operator further down. So we

are

now thinking about "putting the control object as close as to where the
related processor really happens". And in that world my original
preference was somewhere in option 2), i.e. just put the control as a

param

of the related "windowedBy" operator, but the trade-off is we keep adding
overloaded functions to these operators. So after some back and forth
thoughts I'm learning towards relaxing our principles to only have
processing operators but no flow-control operators. That being said, if

you

have any ideas that we can have both world's benefits I'm all ears.

Re: using a direct function like "windowedStream.onWindowClose()" v.s.
"windowedStream.someFunc(Controlled.onWindowClose)", again my motivation
for the latter is for extensibility without adding more functions in the
future. If people feel this is not worthy we can do the first option as
well. If we just feel

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

2022-03-23 Thread David Jacot

Hi Artem,

Thanks for the KIP. This is a really nice improvement!

+1 (binding) from me.

David

On Tue, Mar 22, 2022 at 9:35 PM Jun Rao  wrote:
>
> Hi, Artem,
>
> Thanks for the KIP. +1 from me.
>
> Jun
>
> On Mon, Mar 21, 2022 at 4:52 PM Artem Livshits
>  wrote:
>
> > Hi all,
> >
> > I'd like to start a vote on
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
> > .
> >
> > -Artem
> >

[jira] [Created] (KAFKA-13767) consumer fetch request does not need to put to delayFetch queue when the preferred read replica not local

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

[VOTE] KIP-825: introduce a new API to control when aggregated results are produced

[jira] [Resolved] (KAFKA-13672) Race condition in DynamicBrokerConfig

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

[jira] [Resolved] (KAFKA-13714) Flaky test IQv2StoreIntegrationTest

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

[jira] [Created] (KAFKA-13766) Use `max.poll.interval.ms` as the timeout during complete-rebalance phase

[jira] [Created] (KAFKA-13765) Describe-consumer admin should not return unstable membership information

Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.0 #191

Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.1 #95

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

[jira] [Resolved] (KAFKA-13759) Disable producer idempotence by default in producers instantiated by Connect

[jira] [Created] (KAFKA-13764) Potential improvements for Connect incremental rebalancing logic

[jira] [Created] (KAFKA-13763) Improve unit testing coverage for IncrementalCooperativeAssignor

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #796

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

Re: [DISCUSS] KIP-651 - Support PEM format for SSL certificates and

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

[jira] [Created] (KAFKA-13762) Kafka brokers are not coming up

RE: [VOTE] KIP-653: Upgrade log4j to log4j2

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

Re: [VOTE] KIP-794: Strictly Uniform Sticky Partitioner

31 matches

Site Navigation

Mail list logo

Footer information