[jira] [Created] (KAFKA-13756) Connect validate endpoint should return proper response on name and connector class error

2022-03-21 Thread Daniel Urban (Jira)
Daniel Urban created KAFKA-13756:


 Summary: Connect validate endpoint should return proper response 
on name and connector class error
 Key: KAFKA-13756
 URL: https://issues.apache.org/jira/browse/KAFKA-13756
 Project: Kafka
  Issue Type: Improvement
Reporter: Daniel Urban


Currently, if there is an issue with the connector name or the connector class, 
the validate endpoint returns a 500 response.

Instead, it should return a well formatted response containing proper 
validation error messages.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[GitHub] [kafka-site] tombentley merged pull request #401: Add Tom's public key to KEYS

2022-03-21 Thread GitBox


tombentley merged pull request #401:
URL: https://github.com/apache/kafka-site/pull/401


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #787

2022-03-21 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #788

2022-03-21 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 598536 lines...]
[2022-03-21T13:24:00.505Z] 
[2022-03-21T13:24:00.505Z] 
org.apache.kafka.streams.integration.StreamTableJoinTopologyOptimizationIntegrationTest
 > shouldDoStreamTableJoinWithDifferentNumberOfPartitions[Optimization = none] 
PASSED
[2022-03-21T13:24:05.656Z] 
[2022-03-21T13:24:05.656Z] 
org.apache.kafka.streams.integration.StreamsUpgradeTestIntegrationTest > 
testVersionProbingUpgrade STARTED
[2022-03-21T13:24:10.119Z] 
[2022-03-21T13:24:10.119Z] 
org.apache.kafka.streams.integration.StreamsUpgradeTestIntegrationTest > 
testVersionProbingUpgrade PASSED
[2022-03-21T13:24:10.119Z] 
[2022-03-21T13:24:10.119Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldInheritSerdes STARTED
[2022-03-21T13:24:10.119Z] 
[2022-03-21T13:24:10.119Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldInheritSerdes PASSED
[2022-03-21T13:24:10.119Z] 
[2022-03-21T13:24:10.119Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenRecordConstraintIsViolated STARTED
[2022-03-21T13:24:11.218Z] 
[2022-03-21T13:24:11.218Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenRecordConstraintIsViolated PASSED
[2022-03-21T13:24:11.218Z] 
[2022-03-21T13:24:11.218Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldUseDefaultSerdes STARTED
[2022-03-21T13:24:15.218Z] 
[2022-03-21T13:24:15.218Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldUseDefaultSerdes PASSED
[2022-03-21T13:24:15.218Z] 
[2022-03-21T13:24:15.218Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowDisablingChangelog STARTED
[2022-03-21T13:24:18.582Z] 
[2022-03-21T13:24:18.582Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowDisablingChangelog PASSED
[2022-03-21T13:24:18.582Z] 
[2022-03-21T13:24:18.582Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowOverridingChangelogConfig STARTED
[2022-03-21T13:24:21.628Z] 
[2022-03-21T13:24:21.628Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldAllowOverridingChangelogConfig PASSED
[2022-03-21T13:24:21.628Z] 
[2022-03-21T13:24:21.628Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenBytesConstraintIsViolated STARTED
[2022-03-21T13:24:26.332Z] 
[2022-03-21T13:24:26.332Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldShutdownWhenBytesConstraintIsViolated PASSED
[2022-03-21T13:24:26.332Z] 
[2022-03-21T13:24:26.332Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldCreateChangelogByDefault STARTED
[2022-03-21T13:24:29.412Z] 
[2022-03-21T13:24:29.412Z] 
org.apache.kafka.streams.integration.SuppressionIntegrationTest > 
shouldCreateChangelogByDefault PASSED
[2022-03-21T13:24:30.605Z] 
[2022-03-21T13:24:30.605Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor STARTED
[2022-03-21T13:24:30.606Z] 
[2022-03-21T13:24:30.606Z] 
org.apache.kafka.streams.integration.TaskAssignorIntegrationTest > 
shouldProperlyConfigureTheAssignor PASSED
[2022-03-21T13:24:32.933Z] 
[2022-03-21T13:24:32.933Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers STARTED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargeNumConsumers PASSED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount STARTED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorLargePartitionCount PASSED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient STARTED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testHighAvailabilityTaskAssignorManyThreadsPerClient PASSED
[2022-03-21T13:24:51.714Z] 
[2022-03-21T13:24:51.714Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys STARTED
[2022-03-21T13:25:04.013Z] 
[2022-03-21T13:25:04.013Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyStandbys PASSED
[2022-03-21T13:25:04.013Z] 
[2022-03-21T13:25:04.013Z] 
org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > 
testStickyTaskAssignorManyThr

RE: [DISCUSS] KIP-651 - Support PEM format for SSL certificates and

2022-03-21 Thread Dejan Maric
Regarding this KIP,  we've created an MR that removes the need for
specifying *keyPassword *when PEM certificates and private key are provided
as files: https://github.com/apache/kafka/pull/11916

We think that Kafka should not enforce the use of passwords on private
keys. It would be sufficient to *recommend* encrypted private keys.

The reason we suggested the PR is because this issue is currently blocking
us from using PEM-formatted certificates issued by cert-manager
 with Kafka - since
cert-manager does not support encrypted private keys.

We asked one of the main cert-manager maintainers about this, and got the
following comments:

>There is a benefit to encrypting this stuff if the cert is being persisted
to disk and if the decryption key isn't written to the same disk, which is
probably a pretty common setup in a lot of older non-cloud systems. The
issue in a lot of more modern or cloud environments - especially in k8s -
is that certs are ideally never going to be written to disk and wherever
they are stored, they'd likely be stored next to their decryption keys.

>
>So I guess everyone's right - in some environments encrypting the key is
probably worth it, and in some it's a false sense of security and wasted
CPU cycles. That to me means that being able to choose to not encrypt the
key is desirable.
>(This is assuming that the encryption is actually secure, which might not
be true for all methods of encrypting private keys)


We'd like to hear other opinions on this.

Dejan


On 2020/08/03 11:28:42 Rajini Sivaram wrote:
> Hi all,
>
> I have submitted KIP-651 to support PEM format for SSL key and trust
stores:
>
>-
>
https://cwiki.apache.org/confluence/display/KAFKA/KIP-651+-+Support+PEM+format+for+SSL+certificates+and+private+key

>
> This enables better integration with other Kafka features like dynamic
> config update and password protection.
>
> Feedback and suggestions are welcome.
>
> Thank you...
>
> Regards,
>
> Rajini
>


RE: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-21 Thread Edoardo Comar
Hi Ismael and Luke,
we've tested Dongjin code - porting her preview releases and PR to different 
Kafka code levels (2.8.1+, 3.1.0+, trunk).
We're happy with it and would love it if her PR was merged in 3.2.0.

To chime in on the issue of compatibility, as we have experienced it, the main 
limitation of the log4j-1.2-api.jar 'bridge' jar is in the support for custom 
Appenders, Filters and Layouts.
If you're using such components, they may need to be rewritten to the Log4j2 
spec and correspondingly use the configuration file in log4j2 format (and 
referenced with the log4j2 system property).
Details at 
https://logging.apache.org/log4j/2.x/manual/migration.html#ConfigurationCompatibility
 and 
https://logging.apache.org/log4j/2.x/manual/migration.html#Log4j1.2BridgeLimitations

I think that the above information should find its way in the KIP's 
compatibility section.

HTH
Edo
--
Edoardo Comar
Event Streams for IBM Cloud



From: Luke Chen 
Sent: 18 March 2022 07:57
To: dev 
Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2

Hi Dongjin,

I know there are some discussions about the compatibility issue.
Could you help answer this question?

Thank you.
Luke

On Fri, Mar 18, 2022 at 3:32 AM Ismael Juma  wrote:

> Hi all,
>
> The KIP compatibility section does not include enough detail. I am puzzled
> how we voted +1 given that. I noticed that Colin indicated it would only be
> acceptable in a major release unless the new version was fully compatible
> (which it is not). Can we clarify what we actually voted for here?
>
> Ismael
>
> On Wed, Oct 21, 2020 at 6:41 PM Dongjin Lee  wrote:
>
> > Hi All,
> >
> > As of present:
> >
> > - Binding: +3 (Gwen, John, Colin)
> > - Non-binding: +1 (David, Tom)
> >
> > This KIP is now accepted. Thanks for your votes!
> >
> > @Colin Sure, I have some plan for providing a compatibility preview.
> Let's
> > continue in the discussion thread.
> >
> > All other voters not in KIP-676 Vote thread: KIP-676 (by Tom) is a
> > prerequisite of this KIP. Please have a look at that proposal and vote
> for
> > it.
> >
> > Best,
> > Dongjin
> >
> > On Wed, Oct 21, 2020 at 9:17 PM Colin McCabe  wrote:
> >
> > > +1 (binding).  I think we should consider doing this in 3.0 rather than
> > > 2.8, though, unless we are really confident that it is 100% compatible.
> > >
> > > I wasn't able to find much information on how compatible the new API
> > > bridge is, but the log4j website does have this:
> > >
> > > > Basic compatibility with Log4j 1.x is provided through the
> log4j12-api
> > > component,
> > > > however it does not implement some of the very implementation
> specific
> > > > classes and methods
> > >
> > > best,
> > > Colin
> > >
> > >
> > > On Fri, Oct 9, 2020, at 02:51, Tom Bentley wrote:
> > > > +1 non-binding.
> > > >
> > > > Thanks for your efforts on this Dongjin.
> > > >
> > > > Tom
> > > >
> > > > On Wed, Oct 7, 2020 at 6:45 AM Dongjin Lee 
> wrote:
> > > >
> > > > > As of present:
> > > > >
> > > > > - Binding: +2 (Gwen, John)
> > > > > - Non-binding: +1 (David)
> > > > >
> > > > > Now we need one more binding +1.
> > > > >
> > > > > Thanks,
> > > > > Dongjin
> > > > >
> > > > > On Wed, Oct 7, 2020 at 1:37 AM David Jacot 
> > > wrote:
> > > > >
> > > > > > Thanks for driving this, Dongjin!
> > > > > >
> > > > > > The KIP looks good to me. I’m +1 (non-binding).
> > > > > >
> > > > > > Best,
> > > > > > David
> > > > > >
> > > > > > Le mar. 6 oct. 2020 à 17:23, Dongjin Lee  a
> > > écrit :
> > > > > >
> > > > > > > As of present:
> > > > > > >
> > > > > > > - Binding: +2 (Gwen, John)
> > > > > > > - Non-binding: 0
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Dongjin
> > > > > > >
> > > > > > > On Sat, Oct 3, 2020 at 10:51 AM John Roesler <
> > vvcep...@apache.org>
> > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for the KIP, Dongjin!
> > > > > > > >
> > > > > > > > I’ve just reviewed the KIP document, and it looks good to me.
> > > > > > > >
> > > > > > > > I’m +1 (binding)
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > John
> > > > > > > >
> > > > > > > > On Fri, Oct 2, 2020, at 19:11, Gwen Shapira wrote:
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > A very welcome update :)
> > > > > > > > >
> > > > > > > > > On Tue, Sep 22, 2020 at 9:09 AM Dongjin Lee <
> > > dong...@apache.org>
> > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > Hi devs,
> > > > > > > > > >
> > > > > > > > > > Here I open the vote for KIP-653: Upgrade log4j to
> log4j2.
> > It
> > > > > > > replaces
> > > > > > > > the
> > > > > > > > > > obsolete log4j logging library into the current standard,
> > > log4j2,
> > > > > > > with
> > > > > > > > > > maintaining backward-compatibility.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Dongjin
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > *Dongjin Lee*
> > > > > > > > > >
> > > > > > > 

Re: [VOTE] KIP-653: Upgrade log4j to log4j2

2022-03-21 Thread Ismael Juma
Hi Edoardo,

Thanks for the information. That's definitely useful. A couple of questions
for you and the rest of the group:

1. Did you test the branch using log4j 1.x configs?
2. Given the release of https://github.com/qos-ch/reload4j, does it really
make sense to force breakage on users in a minor release? Would it not be
better to use reload4j in Kafka 3.2 and log4j 2 in Kafka 4.0?

Thanks,
Ismael

On Mon, Mar 21, 2022 at 8:16 AM Edoardo Comar  wrote:

> Hi Ismael and Luke,
> we've tested Dongjin code - porting her preview releases and PR to
> different Kafka code levels (2.8.1+, 3.1.0+, trunk).
> We're happy with it and would love it if her PR was merged in 3.2.0.
>
> To chime in on the issue of compatibility, as we have experienced it, the
> main limitation of the log4j-1.2-api.jar 'bridge' jar is in the support for
> custom Appenders, Filters and Layouts.
> If you're using such components, they may need to be rewritten to the
> Log4j2 spec and correspondingly use the configuration file in log4j2 format
> (and referenced with the log4j2 system property).
> Details at
> https://logging.apache.org/log4j/2.x/manual/migration.html#ConfigurationCompatibility
> and
> https://logging.apache.org/log4j/2.x/manual/migration.html#Log4j1.2BridgeLimitations
>
> I think that the above information should find its way in the KIP's
> compatibility section.
>
> HTH
> Edo
> --
> Edoardo Comar
> Event Streams for IBM Cloud
>
>
> 
> From: Luke Chen 
> Sent: 18 March 2022 07:57
> To: dev 
> Subject: [EXTERNAL] Re: [VOTE] KIP-653: Upgrade log4j to log4j2
>
> Hi Dongjin,
>
> I know there are some discussions about the compatibility issue.
> Could you help answer this question?
>
> Thank you.
> Luke
>
> On Fri, Mar 18, 2022 at 3:32 AM Ismael Juma  wrote:
>
> > Hi all,
> >
> > The KIP compatibility section does not include enough detail. I am
> puzzled
> > how we voted +1 given that. I noticed that Colin indicated it would only
> be
> > acceptable in a major release unless the new version was fully compatible
> > (which it is not). Can we clarify what we actually voted for here?
> >
> > Ismael
> >
> > On Wed, Oct 21, 2020 at 6:41 PM Dongjin Lee  wrote:
> >
> > > Hi All,
> > >
> > > As of present:
> > >
> > > - Binding: +3 (Gwen, John, Colin)
> > > - Non-binding: +1 (David, Tom)
> > >
> > > This KIP is now accepted. Thanks for your votes!
> > >
> > > @Colin Sure, I have some plan for providing a compatibility preview.
> > Let's
> > > continue in the discussion thread.
> > >
> > > All other voters not in KIP-676 Vote thread: KIP-676 (by Tom) is a
> > > prerequisite of this KIP. Please have a look at that proposal and vote
> > for
> > > it.
> > >
> > > Best,
> > > Dongjin
> > >
> > > On Wed, Oct 21, 2020 at 9:17 PM Colin McCabe 
> wrote:
> > >
> > > > +1 (binding).  I think we should consider doing this in 3.0 rather
> than
> > > > 2.8, though, unless we are really confident that it is 100%
> compatible.
> > > >
> > > > I wasn't able to find much information on how compatible the new API
> > > > bridge is, but the log4j website does have this:
> > > >
> > > > > Basic compatibility with Log4j 1.x is provided through the
> > log4j12-api
> > > > component,
> > > > > however it does not implement some of the very implementation
> > specific
> > > > > classes and methods
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Fri, Oct 9, 2020, at 02:51, Tom Bentley wrote:
> > > > > +1 non-binding.
> > > > >
> > > > > Thanks for your efforts on this Dongjin.
> > > > >
> > > > > Tom
> > > > >
> > > > > On Wed, Oct 7, 2020 at 6:45 AM Dongjin Lee 
> > wrote:
> > > > >
> > > > > > As of present:
> > > > > >
> > > > > > - Binding: +2 (Gwen, John)
> > > > > > - Non-binding: +1 (David)
> > > > > >
> > > > > > Now we need one more binding +1.
> > > > > >
> > > > > > Thanks,
> > > > > > Dongjin
> > > > > >
> > > > > > On Wed, Oct 7, 2020 at 1:37 AM David Jacot <
> david.ja...@gmail.com>
> > > > wrote:
> > > > > >
> > > > > > > Thanks for driving this, Dongjin!
> > > > > > >
> > > > > > > The KIP looks good to me. I’m +1 (non-binding).
> > > > > > >
> > > > > > > Best,
> > > > > > > David
> > > > > > >
> > > > > > > Le mar. 6 oct. 2020 à 17:23, Dongjin Lee 
> a
> > > > écrit :
> > > > > > >
> > > > > > > > As of present:
> > > > > > > >
> > > > > > > > - Binding: +2 (Gwen, John)
> > > > > > > > - Non-binding: 0
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Dongjin
> > > > > > > >
> > > > > > > > On Sat, Oct 3, 2020 at 10:51 AM John Roesler <
> > > vvcep...@apache.org>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Thanks for the KIP, Dongjin!
> > > > > > > > >
> > > > > > > > > I’ve just reviewed the KIP document, and it looks good to
> me.
> > > > > > > > >
> > > > > > > > > I’m +1 (binding)
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > John
> > > > > > > > >
> > > > > > > > > On Fri, Oct 2, 2020, at 19:11, Gwen Shapira wrote

Article on kafka.apache.org

2022-03-21 Thread Leah Atkins
Hi!

Sorry for interrupting you.

Our expert can write an article and publish it on kafka.apache.org.
Is it possible?

Thanks.

Kind regards, Leah Atkins


Re: [DISCUSS] KIP-651 - Support PEM format for SSL certificates and

2022-03-21 Thread Rajini Sivaram
For the background on the current implementation: We use Java's keystore
loading for JKS/PKCS12 keystore files and these files require passwords. We
retained the same requirement for PEM files as well for consistency, even
though it is not a language restriction anymore. When PEM keys are provided
as actual config values rather than files, we do allow non-encrypted keys
with the expectation that config externalization will be used to protect
these, similar to SASL configs (e.g. private keys may be in vault or
encrypted using a secure config provider). We could take the view that file
protection should be up to the user as well and permit non-encrypted keys,
making it simpler to adopt PEM. But since this is a deviation from the
other SSL file configs, it will be good to know what others think.

Regards,

Rajini


On Mon, Mar 21, 2022 at 2:16 PM Dejan Maric  wrote:

> Regarding this KIP,  we've created an MR that removes the need for
> specifying *keyPassword *when PEM certificates and private key are provided
> as files: https://github.com/apache/kafka/pull/11916
>
> We think that Kafka should not enforce the use of passwords on private
> keys. It would be sufficient to *recommend* encrypted private keys.
>
> The reason we suggested the PR is because this issue is currently blocking
> us from using PEM-formatted certificates issued by cert-manager
>  with Kafka - since
> cert-manager does not support encrypted private keys.
>
> We asked one of the main cert-manager maintainers about this, and got the
> following comments:
>
> >There is a benefit to encrypting this stuff if the cert is being persisted
> to disk and if the decryption key isn't written to the same disk, which is
> probably a pretty common setup in a lot of older non-cloud systems. The
> issue in a lot of more modern or cloud environments - especially in k8s -
> is that certs are ideally never going to be written to disk and wherever
> they are stored, they'd likely be stored next to their decryption keys.
>
> >
> >So I guess everyone's right - in some environments encrypting the key is
> probably worth it, and in some it's a false sense of security and wasted
> CPU cycles. That to me means that being able to choose to not encrypt the
> key is desirable.
> >(This is assuming that the encryption is actually secure, which might not
> be true for all methods of encrypting private keys)
>
>
> We'd like to hear other opinions on this.
>
> Dejan
>
>
> On 2020/08/03 11:28:42 Rajini Sivaram wrote:
> > Hi all,
> >
> > I have submitted KIP-651 to support PEM format for SSL key and trust
> stores:
> >
> >-
> >
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-651+-+Support+PEM+format+for+SSL+certificates+and+private+key
>
> >
> > This enables better integration with other Kafka features like dynamic
> > config update and password protection.
> >
> > Feedback and suggestions are welcome.
> >
> > Thank you...
> >
> > Regards,
> >
> > Rajini
> >
>


[jira] [Resolved] (KAFKA-13728) PushHttpMetricsReporter no longer pushes metrics when network failure is recovered.

2022-03-21 Thread Guozhang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Guozhang Wang resolved KAFKA-13728.
---
Fix Version/s: 3.2.0
   Resolution: Fixed

> PushHttpMetricsReporter no longer pushes metrics when network failure is 
> recovered.
> ---
>
> Key: KAFKA-13728
> URL: https://issues.apache.org/jira/browse/KAFKA-13728
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 3.1.0
>Reporter: XiaoyiPeng
>Priority: Minor
> Fix For: 3.2.0
>
>
> The class *PushHttpMetricsReporter* no longer pushes metrics when network 
> failure is recovered.
> I debugged the code and found the problem here :
> [https://github.com/apache/kafka/blob/dc36dedd28ff384218b669de13993646483db966/tools/src/main/java/org/apache/kafka/tools/PushHttpMetricsReporter.java#L214-L221]
>  
> When we submit a task to the *ScheduledThreadPoolExecutor* that needs to be 
> executed periodically, if the task throws an exception and is not swallowed, 
> the task will no longer be scheduled to execute.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Jenkins build is unstable: Kafka » Kafka Branch Builder » 3.2 #1

2022-03-21 Thread Apache Jenkins Server
See 




Re: [DISCUSS] KIP-714: Client metrics and observability

2022-03-21 Thread Jun Rao
Hi, Kirk, Sarat,

A few more comments.

40. GetTelemetrySubscriptionsResponseV0 : RequestedMetrics Array[string]
uses "Array[0] empty string" to represent all metrics subscribed. We had a
similar issue with the topics field in MetadataRequest and used the
following convention.
In version 1 and higher, an empty array indicates "request metadata for no
topics," and a null array is used to indicate "request metadata for all
topics."
Should we use the same convention in GetTelemetrySubscriptionsResponseV0?

41. We include CompressionType in PushTelemetryRequestV0, but not in
ClientTelemetryPayload. How would the implementer know the compression type
for the telemetry payload?

42. For blocking the metrics for certain clients in the following example,
could you describe the corresponding config value used through the
kafka-config command?
kafka-client-metrics.sh --bootstrap-server $BROKERS \
   --add \
   --name 'Disabe_b69cc35a' \  # A descriptive name makes it easier to
clean up old subscriptions.
   --match client_instance_id=b69cc35a-7a54-4790-aa69-cc2bd4ee4538 \  #
Match this specific client instance
   --block

Thanks,

Jun


On Thu, Mar 10, 2022 at 11:57 AM Jun Rao  wrote:

> Hi, Kirk, Sarat,
>
> Thanks for the reply.
>
> 28. On the broker, we typically use Yammer metrics. Only for metrics that
> depend on Kafka metric features (e.g., quota), we use the Kafka metric.
> Yammer metrics have 4 types: gauge, meter, histogram and timer. meter
> calculates a rate, but also exposes an accumulated value.
>
> 29. The Histogram class in org.apache.kafka.common.metrics.stats was never
> used in the client metrics. The implementation of Histogram only provides a
> fixed number of values in the domain and may not capture the quantiles very
> accurately. So, we punted on using it.
>
> Thanks,
>
> Jun
>
>
>
> On Thu, Mar 10, 2022 at 10:59 AM Sarat Kakarla
>  wrote:
>
>> Jun,
>>
>>   >>  28. For the broker metrics, could you spell out the full metric name
>>   >>   including groups, tags, etc? We typically don't add the broker_id
>> label for
>>   >>   broker metrics. Also, brokers use Yammer metrics, which doesn't
>> have type
>>   >>   Sum.
>>
>> Sure,  I will update the KIP-714 with the above information, will remove
>> the broker-id label from the metrics.
>>
>> Regarding the type is CumulativeSum the right type to use in the place of
>> Sum?
>>
>> Thanks
>> Sarat
>>
>>
>> On 3/8/22, 5:48 PM, "Jun Rao"  wrote:
>>
>> Hi, Magnus, Sarat and Xavier,
>>
>> Thanks for the reply. A few more comments below.
>>
>> 20. It seems that we are piggybacking the plugin on the
>> existing MetricsReporter. So, this seems fine.
>>
>> 21. That could work. Are we requiring any additional jar dependency
>> on the
>> client? Or, are you suggesting that we check the runtime dependency
>> to pick
>> the compression codec?
>>
>> 28. For the broker metrics, could you spell out the full metric name
>> including groups, tags, etc? We typically don't add the broker_id
>> label for
>> broker metrics. Also, brokers use Yammer metrics, which doesn't have
>> type
>> Sum.
>>
>> 29. There are several client metrics listed as histogram. However,
>> the java
>> client currently doesn't support histogram type.
>>
>> 30. Could you show an example of the metric payload in
>> PushTelemetryRequest
>> to help understand how we organize metrics at different levels (per
>> instance, per topic, per partition, per broker, etc)?
>>
>> 31. Could you add a bit more detail on which client thread sends the
>> PushTelemetryRequest?
>>
>> Thanks,
>>
>> Jun
>>
>> On Mon, Mar 7, 2022 at 11:48 AM Magnus Edenhill 
>> wrote:
>>
>> > Hi Jun,
>> >
>> > thanks for your initiated questions, see my answers below.
>> > There's been a number of clarifications to the KIP.
>> >
>> >
>> >
>> > Den tors 27 jan. 2022 kl 20:08 skrev Jun Rao
>> :
>> >
>> > > Hi, Magnus,
>> > >
>> > > Thanks for updating the KIP. The overall approach makes sense to
>> me. A
>> > few
>> > > more detailed comments below.
>> > >
>> > > 20. ClientTelemetry: Should it be extending configurable and
>> closable?
>> > >
>> >
>> > I'll pass this question to Sarat and/or Xavier.
>> >
>> >
>> >
>> > > 21. Compression of the metrics on the client: what's the default?
>> > >
>> >
>> > How about we specify a prioritized list: zstd, lz4, snappy, gzip?
>> > But ultimately it is up to what the client supports.
>> >
>> >
>> > 23. A client instance is considered a metric resource and the
>> > > resource-level (thus client instance level) labels could include:
>> > > client_software_name=confluent-kafka-python
>> > > client_software_version=v2.1.3
>> > > client_instance_id=B64CD139-3975-440A-91D4
>> > > transactional_id=someTxnApp
>> > > Are those labels added in PushTelemetryRequest? 

Re: [VOTE] KIP-820: Extend KStream process with new Processor API

2022-03-21 Thread Jorge Esteban Quilcate Otoya
Hi Dev team,

The vote for KIP-820 has passed with:

3 +1 (binding) votes from John, Matthias, and Guozhang.

This KIP is now accepted. Thanks for your feedback and votes!

Jorge.

On Thu, 17 Mar 2022 at 17:30, Guozhang Wang  wrote:

> Reviewed the updated docs, and recasting my +1 vote again, thanks Jorge for
> pushing it through!
>
> Guozhang
>
> On Tue, Mar 15, 2022 at 4:28 PM Matthias J. Sax  wrote:
>
> > +1 (binding)
> >
> > Thanks for pushing this through. Was a difficult discussion!
> >
> >
> > -Matthias
> >
> > On 3/15/22 10:01 AM, John Roesler wrote:
> > > Thanks for the update, Jorge!
> > >
> > > I’m still +1 (binding)
> > >
> > > Thanks,
> > > John
> > >
> > > On Thu, Feb 17, 2022, at 12:57, Guozhang Wang wrote:
> > >> Thanks Jorge, overall looks good to me.
> > >>
> > >> Maybe we can clarify a bit in the wiki that the reason we have to not
> > >> include the additional `final String... stateStoreNames` params in the
> > new
> > >> `process` API is that we need to have overloaded functions which takes
> > >> `ProcessorSupplier<...> ` where the output types are not `Void`, but
> > due to
> > >> type eraser we cannot distinguish the new overloaded function
> signatures
> > >> with the old ones if they also include `final String...
> > stateStoreNames`.
> > >> And in javadocs explains that if users want to connect state stores to
> > this
> > >> processor, they could use the `connectState` API instead.
> > >>
> > >> Otherwise, I'm +1.
> > >>
> > >> Guozhang
> > >>
> > >> On Tue, Feb 15, 2022 at 11:54 AM John Roesler 
> > wrote:
> > >>
> > >>> Thanks, Jorge!
> > >>>
> > >>> I'm +1 (binding)
> > >>>
> > >>> -John
> > >>>
> > >>> On Tue, 2022-02-15 at 19:16 +, Jorge Esteban Quilcate
> > >>> Otoya wrote:
> >  Hi all,
> > 
> >  I'd like to start a vote on KIP-820 which proposes extending KStream
> > to
> > >>> use
> >  the new Processor API
> > 
> > >>>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-820%3A+Extend+KStream+process+with+new+Processor+API
> > 
> >  Thanks,
> >  Jorge
> > >>>
> > >>>
> > >>
> > >> --
> > >> -- Guozhang
> >
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-634: Complementary support for headers in Kafka Streams DSL

2022-03-21 Thread Jorge Esteban Quilcate Otoya
Hi all,

With the acceptance of KIP-820 which will enable easier access to the
Record's metadata and headers and the potential design of a new version of
the DSL, I will set this KIP as inactive/dormant for the time being.

Thanks, everyone for the great discussions!
Jorge.

On Wed, 16 Feb 2022 at 00:08, Matthias J. Sax  wrote:

> Sorry for playing devil's advocate, but do we really think it's a good
> way to design the API? To me, if feels like a cumbersome workaround.
>
> Personally, I believe that we are hitting a point for the DSL that
> requires a redesign from scratch. When the DSL was designed 5 years ago,
> record timestamp was just newly added (and did not play a significant
> role yet), and there was no record headers. That's why we have a
> kv-based model with `KStream` and `KTable` types.
>
> Given the changes in Kafka (and Kafka Streams) that accumulated over the
> last 5 years, it seems that a better API would be to have a
> KStream>` and `KTable>` model.
>
> I also think that we should clearly separate (modifiable) data (key,
> value, timestamp, headers) from (immutable) meta-data (topic name,
> partition, offset). And to go one step further, I am not even sure it
> meta-data makes much sense for non-source records (e.g., `offset` is
> more or less useless after a flatMap(), aggregation() -- similar for
> topic and partition after a join()). It would make sense to me, to take
> considerations like this into account designing the API.
>
> I would propose to not move forward with a "hacky design" (sorry for the
> strong terminology...) but to starting a DSL 2.0 discussion to come up
> with a long term sensible re-design. Of course, it would be a much
> larger effort, and might not provide a short term fix.
>
> A DSL 2.0 would not only allow us to add support for headers, but also
> to fix many other issue in the DSL that are not easily fixed without
> breaking compatibility (one example is KIP-300:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-300%3A+Add+Windowed+KTable+API+in+StreamsBuilder
> )
>
> Thoughts?
>
>
> -Matthias
>
> On 2/15/22 11:55 AM, John Roesler wrote:
> > Thanks for the update, Jorge,
> >
> > I've just looked over the KIP again. Just one more small
> > concern:
> >
> > 5) We can't just change the type of Record#headers() to a
> > new fully qualified type. That would be a source-
> > incompatible breaking change for users.
> >
> > Out options are:
> > * Deprecate the existing method and create a new one with
> > the new type
> > * If the existing Headers is "not great but ok", then maybe
> > we leave it alone.
> >
> > Thanks,
> > -John
> >
> >
> > On Fri, 2022-02-11 at 20:40 +, Jorge Esteban Quilcate
> > Otoya wrote:
> >> John and team,
> >>
> >> The following changes have been applied to the KIP following your
> feedback:
> >>
> >> - Leverage `Record` instead of introducing a new type
> >> (`RecordValue`).
> >> - `RecordSerde` for stateful operations using `Record` as
> value.
> >> - Extend `Record` to:
> >>- Implement `RecordMetadata` to expose `topic`, `partition`, and
> `offset`
> >>- Use `Headers` abstraction introduce on this KIP instead of core one
> >>
> >> KIP:
> >>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-634%3A+Complementary+support+for+headers+and+record+metadata+in+Kafka+Streams+DSL
> >>
> >> Looking forward to your feedback.
> >>
> >> Have a great weekend!
> >>
> >> On Thu, 10 Feb 2022 at 13:15, Jorge Esteban Quilcate Otoya <
> >> quilcate.jo...@gmail.com> wrote:
> >>
>  What do you think about instead adding topic and
> >>> partition to Record?
> >>>
> >>> This is a very interesting idea. Forgot to consider this addition from
> >>> KIP-478.
> >>>
> >>> `Record` would also require `offset`. Maybe implementing
> `RecordMetadata`
> >>> and adding these fields as part of the constructor to keep them
> immutable
> >>> in comparison to the other fields?
> >>> It would also need to change `Record`'s headers type to the new one
> >>> proposed on this KIP.
> >>>
> >>> Let me explore this approach in more detail and update the KIP.
> >>>
>  I find the name "mapRecordValue" to be a bit confusing
> >>>because it seems like it would map the value of a record.
> >>>What do you think about "mapValueToRecord" instead?
> >>>
> >>> Agree. It will depend on how we solve 1). If we end up using `Record`
> then
> >>> `mapValueToRecord` will make even more sense.
> >>>
>  I agree with adding the serde explicitly. However, it
> >>> would be good to state whether and when we'll automatically
> >>> wrap a value serde. For example, if the value serde is known
> >>> (or if we're using a default serde from the config), will
> >>> Streams automatically wrap it downstream of the record-
> >>> mapping operator?
> >>>
> >>> Good point. The goal is as you describe it: only when
> `mapValueToRecord`
> >>> is called, the Serde will be implicitly applied.
> >>> Will make this explicit on the KIP.
> >>>
> >>>
> >>> On Wed, 9 Feb 2022 at 20:0

[jira] [Resolved] (KAFKA-6718) Rack Aware Stand-by Task Assignment for Kafka Streams

2022-03-21 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-6718.
--
Resolution: Fixed

> Rack Aware Stand-by Task Assignment for Kafka Streams
> -
>
> Key: KAFKA-6718
> URL: https://issues.apache.org/jira/browse/KAFKA-6718
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Deepak Goyal
>Assignee: Levani Kokhreidze
>Priority: Major
>  Labels: kip
> Fix For: 3.2.0
>
>
> |Machines in data centre are sometimes grouped in racks. Racks provide 
> isolation as each rack may be in a different physical location and has its 
> own power source. When tasks are properly replicated across racks, it 
> provides fault tolerance in that if a rack goes down, the remaining racks can 
> continue to serve traffic.
>   
>  This feature is already implemented at Kafka 
> [KIP-36|https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment]
>  but we needed similar for task assignments at Kafka Streams Application 
> layer. 
>   
>  This features enables replica tasks to be assigned on different racks for 
> fault-tolerance.
>  NUM_STANDBY_REPLICAS = x
>  totalTasks = x+1 (replica + active)
>  # If there are no rackID provided: Cluster will behave rack-unaware
>  # If same rackId is given to all the nodes: Cluster will behave rack-unaware
>  # If (totalTasks <= number of racks), then Cluster will be rack aware i.e. 
> each replica task is each assigned to a different rack.
>  # Id (totalTasks > number of racks), then it will first assign tasks on 
> different racks, further tasks will be assigned to least loaded node, cluster 
> wide.|
> We have added another config in StreamsConfig called "RACK_ID_CONFIG" which 
> helps StickyPartitionAssignor to assign tasks in such a way that no two 
> replica tasks are on same rack if possible.
>  Post that it also helps to maintain stickyness with-in the rack.|



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13587) Implement unclean leader election in KIP-704

2022-03-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13587.

Resolution: Fixed

> Implement unclean leader election in KIP-704
> 
>
> Key: KAFKA-13587
> URL: https://issues.apache.org/jira/browse/KAFKA-13587
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13682) Implement auto preferred leader election in KRaft Controller

2022-03-21 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13682.

Resolution: Fixed

> Implement auto preferred leader election in KRaft Controller
> 
>
> Key: KAFKA-13682
> URL: https://issues.apache.org/jira/browse/KAFKA-13682
> Project: Kafka
>  Issue Type: Task
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>  Labels: kip-500
> Fix For: 3.2.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Apache Kafka 3.2.0 release

2022-03-21 Thread Bruno Cadonna

Hi Kafkateers,

Last week we reached feature freeze for the next major release of Apache 
Kafka.


I cut the release branch for 3.2 and bumped trunk to 3.3.0-SNAPSHOT. 
From this point, commits that should go into 3.2.0 should be merged 
into trunk and cherry-picked to 3.2.



I removed the KIPs for which I think the main PRs were not merged from 
the release plan for AK 3.2.0. Please, verify the plan and let me know 
if I removed a KIP that should not have been removed.


Until code freeze on 30 March, please help with fixing flaky tests, 
finding bugs, testing, and updating documentation.


You can find the list of open issues for 3.2.0 under the following link:

https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.2.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC%20%20%20%20%20%20%20%20%20 




Best,
Bruno

On 15.03.22 15:11, Bruno Cadonna wrote:

Hi all,

A quick reminder that feature freeze for Apache 3.2.0 is tomorrow. 
Please make sure to get your features merged into trunk.


I will cut the release branch on Monday.

Best,
Bruno

On 07.03.22 15:03, Bruno Cadonna wrote:

Hi Kafkateers,

Last week we reached KIP freeze for the next major release 3.2.0 of 
Apache Kafka.


I have updated the release plan for AK 3.2.0 with all the KIPs that 
passed the vote last week.


Please, verify the plan and let me know if any KIP should be added
to or removed from the release plan.

For the KIPs which are still in progress, please work closely with your
reviewers to make sure that they land on time for the feature freeze.

The next milestone for the AK 3.2.0 release is feature freeze on March 
16th 2022.


Best,
Bruno

On 01.03.22 17:41, Bruno Cadonna wrote:

Hi all,

A quick reminder that KIP freeze for the Apache 3.2.0 is tomorrow. 
Please make sure to close your votes if you want to add a KIP to the 
release plan.


Best,
Bruno

On 15.02.22 12:37, Bruno Cadonna wrote:

Hi all,

I published a release plan for the Apache Kafka 3.2.0 release here:
https://cwiki.apache.org/confluence/display/KAFKA/Release+Plan+3.2.0

KIP Freeze: 2 March 2022
Feature Freeze: 16 March 2022
Code Freeze:    30 March 2022

At least two weeks of stabilization will follow Code Freeze.

Please let me know if should add or remove KIPs from the plan or if 
you have any other objections.


Best,
Bruno


On 04.02.22 16:03, Bruno Cadonna wrote:

Hi,

I'd like to volunteer to be the release manager for our next
feature release, 3.2.0. If there are no objections, I'll send
out the release plan soon.

Best,
Bruno


[VOTE] KIP-794: Strictly Uniform Sticky Partitioner

2022-03-21 Thread Artem Livshits
Hi all,

I'd like to start a vote on
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner
.

-Artem


Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread Hao Li
Hi Guozhang,

Thanks for the feedback.

1. I agree to have an `Emitted` control class and two static constructors
named `onWindowClose` and `onEachUpdate`.

2. For the API function changes, I'm thinking of adding a new function
called `trigger` to `TimeWindowedKStream` and `SessionWindowedKStream`. It
takes `Emitted` config and returns the same stream. Example:

stream
  .groupBy(...)
  .windowedBy(...)
  .trigger(Emitted.onWindowClose). // N
  .count()

The benefits are:
  1. It's simple and avoids creating overloading of existing functions like
`windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it to
`aggregate` functions, we need to add it to all existing `count`,
`aggregate` overloading functions which is a lot.
  2. It operates directly on windowed kstream and tells how its output
should be configured, if later we need to add this other type of streams,
we can reuse same `trigger` API whereas other type of streams/tables may
not have `aggregate`, `windowedby` api to make it consistent.

Hao


On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang  wrote:

> Hello Hao,
>
> I'm preferring option 2 over the other options mainly because the added
> config object could potentially be used in other operators as well (not
> necessarily has to be a windowed operator and hence have to be piggy-backed
> on `windowedBy`, and that's also why I suggested not naming it
> `WindowConfig` but just `EmitConfig`).
>
> As for Matthias' question, I think the difference between the windowed
> aggregate operator and the stream-stream join operator is that, for the
> latter we think emit-final should be the only right emitting policy and
> hence we should not let users to configure it. If users configure it to
> e.g. emit eager they may get the old spurious emitting behavior which is
> violating the semantics.
>
> For option 2) itself, I have a few more thoughts:
>
> 1. Thinking about Matthias' suggestions, I'm also leaning a bit
> towards adding the new param in the overloaded `aggregate`, than the
> overloaded `windowBy` function. The reason is that the emitting logic could
> be either window based or non-window based, in the long run. Though for
> this KIP we could just add it in `XXXWindowedKStream.aggregate()`, we may
> want to extend to other non-windowed operators in the future.
> 2. To be consistent with other control class names, I feel maybe we can
> name it "Emitted", not "EmitConfig".
> 3. Following the first comment, I think we can have the static constructor
> names as "onWindowClose" and "onEachUpdate".
>
> The resulted code pattern would be like this:
>
>stream
>  .groupBy(..)
>  .windowBy(TimeWindow..)
>  .count(Emitted.onWindowClose)
>
> WDYT?
>
>
> On Wed, Mar 16, 2022 at 12:07 PM Matthias J. Sax  wrote:
>
> > >> `allowedLateness` may not be a good name. What I have in mind is to
> use
> > >> this to control how frequently we try to emit final results. Maybe
> it's
> > >> more flexible to be used as config in properties as we don't need to
> > >> recompile DSL to change it.
> >
> > I see; making it a config seems better. Frankly, I am not even sure if
> > we need a config at all or if we can just hard code it? For the
> > stream-stream join left/outer join fix, there is only an internal config
> > but no public config either.
> >
> > Option 1: Your proposal is?
> >
> >stream
> >  .groupByKey()
> >  .windowBy(TimeWindow.ofSizeNoGrace(...))
> >  .configure(EmitConfig.emitFinal()
> >  .count()
> >
> > Does not change my argument that it seems to be misplace from an API
> > flow POV.
> >
> > Option 1 seems to be the least desirable to me.
> >
> > For option 2 and 3, and not sure which one I like better. Might be good
> > if other could chime in, too. I think I slightly prefer option 2 over
> > option 3.
> >
> >
> > -Matthias
> >
> > On 3/15/22 5:33 PM, Hao Li wrote:
> > > Thanks for the feedback Matthias.
> > >
> > > `allowedLateness` may not be a good name. What I have in mind is to use
> > > this to control how frequently we try to emit final results. Maybe it's
> > > more flexible to be used as config in properties as we don't need to
> > > recompile DSL to change it.
> > >
> > > For option 1, I intend to use `emitFinal` to configure how
> > > `TimeWindowedKStream` should be outputted to `KTable` after
> aggregation.
> > > But `emitFinal` is not an action to the `TimeWindowedKStream`
> interface.
> > > Maybe adding `configure(EmitConfig config)` makes more sense?
> > >
> > > For option 2, config can be created using `WindowConfig.emitFinal()` or
> > > `EmitConfig.emitFinal`
> > >
> > > For option 3, it will be something like `TimeWindows(..., EmitConfig
> > > emitConfig)`.
> > >
> > > For putting `EmitConfig` in aggregation operator, I think it doesn't
> > > control how we do aggregation but how we output to `KTable`. That's
> why I
> > > feel option 1 makes more sense as it applies to `TimeWindowedKStream`.
> > But
> > > I'm also OK with option 2.
> > >
> > > Hao
>

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread Guozhang Wang
Hi Hao,

For 2), I think it's a good idea in general to use a separate function on
the Time/SessionWindowedKStream itself, to achieve the same effect that,
for now, the emitting control is only for windowed aggregations as in this
KIP, than overloading existing functions. We can discuss further about the
actual function names, whether others like the name `trigger` or not. As
for myself I feel `trigger` is a good one but I'd like to see if others
have opinions as well.


Guozhang

On Mon, Mar 21, 2022 at 5:18 PM Hao Li  wrote:

> Hi Guozhang,
>
> Thanks for the feedback.
>
> 1. I agree to have an `Emitted` control class and two static constructors
> named `onWindowClose` and `onEachUpdate`.
>
> 2. For the API function changes, I'm thinking of adding a new function
> called `trigger` to `TimeWindowedKStream` and `SessionWindowedKStream`. It
> takes `Emitted` config and returns the same stream. Example:
>
> stream
>   .groupBy(...)
>   .windowedBy(...)
>   .trigger(Emitted.onWindowClose). // N
>   .count()
>
> The benefits are:
>   1. It's simple and avoids creating overloading of existing functions like
> `windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it to
> `aggregate` functions, we need to add it to all existing `count`,
> `aggregate` overloading functions which is a lot.
>   2. It operates directly on windowed kstream and tells how its output
> should be configured, if later we need to add this other type of streams,
> we can reuse same `trigger` API whereas other type of streams/tables may
> not have `aggregate`, `windowedby` api to make it consistent.
>
> Hao
>
>
> On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang  wrote:
>
> > Hello Hao,
> >
> > I'm preferring option 2 over the other options mainly because the added
> > config object could potentially be used in other operators as well (not
> > necessarily has to be a windowed operator and hence have to be
> piggy-backed
> > on `windowedBy`, and that's also why I suggested not naming it
> > `WindowConfig` but just `EmitConfig`).
> >
> > As for Matthias' question, I think the difference between the windowed
> > aggregate operator and the stream-stream join operator is that, for the
> > latter we think emit-final should be the only right emitting policy and
> > hence we should not let users to configure it. If users configure it to
> > e.g. emit eager they may get the old spurious emitting behavior which is
> > violating the semantics.
> >
> > For option 2) itself, I have a few more thoughts:
> >
> > 1. Thinking about Matthias' suggestions, I'm also leaning a bit
> > towards adding the new param in the overloaded `aggregate`, than the
> > overloaded `windowBy` function. The reason is that the emitting logic
> could
> > be either window based or non-window based, in the long run. Though for
> > this KIP we could just add it in `XXXWindowedKStream.aggregate()`, we may
> > want to extend to other non-windowed operators in the future.
> > 2. To be consistent with other control class names, I feel maybe we can
> > name it "Emitted", not "EmitConfig".
> > 3. Following the first comment, I think we can have the static
> constructor
> > names as "onWindowClose" and "onEachUpdate".
> >
> > The resulted code pattern would be like this:
> >
> >stream
> >  .groupBy(..)
> >  .windowBy(TimeWindow..)
> >  .count(Emitted.onWindowClose)
> >
> > WDYT?
> >
> >
> > On Wed, Mar 16, 2022 at 12:07 PM Matthias J. Sax 
> wrote:
> >
> > > >> `allowedLateness` may not be a good name. What I have in mind is to
> > use
> > > >> this to control how frequently we try to emit final results. Maybe
> > it's
> > > >> more flexible to be used as config in properties as we don't need to
> > > >> recompile DSL to change it.
> > >
> > > I see; making it a config seems better. Frankly, I am not even sure if
> > > we need a config at all or if we can just hard code it? For the
> > > stream-stream join left/outer join fix, there is only an internal
> config
> > > but no public config either.
> > >
> > > Option 1: Your proposal is?
> > >
> > >stream
> > >  .groupByKey()
> > >  .windowBy(TimeWindow.ofSizeNoGrace(...))
> > >  .configure(EmitConfig.emitFinal()
> > >  .count()
> > >
> > > Does not change my argument that it seems to be misplace from an API
> > > flow POV.
> > >
> > > Option 1 seems to be the least desirable to me.
> > >
> > > For option 2 and 3, and not sure which one I like better. Might be good
> > > if other could chime in, too. I think I slightly prefer option 2 over
> > > option 3.
> > >
> > >
> > > -Matthias
> > >
> > > On 3/15/22 5:33 PM, Hao Li wrote:
> > > > Thanks for the feedback Matthias.
> > > >
> > > > `allowedLateness` may not be a good name. What I have in mind is to
> use
> > > > this to control how frequently we try to emit final results. Maybe
> it's
> > > > more flexible to be used as config in properties as we don't need to
> > > > recompile DSL to change it.
> > > >
> > > > For option 1, I intend to use `emitF

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread John Roesler
Hi all,

Thanks for the Kip, Hao!

For what it’s worth, I’m also in favor of your latest framing of the API,

I think the name is fine. I assume it’s inspired by Flink? It’s not identical 
to the concept of a trigger in Flink, which specifies when to evaluate the 
window, which might be confusing to some people who have deep experience with 
Flink. Then again, it seems close enough that it should be clear to casual 
Flink users. For people with no other stream processing experience, it might 
seem a bit esoteric compared to something self-documenting like ‘emit()’, but 
the docs should  make it clear. 

One small question: it seems like this proposal is identical to 
Suppressed.untilWindowClose, and the KIP states that this API is superior. In 
that case, should we deprecate Suppressed.untilWindowClose?

Thanks,
John

On Mon, Mar 21, 2022, at 19:30, Guozhang Wang wrote:
> Hi Hao,
>
> For 2), I think it's a good idea in general to use a separate function on
> the Time/SessionWindowedKStream itself, to achieve the same effect that,
> for now, the emitting control is only for windowed aggregations as in this
> KIP, than overloading existing functions. We can discuss further about the
> actual function names, whether others like the name `trigger` or not. As
> for myself I feel `trigger` is a good one but I'd like to see if others
> have opinions as well.
>
>
> Guozhang
>
> On Mon, Mar 21, 2022 at 5:18 PM Hao Li  wrote:
>
>> Hi Guozhang,
>>
>> Thanks for the feedback.
>>
>> 1. I agree to have an `Emitted` control class and two static constructors
>> named `onWindowClose` and `onEachUpdate`.
>>
>> 2. For the API function changes, I'm thinking of adding a new function
>> called `trigger` to `TimeWindowedKStream` and `SessionWindowedKStream`. It
>> takes `Emitted` config and returns the same stream. Example:
>>
>> stream
>>   .groupBy(...)
>>   .windowedBy(...)
>>   .trigger(Emitted.onWindowClose). // N
>>   .count()
>>
>> The benefits are:
>>   1. It's simple and avoids creating overloading of existing functions like
>> `windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it to
>> `aggregate` functions, we need to add it to all existing `count`,
>> `aggregate` overloading functions which is a lot.
>>   2. It operates directly on windowed kstream and tells how its output
>> should be configured, if later we need to add this other type of streams,
>> we can reuse same `trigger` API whereas other type of streams/tables may
>> not have `aggregate`, `windowedby` api to make it consistent.
>>
>> Hao
>>
>>
>> On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang  wrote:
>>
>> > Hello Hao,
>> >
>> > I'm preferring option 2 over the other options mainly because the added
>> > config object could potentially be used in other operators as well (not
>> > necessarily has to be a windowed operator and hence have to be
>> piggy-backed
>> > on `windowedBy`, and that's also why I suggested not naming it
>> > `WindowConfig` but just `EmitConfig`).
>> >
>> > As for Matthias' question, I think the difference between the windowed
>> > aggregate operator and the stream-stream join operator is that, for the
>> > latter we think emit-final should be the only right emitting policy and
>> > hence we should not let users to configure it. If users configure it to
>> > e.g. emit eager they may get the old spurious emitting behavior which is
>> > violating the semantics.
>> >
>> > For option 2) itself, I have a few more thoughts:
>> >
>> > 1. Thinking about Matthias' suggestions, I'm also leaning a bit
>> > towards adding the new param in the overloaded `aggregate`, than the
>> > overloaded `windowBy` function. The reason is that the emitting logic
>> could
>> > be either window based or non-window based, in the long run. Though for
>> > this KIP we could just add it in `XXXWindowedKStream.aggregate()`, we may
>> > want to extend to other non-windowed operators in the future.
>> > 2. To be consistent with other control class names, I feel maybe we can
>> > name it "Emitted", not "EmitConfig".
>> > 3. Following the first comment, I think we can have the static
>> constructor
>> > names as "onWindowClose" and "onEachUpdate".
>> >
>> > The resulted code pattern would be like this:
>> >
>> >stream
>> >  .groupBy(..)
>> >  .windowBy(TimeWindow..)
>> >  .count(Emitted.onWindowClose)
>> >
>> > WDYT?
>> >
>> >
>> > On Wed, Mar 16, 2022 at 12:07 PM Matthias J. Sax 
>> wrote:
>> >
>> > > >> `allowedLateness` may not be a good name. What I have in mind is to
>> > use
>> > > >> this to control how frequently we try to emit final results. Maybe
>> > it's
>> > > >> more flexible to be used as config in properties as we don't need to
>> > > >> recompile DSL to change it.
>> > >
>> > > I see; making it a config seems better. Frankly, I am not even sure if
>> > > we need a config at all or if we can just hard code it? For the
>> > > stream-stream join left/outer join fix, there is only an internal
>> config
>> > > but no public config ei

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread Guozhang Wang
Just my 2c: Suppressed is in `suppress` whose application scope is much
larger and hence more flexible. I.e. it can be used anywhere for a `KTable`
(but internally we would check whether certain emit policies like
`untilWindowClose` is valid or not), whereas `trigger` as for now is only
applicable in XXWindowedKStream. So I think it would not be completely
replacing Suppressed.untilWindowClose.

In the future, personally I'd still want to keep one control object still
for all emit policies, and maybe if we have extended Emitted for other
emitting policies covered by Suppressed today, we can discuss if we could
have `KTable.suppress(Emitted..)` replacing `KTable.suppress(Suppressed..)`
as a whole, but for this KIP I think it's too early.


Guozhang


On Mon, Mar 21, 2022 at 6:18 PM John Roesler  wrote:

> Hi all,
>
> Thanks for the Kip, Hao!
>
> For what it’s worth, I’m also in favor of your latest framing of the API,
>
> I think the name is fine. I assume it’s inspired by Flink? It’s not
> identical to the concept of a trigger in Flink, which specifies when to
> evaluate the window, which might be confusing to some people who have deep
> experience with Flink. Then again, it seems close enough that it should be
> clear to casual Flink users. For people with no other stream processing
> experience, it might seem a bit esoteric compared to something
> self-documenting like ‘emit()’, but the docs should  make it clear.
>
> One small question: it seems like this proposal is identical to
> Suppressed.untilWindowClose, and the KIP states that this API is superior.
> In that case, should we deprecate Suppressed.untilWindowClose?
>
> Thanks,
> John
>
> On Mon, Mar 21, 2022, at 19:30, Guozhang Wang wrote:
> > Hi Hao,
> >
> > For 2), I think it's a good idea in general to use a separate function on
> > the Time/SessionWindowedKStream itself, to achieve the same effect that,
> > for now, the emitting control is only for windowed aggregations as in
> this
> > KIP, than overloading existing functions. We can discuss further about
> the
> > actual function names, whether others like the name `trigger` or not. As
> > for myself I feel `trigger` is a good one but I'd like to see if others
> > have opinions as well.
> >
> >
> > Guozhang
> >
> > On Mon, Mar 21, 2022 at 5:18 PM Hao Li  wrote:
> >
> >> Hi Guozhang,
> >>
> >> Thanks for the feedback.
> >>
> >> 1. I agree to have an `Emitted` control class and two static
> constructors
> >> named `onWindowClose` and `onEachUpdate`.
> >>
> >> 2. For the API function changes, I'm thinking of adding a new function
> >> called `trigger` to `TimeWindowedKStream` and `SessionWindowedKStream`.
> It
> >> takes `Emitted` config and returns the same stream. Example:
> >>
> >> stream
> >>   .groupBy(...)
> >>   .windowedBy(...)
> >>   .trigger(Emitted.onWindowClose). // N
> >>   .count()
> >>
> >> The benefits are:
> >>   1. It's simple and avoids creating overloading of existing functions
> like
> >> `windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it to
> >> `aggregate` functions, we need to add it to all existing `count`,
> >> `aggregate` overloading functions which is a lot.
> >>   2. It operates directly on windowed kstream and tells how its output
> >> should be configured, if later we need to add this other type of
> streams,
> >> we can reuse same `trigger` API whereas other type of streams/tables may
> >> not have `aggregate`, `windowedby` api to make it consistent.
> >>
> >> Hao
> >>
> >>
> >> On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang 
> wrote:
> >>
> >> > Hello Hao,
> >> >
> >> > I'm preferring option 2 over the other options mainly because the
> added
> >> > config object could potentially be used in other operators as well
> (not
> >> > necessarily has to be a windowed operator and hence have to be
> >> piggy-backed
> >> > on `windowedBy`, and that's also why I suggested not naming it
> >> > `WindowConfig` but just `EmitConfig`).
> >> >
> >> > As for Matthias' question, I think the difference between the windowed
> >> > aggregate operator and the stream-stream join operator is that, for
> the
> >> > latter we think emit-final should be the only right emitting policy
> and
> >> > hence we should not let users to configure it. If users configure it
> to
> >> > e.g. emit eager they may get the old spurious emitting behavior which
> is
> >> > violating the semantics.
> >> >
> >> > For option 2) itself, I have a few more thoughts:
> >> >
> >> > 1. Thinking about Matthias' suggestions, I'm also leaning a bit
> >> > towards adding the new param in the overloaded `aggregate`, than the
> >> > overloaded `windowBy` function. The reason is that the emitting logic
> >> could
> >> > be either window based or non-window based, in the long run. Though
> for
> >> > this KIP we could just add it in `XXXWindowedKStream.aggregate()`, we
> may
> >> > want to extend to other non-windowed operators in the future.
> >> > 2. To be consistent with other control class names, I fee

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread John Roesler
Thanks, Guozhang!

To clarify, I was asking specifically about deprecating just the method 
‘untilWindowClose’. I might not be thinking clearly about it, though. What does 
untilWindowClose do that this KIP doesn’t cover?

Thanks,
John

On Mon, Mar 21, 2022, at 20:31, Guozhang Wang wrote:
> Just my 2c: Suppressed is in `suppress` whose application scope is much
> larger and hence more flexible. I.e. it can be used anywhere for a `KTable`
> (but internally we would check whether certain emit policies like
> `untilWindowClose` is valid or not), whereas `trigger` as for now is only
> applicable in XXWindowedKStream. So I think it would not be completely
> replacing Suppressed.untilWindowClose.
>
> In the future, personally I'd still want to keep one control object still
> for all emit policies, and maybe if we have extended Emitted for other
> emitting policies covered by Suppressed today, we can discuss if we could
> have `KTable.suppress(Emitted..)` replacing `KTable.suppress(Suppressed..)`
> as a whole, but for this KIP I think it's too early.
>
>
> Guozhang
>
>
> On Mon, Mar 21, 2022 at 6:18 PM John Roesler  wrote:
>
>> Hi all,
>>
>> Thanks for the Kip, Hao!
>>
>> For what it’s worth, I’m also in favor of your latest framing of the API,
>>
>> I think the name is fine. I assume it’s inspired by Flink? It’s not
>> identical to the concept of a trigger in Flink, which specifies when to
>> evaluate the window, which might be confusing to some people who have deep
>> experience with Flink. Then again, it seems close enough that it should be
>> clear to casual Flink users. For people with no other stream processing
>> experience, it might seem a bit esoteric compared to something
>> self-documenting like ‘emit()’, but the docs should  make it clear.
>>
>> One small question: it seems like this proposal is identical to
>> Suppressed.untilWindowClose, and the KIP states that this API is superior.
>> In that case, should we deprecate Suppressed.untilWindowClose?
>>
>> Thanks,
>> John
>>
>> On Mon, Mar 21, 2022, at 19:30, Guozhang Wang wrote:
>> > Hi Hao,
>> >
>> > For 2), I think it's a good idea in general to use a separate function on
>> > the Time/SessionWindowedKStream itself, to achieve the same effect that,
>> > for now, the emitting control is only for windowed aggregations as in
>> this
>> > KIP, than overloading existing functions. We can discuss further about
>> the
>> > actual function names, whether others like the name `trigger` or not. As
>> > for myself I feel `trigger` is a good one but I'd like to see if others
>> > have opinions as well.
>> >
>> >
>> > Guozhang
>> >
>> > On Mon, Mar 21, 2022 at 5:18 PM Hao Li  wrote:
>> >
>> >> Hi Guozhang,
>> >>
>> >> Thanks for the feedback.
>> >>
>> >> 1. I agree to have an `Emitted` control class and two static
>> constructors
>> >> named `onWindowClose` and `onEachUpdate`.
>> >>
>> >> 2. For the API function changes, I'm thinking of adding a new function
>> >> called `trigger` to `TimeWindowedKStream` and `SessionWindowedKStream`.
>> It
>> >> takes `Emitted` config and returns the same stream. Example:
>> >>
>> >> stream
>> >>   .groupBy(...)
>> >>   .windowedBy(...)
>> >>   .trigger(Emitted.onWindowClose). // N
>> >>   .count()
>> >>
>> >> The benefits are:
>> >>   1. It's simple and avoids creating overloading of existing functions
>> like
>> >> `windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it to
>> >> `aggregate` functions, we need to add it to all existing `count`,
>> >> `aggregate` overloading functions which is a lot.
>> >>   2. It operates directly on windowed kstream and tells how its output
>> >> should be configured, if later we need to add this other type of
>> streams,
>> >> we can reuse same `trigger` API whereas other type of streams/tables may
>> >> not have `aggregate`, `windowedby` api to make it consistent.
>> >>
>> >> Hao
>> >>
>> >>
>> >> On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang 
>> wrote:
>> >>
>> >> > Hello Hao,
>> >> >
>> >> > I'm preferring option 2 over the other options mainly because the
>> added
>> >> > config object could potentially be used in other operators as well
>> (not
>> >> > necessarily has to be a windowed operator and hence have to be
>> >> piggy-backed
>> >> > on `windowedBy`, and that's also why I suggested not naming it
>> >> > `WindowConfig` but just `EmitConfig`).
>> >> >
>> >> > As for Matthias' question, I think the difference between the windowed
>> >> > aggregate operator and the stream-stream join operator is that, for
>> the
>> >> > latter we think emit-final should be the only right emitting policy
>> and
>> >> > hence we should not let users to configure it. If users configure it
>> to
>> >> > e.g. emit eager they may get the old spurious emitting behavior which
>> is
>> >> > violating the semantics.
>> >> >
>> >> > For option 2) itself, I have a few more thoughts:
>> >> >
>> >> > 1. Thinking about Matthias' suggestions, I'm also leaning a bit
>> >> > towards adding the new param i

Re: [VOTE} KIP-796: Interactive Query v2

2022-03-21 Thread John Roesler
Hello, all,

During the PR reviews for this KIP, there were several late concerns raised 
about the IQv2 APIs. I filed tickets under KAFKA-13479 and promised to revisit 
them before the API was released.

Unfortunately, I have not had time to circle back on those concerns. Now that 
the 3.2 branch cut has happened, I can either remove the IQv2 API from 3.2 and 
plan to address those concerns before 3.3, or we can go ahead and release IQv2 
as proposed and implemented.

Note that the APIs are all marked @Evolving, so we can technically break 
compatibility if we do find a better way to do something later. 

What is your preference? Release it, or wait?

Thanks,
John

On Mon, Nov 22, 2021, at 21:18, John Roesler wrote:
> Thanks for voting and for the discussion, all!
>
> The vote on KIP-796 passes with:
> 3 binding +1 (Bruno, Bill, and myself)
> 2 non-binding +1 (Patrick and Vasiliki)
> no vetoes
>
> The vote is now closed. If anyone has objections later on,
> please raise them, though!
>
> We will proceed with a series of pull requests to implement
> the framework, and we will also propose one or more small
> KIPs to propose specific queries.
>
> Thanks again,
> -John
>
> On Mon, 2021-11-22 at 12:11 -0500, Bill Bejeck wrote:
>> Thanks for the well-detailed KIP, John.
>> 
>> It's a +1 (binding) from me.
>> 
>> I want to point out one thing which I think is an oversight. The "Example
>> Raw Query" scan includes a line using the `kafkaStreams.serdesForStore`
>> method, but it's listed in the "Rejected Alternatives" section.
>> 
>> Thanks,
>> Bill
>> 
>> On Mon, Nov 22, 2021 at 9:22 AM Bruno Cadonna  wrote:
>> 
>> > Thanks for the KIP, John!
>> > 
>> > +1 (binding)
>> > 
>> > Best,
>> > Bruno
>> > 
>> > On 19.11.21 18:04, Vasiliki Papavasileiou wrote:
>> > > I think this KIP will greatly improve how we handle IQ in streams so +1
>> > > (non-binding) from me.
>> > > 
>> > > Thank you John!
>> > > 
>> > > On Thu, Nov 18, 2021 at 9:45 PM Patrick Stuedi
>> > 
>> > > wrote:
>> > > 
>> > > > +1 (non-binding), thanks John!
>> > > > -Patrick
>> > > > 
>> > > > On Thu, Nov 18, 2021 at 12:27 AM John Roesler 
>> > wrote:
>> > > > 
>> > > > > Hello all,
>> > > > > 
>> > > > > I'd like to open the vote for KIP-796, which proposes
>> > > > > a revamp of the Interactive Query APIs in Kafka Streams.
>> > > > > 
>> > > > > The proposal is here:
>> > > > > https://cwiki.apache.org/confluence/x/34xnCw
>> > > > > 
>> > > > > Thanks to all who reviewed the proposal, and thanks in
>> > > > > advance for taking the time to vote!
>> > > > > 
>> > > > > Thank you,
>> > > > > -John
>> > > > > 
>> > > > 
>> > > 
>> >


Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread Guozhang Wang
I think the following case is only doable via `suppress`:

stream
  .groupBy(..)
  .windowBy(..)
  .aggregate(..) //result in a KTable>
  .mapValues(..)
  .suppress(Suppressed.untilWindowClose) // since we can trace back to
parent node, to find a window definition


Guozhang


On Mon, Mar 21, 2022 at 6:36 PM John Roesler  wrote:

> Thanks, Guozhang!
>
> To clarify, I was asking specifically about deprecating just the method
> ‘untilWindowClose’. I might not be thinking clearly about it, though. What
> does untilWindowClose do that this KIP doesn’t cover?
>
> Thanks,
> John
>
> On Mon, Mar 21, 2022, at 20:31, Guozhang Wang wrote:
> > Just my 2c: Suppressed is in `suppress` whose application scope is much
> > larger and hence more flexible. I.e. it can be used anywhere for a
> `KTable`
> > (but internally we would check whether certain emit policies like
> > `untilWindowClose` is valid or not), whereas `trigger` as for now is only
> > applicable in XXWindowedKStream. So I think it would not be completely
> > replacing Suppressed.untilWindowClose.
> >
> > In the future, personally I'd still want to keep one control object still
> > for all emit policies, and maybe if we have extended Emitted for other
> > emitting policies covered by Suppressed today, we can discuss if we could
> > have `KTable.suppress(Emitted..)` replacing
> `KTable.suppress(Suppressed..)`
> > as a whole, but for this KIP I think it's too early.
> >
> >
> > Guozhang
> >
> >
> > On Mon, Mar 21, 2022 at 6:18 PM John Roesler 
> wrote:
> >
> >> Hi all,
> >>
> >> Thanks for the Kip, Hao!
> >>
> >> For what it’s worth, I’m also in favor of your latest framing of the
> API,
> >>
> >> I think the name is fine. I assume it’s inspired by Flink? It’s not
> >> identical to the concept of a trigger in Flink, which specifies when to
> >> evaluate the window, which might be confusing to some people who have
> deep
> >> experience with Flink. Then again, it seems close enough that it should
> be
> >> clear to casual Flink users. For people with no other stream processing
> >> experience, it might seem a bit esoteric compared to something
> >> self-documenting like ‘emit()’, but the docs should  make it clear.
> >>
> >> One small question: it seems like this proposal is identical to
> >> Suppressed.untilWindowClose, and the KIP states that this API is
> superior.
> >> In that case, should we deprecate Suppressed.untilWindowClose?
> >>
> >> Thanks,
> >> John
> >>
> >> On Mon, Mar 21, 2022, at 19:30, Guozhang Wang wrote:
> >> > Hi Hao,
> >> >
> >> > For 2), I think it's a good idea in general to use a separate
> function on
> >> > the Time/SessionWindowedKStream itself, to achieve the same effect
> that,
> >> > for now, the emitting control is only for windowed aggregations as in
> >> this
> >> > KIP, than overloading existing functions. We can discuss further about
> >> the
> >> > actual function names, whether others like the name `trigger` or not.
> As
> >> > for myself I feel `trigger` is a good one but I'd like to see if
> others
> >> > have opinions as well.
> >> >
> >> >
> >> > Guozhang
> >> >
> >> > On Mon, Mar 21, 2022 at 5:18 PM Hao Li 
> wrote:
> >> >
> >> >> Hi Guozhang,
> >> >>
> >> >> Thanks for the feedback.
> >> >>
> >> >> 1. I agree to have an `Emitted` control class and two static
> >> constructors
> >> >> named `onWindowClose` and `onEachUpdate`.
> >> >>
> >> >> 2. For the API function changes, I'm thinking of adding a new
> function
> >> >> called `trigger` to `TimeWindowedKStream` and
> `SessionWindowedKStream`.
> >> It
> >> >> takes `Emitted` config and returns the same stream. Example:
> >> >>
> >> >> stream
> >> >>   .groupBy(...)
> >> >>   .windowedBy(...)
> >> >>   .trigger(Emitted.onWindowClose). // N
> >> >>   .count()
> >> >>
> >> >> The benefits are:
> >> >>   1. It's simple and avoids creating overloading of existing
> functions
> >> like
> >> >> `windowedBy` or `count`, `reduce` or `aggregate`. In fact, to add it
> to
> >> >> `aggregate` functions, we need to add it to all existing `count`,
> >> >> `aggregate` overloading functions which is a lot.
> >> >>   2. It operates directly on windowed kstream and tells how its
> output
> >> >> should be configured, if later we need to add this other type of
> >> streams,
> >> >> we can reuse same `trigger` API whereas other type of streams/tables
> may
> >> >> not have `aggregate`, `windowedby` api to make it consistent.
> >> >>
> >> >> Hao
> >> >>
> >> >>
> >> >> On Sat, Mar 19, 2022 at 5:40 PM Guozhang Wang 
> >> wrote:
> >> >>
> >> >> > Hello Hao,
> >> >> >
> >> >> > I'm preferring option 2 over the other options mainly because the
> >> added
> >> >> > config object could potentially be used in other operators as well
> >> (not
> >> >> > necessarily has to be a windowed operator and hence have to be
> >> >> piggy-backed
> >> >> > on `windowedBy`, and that's also why I suggested not naming it
> >> >> > `WindowConfig` but just `EmitConfig`).
> >> >> >
> >> >> > As for Matthias' questio

Re: [DISCUSS] KIP-825: introduce a new API to control when aggregated results are produced

2022-03-21 Thread Hao Li
Hi John,

Yes. For naming, `trigger` is similar to Flink's trigger, but it has a
different meaning in our case. `emit` sounds like an action to emit? How
about `emitTrigger`? I'm open to suggestions for the naming.

For deprecating `Suppressed.untilWindowClose`, I agree with Guozhang we can
deprecate `Suppressed` config as a whole later. Or we can deprecate
`Suppressed.untilWindowClose` in later KIP after implementation of emit
final is done.

BTW, isn't

stream
  .groupBy(..)
  .windowBy(..)
  .aggregate(..) //result in a KTable>
  .mapValues(..)
  .suppress(Suppressed.untilWindowClose) // since we can trace back to
parent node, to find a window definition

same as

stream
  .groupBy(..)
  .windowBy(..)
  .trigger(Emitted.onWindowClose)
  .aggregate(..) //result in a KTable>
  .mapValues(..)
?

Hao


On Mon, Mar 21, 2022 at 7:28 PM Guozhang Wang  wrote:

> I think the following case is only doable via `suppress`:
>
> stream
>   .groupBy(..)
>   .windowBy(..)
>   .aggregate(..) //result in a KTable>
>   .mapValues(..)
>   .suppress(Suppressed.untilWindowClose) // since we can trace back to
> parent node, to find a window definition
>
>
> Guozhang
>
>
> On Mon, Mar 21, 2022 at 6:36 PM John Roesler  wrote:
>
> > Thanks, Guozhang!
> >
> > To clarify, I was asking specifically about deprecating just the method
> > ‘untilWindowClose’. I might not be thinking clearly about it, though.
> What
> > does untilWindowClose do that this KIP doesn’t cover?
> >
> > Thanks,
> > John
> >
> > On Mon, Mar 21, 2022, at 20:31, Guozhang Wang wrote:
> > > Just my 2c: Suppressed is in `suppress` whose application scope is much
> > > larger and hence more flexible. I.e. it can be used anywhere for a
> > `KTable`
> > > (but internally we would check whether certain emit policies like
> > > `untilWindowClose` is valid or not), whereas `trigger` as for now is
> only
> > > applicable in XXWindowedKStream. So I think it would not be completely
> > > replacing Suppressed.untilWindowClose.
> > >
> > > In the future, personally I'd still want to keep one control object
> still
> > > for all emit policies, and maybe if we have extended Emitted for other
> > > emitting policies covered by Suppressed today, we can discuss if we
> could
> > > have `KTable.suppress(Emitted..)` replacing
> > `KTable.suppress(Suppressed..)`
> > > as a whole, but for this KIP I think it's too early.
> > >
> > >
> > > Guozhang
> > >
> > >
> > > On Mon, Mar 21, 2022 at 6:18 PM John Roesler 
> > wrote:
> > >
> > >> Hi all,
> > >>
> > >> Thanks for the Kip, Hao!
> > >>
> > >> For what it’s worth, I’m also in favor of your latest framing of the
> > API,
> > >>
> > >> I think the name is fine. I assume it’s inspired by Flink? It’s not
> > >> identical to the concept of a trigger in Flink, which specifies when
> to
> > >> evaluate the window, which might be confusing to some people who have
> > deep
> > >> experience with Flink. Then again, it seems close enough that it
> should
> > be
> > >> clear to casual Flink users. For people with no other stream
> processing
> > >> experience, it might seem a bit esoteric compared to something
> > >> self-documenting like ‘emit()’, but the docs should  make it clear.
> > >>
> > >> One small question: it seems like this proposal is identical to
> > >> Suppressed.untilWindowClose, and the KIP states that this API is
> > superior.
> > >> In that case, should we deprecate Suppressed.untilWindowClose?
> > >>
> > >> Thanks,
> > >> John
> > >>
> > >> On Mon, Mar 21, 2022, at 19:30, Guozhang Wang wrote:
> > >> > Hi Hao,
> > >> >
> > >> > For 2), I think it's a good idea in general to use a separate
> > function on
> > >> > the Time/SessionWindowedKStream itself, to achieve the same effect
> > that,
> > >> > for now, the emitting control is only for windowed aggregations as
> in
> > >> this
> > >> > KIP, than overloading existing functions. We can discuss further
> about
> > >> the
> > >> > actual function names, whether others like the name `trigger` or
> not.
> > As
> > >> > for myself I feel `trigger` is a good one but I'd like to see if
> > others
> > >> > have opinions as well.
> > >> >
> > >> >
> > >> > Guozhang
> > >> >
> > >> > On Mon, Mar 21, 2022 at 5:18 PM Hao Li 
> > wrote:
> > >> >
> > >> >> Hi Guozhang,
> > >> >>
> > >> >> Thanks for the feedback.
> > >> >>
> > >> >> 1. I agree to have an `Emitted` control class and two static
> > >> constructors
> > >> >> named `onWindowClose` and `onEachUpdate`.
> > >> >>
> > >> >> 2. For the API function changes, I'm thinking of adding a new
> > function
> > >> >> called `trigger` to `TimeWindowedKStream` and
> > `SessionWindowedKStream`.
> > >> It
> > >> >> takes `Emitted` config and returns the same stream. Example:
> > >> >>
> > >> >> stream
> > >> >>   .groupBy(...)
> > >> >>   .windowedBy(...)
> > >> >>   .trigger(Emitted.onWindowClose). // N
> > >> >>   .count()
> > >> >>
> > >> >> The benefits are:
> > >> >>   1. It's simple and avoids creating overloading of existing
> > functions