Re: [DISCUSS] Apache Kafka 3.2.1 release

2022-07-13 Thread David Jacot
+1. Thanks David.

Le mer. 13 juil. 2022 à 23:43, José Armando García Sancio
 a écrit :

> +1. Thanks for volunteering David.
>
> --
> -José
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.3 #6

2022-07-13 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 487659 lines...]
[2022-07-14T03:12:03.956Z] > Task :connect:api:testSrcJar
[2022-07-14T03:12:03.956Z] > Task 
:connect:api:publishMavenJavaPublicationToMavenLocal
[2022-07-14T03:12:03.956Z] > Task :connect:api:publishToMavenLocal
[2022-07-14T03:12:03.956Z] 
[2022-07-14T03:12:03.956Z] > Task :streams:javadoc
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/processor/StreamPartitioner.java:50:
 warning - Tag @link: reference not found: 
org.apache.kafka.clients.producer.internals.DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:854:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:03.956Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:05.020Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:84:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:05.020Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:136:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:05.020Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:147:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:05.020Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:101:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:05.020Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:167:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: missing '#': "org.apache.kafka.streams.StreamsBuilder()"
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:58:
 warning - Tag @link: can't find org.apache.kafka.streams.StreamsBuilder() in 
org.apache.kafka.streams.TopologyConfig
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/TopologyDescription.java:38:
 warning - Tag @link: reference not found: ProcessorContext#forward(Object, 
Object) forwards
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/Position.java:44:
 warning - Tag @link: can't find query(Query,
[2022-07-14T03:12:06.221Z]  PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:44:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:36:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-07-14T03:12:06.221Z] 
/home/jenkins/workspace/Kafka_kafka_3.3/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:57:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-07-14T03:12:06.221Z] 
/home/jenkins/

Re: [DISCUSS] Apache Kafka 3.2.1 release

2022-07-13 Thread José Armando García Sancio
+1. Thanks for volunteering David.

-- 
-José


[DISCUSS] Apache Kafka 3.2.1 release

2022-07-13 Thread David Arthur
Hey folks,

I'd like to volunteer to be the release manager for a hotfix release of the
3.2 line. This will be the first hotfix release of this line and will be
version 3.2.1. We have fixed several bugs since 3.2.0 including an OAuth
token refresh bug, a problem with transaction markers, and a protocol bug
in the group coordinator.

If no one has any objections, I will send out a release plan within the
next few days that includes a list of all of the fixes we are targeting for
3.2.1 along with a timeline.

Thanks!
David


Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

2022-07-13 Thread Guozhang Wang
- clicked "send" by mistake... here's the full email -

Hello Deqi,

Thanks for bringing this KIP, and sorry for getting back to you so late.

I do think that separating the reset policy for the two scenarios: 1) when
we get an out-of-range when polling records, likely due to log being
truncated, 2) when we start fetching and there's no committed offset, would
be preferable as we have seen many places where people would prefer to use
different strategies for these two scenarios. At the same time, I also
share other's concerns that the current proposal is trying to mingle too
many features together which makes it unnecessarily complicated and also
makes the compatibility story trickier.

My readings are that, you want to achieve the following things within this
KIP:

a) separate the two scenarios for reset strategies, as I mentioned above.
This to me is the most compelling motivation.
b) introduce new reset policies, a.k.a."nearest" in addition to earliest
and latest. This has been proposed quite a while ago to add more
flexibilities in the policies.
c) tailor the reset policy of a topic for a specific consumer group. I.e.
when a consumer group starts consuming, we want to let it start from
"latest", but once the consumer group starts, newly added partitions would
be using "earliest" instead to avoid data loss.

I think trying to compound all these things in this one KIP makes it a bit
too mingled, and complicated. Plus, we also need to make sure that we are
compatible with the old behaviors if users only set "earliest" or "latest",
and expect that to impact both scenarios.

I think about them for a bit and here's my 2c: how about, we simplify this
KIP in the following way. The first three rows are existing strategies that
we do not change for compatibility.

--

*auto.offset.resetwhen out-of-range
 when no committed offsets found*
none throw exception
 throw exception
earliest reset to earliest
reset to earliest
latestreset to latest
   reset to latest

*auto.offset.reset.on.no.initial.offset*
none fall back to
*auto.offset.reset* throw exception
earliest fall back to
*auto.offset.reset* reset to earliest
latestfall back to
*auto.offset.reset* reset to latest
latest_on_start fall back to *auto.offset.reset*
  reset to latest when the consumer group is starting (implementation wise,
we do not rely on timestamps, just check if this is the first time the
consumer get assignment); otherwise fall back to *auto.offset.reset*
earliest_on_start  fall back to *auto.offset.reset*
reset to earliest when the consumer group is starting (same as above);
otherwise fall back to *auto.offset.reset*

*auto.offset.reset.on.invalid.offset*
none throw exception
  fall back to *auto.offset.reset*
earliest reset to earliest
 fall back to *auto.offset.reset*
latestreset to latest
fall back to *auto.offset.reset*
nearestreset to latest if the current
offset is larger than log.end;
 to earliest if the
current offset is smaller than log.start  fall
back to *auto.offset.reset*

--

With this slightly modified proposal, we can still cover all three
motivations, e.g.:

a) "I want to use a different reset policy for out-of-range, and when no
committed offsets upon starting": auto.offset.reset = latest,
auto.offset.reset.on.invalid.offset = none.
b) "I want to use a flexible reset policy for out-of-range":
auto.offset.reset = latest, auto.offset.reset.on.invalid.offset = nearest.
b) "I want to not lose data upon new partitions after my consumer has
started a flexible reset policy for out-of-range": auto.offset.reset =
earliest, auto.offset.reset.on.no.initial.offset = latest_on_start.


Please let me know what you think.




On Wed, Jul 13, 2022 at 10:30 AM Guozhang Wang  wrote:

> Hello Deqi,
>
> Thanks for bringing this KIP, and sorry for getting back to you so late.
>
> I do think that separating the reset policy for the two scenarios: 1) when
> we get an out-of-range when polling records, likely due to log being
> truncated, 2) when we start fetching and there's no committed offset, would
> be preferable as we have seen many places where people would prefer to use
> different strategies for these two scenarios. At the same time, I also
> share other's concerns that the current proposal is trying to mingle too
> many features together which makes it unnece

[GitHub] [kafka-site] mumrah opened a new pull request, #423: Add a new signing key for David Arthur

2022-07-13 Thread GitBox


mumrah opened a new pull request, #423:
URL: https://github.com/apache/kafka-site/pull/423

   This is in preparation for the upcoming 3.2.1 release. This key can also be 
found on [keys.openpgp.org](https://keys.openpgp.org/)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: Re: [DISCUSS] KIP-842: Add richer group offset reset mechanisms

2022-07-13 Thread Guozhang Wang
Hello Deqi,

Thanks for bringing this KIP, and sorry for getting back to you so late.

I do think that separating the reset policy for the two scenarios: 1) when
we get an out-of-range when polling records, likely due to log being
truncated, 2) when we start fetching and there's no committed offset, would
be preferable as we have seen many places where people would prefer to use
different strategies for these two scenarios. At the same time, I also
share other's concerns that the current proposal is trying to mingle too
many features together which makes it unnecessarily complicated and also
makes the compatibility story trickier.

My readings are that, you want to achieve the following things within this
KIP:

a) separate the two scenarios for reset strategies, as I mentioned above.
This to me is the most compelling motivation.
b) introduce new reset policies, a.k.a."nearest"in addition to earliest and
latest. This has been proposed quite a while ago to add more flexibilities
in the policies.
c) tailor the reset policy of a topic for a specific consumer group. I.e.
when a consumer group starts consuming, we want to let it start from
"latest", but once the consumer group starts, newly added partitions would
be using "earliest" instead to avoid data loss.

I think trying to compound all these things in this one KIP makes it a bit
too mingled, and complicated. Plus, we also need to make sure that we are
compatible with the old behaviors if users only set "earliest" or "latest",
and expect that to impact both scenarios.

I think about them for a bit and here's my 2c: how about, we simplify this
KIP in the following way. The first three rows are existing strategies that
we do not change for compatibility.

--

*auto.offset.resetwhen out-of-range
 when no committed offsets found*
none throw exception
 throw exception
earliest reset to earliest
reset to earliest
latestreset to latest
   reset to latest

*auto.offset.reset.on.start*
earliest_on_start  throw exception
reset to earliest
latest_on_start throw exception
  reset to latest




















On Fri, Jul 8, 2022 at 7:01 AM hudeqi <16120...@bjtu.edu.cn> wrote:

> Regarding the option to integrate repair logic in "latest", I understand
> your concern about this approach: backward compatibility.
> But we should have a consensus: the problem of data loss due to expand
> partitions is indeed caused by kafka's own design mechanism. The user
> configuration "latest" may be due to the consideration of not wanting to
> consume from earliest when firstly deploy app, or too much lag, or
> consumption exceeds the maximum offset, and then consume directly from the
> latest. As for expanding partition, the user will definitely not want to
> consume from the latest, unless he clearly knows what this means.
> Therefore, it is necessary to solve this problem, at the same time, without
> causing other problems.
> Therefore, for the method of adding an "init.offset.reset" option, there
> will be a problem, that is, this configuration must be set to "earliest" to
> avoid this situation, but it will also cause the new group to be consumed
> from earliest. , which goes against the idea of ​​consuming from the latest
> at the beginning (brings other problems).
> The same is true for the method of setting auto.offset.reset to "earliest"
> and seekingToEnd on new deployments: in order to avoid this case,
> "auto.offset.reset" has no choice but to set "earliest", when the
> consumption is advanced, it will also reset to the earliest, causing
> duplication (bringing other problems).
> So I think it's best to fix it in a black box to fundamentally solve it.
> It does not require users to perceive this problem, nor does the user's
> understanding of "auto.offset.reset" need to be changed, and there will be
> no complexity caused by redundant parameter configuration(and users doesn't
> necessarily know how to combine these parameters to use it). As for the
> compatibility issue, I think it is enough to enrich the test cases after
> the repair, what do you think?
>
> "Matthias J. Sax" 写道:
> > I am not sure if we can/should change the behavior of existing
> > "latest/earliest" due to backward compatibility concerns. While I agree
> > that many users might not know the fine details how both behave, it
> > would still be a change that might break other people that do understand
> > the details and rely on it.
> >
> > I also agree that having both "latest" and "safe_latest" might be
> > difficult, as users might not know which one to choose?
> >
> > Maybe we should have two configs instead of one? `auto.offset.reset`, as
> > the name suggests, resets the offset automatically, and thus it's
> > current behavior is actu

Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-07-13 Thread Sagar
Hey Jose,

Well actually I have 2 approved PRs from Kafka Connect:

https://github.com/apache/kafka/pull/12321
https://github.com/apache/kafka/pull/12309

Not sure how to get these merged though but I think these can go into 3.3
release.

Thanks!
Sagar.


On Wed, Jul 13, 2022 at 5:03 PM Divij Vaidya 
wrote:

> Hey Jose
>
> A few of my PRs are pending review for quite some which I was hoping to
> merge into 3.3. I have already marked them with "Fix version=3.3.0" so that
> you can track them using the JIRA filter you shared earlier
> <
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.3.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC%20%20%20%20%20%20
> >
> in this thread. Would you have some time to review them?
>
> Notable amongst them would be:
> 1. Fix the rate window size calculation for edge cases -
> https://github.com/apache/kafka/pull/12184
> 2. Fix resource leaks - https://github.com/apache/kafka/pull/12228
>
> And the complete list would be at:
>
> https://github.com/search?q=is%3Aopen+is%3Apr+author%3Adivijvaidya+is%3Apr+repo%3Aapache%2Fkafka+created%3A2022-04-01..2022-07-30&type=Issues
>
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jul 11, 2022 at 5:12 PM José Armando García Sancio
>  wrote:
>
> > Hi all,
> >
> > I created the branch for 3.3
> > (https://github.com/apache/kafka/tree/3.3). If you have bug fixes for
> > the 3.3.0 release please make sure to cherry pick them to that branch.
> >
> > Thanks
> >
>


Re: Discard KIP-185

2022-07-13 Thread Guozhang Wang
Justine,

Thanks for bringing this up, I checked the two KIPs and I think KIP-679
subsumes KIP-185 so I can discard the latter.

As for the configs of "max.in.flight.requests.per.connection" and
"retries": I think it's okay to keep the default as 5 for now since on the
broker side we still keep a history of sequences, as for "retries", I think
that's already replaced by the newly introduced "max.block.ms".


Guozhang

On Wed, Jul 13, 2022 at 8:56 AM Justine Olshan 
wrote:

> Hey Apache Kafka community
>
> I was doing some research on idempotent producers and found KIP-185
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-185%3A+Make+exactly+once+in+order+delivery+per+partition+the+default+producer+setting
> >
> .
>
> It seems like this KIP was replaced by KIP-679
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-679%3A+Producer+will+enable+the+strongest+delivery+guarantee+by+default
> >.
> There are minor differences between the KIPs, but I'd say KIP-679 replaces
> KIP-185.
>
> Do we think it is valid to discard KIP-185? Or at least move it from "under
> discussion" on the KIP page
> <
> https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> >
> ?
>
> Thanks,
> Justine
>


-- 
-- Guozhang


[GitHub] [kafka-site] divijvaidya commented on pull request #422: KAFKA-13868: Self host JS files with project website

2022-07-13 Thread GitBox


divijvaidya commented on PR #422:
URL: https://github.com/apache/kafka-site/pull/422#issuecomment-1183421173

   @mimaison please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya opened a new pull request, #422: KAFKA-13868: Self host JS files with project website

2022-07-13 Thread GitBox


divijvaidya opened a new pull request, #422:
URL: https://github.com/apache/kafka-site/pull/422

   As per the [Apache privacy 
policy](https://privacy.apache.org/faq/committers.html), all JS files are 
recommended to be hosted along with the website so that we don't have a 
dependency on CDNs such as cloudflare.
   
   This change brings two JS libraries into the code base. 
   - prism: used for syntax highlighting
   - handlebars: templating library 
   
   # Testing
   Verified that syntax highlighting works as expected. No errors in the 
console.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Discard KIP-185

2022-07-13 Thread Justine Olshan
Hey Apache Kafka community

I was doing some research on idempotent producers and found KIP-185

.

It seems like this KIP was replaced by KIP-679
.
There are minor differences between the KIPs, but I'd say KIP-679 replaces
KIP-185.

Do we think it is valid to discard KIP-185? Or at least move it from "under
discussion" on the KIP page

?

Thanks,
Justine


[DISCUSS] Website changes required for Apache projects

2022-07-13 Thread Divij Vaidya
Hello Apache Kafka community

The ASF has a new data privacy policy to comply with the GDPR (the European
Union's General Data Protection Regulation) and we - like all other ASF
projects - have been asked to update our project homepage accordingly.

Mickael Maison has kindly traged the initial set of requirements and listed
down the required set of changes at
https://issues.apache.org/jira/browse/KAFKA-13868.

I would like to bring your attention to a few PRs that address the required
changes and also solicit your comments on how I plan to solve others.

1. Our website is missing privacy policy -> Addressed by adding an item in
the top nav bar https://github.com/apache/kafka-site/pull/421. *Action -
please review the PR.*
2. It's using Google Analytics -> I would propose that we should get rid of
Google Analytics in favor of Apache recommended Matomo
 for website analytics. If
you folks agree, I would request a Matomo site ID for Apache Kafka to make
the required changes.
*Action - do you agree to this change?*3. It's using Google Fonts -> I have
moved the Google fonts to a self hosted version which is acceptable by
Apache in the PR https://github.com/apache/kafka-site/pull/420.
*Action - please review the PR. *4. It's using scripts hosted on Cloudflare
CDN -> We use JS scripts such as handlebars
 and prism
. Both these libraries are MIT licensed and hence,
could be hosted locally along with the website. I will move them along to
be placed along with the website.
*Action - do you agree to this change?*5. Embedded videos don't have an
image placeholder -> I don't have a proposed solution for this. *Action -
can someone with front end experience help us with this one?*

Note that we need to make these changes by July 22nd and hence your
immediate attention would be greatly appreciated.

Cheers!

-- 
Divij Vaidya


[GitHub] [kafka-site] divijvaidya commented on pull request #421: KAFKA-13868: Add new item 'Apache Software' in top nav bar including privacy policy

2022-07-13 Thread GitBox


divijvaidya commented on PR #421:
URL: https://github.com/apache/kafka-site/pull/421#issuecomment-1183316551

   @mimaison please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya opened a new pull request, #421: KAFKA-13868: Add new item 'Apache Software' in top nav bar including privacy policy

2022-07-13 Thread GitBox


divijvaidya opened a new pull request, #421:
URL: https://github.com/apache/kafka-site/pull/421

   **Why**
   As per the [Apache branching 
policy](https://www.apache.org/foundation/marks/pmcs#navigation), every project 
website's main navigation system must feature certain text links back to key 
pages on the main www.apache.org website.
   
   **What**
   Added a new item 'Apache Software' to the top nav bar which includes the 
required links including Privacy Policy requirement outlined in 
https://issues.apache.org/jira/browse/KAFKA-13868 
   
   **Tested**
   Tested the change by running website locally. The new nav bar looks as 
follows:
   https://user-images.githubusercontent.com/71267/178761918-cf18f304-c13d-4a94-b567-1a6329bad4ee.png";>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-848: The Next Generation of the Consumer Rebalance Protocol

2022-07-13 Thread David Jacot
Thanks Guozhang. My answers are below:

> 1) the migration path, especially the last step when clients flip the flag
> to enable the new protocol, in which we would have a window where both new
> protocols / rpcs and old protocols / rpcs are used by members of the same
> group. How the coordinator could "mimic" the old behavior while using the
> new protocol is something we need to present about.

Noted. I just published a new version of KIP which includes more
details about this. See the "Supporting Online Consumer Group Upgrade"
and the "Compatibility, Deprecation, and Migration Plan". I think that
I have to think through a few cases now but the overall idea and
mechanism should be understandable.

> 2) the usage of topic ids. So far as KIP-516 the topic ids are only used as
> part of RPCs and admin client, but they are not exposed via any public APIs
> to consumers yet. I think the question is, first should we let the consumer
> client to be maintaining the names -> ids mapping itself to fully leverage
> on all the augmented existing RPCs and the new RPCs with the topic ids; and
> secondly, should we ever consider exposing the topic ids in the consumer
> public APIs as well (both subscribe/assign, as well as in the rebalance
> listener for cases like topic deletion-and-recreation).

a) Assuming that we would include converting all the offsets related
RPCs to using topic ids in this KIP, the consumer would be able to
fully operate with topic ids. That being said, it still has to provide
the topics names in various APIs so having a mapping in the consumer
seems inevitable to me.
b) I don't have a strong opinion on this. Here I wonder if this goes
beyond the scope of this KIP. I would rather focus on the internals
here and we can consider this separately if we see value in doing it.

Coming back to Ismael's point about using topic ids in the
ConsumerGroupHeartbeatRequest, I think that there is one advantage in
favour of it. The consumer will have the opportunity to validate that
the topics exists before passing them into the group rebalance
protocol. Obviously, the coordinator will also notice it but it does
not really have a way to reject an invalid topic in the response.

> I'm agreeing with David on all other minor questions except for the
> `subscribe(Pattern)` question: personally I think it's not necessary to
> deprecate the subscribe API with Pattern, but instead we still use Pattern
> while just documenting that our subscription may be rejected by the server.
> Since the incompatible case is a very rare scenario I felt using an
> overloaded `String` based subscription may be more vulnerable to various
> invalid regexes.

That could work. I have to look at the differences between the two
engines to better understand the potential issues. My understanding is
that would work for all the basic regular expressions. The differences
between the two are mainly about the various character classes. I
wonder what other people think about this.

Best,
David

On Tue, Jul 12, 2022 at 11:28 PM Guozhang Wang  wrote:
>
> Thanks David! I think on the high level there are two meta points we need
> to concretize a bit more:
>
> 1) the migration path, especially the last step when clients flip the flag
> to enable the new protocol, in which we would have a window where both new
> protocols / rpcs and old protocols / rpcs are used by members of the same
> group. How the coordinator could "mimic" the old behavior while using the
> new protocol is something we need to present about.
> 2) the usage of topic ids. So far as KIP-516 the topic ids are only used as
> part of RPCs and admin client, but they are not exposed via any public APIs
> to consumers yet. I think the question is, first should we let the consumer
> client to be maintaining the names -> ids mapping itself to fully leverage
> on all the augmented existing RPCs and the new RPCs with the topic ids; and
> secondly, should we ever consider exposing the topic ids in the consumer
> public APIs as well (both subscribe/assign, as well as in the rebalance
> listener for cases like topic deletion-and-recreation).
>
> I'm agreeing with David on all other minor questions except for the
> `subscribe(Pattern)` question: personally I think it's not necessary to
> deprecate the subscribe API with Pattern, but instead we still use Pattern
> while just documenting that our subscription may be rejected by the server.
> Since the incompatible case is a very rare scenario I felt using an
> overloaded `String` based subscription may be more vulnerable to various
> invalid regexes.
>
>
> Guozhang
>
> On Tue, Jul 12, 2022 at 5:23 AM David Jacot 
> wrote:
>
> > Hi Ismael,
> >
> > Thanks for your feedback. Let me answer your questions inline.
> >
> > > 1. I think it's premature to talk about target versions for deprecation
> > and
> > > removal of the existing group protocol. Unlike KRaft, this affects a core
> > > client protocol and hence deprecation/removal will be heavily depende

Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2022-07-13 Thread Divij Vaidya
Thank you for your review Luke.

> Reg: is that would the new `RemoteLogSizeBytes` metric be a performance
overhead? Although we move the calculation to a seperate API, we still
can't assume users will implement a light-weight method, right?

This metric would be logged using the information that is already being
calculated for handling remote retention logic, hence, no additional work
is required to calculate this metric. More specifically, whenever
RemoteLogManager calls getRemoteLogSize API, this metric would be captured.
This API call is made every time RemoteLogManager wants to handle expired
remote log segments (which should be periodic). Does that address your
concern?

Divij Vaidya



On Tue, Jul 12, 2022 at 11:01 AM Luke Chen  wrote:

> Hi Divij,
>
> Thanks for the KIP!
>
> I think it makes sense to delegate the responsibility of calculation to the
> specific RemoteLogMetadataManager implementation.
> But one thing I'm not quite sure, is that would the new
> `RemoteLogSizeBytes` metric be a performance overhead?
> Although we move the calculation to a seperate API, we still can't assume
> users will implement a light-weight method, right?
>
> Thank you.
> Luke
>
> On Fri, Jul 1, 2022 at 5:47 PM Divij Vaidya 
> wrote:
>
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-852%3A+Optimize+calculation+of+size+for+log+in+remote+tier
> >
> >
> > Hey folks
> >
> > Please take a look at this KIP which proposes an extension to KIP-405.
> This
> > is my first KIP with Apache Kafka community so any feedback would be
> highly
> > appreciated.
> >
> > Cheers!
> >
> > --
> > Divij Vaidya
> > Sr. Software Engineer
> > Amazon
> >
>


[GitHub] [kafka-site] divijvaidya commented on pull request #420: KAFKA-13868: Self host fonts with project website

2022-07-13 Thread GitBox


divijvaidya commented on PR #420:
URL: https://github.com/apache/kafka-site/pull/420#issuecomment-1183179393

   @ijuma @mimaison please review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-07-13 Thread Divij Vaidya
Hey Jose

A few of my PRs are pending review for quite some which I was hoping to
merge into 3.3. I have already marked them with "Fix version=3.3.0" so that
you can track them using the JIRA filter you shared earlier

in this thread. Would you have some time to review them?

Notable amongst them would be:
1. Fix the rate window size calculation for edge cases -
https://github.com/apache/kafka/pull/12184
2. Fix resource leaks - https://github.com/apache/kafka/pull/12228

And the complete list would be at:
https://github.com/search?q=is%3Aopen+is%3Apr+author%3Adivijvaidya+is%3Apr+repo%3Aapache%2Fkafka+created%3A2022-04-01..2022-07-30&type=Issues


--
Divij Vaidya



On Mon, Jul 11, 2022 at 5:12 PM José Armando García Sancio
 wrote:

> Hi all,
>
> I created the branch for 3.3
> (https://github.com/apache/kafka/tree/3.3). If you have bug fixes for
> the 3.3.0 release please make sure to cherry pick them to that branch.
>
> Thanks
>


[GitHub] [kafka-site] divijvaidya opened a new pull request, #420: KAFKA-13868: Self host fonts with project website

2022-07-13 Thread GitBox


divijvaidya opened a new pull request, #420:
URL: https://github.com/apache/kafka-site/pull/420

   As per the [Apache privacy 
policy](https://privacy.apache.org/faq/committers.html), Google Fonts are 
recommended to be hosted along with the website.
   
   This change  adds the fonts locally in the code
   
   # Testing
   Tested the website locally to ensure sanity
   
   **Before the change**
   ![Screenshot 2022-07-13 at 13 21 
20](https://user-images.githubusercontent.com/71267/178722327-ef1c5d18-992f-40be-b1c1-492d8b643db0.png)
   
   **After the change (local)**
   ![Screenshot 2022-07-13 at 13 19 
31](https://user-images.githubusercontent.com/71267/178722059-6f30459d-7bb8-496b-b907-e8e45a4aae8b.png)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [VOTE] KIP-847

2022-07-13 Thread Ismael Juma
Thanks for the updates, +1 (binding) from me.

Ismael

On Fri, Jul 8, 2022 at 3:45 AM Artem Livshits
 wrote:

> Hello,
>
> There was an additional discussion and the KIP got changed as a result of
> that.  I would like to restart the vote on the updated
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-847%3A+Add+ProducerIdCount+metrics
> .
>
> -Artem
>
> On Fri, Jun 24, 2022 at 7:49 PM Luke Chen  wrote:
>
> > Hi Artem,
> >
> > Thanks for the KIP.
> > +1 (binding) from me.
> >
> > In addition to the `ProducerIdCount` in motivation section, the KIP title
> > should also be updated.
> >
> > Luke
> >
> > On Fri, Jun 24, 2022 at 8:33 PM David Jacot  >
> > wrote:
> >
> > > Thanks for the KIP, Artem.
> > >
> > > I am +1 (binding).
> > >
> > > A small nit: ProducerIdCount should be used in the motivation.
> > >
> > > Best,
> > > David
> > >
> > > On Thu, Jun 23, 2022 at 10:26 PM Artem Livshits
> > >  wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'd like to start a vote on KIP-847
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-847%3A+Add+ProducerCount+metrics
> > > >
> > > > -Artem
> > >
> >
>