[jira] [Created] (KAFKA-13543) Consumer may pass stale cluster metadata to the assignor following a subscription update

2021-12-13 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-13543:
--

 Summary: Consumer may pass stale cluster metadata to the assignor 
following a subscription update
 Key: KAFKA-13543
 URL: https://issues.apache.org/jira/browse/KAFKA-13543
 Project: Kafka
  Issue Type: Bug
  Components: consumer
Reporter: A. Sophie Blee-Goldman


A consumer only ever tracks metadata corresponding to its subscribed topics, 
which can cause a race condition during a rebalance immediately after a change 
to the consumer's subscription. Particularly, when new topics are added to the 
subscription but a rebalance in kicked off before the consumer's metadata is 
updated with the new topics, it will pass a stale copy of the cluster metadata 
in to the ConsumerPartitionAssignor#assign method, which may not include the 
newly subscribed topics regardless of whether they do or do not exist.

Most apps are likely unaffected by this, including any consumer client apps 
using OOTB assignors, since a new rebalance will be kicked off when the 
metadata is updated and any partitions from the new topics will be assigned at 
that time. But in Kafka Streams, we do a check during each rebalance to ensure 
that any user input topics are created ahead of time. This race condition can 
result in Streams incorrectly identifying user topics as missing and throwing a 
MissingSourceTopicException when a new topology subscribed to new topics is 
added to the application 

We can work around this for now, but it's unfortunate that we can't distinguish 
between true missing source topics and a transient lack of these topics in the 
metadata. There might also be some plain consumer client apps with custom 
assignors that run into this as well, for more advanced users.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13542) Utilize the new Consumer#enforceRebalance(reason) API in Streams

2021-12-13 Thread A. Sophie Blee-Goldman (Jira)
A. Sophie Blee-Goldman created KAFKA-13542:
--

 Summary: Utilize the new Consumer#enforceRebalance(reason) API in 
Streams
 Key: KAFKA-13542
 URL: https://issues.apache.org/jira/browse/KAFKA-13542
 Project: Kafka
  Issue Type: Task
  Components: streams
Reporter: A. Sophie Blee-Goldman
 Fix For: 3.2.0


KIP-800 is adding a new "reason" parameter to the Consumer#enforceRebalance 
API, which will be passed in to a new field of the JoinGroup protocol. We 
invoke this API throughout Streams for various reasons, which are very useful 
for debugging the cause of rebalancing. Passing in the reason to this new API 
would make it possible to figure out why a Streams client triggered a rebalance 
from the broker logs, which are often the only logs available when the client 
logs cannot be retrieved for whatever reason



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[VOTE] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

2021-12-13 Thread Colin McCabe
Hi all,

I'd like to start the vote on KIP-801: Implement an Authorizer that stores 
metadata in __cluster_metadata

The KIP is here: https://cwiki.apache.org/confluence/x/h5KqCw

The original DISCUSS thread is here:

https://lists.apache.org/thread/3d5o7h17ztjztjhblx4fln0wbbs1rmdq

Please take a look and vote if you can.

best,
Colin


Re: [DISCUSS] KIP-801: Implement an Authorizer that stores metadata in __cluster_metadata

2021-12-13 Thread Colin McCabe
Hi José,

I think these are good questions. We have a few situations like this where 
there is something brokers have to know before they can contact the controller 
quorum -- or something controllers have to know before they can accept broker 
connections. Basically, the bootstrapping problem.

Offhand, I can think of a few scenarios like this:
1. If you need certain ACLs to be present, you need a way of setting those up 
on the controller before starting the controller quorum for the first time.
2. If you are using SCRAM for the broker user, you need some way of setting 
that up before starting up the cluster for the first time.
3. If you are using KIP-226 dynamic broker configurations to configure the SSL 
settings for the connection to the controller, you need a way of setting those 
up before starting the broker.

Let's tackle these in a separate KIP since they all have some common features, 
and I think it's somewhat orthogonal to the main point here (authorizer in 
kraft)

If this KIP lands before the bootstrapping one, we can always set super.users 
to be the broker user as a workaround.

best,
Colin


On Wed, Dec 8, 2021, at 14:15, José Armando García Sancio wrote:
> Hi Colin,
>
> Thanks for the KIP.
>
> 1. Can you talk about how the set of ACLs needed to authorize
> controllers and brokers will get bootstrapped? I am particularly
> interested in how we are going to configure a new cluster so that the
> controllers nodes can authorize each other's requests to establish
> quorum. After a quorum is established, I am interested in how the user
> would make sure that new brokers will get authorize against the
> controllers for requests like "register broker" and "fetch".
>
> thanks,
> -Jose


Re: [VOTE] KIP-806: Add session and window query over KV-store in IQv2

2021-12-13 Thread Guozhang Wang
Hi John,

Please see my follow-up comments inlined below.

On Mon, Dec 13, 2021 at 2:26 PM John Roesler  wrote:

> Hi Patrick, thanks for the KIP!
>
> I hope you, Guozhang, and Luke don't mind if I share some
> thoughts:
>
> 1. I think you just meant to remove that private constructor
> from the KIP, right?
>

Yes.


>
> 2. I think WindowRangeQuery#withWindowRage(windowLower,
> windowUpper) is the version that yields an iterator over
> both keys and windows, so it's not totally redundant.
>
> However, it does seem like WindowRangeQuery#withKey is
> equivalent to WindowKeyQuery#withKey . I overlooked it
> before, but we probably don't need both.
>
> Stepping back, it seems like we could either just de-
> duplicate that specific withKey query or we could even just
> merge both use cases into the WindowRangeQuery. I've
> personally never been convinced that the extra complexity of
> having the WindowStoreIterator is worth the savings of not
> duplicating the key on each record. Maybe it would be if we
> separately deserialized the same key over and over again,
> but that seems optimizable.
>
> On the other hand, that's more work, and the focus for IQv2
> right now is to just get some MVP of queries implemented to
> cover basic use cases, so it might be worthwhile to keep it
> simple (by just dropping `WindowRangeQuery#withKey`). I'm
> happy with whatever call Patrick wants to make here, since
> it's his work to do.
>

My original comments was actually referring to that `WindowRangeQuery` does
not allow users to specify a range of [keyFrom: windowFrom, keyTo:
windowTo], but only [key: windowFrom, key: windowTo], or [-INF: windowFrom,
INF: windowTo] if we do not specify the key but only do `withWindowRage`. I
was not sure if this is intentional by design, i.e. that the only
difference between the two classes is that the latter allows [-INF:
windowFrom, INF: windowTo], and and I was assuming it's now, but otherwise
these two are then very much alike.

So I guess my question is that, is it by-design to not allow users do
something like `query.withWindowRange(..).withKeyRange(..)`?


>
> 3. I got the impression that WindowRangeQuery was intended
> for use with both Window and Session stores, but that might
> be my assumption.
>
>
Today the session store's query API is at least different on the names,
e.g. "findSessions", so I'm not sure if the proposal intend to replace
these functions with the WindowRangeQuery such that:

sessionStore.findSessions(K, Ins, Ins) -> query.withKey().withWindowRange()
sessionStore.findSessions(K, K, Ins, Ins) ->
query.withKeyRange().withWindowRange()   // again, assuming we would want
`withKeyRange` as in 2) above.
sessionStore.fetch(K) -> query.withKey()
sessionStore.fetch(K, K) -> query.withKeyRange()

Also, sessionStore.fetchSession(K, Ins, Ins) seems would not be supported
with the new API. I'm not enforcing that we have the complete coverage in
this KIP, and I'm with you that we would focus on getting some MVP out
sooner, but I'd like our KIP to clarify on what's really covered comparing
against the old API and what's not.



> 4. That's good feedback. I think those names were inspired
> by the WindowStore methods that just say stuff like
> `timeFrom` and `timeTo`. The SessionStore does a better job
> of being unambiguous, since it says stuff like
> `earliestSessionEndTime` and `latestSessionStartTime`.
>
> And, actually, that brings up a point that I think we
> overlooked before. While the method signatures of the
> WindowStore and SessionStore methods look the same, the
> meanings of their arguments are not the same. In particular,
> the WindowStore ranges are always in reference to the window
> start time, while the SessionStore ranges may be in
> reference to the window start time or end time (or different
> combinations of both).
>
> I suspect that your instinct to cover both stores with the
> same Query is correct, and we just need to be specific about
> the kinds of ranges we're talking about. For example,
> instead of withWindowRange, maybe we would have:
>
> withWindowStartRange(
>   Instant earliestWindowStartTime,
>   Instant latestWindowStartTime
> );
>
> Then in the future, we could add stuff like:
>
> withWindowStartAndEndRange(
>   Instant earliestWindowStartTime,
>   Instant latestWindowEndTime
> );
>
> Etc.
>
> 5. I think Luke's question reveals how weirdly similar but
> not the same these two queries are. It's not your fault,
> this is inherited from the existing set of weirdly-similar-
> but-not-the-same store methods. The thing that distinguishes
> the WindowKeyQuery from the WindowRangeQuery is:
> * WindowKeyQuery: all results share the same K key (cf point
> 2: I don't think we need `WindowRangeQuery#withKey(k)`).
> Therefore, the iterator is just (windowStartTime, value)
> pairs.
> * WindowRangeQuery: the results may have different keys, so
> the iterator is (Windowed, value) pairs, where
> Windowed is basically a (windowStartTime, K key) pair.
>
> Therefor

Re: [DISCUSS] KIP-785 Automatic storage formatting

2021-12-13 Thread Colin McCabe
Hi Igor,

Disk failures have always required manual intervention in Kafka. For example, 
if you are using a RAID array and a disk goes bad, you will need to remove the 
disk, insert a new one, and start the RAID rebuild process. Kafka can't help 
with this process since we don't operate at this hardware level.

In the case where we are using JBOD, you would need to remove the bad disk, add 
a new good disk, reformat, and then restart the broker. The reformatting stage 
is the quickest stage and doesn't add a lot of overhead.

It is generally a very bad idea to re-add a bad disk to a node. The behavior of 
a failing disk is usually pathological. In addition to not persisting the data, 
you often get very slow operations, kernel errors, and unpredictable behavior. 
Certainly automatically re-adding a potentially bad disk doesn't do anyone any 
favors.

best,
Colin

On Thu, Dec 9, 2021, at 14:35, Igor Soarez wrote:
> Hi Colin,
>
> Thank you for your kind and thoughtful reply.
>
> Thank also you for clarifying why it is important to distinguish 
> between disk problems and first boot for a log directory. I completely 
> agree that loosing all metadata is a very serious issue and we should 
> strive to make that as least likely to happen as possible.
>
> Currently, the storage format step is simply ensuring each log 
> directory exists and creating a meta.properties file with clusterId and 
> nodeId in each configured log directory. The nodeId is already a 
> configuration property and clusterId is being proposed in this KIP as a 
> new one. The bootstrapping information generated by the format step can 
> optionally be made redundant. So if I understand correctly, in the 
> scenario you describe of when disks "erroneously show up as blank", 
> when the KafkaRaftServer starts, we are relying on the existence of 
> this file to prevent disaster and halt the system until there is manual 
> intervention.
>
> Currently, all the log directories must be formatted - not just the 
> metadata directory - that is, all log directories must contain 
> `metadata.properties`. This is validated in 
> BrokerMetadataCheckpoint.getBrokerMetadataAndOfflineDirs. The 
> validation that this file exists *in every log directory* is only done 
> when the “controller” role is in effect, and that includes “broker, 
> controller”. This means we currently require the external storage 
> format step to run when a non metadata disk is replaced, which just 
> seems unnecessary.
>
> Many of the ways disks fail do not enable this scenario where data is 
> lost. The disk might be unmounted, become read-only, or otherwise 
> generate IO failures. In any of these cases, an automatic step to 
> format the log directory would also fail and prevent an amnesiac 
> metadata quorum. To risk data loss in the scenario you describe we need 
> the disk to be available and usable but also blank. I see think of 
> use-cases here where this isn't a concern, such as a) a platform where 
> disks are slow to be repaired and replaced or b) if the controller 
> group is large enough to make simultaneous disk failure in a quorum 
> highly unlikely. In such cases, a non default option to disable this 
> metadata.properties pre-existence guard can have a net positive value.
>
> I am aware of the similar initialization steps for other systems. I’m 
> however having some difficulty envisioning always requiring manual 
> intervention upon disk failure in general as a desirable solution in 
> Kafka. Not having an automated way to deal with unformatted log 
> directories means that an operator then needs to intervene and run this 
> command before the instance is operational again. Unless it's actually 
> protecting the user, Kafka shouldn't be any more difficult to use than 
> necessary.
>
> Please, let me know your thoughts on this.
>
> Best,
>
> --
> Igor
>
>
>> On 2 Dec 2021, at 22:52, Colin McCabe  wrote:
>> 
>> Hi Igor,
>> 
>> It is common for databases, filesystems, and other similar programs to 
>> require a formatting step before they are used. For example, postgres 
>> requires you to run initdb. Linux requires you to run mkfs before using a 
>> filesystem. Windows requires you to run "format c:/", or something 
>> equivalent. Ceph requires you to run the ceph-deploy tool or a similar tool. 
>> It's really not a high operational burden because it only has to be done 
>> once when the system is initialized.
>> 
>> With a clearly defined initialization step, you can clearly distinguish disk 
>> problems from simply the first startup of a cluster. This is actually quite 
>> important to the correctness of the system. For example, if I start up two 
>> out of three Raft nodes and their disks erroneously show up as blank, I 
>> could elect a leader with an empty log. In that case, I've silently lost all 
>> the metadata in the system.
>> 
>> In general, there is a bootstrapping problem where brokers may not be able 
>> to connect to the controller quorum without firs

Re: [VOTE] KIP-806: Add session and window query over KV-store in IQv2

2021-12-13 Thread John Roesler
Hi Patrick, thanks for the KIP!

I hope you, Guozhang, and Luke don't mind if I share some
thoughts:

1. I think you just meant to remove that private constructor
from the KIP, right?

2. I think WindowRangeQuery#withWindowRage(windowLower,
windowUpper) is the version that yields an iterator over
both keys and windows, so it's not totally redundant.

However, it does seem like WindowRangeQuery#withKey is
equivalent to WindowKeyQuery#withKey . I overlooked it
before, but we probably don't need both.

Stepping back, it seems like we could either just de-
duplicate that specific withKey query or we could even just
merge both use cases into the WindowRangeQuery. I've
personally never been convinced that the extra complexity of
having the WindowStoreIterator is worth the savings of not
duplicating the key on each record. Maybe it would be if we
separately deserialized the same key over and over again,
but that seems optimizable.

On the other hand, that's more work, and the focus for IQv2
right now is to just get some MVP of queries implemented to
cover basic use cases, so it might be worthwhile to keep it
simple (by just dropping `WindowRangeQuery#withKey`). I'm
happy with whatever call Patrick wants to make here, since
it's his work to do.

3. I got the impression that WindowRangeQuery was intended
for use with both Window and Session stores, but that might
be my assumption.

4. That's good feedback. I think those names were inspired
by the WindowStore methods that just say stuff like
`timeFrom` and `timeTo`. The SessionStore does a better job
of being unambiguous, since it says stuff like
`earliestSessionEndTime` and `latestSessionStartTime`.

And, actually, that brings up a point that I think we
overlooked before. While the method signatures of the
WindowStore and SessionStore methods look the same, the
meanings of their arguments are not the same. In particular,
the WindowStore ranges are always in reference to the window
start time, while the SessionStore ranges may be in
reference to the window start time or end time (or different
combinations of both).

I suspect that your instinct to cover both stores with the
same Query is correct, and we just need to be specific about
the kinds of ranges we're talking about. For example,
instead of withWindowRange, maybe we would have:

withWindowStartRange(
  Instant earliestWindowStartTime,
  Instant latestWindowStartTime
);

Then in the future, we could add stuff like:

withWindowStartAndEndRange(
  Instant earliestWindowStartTime,
  Instant latestWindowEndTime
);

Etc.

5. I think Luke's question reveals how weirdly similar but
not the same these two queries are. It's not your fault,
this is inherited from the existing set of weirdly-similar-
but-not-the-same store methods. The thing that distinguishes
the WindowKeyQuery from the WindowRangeQuery is:
* WindowKeyQuery: all results share the same K key (cf point
2: I don't think we need `WindowRangeQuery#withKey(k)`).
Therefore, the iterator is just (windowStartTime, value)
pairs.
* WindowRangeQuery: the results may have different keys, so
the iterator is (Windowed, value) pairs, where
Windowed is basically a (windowStartTime, K key) pair.

Therefore, we could support a WindowRangeQuery with a fixed
key, but it would just be another way to express the same
thing as the WindowKeyQuery with the same parameters.

As in point 2, I do think it might be worth converging all
use cases into the WindowRangeQuery, but we can also just
shoot for parity now and defer elegance for the future.

6. I'll just add one question of my own here: If we do keep
both query classes, then I think we would drop `withKey` and
`getKey` from the WindowRangeQuery. But if we do wind up
keeping the WindowRangeQuery class, I think we should
reconsider the name of the getter, since there are other
range queries that support a range of keys. Is there a
getter name we can use that would still work if we come back
later and add lower and upper key bounds?

Thanks again for the KIP!
John

On Mon, 2021-12-13 at 16:35 +0800, Luke Chen wrote:
> Hi Patrick,
> 
> Thanks for the KIP!
> 
> I have some comments, in addition to Guozhang's comments:
> 4. The parameter names `windowLower` and `windowUpper` are kind of
> ambiguous to me. Could we come up a better name for it, like
> `windowStartTime`, `windowEndTime`, or even we don't need the "window"
> name, just `startTime` and `endTime`?
> 5. Why can't we support window range query with a key within a time range?
> You might need to explain in the KIP.
> 
> Thank you.
> Luke
> 
> 
> On Sat, Dec 11, 2021 at 7:54 AM Guozhang Wang  wrote:
> 
> > Hi Patrick,
> > 
> > I made a pass on the KIP and have a few comments below:
> > 
> > 1. The `WindowRangeQuery` has a private constructor while the
> > `WindowKeyQuery` has not, is that intentional?
> > 
> > 2. The `WindowRangeQuery` seems not allowing to range over both window and
> > key, but only window with a fixed key, in that case it seems pretty much
> > the same a

Re: [VOTE] KIP-769: Connect APIs to list all connector plugins and retrieve their configuration definitions

2021-12-13 Thread Ryanne Dolan
+1 (non-binding)

Ryanne

On Mon, Dec 13, 2021, 4:18 AM Mickael Maison 
wrote:

> Hi all,
>
> I'd like to start a vote on KIP-769 which proposes adding new
> endpoints to the Connect REST API to list all connectors plugins and
> retrieve their configurations.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions
>
> Please take a look and let me know if you have any feedback.
>
> Thanks,
> Mickael
>


Re: [DISCUSS] KIP-714: Client metrics and observability

2021-12-13 Thread Ryanne Dolan
I think we should be very careful about introducing new runtime
dependencies into the clients. Historically this has been rare and
essentially necessary (e.g. compression libs).

Ryanne

On Mon, Dec 13, 2021, 1:06 PM Kirk True  wrote:

> Hi Jun,
>
> On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote:
> > 13. Using OpenTelemetry. Does that require runtime dependency
> > on OpenTelemetry library? How good is the compatibility story
> > of OpenTelemetry? This is important since an application could have other
> > OpenTelemetry dependencies than the Kafka client.
>
> The current design is that the OpenTelemetry JARs would ship with the
> client. Perhaps we can design the client such that the JARs aren't even
> loaded if the user has opted out. The user could even exclude the JARs from
> their dependencies if they so wished.
>
> I can't speak to the compatibility of the libraries. Is it possible that
> we include a shaded version?
>
> Thanks,
> Kirk
>
> >
> > 14. The proposal listed idempotence=true. This is more of a configuration
> > than a metric. Are we including that as a metric? What other
> configurations
> > are we including? Should we separate the configurations from the metrics?
> >
> > Thanks,
> >
> > Jun
> >
> > On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill 
> wrote:
> >
> > > Hey Bob,
> > >
> > > That's a good point.
> > >
> > > Request type labels were considered but since they're already tracked
> by
> > > broker-side metrics
> > > they were left out as to avoid metric duplication, however those
> metrics
> > > are not per connection,
> > > so they won't be that useful in practice for troubleshooting specific
> > > client instances.
> > >
> > > I'll add the request_type label to the relevant metrics.
> > >
> > > Thanks,
> > > Magnus
> > >
> > >
> > > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett
> > > :
> > >
> > > > Hi Magnus,
> > > >
> > > > Thanks for the thorough KIP, this seems very useful.
> > > >
> > > > Would it make sense to include the request type as a label for the
> > > > `client.request.success`, `client.request.errors` and
> > > `client.request.rtt`
> > > > metrics? I think it would be very useful to see which specific
> requests
> > > are
> > > > succeeding and failing for a client. One specific case I can think of
> > > where
> > > > this could be useful is producer batch timeouts. If a Java
> application
> > > does
> > > > not enable producer client logs (unfortunately, in my experience this
> > > > happens more often than it should), the application logs will only
> > > contain
> > > > the expiration error message, but no information about what is
> causing
> > > the
> > > > timeout. The requests might all be succeeding but taking too long to
> > > > process batches, or metadata requests might be failing, or some or
> all
> > > > produce requests might be failing (if the bootstrap servers are
> reachable
> > > > from the client but one or more other brokers are not, for example).
> If
> > > the
> > > > cluster operator is able to identify the specific requests that are
> slow
> > > or
> > > > failing for a client, they will be better able to diagnose the issue
> > > > causing batch timeouts.
> > > >
> > > > One drawback I can think of is that this will increase the
> cardinality of
> > > > the request metrics. But any given client is only going to use a
> small
> > > > subset of the request types, and since we already have partition
> labels
> > > for
> > > > the topic-level metrics, I think request labels will still make up a
> > > > relatively small percentage of the set of metrics.
> > > >
> > > > Thanks,
> > > > Bob
> > > >
> > > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass <
> > > > viktorsomo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Magnus,
> > > > >
> > > > > I think this is a very useful addition. We also have a similar (but
> > > much
> > > > > more simplistic) implementation of this. Maybe I missed it in the
> KIP
> > > but
> > > > > what about adding metrics about the subscription cache itself?
> That I
> > > > think
> > > > > would improve its usability and debuggability as we'd be able to
> see
> > > its
> > > > > performance, hit/miss rates, eviction counts and others.
> > > > >
> > > > > Best,
> > > > > Viktor
> > > > >
> > > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill <
> mag...@edenhill.se>
> > > > > wrote:
> > > > >
> > > > > > Hi Mickael,
> > > > > >
> > > > > > see inline.
> > > > > >
> > > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael Maison <
> > > > > > mickael.mai...@gmail.com
> > > > > > >:
> > > > > >
> > > > > > > Hi Magnus,
> > > > > > >
> > > > > > > I see you've addressed some of the points I raised above but
> some
> > > (4,
> > > > > > > 5) have not been addressed yet.
> > > > > > >
> > > > > >
> > > > > > Re 4) How will the user/app know metrics are being sent.
> > > > > >
> > > > > > One possibility is to add a JMX metric (thus for user
> consumption)
> > > for
> > > > > the
> > > > > > number of metric pushes the

[jira] [Resolved] (KAFKA-13540) UniformStickyPartitioner leads to uneven Kafka partitions

2021-12-13 Thread Artem Livshits (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Artem Livshits resolved KAFKA-13540.

Resolution: Duplicate

See also 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner

> UniformStickyPartitioner leads to uneven Kafka partitions
> -
>
> Key: KAFKA-13540
> URL: https://issues.apache.org/jira/browse/KAFKA-13540
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, producer 
>Affects Versions: 2.4.1
>Reporter: nk2242696
>Priority: Major
> Attachments: MicrosoftTeams-image (1).png
>
>
> Kafka Topic with 20 partitions, 24 hour TTL. Replication factor of 3 . 
> Using UniformStickyPartitioner expected size of each partition to be roughly 
> of same size But realised size for some of the partitions is almost double .
> !MicrosoftTeams-image (1).png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [DISCUSS] Apache Kafka 3.1.0 release

2021-12-13 Thread Jason Gustafson
Hi David,

I think we should get https://issues.apache.org/jira/browse/KAFKA-13488.
This is a follow-up to https://issues.apache.org/jira/browse/KAFKA-12257,
which was previously considered a 3.0 blocker. Without the additional
patch, the bug causing the consumer to get stuck can still occur in a
common scenario.

Thanks,
Jason

On Tue, Dec 7, 2021 at 11:19 PM David Jacot 
wrote:

> Hi Justine,
>
> I am +1 on getting this in the 3.1 release as it is a serious
> regression in clusters with a high number of partitions.
>
> Thanks,
> David
>
> On Tue, Dec 7, 2021 at 10:39 PM Justine Olshan
>  wrote:
> >
> > Hi all,
> > I've filed a bug for an extra map allocation that is used in the fetch
> > path. https://issues.apache.org/jira/browse/KAFKA-13512
> > I think it qualifies as a blocker since this path is used pretty
> frequently
> > and it looks to be a regression.
> >
> > I also have a PR open to fix the issue. With this change, the performance
> > looks much better. https://github.com/apache/kafka/pull/11576
> > Thanks,
> > Justine
> >
> > On Fri, Dec 3, 2021 at 5:29 AM David Jacot 
> > wrote:
> >
> > > Hi Rajini,
> > >
> > > Interesting bug. The patch seems to be low risk so I suppose that
> > > it is fine to keep it in 3.1.0.
> > >
> > > Thanks,
> > > David
> > >
> > > On Fri, Dec 3, 2021 at 2:26 PM David Jacot 
> wrote:
> > > >
> > > > Hi Colin,
> > > >
> > > > Thanks for the heads up. It makes sense to include it in order
> > > > to keep the KRaft inline with ZK behavior.
> > > >
> > > > Thanks,
> > > > David
> > > >
> > > > On Fri, Dec 3, 2021 at 9:44 AM Rajini Sivaram <
> rajinisiva...@gmail.com>
> > > wrote:
> > > > >
> > > > > Hi David,
> > > > >
> > > > > Sorry, I had completely forgotten about code freeze and merged
> > > > > https://issues.apache.org/jira/browse/KAFKA-13461 to 3.1 branch
> > > yesterday.
> > > > > Can you take a look and see if we want it in 3.1.0? It is not a
> > > regression
> > > > > in 3.1, but we see this issue in tests and when it happens, the
> > > controller
> > > > > no longer operates as a controller.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Rajini
> > > > >
> > > > > On Thu, Dec 2, 2021 at 10:56 PM Colin McCabe 
> > > wrote:
> > > > >
> > > > > > Hi David,
> > > > > >
> > > > > > We'd like to include "KAFKA-13490: Fix createTopics and
> > > > > > incrementalAlterConfigs for KRaft mode #11416" in the upcoming
> > > release.
> > > > > > This fixes some bugs in how createTopics and
> incrementalAlterConfigs
> > > are
> > > > > > handled by the controller. It is specific to KRaft, so will not
> > > affect ZK
> > > > > > mode.
> > > > > >
> > > > > > best,
> > > > > > Colin
> > > > > >
> > > > > > On Wed, Nov 24, 2021, at 01:20, David Jacot wrote:
> > > > > > > Hi Mickael,
> > > > > > >
> > > > > > > Thanks for reporting it. It makes sense to include it in the
> 3.1
> > > release
> > > > > > > as well as it is a regression.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > David
> > > > > > >
> > > > > > > On Tue, Nov 23, 2021 at 6:52 PM Mickael Maison <
> > > mickael.mai...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >> Hi David,
> > > > > > >>
> > > > > > >> Can we also consider
> > > https://issues.apache.org/jira/browse/KAFKA-13397?
> > > > > > >> It's essentially a regression but in a very specific case. To
> hit
> > > it,
> > > > > > >> you must be running MirrorMaker in dedicated mode and have
> > > changed the
> > > > > > >> separator of the default replication policy.
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >> Mickael
> > > > > > >>
> > > > > > >> On Tue, Nov 23, 2021 at 4:58 PM David Jacot
> > > 
> > > > > > wrote:
> > > > > > >> >
> > > > > > >> > Hi Ron,
> > > > > > >> >
> > > > > > >> > Thank you for reaching out about this. While this is clearly
> > > not a
> > > > > > >> > regression, I agree with including it in 3.1 in order to
> have
> > > proper
> > > > > > >> > and correct configuration constraints for KRaft. You can
> > > proceed.
> > > > > > >> >
> > > > > > >> > Cheers,
> > > > > > >> > David
> > > > > > >> >
> > > > > > >> > On Tue, Nov 23, 2021 at 2:55 PM Ron Dagostino <
> > > rndg...@gmail.com>
> > > > > > wrote:
> > > > > > >> > >
> > > > > > >> > > Hi David.  I would like to nominate
> > > > > > >> > >
> > > https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-13456
> > > > > > >> > > "Tighten KRaft config checks/constraints" as a 3.1.0
> > > blocker.  The
> > > > > > >> > > existing configuration constraints/checks related to KRaft
> > > currently
> > > > > > >> > > do not eliminate certain illegal configuration
> combinations.
> > > The
> > > > > > >> > > illegal combinations do not cause harm at the moment, but
> we
> > > would
> > > > > > >> > > like to implement constraints in 3.1.0 to catch them while
> > > KRaft is
> > > > > > >> > > still in Preview.  We could add these additional checks
> later
> > > in
> > > > > > 3.2.x
> > > > > > >> > > instead, but we would like to add these as early as
> possible

Re: [DISCUSS] KIP-782: Expandable batch size in producer

2021-12-13 Thread Jun Rao
Hi, Lucas,

Thanks for the reply. It would be useful to summarize the benefits of a
separate batch.max.size. To me, it's not clear why a user would want two
different batch sizes. In your example, I can understand why a user would
want to form a batch with a 5ms linger. But why would a user prefer 16K
batches with 5ms linger, if say 256K is deemed best for throughput?

Thanks,

Jun

On Fri, Dec 10, 2021 at 4:35 PM Lucas Bradstreet 
wrote:

> Hi Jun,
>
> One difference compared to increasing the default batch size is that users
> may actually prefer smaller batches but it makes much less sense to
> accumulate many small batches if a batch is already sending.
>
> For example, imagine a user that prefer 16K batches with 5ms linger.
> Everything is functioning normally and 16KB batches are being sent. Then
> there's a 500ms blip for that broker. Do we want to continue to accumulate
> 16KB batches, each of which requires a round trip, or would we prefer to
> accumulate larger batches while sending is blocked?
>
> I'm not hugely against increasing the default batch.size in general, but
> batch.max.size does seem to have some nice properties.
>
> Thanks,
>
> Lucas
>
> On Fri, Dec 10, 2021 at 9:42 AM Jun Rao  wrote:
>
> > Hi, Artem, Luke,
> >
> > Thanks for the reply.
> >
> > 11. If we get rid of batch.max.size and increase the default batch.size,
> > it's true the behavior is slightly different than before. However, does
> > that difference matter to most of our users? In your example, if a user
> > sets linger.ms to 100ms and thinks 256KB is good for throughput, does it
> > matter to deliver any batch smaller than 256KB before 100ms? I also find
> it
> > a bit hard to explain to our users these 3 different settings related to
> > batch size.
> >
> > Thanks,
> >
> > Jun
> >
> > On Thu, Dec 9, 2021 at 5:47 AM Luke Chen  wrote:
> >
> > > Hi Jun,
> > >
> > > 11. In addition to Artem's comment, I think the reason to have
> additional
> > > "batch.max.size" is to have more flexibility to users.
> > > For example:
> > > With linger.ms=100ms, batch.size=16KB, now, we have 20KB of data
> coming
> > to
> > > a partition within 50ms. Now, sender is ready to pick up the batch to
> > send.
> > > In current design, we send 16KB data to broker, and keep the remaining
> > 4KB
> > > in the producer, to keep accumulating data.
> > > But after this KIP, user can send the whole 20KB of data together. That
> > is,
> > > user can decide if they want to accumulate more data before the sender
> is
> > > ready, and send them together, to have higher throughput. The
> > > "batch.size=16KB" in the proposal, is more like a soft limit, (and
> > > "batch.max.size" is like a hard limit), or it's like a switch to enable
> > the
> > > batch to become ready. Before sender is ready, we can still accumulate
> > more
> > > data, and wrap them together to send to broker.
> > >
> > > User can increase "batch.size" to 20KB to achieve the same goal in the
> > > current design, of course. But you can imagine, if the data within
> 100ms
> > is
> > > just 18KB, then the batch of data will wait for 100ms passed to be sent
> > > out. This "batch.max.size" config will allow more flexible for user
> > config.
> > >
> > > Does that make sense?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Thu, Dec 9, 2021 at 7:53 AM Artem Livshits
> > >  wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > 11. That was my initial thinking as well, but in a discussion some
> > people
> > > > pointed out the change of behavior in some scenarios.  E.g. if
> someone
> > > for
> > > > some reason really wants batches to be at least 16KB and sets large
> > > > linger.ms, and most of the time the batches are filled quickly
> enough
> > > and
> > > > they observe a certain latency.  Then they upgrade their client with
> a
> > > > default size 256KB and the latency increases.  This could be seen as
> a
> > > > regression.  It could be fixed by just reducing linger.ms to specify
> > the
> > > > expected latency, but still could be seen as a disruption by some
> > users.
> > > > The other reason to have 2 sizes is to avoid allocating large buffers
> > > > upfront.
> > > >
> > > > -Artem
> > > >
> > > > On Wed, Dec 8, 2021 at 3:07 PM Jun Rao 
> > wrote:
> > > >
> > > > > Hi, Artem,
> > > > >
> > > > > Thanks for the reply.
> > > > >
> > > > > 11. Got it. To me, batch.size is really used for throughput and not
> > for
> > > > > latency guarantees. There is no guarantee when 16KB will be
> > > accumulated.
> > > > > So, if users want any latency guarantee, they will need to specify
> > > > > linger.ms accordingly.
> > > > > Then, batch.size can just be used to tune for throughput.
> > > > >
> > > > > 20. Could we also describe the unit of compression? Is
> > > > > it batch.initial.size, batch.size or batch.max.size?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, Dec 8, 2021 at 9:58 AM Artem Livshits
> > > > >  wrote:
> > > > >
> > > > > > Hi Jun,
> > > > > >
> > 

Re: [DISCUSS] KIP-714: Client metrics and observability

2021-12-13 Thread Kirk True
Hi Jun,

On Thu, Dec 9, 2021, at 2:28 PM, Jun Rao wrote:
> 13. Using OpenTelemetry. Does that require runtime dependency
> on OpenTelemetry library? How good is the compatibility story
> of OpenTelemetry? This is important since an application could have other
> OpenTelemetry dependencies than the Kafka client.

The current design is that the OpenTelemetry JARs would ship with the client. 
Perhaps we can design the client such that the JARs aren't even loaded if the 
user has opted out. The user could even exclude the JARs from their 
dependencies if they so wished.

I can't speak to the compatibility of the libraries. Is it possible that we 
include a shaded version?

Thanks,
Kirk

> 
> 14. The proposal listed idempotence=true. This is more of a configuration
> than a metric. Are we including that as a metric? What other configurations
> are we including? Should we separate the configurations from the metrics?
> 
> Thanks,
> 
> Jun
> 
> On Mon, Nov 29, 2021 at 7:34 AM Magnus Edenhill  wrote:
> 
> > Hey Bob,
> >
> > That's a good point.
> >
> > Request type labels were considered but since they're already tracked by
> > broker-side metrics
> > they were left out as to avoid metric duplication, however those metrics
> > are not per connection,
> > so they won't be that useful in practice for troubleshooting specific
> > client instances.
> >
> > I'll add the request_type label to the relevant metrics.
> >
> > Thanks,
> > Magnus
> >
> >
> > Den tis 23 nov. 2021 kl 19:20 skrev Bob Barrett
> > :
> >
> > > Hi Magnus,
> > >
> > > Thanks for the thorough KIP, this seems very useful.
> > >
> > > Would it make sense to include the request type as a label for the
> > > `client.request.success`, `client.request.errors` and
> > `client.request.rtt`
> > > metrics? I think it would be very useful to see which specific requests
> > are
> > > succeeding and failing for a client. One specific case I can think of
> > where
> > > this could be useful is producer batch timeouts. If a Java application
> > does
> > > not enable producer client logs (unfortunately, in my experience this
> > > happens more often than it should), the application logs will only
> > contain
> > > the expiration error message, but no information about what is causing
> > the
> > > timeout. The requests might all be succeeding but taking too long to
> > > process batches, or metadata requests might be failing, or some or all
> > > produce requests might be failing (if the bootstrap servers are reachable
> > > from the client but one or more other brokers are not, for example). If
> > the
> > > cluster operator is able to identify the specific requests that are slow
> > or
> > > failing for a client, they will be better able to diagnose the issue
> > > causing batch timeouts.
> > >
> > > One drawback I can think of is that this will increase the cardinality of
> > > the request metrics. But any given client is only going to use a small
> > > subset of the request types, and since we already have partition labels
> > for
> > > the topic-level metrics, I think request labels will still make up a
> > > relatively small percentage of the set of metrics.
> > >
> > > Thanks,
> > > Bob
> > >
> > > On Mon, Nov 22, 2021 at 2:08 AM Viktor Somogyi-Vass <
> > > viktorsomo...@gmail.com>
> > > wrote:
> > >
> > > > Hi Magnus,
> > > >
> > > > I think this is a very useful addition. We also have a similar (but
> > much
> > > > more simplistic) implementation of this. Maybe I missed it in the KIP
> > but
> > > > what about adding metrics about the subscription cache itself? That I
> > > think
> > > > would improve its usability and debuggability as we'd be able to see
> > its
> > > > performance, hit/miss rates, eviction counts and others.
> > > >
> > > > Best,
> > > > Viktor
> > > >
> > > > On Thu, Nov 18, 2021 at 5:12 PM Magnus Edenhill 
> > > > wrote:
> > > >
> > > > > Hi Mickael,
> > > > >
> > > > > see inline.
> > > > >
> > > > > Den ons 10 nov. 2021 kl 15:21 skrev Mickael Maison <
> > > > > mickael.mai...@gmail.com
> > > > > >:
> > > > >
> > > > > > Hi Magnus,
> > > > > >
> > > > > > I see you've addressed some of the points I raised above but some
> > (4,
> > > > > > 5) have not been addressed yet.
> > > > > >
> > > > >
> > > > > Re 4) How will the user/app know metrics are being sent.
> > > > >
> > > > > One possibility is to add a JMX metric (thus for user consumption)
> > for
> > > > the
> > > > > number of metric pushes the
> > > > > client has performed, or perhaps the number of metrics subscriptions
> > > > > currently being collected.
> > > > > Would that be sufficient?
> > > > >
> > > > > Re 5) Metric sizes and rates
> > > > >
> > > > > A worst case scenario for a producer that is producing to 50 unique
> > > > topics
> > > > > and emitting all standard metrics yields
> > > > > a serialized size of around 100KB prior to compression, which
> > > compresses
> > > > > down to about 20-30% of that depending
> > > > > on compression type and topic name uniquen

Re: [DISCUSS] KIP-795: Add public APIs for AbstractCoordinator

2021-12-13 Thread Hector Geraldino (BLOOMBERG/ 919 3RD A)
Hi kafka-devs,

I would like a second review to the proposed changes on KIP-795: Add public 
APIs for AbstractCoordinator 
[ 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-795%3A+Add+public+APIs+for
+AbstractCoordinator ]

I've amended the KIP addressing Tom's feedback, and also opened a PR with the 
proposed changes [ https://github.com/apache/kafka/pull/11515 ]. There's also a 
PR for the kafka-monitor tool [ 
https://github.com/linkedin/kafka-monitor/pull/355 ] that demonstrates how we 
leveraged  AbstractCoordinator to implement a High-Availability mode for it, 
with the intention of showing how this feature can be leveraged by other 
services within the Kafka ecosystem and outside it.

Thanks for your time and consideration.
Hector
 
From: dev@kafka.apache.org At: 11/29/21 13:31:26 UTC-5:00To:  
dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-795: Add public APIs for AbstractCoordinator

Hello again Tom, kafka devs

First, congrats on becoming a PMC member! That's so cool.

Since your last reply I've updated the KIP trying to address some of your 
suggestions. A few more details have been added to the motivation section, and 
also went ahead and opened a draft pull-request with the changes I think are 
needed for this KIP, with the hope that it makes it easier to discuss the 
general approach and any other concerns the community may have.

KIP: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-795%3A+Add+public+APIs+for
+AbstractCoordinator
PR: https://github.com/apache/kafka/pull/11515

Looking forward for some community feedback.

Regards,
Hector

From: dev@kafka.apache.org At: 11/11/21 17:15:17 UTC-5:00To:  
dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-795: Add public APIs for AbstractCoordinator

Hi Tom,

Thanks for taking time reviewing the KIP. 

I think it's reasonable to ask if Kafka's Group Coordination protocol should be 
used for use cases other than the distributed event log. This was actually 
briefly addressed by Gwen Shapira during her presentation at the strangeloop 
conference in '18 (a link to the video is included in the KIP), in which she 
explain in greater details the protocol internals.

We should also keep in mind that this protocol is already being used for other 
use cases outside of core Kafka: Confluent Schema Registry uses it to determine 
leadership between members of a cluster, Kafka Connect uses it for task 
assignments, same with Kafka Stream for partition and task distributions, and 
so on. So having a public, stable API not just for new use cases (like ours) 
but existing ones is IMHO a good thing to have. I'll amend the KIP and add a 
bit more details to the motivation and alternatives sections, so the usefulness 
of this KIP is better understood.

Now, for the first point of your technical observations (regarding 
protocolTypes()), I don't think it matters in this context, as the protocol 
name and subtype are only relevant in the context of a consumer group and group 
rebalance. It really doesn't matter if two different libraries decide to name 
their protocols the same.

For item #2, I was under the impression that, because these classes all 
implement the org.apache.kafka.common.protocol.[Message, ApiMessage] interface, 
they are implicitly part of the Kafka protocol and the top-level API. Isn't 
that really the case?

And finally, for #3, the goal I had in mind when creating this KPI was a small 
one: to provide an interface that users can rely on when extending the 
AbstactCoordinator. So my thought was that, while the AbstractCoordinator 
itself uses some internal APIs (like ConsumerNetworkClient, ConsumerMetadata 
and so on) those can remain internal. But it probably makes sense to at least 
explore the possibility of moving the whole AbstractCoordinator class to be 
part of the public API. I'll do that exercise, see what it entails, and update 
the KIP with my findings.


Thanks again!
Hector


From: dev@kafka.apache.org At: 11/10/21 06:43:59 UTC-5:00To:  Hector Geraldino 
(BLOOMBERG/ 919 3RD A ) ,  dev@kafka.apache.org
Subject: Re: [DISCUSS] KIP-795: Add public APIs for AbstractCoordinator

Hi Hector,

Thanks for the KIP.

At a high level, I think the question to be answered by the community is
"Should Kafka really be providing this kind of cluster management API?".
While Kafka clients need this to provide their functionality it's a
different thing to expose that as a public API of the project, which is
otherwise about providing a distributed event log/data streaming
platform/. Having a public
API brings a significant commitment for API compatibility, which could
impair the ability of the project to change the API in order to make
improvements to the Kafka clients. The current AbstractCoordinator not
being a supported API means we don't currently have to reason about
compatibility here. So I think it would help the motivation section of the
KIP to describe in a bit more detail the use case(s) you have for
implementing your own coordinators.

Re: [VOTE] KIP-805: Add range and scan query support in IQ v2

2021-12-13 Thread Vasiliki Papavasileiou
Hello everyone,

Thank you for the discussion and for voting.

The vote on KIP-805 passes with: 3 binding +1 (John, Matthias, and
Guozhang), 1 non-binding +1 (Luke), no vetoes.
The vote is now closed.

If anyone has objections later on, please don't hesitate to let me know.

I will proceed with implementing the PR for the queries.

Thank you,
Vicky

On Mon, Dec 13, 2021 at 10:49 AM Vasiliki Papavasileiou <
vpapavasile...@confluent.io> wrote:

> Hello everyone,
>
> Thank you for your comments. I applied the changes.
> Hope things are clearer now.
>
> Best,
> Vicky
>
> On Mon, Dec 13, 2021 at 8:24 AM Luke Chen  wrote:
>
>> Hi Vicky,
>>
>> I checked the KIP again, and found there's something you might need to
>> have
>> some description.
>> The term: *scan query*, is my first time seeing this term.
>> I think it's not a publicly known term (or it's just me?), and might need
>> some description for it.
>> And for the "range query", I think you can also have some words for it.
>>
>> Thank you.
>> Luke
>>
>> On Sat, Dec 11, 2021 at 2:26 AM Guozhang Wang  wrote:
>>
>> > Thanks Vicky,
>> >
>> > I'd suggest we change the KIP title as "add range and scan query over
>> > kv-store in IQv2" just for clarification, otherwise I'm +1.
>> >
>> > Guozhang
>> >
>> > On Wed, Dec 8, 2021 at 4:18 PM Matthias J. Sax 
>> wrote:
>> >
>> > > Thanks for the KIP.
>> > >
>> > > +1 (binding)
>> > >
>> > > On 12/5/21 7:03 PM, Luke Chen wrote:
>> > > > Hi Vasiliki,
>> > > >
>> > > > Thanks for the KIP!
>> > > > It makes sense to have the range and scan query in IQv2, as in IQv1.
>> > > >
>> > > > +1 (non-binding)
>> > > >
>> > > > Thank you.
>> > > > Luke
>> > > >
>> > > > On Thu, Dec 2, 2021 at 5:41 AM John Roesler 
>> > wrote:
>> > > >
>> > > >> Thanks for the KIP, Vicky!
>> > > >>
>> > > >> I’m +1 (binding)
>> > > >>
>> > > >> -John
>> > > >>
>> > > >> On Tue, Nov 30, 2021, at 14:51, Vasiliki Papavasileiou wrote:
>> > > >>> Hello everyone,
>> > > >>>
>> > > >>> I would like to start a vote for KIP-805 that adds range and scan
>> > > >> KeyValue
>> > > >>> queries in IQ2.
>> > > >>>
>> > > >>> The KIP can be found here:
>> > > >>>
>> > > >>
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-805%3A+Add+range+and+scan+query+support+in+IQ+v2
>> > > >>>
>> > > >>> Cheers!
>> > > >>> Vicky
>> > > >>
>> > > >
>> > >
>> >
>> >
>> > --
>> > -- Guozhang
>> >
>>
>


Re: [DISCUSS] Please review the 3.1.0 blog post

2021-12-13 Thread Mickael Maison
Hi David,

It looks good, I just noticed a typo:
"KIP-733" should be "KIP-773"

Thanks

On Sun, Dec 12, 2021 at 4:05 PM David Jacot  wrote:
>
> Hi Luke,
>
> Thanks for your feedback. I have found and have fixed the issue. It was
> actually
> due to the formatting of the title of the AK 3.0 blog post.
>
> Best,
> David
>
> On Sat, Dec 11, 2021 at 9:44 AM Luke Chen  wrote:
>
> > Oh, sorry! I have a typo in your name!
> > Sorry, David! >.<
> >
> > Luke
> >
> > On Sat, Dec 11, 2021 at 4:42 PM Luke Chen  wrote:
> >
> >> Hi Davie,
> >>
> >> Thanks for drafting the release announcement post.
> >> I've checked the content, and looks good to me.
> >> But I think the header section: "What's New in Apache..." is not
> >> formatted properly.
> >> I checked the previous blog post, and it should be a hyperlink just like
> >> the "Main" kind of font.
> >>
> >> [image: image.png]
> >>
> >> Thank you.
> >> Luke
> >>
> >>
> >> On Sat, Dec 11, 2021 at 5:51 AM David Jacot 
> >> wrote:
> >>
> >>> I have put the wrong link in my previous email. Here is the public one:
> >>>
> >>>
> >>> https://blogs.apache.org/preview/kafka/?previewEntry=what-s-new-in-apache7
> >>>
> >>> Best,
> >>> David
> >>>
> >>> On Fri, Dec 10, 2021 at 10:35 PM David Jacot 
> >>> wrote:
> >>> >
> >>> > Hello all,
> >>> >
> >>> > I have prepared a draft of the release announcement post for the
> >>> > Apache Kafka 3.1.0 release:
> >>> >
> >>> >
> >>> https://blogs.apache.org/roller-ui/authoring/preview/kafka/?previewEntry=what-s-new-in-apache7
> >>> >
> >>> > I would greatly appreciate your reviews if you have a moment.
> >>> >
> >>> > Thanks,
> >>> > David
> >>>
> >>


Re: [VOTE] KIP-769: Connect APIs to list all connector plugins and retrieve their configuration definitions

2021-12-13 Thread Chris Egerton
Thanks Mickael, this looks great. +1 non-binding.

Cheers,

Chris

On Mon, Dec 13, 2021 at 5:18 AM Mickael Maison 
wrote:

> Hi all,
>
> I'd like to start a vote on KIP-769 which proposes adding new
> endpoints to the Connect REST API to list all connectors plugins and
> retrieve their configurations.
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions
>
> Please take a look and let me know if you have any feedback.
>
> Thanks,
> Mickael
>


[jira] [Created] (KAFKA-13541) Make IQv2 query/store interface type safe

2021-12-13 Thread Patrick Stuedi (Jira)
Patrick Stuedi created KAFKA-13541:
--

 Summary: Make IQv2 query/store interface type safe
 Key: KAFKA-13541
 URL: https://issues.apache.org/jira/browse/KAFKA-13541
 Project: Kafka
  Issue Type: Sub-task
Reporter: Patrick Stuedi
Assignee: Patrick Stuedi


Currently the new IQv2 interface allows applications to query state stores 
using subclasses of the Query type. Unfortunately there is currently no way 
to check that the template type of the query matches the type of the relevant 
store the query is executed on. As a consequence stores have to do a set of 
unsafe casts.

This ticket is to explore ways to make the query interface type safe where only 
type mismatches are detected at compile time.  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13540) UniformStickyPartitioner leads to uneven Kafka partitions

2021-12-13 Thread nk2242696 (Jira)
nk2242696 created KAFKA-13540:
-

 Summary: UniformStickyPartitioner leads to uneven Kafka partitions
 Key: KAFKA-13540
 URL: https://issues.apache.org/jira/browse/KAFKA-13540
 Project: Kafka
  Issue Type: Bug
  Components: clients, producer 
Affects Versions: 2.4.1
Reporter: nk2242696
 Attachments: MicrosoftTeams-image (1).png

Kafka Topic with 20 partitions, 24 hour TTL. Replication factor of 3 . 
Using UniformStickyPartitioner expected size of each partition to be roughly of 
same size But realised size for some of the partitions is almost double .

!MicrosoftTeams-image (1).png!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (KAFKA-13539) Improve propagation and processing of SSL handshake failures

2021-12-13 Thread Rajini Sivaram (Jira)
Rajini Sivaram created KAFKA-13539:
--

 Summary: Improve propagation and processing of SSL handshake 
failures
 Key: KAFKA-13539
 URL: https://issues.apache.org/jira/browse/KAFKA-13539
 Project: Kafka
  Issue Type: Bug
  Components: security
Affects Versions: 3.1.0
Reporter: Rajini Sivaram
Assignee: Rajini Sivaram
 Fix For: 3.2.0


{color:#172b4d}When server fails SSL handshake and closes its connection, we 
attempt to report this to clients on a best-effort basis. However, our tests 
assume that peer always detects the failure. This may not be the case when 
there are delays. It will be good to improve reliability of handshake failure 
reporting. {color}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


Re: [VOTE] KIP-805: Add range and scan query support in IQ v2

2021-12-13 Thread Vasiliki Papavasileiou
Hello everyone,

Thank you for your comments. I applied the changes.
Hope things are clearer now.

Best,
Vicky

On Mon, Dec 13, 2021 at 8:24 AM Luke Chen  wrote:

> Hi Vicky,
>
> I checked the KIP again, and found there's something you might need to have
> some description.
> The term: *scan query*, is my first time seeing this term.
> I think it's not a publicly known term (or it's just me?), and might need
> some description for it.
> And for the "range query", I think you can also have some words for it.
>
> Thank you.
> Luke
>
> On Sat, Dec 11, 2021 at 2:26 AM Guozhang Wang  wrote:
>
> > Thanks Vicky,
> >
> > I'd suggest we change the KIP title as "add range and scan query over
> > kv-store in IQv2" just for clarification, otherwise I'm +1.
> >
> > Guozhang
> >
> > On Wed, Dec 8, 2021 at 4:18 PM Matthias J. Sax  wrote:
> >
> > > Thanks for the KIP.
> > >
> > > +1 (binding)
> > >
> > > On 12/5/21 7:03 PM, Luke Chen wrote:
> > > > Hi Vasiliki,
> > > >
> > > > Thanks for the KIP!
> > > > It makes sense to have the range and scan query in IQv2, as in IQv1.
> > > >
> > > > +1 (non-binding)
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > > On Thu, Dec 2, 2021 at 5:41 AM John Roesler 
> > wrote:
> > > >
> > > >> Thanks for the KIP, Vicky!
> > > >>
> > > >> I’m +1 (binding)
> > > >>
> > > >> -John
> > > >>
> > > >> On Tue, Nov 30, 2021, at 14:51, Vasiliki Papavasileiou wrote:
> > > >>> Hello everyone,
> > > >>>
> > > >>> I would like to start a vote for KIP-805 that adds range and scan
> > > >> KeyValue
> > > >>> queries in IQ2.
> > > >>>
> > > >>> The KIP can be found here:
> > > >>>
> > > >>
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-805%3A+Add+range+and+scan+query+support+in+IQ+v2
> > > >>>
> > > >>> Cheers!
> > > >>> Vicky
> > > >>
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


[VOTE] KIP-769: Connect APIs to list all connector plugins and retrieve their configuration definitions

2021-12-13 Thread Mickael Maison
Hi all,

I'd like to start a vote on KIP-769 which proposes adding new
endpoints to the Connect REST API to list all connectors plugins and
retrieve their configurations.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-769%3A+Connect+APIs+to+list+all+connector+plugins+and+retrieve+their+configuration+definitions

Please take a look and let me know if you have any feedback.

Thanks,
Mickael


Re: [VOTE] KIP-784: Add top-level error code field to DescribeLogDirsResponse

2021-12-13 Thread Mickael Maison
Bumping this thread another time.

This is a very minor change to make DescribeLogDirsResponse consistent
with the other APIs.
Let me know if you have any feedback.

Thanks,
Mickael

On Mon, Nov 22, 2021 at 10:29 AM Tom Bentley  wrote:
>
> Hi Mickael,
>
> It's pretty low value, but I think consistency is a useful trait, and it's
> easily achievable here.
>
> +1 (binding).
>
> Kind regards,
>
> Tom
>
>
> On Thu, Nov 18, 2021 at 2:56 PM Mickael Maison 
> wrote:
>
> > Bumping this thread.
> >
> > Let me know if you have any feedback.
> >
> > Thanks,
> > Mickael
> >
> > On Wed, Oct 27, 2021 at 3:25 PM Luke Chen  wrote:
> > >
> > > Hi Mickael,
> > > Thanks for the KIP.
> > > It's good to keep it consistent with others, to have top-level error
> > field.
> > >
> > > + 1 (non-binding)
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Oct 27, 2021 at 9:01 PM Mickael Maison  > >
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I'd like to start the vote on this minor KIP.
> > > >
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-784%3A+Add+top-level+error+code+field+to+DescribeLogDirsResponse
> > > >
> > > > Please take a look, vote or let me know if you have any feedback.
> > > >
> > > > Thanks
> > > >
> >
> >


Re: [VOTE] KIP-806: Add session and window query over KV-store in IQv2

2021-12-13 Thread Luke Chen
Hi Patrick,

Thanks for the KIP!

I have some comments, in addition to Guozhang's comments:
4. The parameter names `windowLower` and `windowUpper` are kind of
ambiguous to me. Could we come up a better name for it, like
`windowStartTime`, `windowEndTime`, or even we don't need the "window"
name, just `startTime` and `endTime`?
5. Why can't we support window range query with a key within a time range?
You might need to explain in the KIP.

Thank you.
Luke


On Sat, Dec 11, 2021 at 7:54 AM Guozhang Wang  wrote:

> Hi Patrick,
>
> I made a pass on the KIP and have a few comments below:
>
> 1. The `WindowRangeQuery` has a private constructor while the
> `WindowKeyQuery` has not, is that intentional?
>
> 2. The `WindowRangeQuery` seems not allowing to range over both window and
> key, but only window with a fixed key, in that case it seems pretty much
> the same as the other (ignoring the constructor), since we know we would
> only have a single `key` value in the returned iterator, and hence it seems
> returning in the form of `WindowStoreIterator` is also fine as the key
> is fixed and hence no need to maintain it in the returned iterator. I'm
> wondering should we actually support ranging over keys as well in
> `WindowRangeQuery`?
>
> 3. The KIP title mentioned both session and window, but the APIs only
> involves window stores; However the return type `WindowStoreIterator` is
> only for window stores not session stores, so I feel we would still have
> some differences for session window query interface?
>
>
> Guozhang
>
> On Fri, Dec 10, 2021 at 1:32 PM Patrick Stuedi
> 
> wrote:
>
> > Hi everyone,
> >
> > I would like to start the vote for KIP-806 that adds window and session
> > query support to query KV-stores using IQv2.
> >
> > The KIP can be found here:
> > https://cwiki.apache.org/confluence/x/LJaqCw
> >
> > Skipping the discussion phase as this KIP is following the same pattern
> as
> > the previously submitted KIP-805 (KIP:
> > https://cwiki.apache.org/confluence/x/85OqCw, Discussion:
> > https://tinyurl.com/msp5mcb2). Of course concerns/comments can still be
> > brought up in this thread.
> >
> > -Patrick
> >
>
>
> --
> -- Guozhang
>


Re: [VOTE] KIP-805: Add range and scan query support in IQ v2

2021-12-13 Thread Luke Chen
Hi Vicky,

I checked the KIP again, and found there's something you might need to have
some description.
The term: *scan query*, is my first time seeing this term.
I think it's not a publicly known term (or it's just me?), and might need
some description for it.
And for the "range query", I think you can also have some words for it.

Thank you.
Luke

On Sat, Dec 11, 2021 at 2:26 AM Guozhang Wang  wrote:

> Thanks Vicky,
>
> I'd suggest we change the KIP title as "add range and scan query over
> kv-store in IQv2" just for clarification, otherwise I'm +1.
>
> Guozhang
>
> On Wed, Dec 8, 2021 at 4:18 PM Matthias J. Sax  wrote:
>
> > Thanks for the KIP.
> >
> > +1 (binding)
> >
> > On 12/5/21 7:03 PM, Luke Chen wrote:
> > > Hi Vasiliki,
> > >
> > > Thanks for the KIP!
> > > It makes sense to have the range and scan query in IQv2, as in IQv1.
> > >
> > > +1 (non-binding)
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Thu, Dec 2, 2021 at 5:41 AM John Roesler 
> wrote:
> > >
> > >> Thanks for the KIP, Vicky!
> > >>
> > >> I’m +1 (binding)
> > >>
> > >> -John
> > >>
> > >> On Tue, Nov 30, 2021, at 14:51, Vasiliki Papavasileiou wrote:
> > >>> Hello everyone,
> > >>>
> > >>> I would like to start a vote for KIP-805 that adds range and scan
> > >> KeyValue
> > >>> queries in IQ2.
> > >>>
> > >>> The KIP can be found here:
> > >>>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-805%3A+Add+range+and+scan+query+support+in+IQ+v2
> > >>>
> > >>> Cheers!
> > >>> Vicky
> > >>
> > >
> >
>
>
> --
> -- Guozhang
>