[jira] [Resolved] (KAFKA-13483) Stale Missing ISR on a partition after a zookeeper timeout

2022-06-16 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13483.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

> Stale Missing ISR on a partition after a zookeeper timeout
> --
>
> Key: KAFKA-13483
> URL: https://issues.apache.org/jira/browse/KAFKA-13483
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 2.8.1
>Reporter: F Méthot
>Priority: Major
> Fix For: 3.1.0
>
>
> We hit a situation where we had a Stale Missing ISR on a single partition on 
> an output changelog topic after a "broker to zookeeper" connection timed out 
> in our production system, This ticket shows the logs of what happened and a 
> workaround that got us out of this situation.
>  
> *Cluster config*
> 7 Kafka Brokers v2.8.1  (k8s bitnami)
> 3 Zookeeper v3.6.2 (k8s bitnami)
> kubernetes v1.20.6
>  
> *Processing pattern:*
> {code:java}
> source topic 
>  -> KStream application: update 40 stores backed by 
>-> data-##-changelog topics {code}
>  
> All topics have {*}10 partitions{*}, {*}3 replicas{*}, *min.isr 2*
> After a broker to zookeeper connection timeed out (see logs below) , lots of 
> topic's partitions ISR went missing.
> Almost all partition recovered a few milliseconds later, as the reconnection 
> to zk re-established.
> Except for partition number 3 of *one* of the 40 data-##-changelog topics
> It stayed overnight under-replicated, preventing any progress to be done from 
> the source topic's partition 3 of the kstream app. At the same time halting 
> production of data for the 39 other changelog topic on partition 3 (because 
> they also reply on partition 3 of the input topic)
> +*Successfull Workaround*+
> We ran kafka-reassign-partitions.sh on that partition, with the exact same 
> replicas config, and the ISR came back normal, in a matter of milliseconds.
> {code:java}
> kafka-reassign-partitions.sh --bootstrap-server 
> kafka.kafka.svc.cluster.local:9092 --reassignment-json-file 
> ./replicas-data-23-changlog.json {code}
> where replicas-data-23-changlog.json  contains that original ISR config
> {code:java}
> {"version":1,"partitions":[{"topic":"data-23-changelog","partition":3,"replicas":[1007,1005,1003]}]}
>  {code}
>  
> +*Questions:*+
> Would you be able to provide an explanation why that specific partition did 
> not recover like the others after the zk timeout  failure?
> Or could it be a bug?
> We are glad the workaround worked, but is there an explanation why it did?
> Otherwise what should have been done to address this issue?
>  
> +*Observed summary of the logs*+
>  
> {code:java}
> [2021-11-20 20:21:42,577] WARN Client session timed out, have not heard from 
> server in 26677ms for sessionid 0x286f5260006 
> (org.apache.zookeeper.ClientCnxn)
> [2021-11-20 20:21:42,582] INFO Client session timed out, have not heard from 
> server in 26677ms for sessionid 0x286f5260006, closing socket connection 
> and attempting reconnect (org.apache.zookeeper.ClientCnxn)
> [2021-11-20 20:21:44,644] INFO Opening socket connection to server 
> zookeeper.kafka.svc.cluster.local
> [2021-11-20 20:21:44,646] INFO Socket connection established, initiating 
> session, client: , server: zookeeper.kafka.svc.cluster.local 
> (org.apache.zookeeper.ClientCnxn)
> [2021-11-20 20:21:44,649] INFO Session establishment complete on server 
> zookeeper.kafka.svc.cluster.local, sessionid = 0x286f5260006, negotiated 
> timeout = 4 (org.apache.zookeeper.ClientCnxn)
>  
> [2021-11-20 20:21:57,133] INFO [ReplicaFetcher replicaId=1007, leaderId=1001, 
> fetcherId=0] Shutting down (kafka.server.ReplicaFetcherThread)
> [2021-11-20 20:21:57,137] INFO [ReplicaFetcher replicaId=1007, leaderId=1001, 
> fetcherId=0] Error sending fetch request (sessionId=1896541533, epoch=50199) 
> to node 1001: (org.apache.kafka.clients.FetchSessionHandler)
> java.io.IOException: Client was shutdown before response was read
>         at 
> org.apache.kafka.clients.NetworkClientUtils.sendAndReceive(NetworkClientUtils.java:109)
>         at 
> kafka.server.ReplicaFetcherBlockingSend.sendRequest(ReplicaFetcherBlockingSend.scala:110)
>         at 
> kafka.server.ReplicaFetcherThread.fetchFromLeader(ReplicaFetcherThread.scala:217)
>         at 
> kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:325)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:141)
>         at 
> kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3$adapted(AbstractFetcherThread.scala:140)
>         at scala.Option.foreach(Option.scala:407)
>         at 
> kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:140)
>         at 
> 

[jira] [Resolved] (KAFKA-13720) Few topic partitions remain under replicated after broker lose connectivity to zookeeper

2022-06-16 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-13720.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

> Few topic partitions remain under replicated after broker lose connectivity 
> to zookeeper
> 
>
> Key: KAFKA-13720
> URL: https://issues.apache.org/jira/browse/KAFKA-13720
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Affects Versions: 2.7.1
>Reporter: Dhirendra Singh
>Priority: Major
> Fix For: 3.1.0
>
>
> Few topic partitions remain under replicated after broker lose connectivity 
> to zookeeper.
> It only happens when brokers lose connectivity to zookeeper and it results in 
> change in active controller. Issue does not occur always but randomly.
> Issue never occurs when there is no change in active controller when brokers 
> lose connectivity to zookeeper.
> Following error message i found in the log file.
> [2022-02-28 04:01:20,217] WARN [Partition __consumer_offsets-4 broker=1] 
> Controller failed to update ISR to PendingExpandIsr(isr=Set(1), 
> newInSyncReplicaId=2) due to unexpected UNKNOWN_SERVER_ERROR. Retrying. 
> (kafka.cluster.Partition)
> [2022-02-28 04:01:20,217] ERROR [broker-1-to-controller] Uncaught error in 
> request completion: (org.apache.kafka.clients.NetworkClient)
> java.lang.IllegalStateException: Failed to enqueue `AlterIsr` request with 
> state LeaderAndIsr(leader=1, leaderEpoch=2728, isr=List(1, 2), 
> zkVersion=4719) for partition __consumer_offsets-4
> at kafka.cluster.Partition.sendAlterIsrRequest(Partition.scala:1403)
> at 
> kafka.cluster.Partition.$anonfun$handleAlterIsrResponse$1(Partition.scala:1438)
> at kafka.cluster.Partition.handleAlterIsrResponse(Partition.scala:1417)
> at 
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1(Partition.scala:1398)
> at 
> kafka.cluster.Partition.$anonfun$sendAlterIsrRequest$1$adapted(Partition.scala:1398)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8(AlterIsrManager.scala:166)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$handleAlterIsrResponse$8$adapted(AlterIsrManager.scala:163)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at 
> kafka.server.AlterIsrManagerImpl.handleAlterIsrResponse(AlterIsrManager.scala:163)
> at 
> kafka.server.AlterIsrManagerImpl.responseHandler$1(AlterIsrManager.scala:94)
> at 
> kafka.server.AlterIsrManagerImpl.$anonfun$sendRequest$2(AlterIsrManager.scala:104)
> at 
> kafka.server.BrokerToControllerRequestThread.handleResponse(BrokerToControllerChannelManagerImpl.scala:175)
> at 
> kafka.server.BrokerToControllerRequestThread.$anonfun$generateRequests$1(BrokerToControllerChannelManagerImpl.scala:158)
> at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:109)
> at 
> org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:586)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:578)
> at kafka.common.InterBrokerSendThread.doWork(InterBrokerSendThread.scala:71)
> at 
> kafka.server.BrokerToControllerRequestThread.doWork(BrokerToControllerChannelManagerImpl.scala:183)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:96)
>  
> under replication count goes to zero after the controller broker is restarted 
> again. but this require manual intervention.
> Expectation is that when broker reconnect with zookeeper cluster should come 
> back to stable state with under replication count as zero by itself without 
> any manual intervention.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #1013

2022-06-16 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-14006) WorkerConnector tests should be parameterized by connector type

2022-06-16 Thread Chris Egerton (Jira)
Chris Egerton created KAFKA-14006:
-

 Summary: WorkerConnector tests should be parameterized by 
connector type
 Key: KAFKA-14006
 URL: https://issues.apache.org/jira/browse/KAFKA-14006
 Project: Kafka
  Issue Type: Test
  Components: KafkaConnect
Reporter: Chris Egerton


The {{WorkerConnectorTest}} test suite has several cases that could apply for 
both sink and source connectors, but each case is only run for one or the 
other. We should parameterize these tests to run every applicable case for both 
a sink and a source connector. Any case that requires one or the other can 
become a no-op by checking for which type of connector is being used and, if 
it's not the required type, returning at the beginning of the test method.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #1012

2022-06-16 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 564565 lines...]
[2022-06-16T23:52:15.651Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:854:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:15.651Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:890:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:15.651Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:919:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:15.651Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/KStream.java:939:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:84:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:136:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Produced.java:147:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:101:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/kstream/Repartitioned.java:167:
 warning - Tag @link: reference not found: DefaultPartitioner
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java:443:
 warning - @statestore.cache.max.bytes is an unknown tag.
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java:443:
 warning - @statestore.cache.max.bytes is an unknown tag.
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/StreamsConfig.java:443:
 warning - @statestore.cache.max.bytes is an unknown tag.
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:61:
 warning - Tag @link: missing '#': "org.apache.kafka.streams.StreamsBuilder()"
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/TopologyConfig.java:61:
 warning - Tag @link: can't find org.apache.kafka.streams.StreamsBuilder() in 
org.apache.kafka.streams.TopologyConfig
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/Position.java:44:
 warning - Tag @link: can't find query(Query,
[2022-06-16T23:52:17.093Z]  PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:44:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:36:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:57:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:74:
 warning - Tag @link: can't find query(Query, PositionBound, boolean) in 
org.apache.kafka.streams.processor.StateStore
[2022-06-16T23:52:17.093Z] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_trunk/streams/src/main/java/org/apache/kafka/streams/query/QueryResult.java:110:
 warning - Tag @link: reference not found: this#getResult()
[2022-06-16T23:52:17.093Z] 

Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-16 Thread José Armando García Sancio
Hi Divij,

On Thu, Jun 16, 2022 at 1:37 AM Divij Vaidya  wrote:
> *Question#1*: Do we only track the KIPs over here that are blockers for
> release or do we track the non-KIP JIRA tickets as well?

This page documents the KIPs and Jira issues I am tracking for the
3.3.0 release.
https://cwiki.apache.org/confluence/x/-xahD

> If we don't track the JIRA tickets, please ignore the following, but if we
> do, I would like to propose that we fix/merge the following before release:
> 1. https://github.com/apache/kafka/pull/12228 -> Fixes multiple memory
> leaks.
> 2. https://github.com/apache/kafka/pull/12184 -> Fixes an edge case where a
> specific configuration for quota values could lead to errors.

This is the Jira query I am using to track issues that need to be
fixed by 3.3.0:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20KAFKA%20AND%20fixVersion%20%3D%203.3.0%20AND%20status%20not%20in%20(resolved%2C%20closed)%20ORDER%20BY%20priority%20DESC%2C%20status%20DESC%2C%20updated%20DESC%20%20%20%20%20%20

If you think those PRs need to be reviewed and merged before feature
freeze or code freeze please feel free to add 3.3.0 to the fixVersion
of the Jira. I don't have time to review those PRs this week but I'll
try to take a look next week.

> *Question#2*: As a non-committer, is there anything that I could help with
> for the release process?

Thanks for volunteering to help. I would suggest looking at the issues
in the search above and work on any issue that interests you and it is
not already assigned.
-- 
-José


Re: [DISCUSS] KIP-714: Client metrics and observability

2022-06-16 Thread Jun Rao
Hi, Kirk,

Thanks for the reply. A couple of more comments.

(1) "Another perspective is that these two sets of metrics serve different
purposes and/or have different audiences, which argues that they should
maintain their individuality and purpose. " Hmm, I am wondering if those
metrics are really for different audiences and purposes? For example, if
the operator detected an issue through a client metric collected through
the server, the operator may need to communicate that back to the client.
It would be weird if that same metric is not visible on the client side.

(2) If we could standardize the names on the server side, do we need to
enforce a naming convention for all clients?

Thanks,

Jun

On Thu, Jun 16, 2022 at 12:00 PM Kirk True  wrote:

> Hi Jun,
>
> I'll try to answer the questions posed...
>
> On Tue, Jun 7, 2022, at 4:32 PM, Jun Rao wrote:
> > Hi, Magnus,
> >
> > Thanks for the reply.
> >
> > So, the standard set of generic metrics is just a recommendation and not
> a
> > requirement? This sounds good to me since it makes the adoption of the
> KIP
> > easier.
>
> I believe that was the intent, yes.
>
> > Regarding the metric names, I have two concerns.
>
> (I'm splitting these two up for readability...)
>
> > (1) If a client already
> > has an existing metric similar to the standard one, duplicating the
> metric
> > seems to be confusing.
>
> Agreed. I'm dealing with that situation as I write the Java client
> implementation.
>
> The existing Java client exposes a set of metrics via JMX. The updated
> Java client will introduce a second set of metrics, which instead are
> exposed via sending them to the broker. There is substantial overlap with
> the two set of metrics and in a few places in the code under development,
> there are essentially two separate calls to update metrics: one for the
> JMX-bound metrics and one for the broker-bound metrics.
>
> To be candid, I have gone back-and-forth on that design. From one
> perspective, it could be argued that the set of client metrics should be
> standardized across a given client, regardless of how those metrics are
> exposed for consumption. Another perspective is that these two sets of
> metrics serve different purposes and/or have different audiences, which
> argues that they should maintain their individuality and purpose. Your
> inputs/suggestions are certainly welcome!
>
> > (2) If a client needs to implement a standard metric
> > that doesn't exist yet, using a naming convention (e.g., using dash vs
> dot)
> > different from other existing metrics also seems a bit confusing. It
> seems
> > that the main benefit of having standard metric names across clients is
> for
> > better server side monitoring. Could we do the standardization in the
> > plugin on the server?
>
> I think the expectation is that the plugin implementation will perform
> transformation of metric names, if needed, to fit in with an organization's
> monitoring naming standards. Perhaps we need to call that out in the KIP
> itself.
>
> Thanks,
> Kirk
>
> >
> > Thanks,
> >
> > Jun
> >
> >
> >
> > On Tue, Jun 7, 2022 at 6:53 AM Magnus Edenhill 
> wrote:
> >
> > > Hey Jun,
> > >
> > > I've clarified the scope of the standard metrics in the KIP, but
> basically:
> > >
> > >  * We define a standard set of generic metrics that should be relevant
> to
> > > most client implementations, e.g., each producer implementation
> probably
> > > has some sort of per-partition message queue.
> > >  * A client implementation should strive to implement as many of the
> > > standard metrics as possible, but only the ones that make sense.
> > >  * For metrics that are not in the standard set, a client maintainer
> can
> > > choose to either submit a KIP to add additional standard metrics - if
> > > they're relevant, or go ahead and add custom metrics that are specific
> to
> > > that client implementation. These custom metrics will have a prefix
> > > specific to that client implementation, as opposed to the standard
> metric
> > > set that resides under "org.apache.kafka...". E.g.,
> > > "se.edenhill.librdkafka" or whatever.
> > >  * Existing non-KIP-714 metrics should remain untouched. In some cases
> we
> > > might be able to use the same meter given it is compatible with the
> > > standard metric set definition, in other cases a semi-duplicate meter
> may
> > > be needed. Thus this will not affect the metrics exposed through JMX,
> or
> > > vice versa.
> > >
> > > Thanks,
> > > Magnus
> > >
> > >
> > >
> > > Den ons 1 juni 2022 kl 18:55 skrev Jun Rao :
> > >
> > > > Hi, Magnus,
> > > >
> > > > 51. Just to clarify my question.  (1) Are standard metrics required
> for
> > > > every client for this KIP to function?  (2) Are we converting
> existing
> > > java
> > > > metrics to the standard metrics and deprecating the old ones? If so,
> > > could
> > > > we list all existing java metrics that need to be renamed and the
> > > > corresponding new name?
> > > >
> > > > Thanks,
> > > >
> > > > 

Re: [DISCUSS] KIP-847: Add ProducerCount metrics

2022-06-16 Thread Artem Livshits
Hi Ismael,

Thank you for your feedback.  Yes, this is counting the number of producer
ids tracked by the partition and broker.  Another options I was thinking of
are the following:

- IdempotentProducerCount
- TransactionalProducerCount
- ProducerIdCount

Let me know if one of these seems better, or I'm open to other name
suggestions as well.

-Artem

On Wed, Jun 15, 2022 at 11:49 PM Ismael Juma  wrote:

> Thanks for the KIP.
>
> ProducerCount seems like a misleading name since producers without a
> producer id are not counted. Is this meant to count the number of producer
> IDs tracked by the broker?
>
> Ismael
>
> On Wed, Jun 15, 2022, 3:12 PM Artem Livshits  .invalid>
> wrote:
>
> > Hello,
> >
> > I'd like to start a discussion on the KIP-847:
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-847%3A+Add+ProducerCount+metrics
> > .
> >
> > -Artem
> >
>


Re: [VOTE] KIP-834: Pause / Resume KafkaStreams Topologies

2022-06-16 Thread Jim Hughes
Hi all,

As an update, I wanted to say that I updated my PR to include this
feedback.  Let me know if I should clarify anything on the KIP itself.

Cheers,

Jim

On Thu, Jun 2, 2022 at 12:36 AM Sophie Blee-Goldman
 wrote:

> Hey Jim, thanks for the update. I'm on the same side as Guozhang here, as
> I've expressed during
> the original discussion I think it would be confusing and possibly harmful
> to continue *any* kind of
> processing or action within Streams while it is "paused". In fact I sort of
> assumed we were including
> active task restoration under the umbrella of standby tasks when we decided
> to pause those as well,
> but since they are technically different I can see why we might want to
> consider them separately.
>
> I would say that for now we should just keep the semantics simple and
> obvious, and if users express
> a desire to pause applications from active processing but allow them to
> catch up as restoring actives,
> lagging standbys, warmup tasks, or so on then we can always add that
> functionality to the feature later on
>
> On Wed, Jun 1, 2022 at 11:41 AM Guozhang Wang  wrote:
>
> > Hello Jim,
> >
> > I think If our primary goal would be to reduce resource utilization and
> > potentially to stop the streaming pipeline for investigating possible
> bugs
> > etc, then we should also pause active tasks' restoration as well since
> that
> > 1) may still use resources, and 2) may load in bad data.
> >
> > Guozhang
> >
> >
> >
> >
> >
> >
> >
> > On Wed, Jun 1, 2022 at 5:53 AM Jim Hughes 
> > wrote:
> >
> > > Hi all,
> > >
> > > While reviewing my PR for KIP-834, Bruno noticed a case that we may not
> > > have discussed enough.*
> > >
> > > During the discussion, we decided that standby tasks would be paused.
> In
> > > order to do this, there are changes to the StoreChangelogReader around
> > > where it does restorations.  Bruno noticed that the restoration of
> active
> > > tasks is not paused in my PR.
> > >
> > > From my point of view, I was hoping to let active tasks
> > restore/consume/etc
> > > in order that the Kafka Streams instance could transition to RUNNING
> > > (assuming that it was started paused).  I believe Bruno's position is
> > that
> > > if we are pausing restoration for standby tasks, then restoration
> should
> > be
> > > paused for active tasks as well.
> > >
> > > Since this point hasn't been discussed like this, the KIP is unclear
> > about
> > > this detail.
> > >
> > > What do folks think?
> > >
> > > Thanks in advance,
> > >
> > > Jim
> > >
> > > * https://github.com/apache/kafka/pull/12161#discussion_r886732983
> > >
> > > On Mon, May 16, 2022 at 11:07 AM Jim Hughes 
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > >
> > > > With 5 binding votes (John, Bruno, Sophie, Matthias, Bill) and 4
> > > > non-binding votes (Guozhang, Luke, Leah, Walker), the vote for
> KIP-834
> > > > passes!
> > > >
> > > >
> > > > Thanks all for the great discussion.
> > > >
> > > > I have a PR up here: https://github.com/apache/kafka/pull/12161
> > > >
> > > >
> > > > Thanks in advance for feedback on the PR!
> > > >
> > > >
> > > > Cheers,
> > > >
> > > >
> > > > JIm
> > > >
> > > > On Fri, May 13, 2022 at 12:04 PM Walker Carlson
> > > >  wrote:
> > > >
> > > >> +1 from me (non-binding)
> > > >>
> > > >> Walker
> > > >>
> > > >> On Wed, May 11, 2022 at 12:36 PM Leah Thomas
> > >  > > >> >
> > > >> wrote:
> > > >>
> > > >> > Thanks Jim, great discussion. +1 from me (non-binding)
> > > >> >
> > > >> > Cheers,
> > > >> > Leah
> > > >> >
> > > >> > On Wed, May 11, 2022 at 10:14 AM Bill Bejeck 
> > > wrote:
> > > >> >
> > > >> > > Thanks for the KIP!
> > > >> > >
> > > >> > > +1 (binding)
> > > >> > >
> > > >> > > -Bill
> > > >> > >
> > > >> > > On Wed, May 11, 2022 at 9:36 AM Luke Chen 
> > > wrote:
> > > >> > >
> > > >> > > > Hi Jim,
> > > >> > > >
> > > >> > > > I'm +1. (please add some note in KIP about the stream
> resetting
> > > tool
> > > >> > > can't
> > > >> > > > be used in paused state)
> > > >> > > > Thanks for the KIP!
> > > >> > > >
> > > >> > > > Luke
> > > >> > > >
> > > >> > > > On Wed, May 11, 2022 at 9:09 AM Guozhang Wang <
> > wangg...@gmail.com
> > > >
> > > >> > > wrote:
> > > >> > > >
> > > >> > > > > Thanks Jim. +1 from me.
> > > >> > > > >
> > > >> > > > > On Tue, May 10, 2022 at 4:51 PM Matthias J. Sax <
> > > mj...@apache.org
> > > >> >
> > > >> > > > wrote:
> > > >> > > > >
> > > >> > > > > > I had one minor question on the discuss thread. It's
> mainly
> > > >> about
> > > >> > > > > > clarifying and document the user contract. I am fine
> either
> > > way.
> > > >> > > > > >
> > > >> > > > > > +1 (binding)
> > > >> > > > > >
> > > >> > > > > >
> > > >> > > > > > -Matthias
> > > >> > > > > >
> > > >> > > > > > On 5/10/22 12:32 PM, Sophie Blee-Goldman wrote:
> > > >> > > > > > > Thanks for the KIP! +1 (binding)
> > > >> > > > > > >
> > > >> > > > > > > On Tue, May 10, 2022, 12:24 PM Bruno Cadonna <
> > > >> cado...@apache.org
> > > >> > >
> > > >> > > > > 

Re: [DISCUSS] KIP-714: Client metrics and observability

2022-06-16 Thread Kirk True
Hi Jun,

I'll try to answer the questions posed...

On Tue, Jun 7, 2022, at 4:32 PM, Jun Rao wrote:
> Hi, Magnus,
> 
> Thanks for the reply.
> 
> So, the standard set of generic metrics is just a recommendation and not a
> requirement? This sounds good to me since it makes the adoption of the KIP
> easier.

I believe that was the intent, yes.

> Regarding the metric names, I have two concerns.

(I'm splitting these two up for readability...)

> (1) If a client already
> has an existing metric similar to the standard one, duplicating the metric
> seems to be confusing.

Agreed. I'm dealing with that situation as I write the Java client 
implementation.

The existing Java client exposes a set of metrics via JMX. The updated Java 
client will introduce a second set of metrics, which instead are exposed via 
sending them to the broker. There is substantial overlap with the two set of 
metrics and in a few places in the code under development, there are 
essentially two separate calls to update metrics: one for the JMX-bound metrics 
and one for the broker-bound metrics.

To be candid, I have gone back-and-forth on that design. From one perspective, 
it could be argued that the set of client metrics should be standardized across 
a given client, regardless of how those metrics are exposed for consumption. 
Another perspective is that these two sets of metrics serve different purposes 
and/or have different audiences, which argues that they should maintain their 
individuality and purpose. Your inputs/suggestions are certainly welcome! 

> (2) If a client needs to implement a standard metric
> that doesn't exist yet, using a naming convention (e.g., using dash vs dot)
> different from other existing metrics also seems a bit confusing. It seems
> that the main benefit of having standard metric names across clients is for
> better server side monitoring. Could we do the standardization in the
> plugin on the server?

I think the expectation is that the plugin implementation will perform 
transformation of metric names, if needed, to fit in with an organization's 
monitoring naming standards. Perhaps we need to call that out in the KIP itself.

Thanks,
Kirk

> 
> Thanks,
> 
> Jun
> 
> 
> 
> On Tue, Jun 7, 2022 at 6:53 AM Magnus Edenhill  wrote:
> 
> > Hey Jun,
> >
> > I've clarified the scope of the standard metrics in the KIP, but basically:
> >
> >  * We define a standard set of generic metrics that should be relevant to
> > most client implementations, e.g., each producer implementation probably
> > has some sort of per-partition message queue.
> >  * A client implementation should strive to implement as many of the
> > standard metrics as possible, but only the ones that make sense.
> >  * For metrics that are not in the standard set, a client maintainer can
> > choose to either submit a KIP to add additional standard metrics - if
> > they're relevant, or go ahead and add custom metrics that are specific to
> > that client implementation. These custom metrics will have a prefix
> > specific to that client implementation, as opposed to the standard metric
> > set that resides under "org.apache.kafka...". E.g.,
> > "se.edenhill.librdkafka" or whatever.
> >  * Existing non-KIP-714 metrics should remain untouched. In some cases we
> > might be able to use the same meter given it is compatible with the
> > standard metric set definition, in other cases a semi-duplicate meter may
> > be needed. Thus this will not affect the metrics exposed through JMX, or
> > vice versa.
> >
> > Thanks,
> > Magnus
> >
> >
> >
> > Den ons 1 juni 2022 kl 18:55 skrev Jun Rao :
> >
> > > Hi, Magnus,
> > >
> > > 51. Just to clarify my question.  (1) Are standard metrics required for
> > > every client for this KIP to function?  (2) Are we converting existing
> > java
> > > metrics to the standard metrics and deprecating the old ones? If so,
> > could
> > > we list all existing java metrics that need to be renamed and the
> > > corresponding new name?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Tue, May 31, 2022 at 3:29 PM Jun Rao  wrote:
> > >
> > > > Hi, Magnus,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 51. I think it's fine to have a list of recommended metrics for every
> > > > client to implement. I am just not sure that standardizing on the
> > metric
> > > > names across all clients is practical. The list of common metrics in
> > the
> > > > KIP have completely different names from the java metric names. Some of
> > > > them have different types. For example, some of the common metrics
> > have a
> > > > type of histogram, but the java client metrics don't use histogram in
> > > > general. Requiring the operator to translate those names and understand
> > > the
> > > > subtle differences across clients seem to cause more confusion during
> > > > troubleshooting.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, May 31, 2022 at 5:02 AM Magnus Edenhill 
> > > > wrote:
> > > >
> > > >> Den fre 20 

[jira] [Created] (KAFKA-14005) LogCleaner doesn't clean log if there is no dirty range

2022-06-16 Thread Vincent Jiang (Jira)
Vincent Jiang created KAFKA-14005:
-

 Summary: LogCleaner doesn't clean log if there is no dirty range
 Key: KAFKA-14005
 URL: https://issues.apache.org/jira/browse/KAFKA-14005
 Project: Kafka
  Issue Type: Bug
Reporter: Vincent Jiang


When there is no dirty range to clean (firstDirtyOffset == 
firstUnclenableOffset), buildOffsetMap for dirty range returns an empty offset 
map, with map.latestOffset = -1.

 

Then target cleaning offset range becomes [startOffset, map.latestOffset + 1) = 
[startOffset, 0], hence no segments are cleaned.

 

The correct cleaning offset range should be [startOffset, firstDirtyOffset], so 
that the log can be cleaned again to remove abort/commit markers, or tombstones.

 

LogCleanerTest.FakeOffsetMap.clear() method has a bug - it doesn't reset 
lastOffset. This bug causes test case like testAbortMarkerRemoval() pass 
false-positively.

  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-14004) Part 3

2022-06-16 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-14004:
-

 Summary: Part 3
 Key: KAFKA-14004
 URL: https://issues.apache.org/jira/browse/KAFKA-14004
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


This sub-task is used to track the third pull request in a series of pull 
requests which will be modifying test files in the Streams module in moving it 
from JUnit 4 to JUnit 5.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-14003) Part 2

2022-06-16 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-14003:
-

 Summary: Part 2
 Key: KAFKA-14003
 URL: https://issues.apache.org/jira/browse/KAFKA-14003
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


This sub-task is used to track the second pull request in a series of pull 
requests which will be modifying test files in the Streams module in moving it 
from JUnit 4 to JUnit 5.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-14002) Update Zookeeper client version to 3.8.0

2022-06-16 Thread Kobi Hikri (Jira)
Kobi Hikri created KAFKA-14002:
--

 Summary: Update Zookeeper client version to 3.8.0
 Key: KAFKA-14002
 URL: https://issues.apache.org/jira/browse/KAFKA-14002
 Project: Kafka
  Issue Type: Improvement
Reporter: Kobi Hikri
 Fix For: 3.2.1


We need to update kafka to use the zookeeper client version 3.8.0, as important 
bug fixes are included in it.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [VOTE] KIP-840: Config file option for MessageReader/MessageFormatter in ConsoleProducer/ConsoleConsumer

2022-06-16 Thread Alexandre Garnier
Hi everyone.

Anyone wants to give a last binding vote for this KIP?

Thanks.

Le mar. 7 juin 2022 à 14:53, Alexandre Garnier  a écrit :

> Hi!
>
> A little reminder to vote for this KIP.
>
> Thanks.
>
>
> Le mer. 1 juin 2022 à 10:58, Alexandre Garnier  a écrit
> :
> >
> > Hi everyone!
> >
> > I propose to start voting for KIP-840:
> > https://cwiki.apache.org/confluence/x/bBqhD
> >
> > Thanks,
> > --
> > Alex
>


[jira] [Resolved] (KAFKA-13873) Add ability to Pause / Resume KafkaStreams Topologies

2022-06-16 Thread Bruno Cadonna (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruno Cadonna resolved KAFKA-13873.
---
Resolution: Fixed

> Add ability to Pause / Resume KafkaStreams Topologies
> -
>
> Key: KAFKA-13873
> URL: https://issues.apache.org/jira/browse/KAFKA-13873
> Project: Kafka
>  Issue Type: New Feature
>  Components: streams
>Reporter: Jim Hughes
>Assignee: Jim Hughes
>Priority: Major
>  Labels: kip
> Fix For: 3.3.0
>
>
> In order to reduce resources used or modify data pipelines, users may want to 
> pause processing temporarily.  Presently, this would require stopping the 
> entire KafkaStreams instance (or instances).  
> This work would add the ability to pause and resume topologies.  When the 
> need to pause processing has passed, then users should be able to resume 
> processing.
> KIP-834: 
> [https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=211882832]



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (KAFKA-14001) Part 1

2022-06-16 Thread Christo Lolov (Jira)
Christo Lolov created KAFKA-14001:
-

 Summary: Part 1
 Key: KAFKA-14001
 URL: https://issues.apache.org/jira/browse/KAFKA-14001
 Project: Kafka
  Issue Type: Sub-task
Reporter: Christo Lolov


This sub-task is used to track the first pull request in a series of pull 
requests which will be modifying test files in the Streams module in moving it 
from JUnit 4 to JUnit 5.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13916) Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841)

2022-06-16 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13916?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-13916.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

> Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841)
> 
>
> Key: KAFKA-13916
> URL: https://issues.apache.org/jira/browse/KAFKA-13916
> Project: Kafka
>  Issue Type: Improvement
>Reporter: David Jacot
>Assignee: David Jacot
>Priority: Major
> Fix For: 3.3.0
>
>
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-841%3A+Fenced+replicas+should+not+be+allowed+to+join+the+ISR+in+KRaft



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] Should we automatically close stale PRs?

2022-06-16 Thread Divij Vaidya
FWIW, I think the "triaged" label is a great idea for reducing the
workload on committers. It would perhaps help in reducing our open PR count
and increase the velocity of contributions to the project.

--
Divij Vaidya



On Wed, Jun 8, 2022 at 3:49 PM Viktor Somogyi-Vass
 wrote:

> >One thing that might make sense to do maybe is to add frequent
> contributors
> >with the "triage" role, so they could label PRs they reviewed and they can
> >be taken by committers for a further review and potential merge. What do
> >you think?
>
> In addition to labeling commits as stale I think the opposite as said above
> (triage and label new PRs) is a very good idea too. We often try to review
> each other's commits in the team before/after publishing them upstream.
> Such commits I think would filter the incoming PRs well and make overall
> quality better.
> Would it be possible to do this? I'd be happy to be triager.
>
> On Sun, Feb 27, 2022 at 4:23 AM Guozhang Wang  wrote:
>
> > Hey David,
> >
> > Just reviving on this thread, do you have some final decision on this now
> > with all the feedbacks received so far?
> >
> > On Sun, Feb 13, 2022 at 8:41 PM Ismael Juma  wrote:
> >
> > > Hi David,
> > >
> > > I think it's a good idea to use the bot for auto closing stale PRs. The
> > > ideal flow would be:
> > >
> > > 1. Write a comment and add stale label
> > > 2. If user responds saying that the PR is still valid, the stale label
> is
> > > removed
> > > 3. Otherwise, the PR is closed
> > >
> > > Thanks,
> > > Ismael
> > >
> > > On Sat, Feb 5, 2022, 2:22 AM David Jacot  wrote:
> > >
> > > > Hi team,
> > > >
> > > > I find our ever growing back of PRs a little frustrating, don't
> > > > you? I just made a pass over all the list and a huge chunk
> > > > of the PRs are abandoned, outdated or irrelevant with the
> > > > current code base. For instance, we still have PRs opened
> > > > back in 2015.
> > > >
> > > > There is not a Github Action [1] for automatically marking
> > > > PRs as stale and to automatically close them as well. How
> > > > would the community feel about enabling this? I think that
> > > > we could mark a PR as stable after one year and close it
> > > > a month after if there are no new activities. Reopening a
> > > > closed PR is really easy so there is no real arm is closing
> > > > it.
> > > >
> > > > [1] https://github.com/actions/stale
> > > >
> > >
> >
> >
> > --
> > -- Guozhang
> >
>


Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-16 Thread Divij Vaidya
Hello

*Question#1*: Do we only track the KIPs over here that are blockers for
release or do we track the non-KIP JIRA tickets as well?

If we don't track the JIRA tickets, please ignore the following, but if we
do, I would like to propose that we fix/merge the following before release:
1. https://github.com/apache/kafka/pull/12228 -> Fixes multiple memory
leaks.
2. https://github.com/apache/kafka/pull/12184 -> Fixes an edge case where a
specific configuration for quota values could lead to errors.

*Question#2*: As a non-committer, is there anything that I could help with
for the release process?

Regards,
Divij Vaidya



On Wed, Jun 15, 2022 at 11:10 PM José Armando García Sancio
 wrote:

> Hi all,
>
> This is a friendly reminder that the KIP freeze date is today, June 15th,
> 2022.
>
> The feature freeze date is July 6th, 2022.
>
> Thanks,
> -José
>


Re: [DISCUSS] KIP-847: Add ProducerCount metrics

2022-06-16 Thread Ismael Juma
Thanks for the KIP.

ProducerCount seems like a misleading name since producers without a
producer id are not counted. Is this meant to count the number of producer
IDs tracked by the broker?

Ismael

On Wed, Jun 15, 2022, 3:12 PM Artem Livshits 
wrote:

> Hello,
>
> I'd like to start a discussion on the KIP-847:
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-847%3A+Add+ProducerCount+metrics
> .
>
> -Artem
>