[jira] [Created] (KAFKA-13960) ERROR Shutdown broker because all log dirs in c:\kafka\kafka-logs have failed (kafka.log.LogManager)

2022-06-03 Thread rohit k (Jira)
rohit k created KAFKA-13960:
---

 Summary: ERROR Shutdown broker because all log dirs in 
c:\kafka\kafka-logs have failed (kafka.log.LogManager)
 Key: KAFKA-13960
 URL: https://issues.apache.org/jira/browse/KAFKA-13960
 Project: Kafka
  Issue Type: Bug
  Components: admin
Affects Versions: 3.1.1
Reporter: rohit k
 Fix For: 3.1.1
 Attachments: kafka error.png

Am using Kafka for chat application with Django rest framework and react 
native. while building my API I need to make an API for group deletion. 
Whenever , I create a group , it will make a NewTopic And i made a group 
deletion where i will delete that topic . from here, I got an error and cannot 
get back. whenever , i start kafka server am getting issue " ERROR Shutdown 
broker because all log dirs in c:\kafka\kafka-logs have failed 
(kafka.log.LogManager)". I tried changing logs dirs but still error exists.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13926) Proposal to have "HasField" predicate for kafka connect

2022-06-03 Thread Kumud Kumar Srivatsava Tirupati (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kumud Kumar Srivatsava Tirupati resolved KAFKA-13926.
-
Resolution: Won't Fix

Dropping in favor of improving the existing SMTs as per the discussion.

https://lists.apache.org/thread/odbj7793plyz7xxyy6d71c3xn7zng49f

> Proposal to have "HasField" predicate for kafka connect
> ---
>
> Key: KAFKA-13926
> URL: https://issues.apache.org/jira/browse/KAFKA-13926
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Kumud Kumar Srivatsava Tirupati
>Assignee: Kumud Kumar Srivatsava Tirupati
>Priority: Major
>
> Hello,
> Today's connect predicates enables checks on the record metadata. However, 
> this can be limiting considering {*}many inbuilt and custom transformations 
> that we (community) use are more key/value centric{*}.
> Some use-cases this can solve:
>  * Data type conversions of certain pre-identified fields for records coming 
> across datasets only if those fields exist. [Ex: TimestampConverter can be 
> run only if the specified date field exists irrespective of the record 
> metadata]
>  * Skip running certain transform if a given field does/does not exist. A lot 
> of inbuilt transforms raise exceptions (Ex: InsertField transform if the 
> field already exists) thereby breaking the task. Giving this control enable 
> users to consciously configure for such cases.
>  * Even though some inbuilt transforms explicitly handle these cases, it 
> would still be an unnecessary pass-through loop.
>  * Considering each connector usually deals with multiple datasets (Even 100s 
> for a database CDC connector), metadata-centric predicate checking will be 
> somewhat limiting when we talk about such pre-identified custom metadata 
> fields in the records.
> I know some of these cases can be handled within the transforms itself but 
> that defeats the purpose of having predicates.
> We have built this predicate for us and it is found to be extremely helpful. 
> Please let me know your thoughts on the same so that I can raise a PR.
>  
> KIP: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-845%3A+%27HasField%27+predicate+for+kafka+connect



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] KIP-845: HasField predicate for kafka connect

2022-06-03 Thread Kumud Kumar Srivatsava Tirupati
Hi Chris,
Thanks for your comment. I might have misunderstood the filter SMT. Makes
sense. Dropping this KIP for now. I will look at improving the
existing SMTs separately.

*---*
*Thanks and Regards,*
*Kumud Kumar Srivatsava Tirupati*


On Sat, 4 Jun 2022 at 03:57, Chris Egerton  wrote:

> Hi Kumud,
>
> I believe using the Filter SMT with a predicate would also cause the
> record to be dropped from the pipeline. AFAIK there is no way to skip the
> entire transformation chain besides applying a predicate to every SMT in
> the chain. If there's a reasonable use case for wanting to do that based on
> the presence of a field in a record that wouldn't be addressed by updating
> the SMTs that come with Connect to be friendlier when the field they're
> acting on does/does not exist, then we definitely could continue to pursue
> the HasField predicate.
>
> Cheers,
>
> Chris
>
> On Fri, Jun 3, 2022 at 10:52 AM Kumud Kumar Srivatsava Tirupati <
> kumudkumartirup...@gmail.com> wrote:
>
>> Hi Chris,
>> Thanks for the explanation. This clears my thoughts.
>> I can now agree that the concerns are totally different for SMTs and
>> predicates.
>> I also do agree with you that this might encourage SMTs to be poorly
>> designed.
>>
>> Do you see this worth considering just for the filter use case?
>> ['errors.tolerance' would make it drop from the pipeline instead of
>> skipping from the transformation chain]. Maybe by further limiting its
>> functionality to not support nested fields. I can now discard the other
>> use-cases I mentioned in the KIP in favor of "Improving the corresponding
>> SMTs".
>>
>> *---*
>> *Thanks and Regards,*
>> *Kumud Kumar Srivatsava Tirupati*
>> *Ph : +91-8686073938*
>>
>> On Fri, 3 Jun 2022 at 18:04, Chris Egerton 
>> wrote:
>>
>>> Hi Kumud,
>>>
>>> Responses inline.
>>>
>>> > But, I still believe this logic of predicate checking shouldn't be
>>> made a
>>> part of each individual SMT. After all, that is what the predicates are
>>> for
>>> right?
>>>
>>> I don't quite agree. I think the benefit of predicates is that they can
>>> allow you to selectively apply a transformation based on conditions that
>>> don't directly relate to the behavior of the transformation. For example,
>>> "mask field 'ID' if the 'SSN' header is present". The use cases identified
>>> in the KIP don't quite align with this; instead, they're more about working
>>> around undesirable behavior in transformations that makes them difficult to
>>> use in some pipelines. If an SMT is difficult to use, it's better to just
>>> improve the SMT directly instead of adding a layer of indirection on top of
>>> it.
>>>
>>> > `HasField` predicate is something very similar but is more powerful
>>> because
>>> it allows users to skip transformations at SMT level or drop them from
>>> the
>>> transformation chain altogether using the current `
>>> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not
>>> just
>>> limited to skipping some SMTs.
>>>
>>> This use case should be included in the KIP if it's a significant
>>> motivating factor for adding this predicate. But it's also possible to drop
>>> entire records from a pipeline today by setting the 'errors.tolerance'
>>> property to 'all' and allowing all records that cause exceptions to be
>>> thrown in the transformation chain to be skipped/written to a DLQ/logged.
>>>
>>> Ultimately, I'm worried that we're encouraging bad habits with SMT
>>> developers by establishing a precedent where core logic that most users
>>> would expect to be present is left out in favor of a predicate, which makes
>>> things harder to configure and can confound new users by adding a layer of
>>> complexity in the form of another pluggable interface.
>>>
>>> Cheers,
>>>
>>> Chris
>>>
>>> On Thu, Jun 2, 2022 at 4:24 AM Kumud Kumar Srivatsava Tirupati <
>>> kumudkumartirup...@gmail.com> wrote:
>>>
 Hi Chris,
 Thanks for your comments.

 I am totally aligned with your comment on nested field names which
 include
 dots. I will incorporate the same based on how the KIP-821 discussion
 goes
 (maybe this parser could be a utility that can be reused in other areas
 as
 well).

 But, I still believe this logic of predicate checking shouldn't be made
 a
 part of each individual SMT. After all, that is what the predicates are
 for
 right?
 If you see, the current predicates that we have are:
 org.apache.kafka.connect.transforms.predicates.TopicNameMatches
 org.apache.kafka.connect.transforms.predicates.HasHeaderKey
 org.apache.kafka.connect.transforms.predicates.RecordIsTombstone
 They only allow you filter/transform based on the metadata of the
 record.

 `HasField` predicate is something very similar but is more powerful
 because
 it allows users to skip transformations at SMT level or drop them from
 the
 transformation chain altogether using the current `
 org.apache.kafka.connect.tran

Re: Re-run Jenkins build

2022-06-03 Thread Luke Chen
Hi José,

I think you should have the same permissions as I have, and I can login
jenkins.
Anyway, I can help you re-run the jenkins build.
Just point me the jenkins build link.
Or you can actually merge with the latest trunk to trigger the build.

Thank you.
Luke

On Sat, Jun 4, 2022 at 5:52 AM José Armando García Sancio
 wrote:

> Hey all,
>
> I am trying to re-run a Jenkins build. It looks like my Apache login
> doesn't work with Jenkins. Do I need to ask a Kafka PMC to add me to
> the Jenkins infrastructure? I see the following from an Apache wiki
> page.
>
> How do I get an account?
>
> Jenkins uses the Apache LDAP servers for authentication. To get access
> to Jenkins, the committer must be a member of the hudson-jobadmin
> group. This is done using the Whimsy Tool which all PMC Chairs can
> change.
>
> You must subscribe to users@infra and builds@ to receive notifications
> of Jenkins upgrades, outages, etc.
>
>
> https://cwiki.apache.org/confluence/display/INFRA/Jenkins#Jenkins-HowdoIgetanaccount
>
> Thanks,
> --
> -José
>


Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-06-03 Thread Luke Chen
Hi Jun,

Thanks for the comment.

I've updated the KIP as:
1. remainingLogsToRecover*y* -> remainingLogsToRecover
2. remainingSegmentsToRecover*y* -> remainingSegmentsToRecover
3. The description of remainingSegmentsToRecover: The remaining segments
for the current log assigned to the recovery thread.

Thank you.
Luke

On Sat, Jun 4, 2022 at 12:54 AM Jun Rao  wrote:

> Hi, Luke,
>
> Thanks for the explanation.
>
> 10. It makes sense to me now. Instead of using a longer name, perhaps we
> could keep the current name, but make the description clear that it's the
> remaining segments for the current log assigned to a thread. Also, would it
> be better to use ToRecover instead of ToRecovery?
>
> Thanks,
>
> Jun
>
> On Fri, Jun 3, 2022 at 1:18 AM Luke Chen  wrote:
>
> > Hi Jun,
> >
> > > how do we implement kafka.log
> >
> >
> :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > which tracks at the segment level?
> >
> > It looks like the name is misleading.
> > Suppose we have 2 log recovery threads
> > (num.recovery.threads.per.data.dir=2),
> > and 10 logs to iterate through
> > As mentioned before, we don't know how many segments in each log until
> the
> > log is iterated(loaded)
> > So, when thread-1 iterates logA, it gets the segments to recover, and
> > expose the number to `remainingSegmentsToRecovery` metric.
> > And the same for thread-2 iterating logB.
> > That is, the metric showed in `remainingSegmentsToRecovery` is actually
> the
> > remaining segments to recover "in a specific log".
> >
> > Maybe I should rename it: remainingSegmentsToRecovery ->
> > remainingSegmentsToRecoverInCurrentLog
> > WDYT?
> >
> > Thank you.
> > Luke
> >
> > On Fri, Jun 3, 2022 at 1:27 AM Jun Rao  wrote:
> >
> > > Hi, Luke,
> > >
> > > Thanks for the reply.
> > >
> > > 10. You are saying it's difficult to track the number of segments to
> > > recover. But how do we
> > > implement
> > >
> >
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > > which tracks at the segment level?
> > >
> > > Jun
> > >
> > > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen  wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > Thanks for the comment.
> > > >
> > > > Yes, I've tried to work on this way to track the number of remaining
> > > > segments, but it will change the design in UnifiedLog, so I only
> track
> > > the
> > > > logs number.
> > > > Currently, we will load all segments and recover those segments if
> > needed
> > > > "during creating UnifiedLog instance". And also get the log offsets
> > here
> > > > <
> > > >
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > > > >
> > > > .
> > > > That is, if we want to get all segments to be recovered before
> running
> > > log
> > > > recovery, we need to break the logic in UnifiedLog, to create a
> partial
> > > > UnifiedLog instance, and add more info to it later, which I think is
> > just
> > > > making the codes more complicated.
> > > >
> > > >
> > > > Thank you.
> > > > Luke
> > > >
> > > >
> > > >
> > > > On Thu, May 26, 2022 at 2:57 AM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Luke,
> > > > >
> > > > > Thanks for the KIP. Just one comment.
> > > > >
> > > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery,
> could
> > > we
> > > > > instead track the number of remaining segments? This monitors the
> > > > progress
> > > > > at a finer granularity and is also consistent with the thread level
> > > > metric.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley 
> > > wrote:
> > > > >
> > > > > > Thanks Luke! LGTM.
> > > > > >
> > > > > > On Sun, 22 May 2022 at 05:18, Luke Chen 
> wrote:
> > > > > >
> > > > > > > Hi Tom and Raman,
> > > > > > >
> > > > > > > Thanks for your comments.
> > > > > > >
> > > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > > updating).
> > > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > > >
> > > > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > > > >
> > > > > > > > 3. I wonder whether we need to keep these metrics (with value
> > 0)
> > > > once
> > > > > > the
> > > > > > > broker enters the running state. Do you see it as valuable? A
> > > benefit
> > > > > of
> > > > > > > removing the metrics would be a reduction on storage required
> for
> > > > > metric
> > > > > > > stores which are recording these metrics.
> > > > > > >
> > > > > > > Yes, removing the metrics after log recovery completed is a
> good
> > > > idea.
> > > > > > > Updated the KIP.
> > > > > > >
> > > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > > clearer.
> > > > > > > Previous KIPs which added metrics usually used a table, with
> the
> > > > MBean
> > > > > > > name, 

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #973

2022-06-03 Thread Apache Jenkins Server
See 




Re-run Jenkins build

2022-06-03 Thread José Armando García Sancio
Hey all,

I am trying to re-run a Jenkins build. It looks like my Apache login
doesn't work with Jenkins. Do I need to ask a Kafka PMC to add me to
the Jenkins infrastructure? I see the following from an Apache wiki
page.

How do I get an account?

Jenkins uses the Apache LDAP servers for authentication. To get access
to Jenkins, the committer must be a member of the hudson-jobadmin
group. This is done using the Whimsy Tool which all PMC Chairs can
change.

You must subscribe to users@infra and builds@ to receive notifications
of Jenkins upgrades, outages, etc.

https://cwiki.apache.org/confluence/display/INFRA/Jenkins#Jenkins-HowdoIgetanaccount

Thanks,
-- 
-José


Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-03 Thread José Armando García Sancio
Thanks for the updates everyone. I added KIP-618, KIP-841, KIP-827 and
KIP-834 to the planned KIPs for 3.3.0.

David Jacot, it looks like KIP-841 is not linked from the "Kafka
Improvement Proposals" page:
https://cwiki.apache.org/confluence/x/4QwIAw

-- 
-José


Jenkins build is unstable: Kafka » Kafka Branch Builder » trunk #972

2022-06-03 Thread Apache Jenkins Server
See 




Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-03 Thread Jim Hughes
Hi José,

Can we add KIP-834 as well?  The vote has passed and I have PR in progress.

Cheers,

Jim

On Fri, Jun 3, 2022 at 1:05 PM Mickael Maison 
wrote:

> Hi José,
>
> Can we also add KIP-827? The vote has passed and I opened a PR today.
>
> Thanks,
> Mickael
>
> On Fri, Jun 3, 2022 at 5:43 PM David Jacot 
> wrote:
> >
> > Hi José,
> >
> > KIP-841 has been accepted. Could we add it to the release plan?
> >
> > Thanks,
> > David
> >
> > On Wed, May 11, 2022 at 11:04 PM Chris Egerton 
> wrote:
> > >
> > > Hi José,
> > >
> > > Could we add KIP-618 (
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors
> )
> > > to the release plan? It's been blocked on review for almost a year
> now. If
> > > we can't make time to review it before the 3.3.0 release I'll probably
> > > close my PRs and abandon the feature.
> > >
> > > Best,
> > >
> > > Chris
> > >
> > > On Wed, May 11, 2022 at 4:44 PM José Armando García Sancio
> > >  wrote:
> > >
> > > > Great.
> > > >
> > > > I went ahead and created a release page for 3.3.0:
> > > > https://cwiki.apache.org/confluence/x/-xahD
> > > >
> > > > The planned KIP content is based on the list of KIPs targeting 3.3.0
> > > > in https://cwiki.apache.org/confluence/x/4QwIAw. Please take a look
> at
> > > > the list and let me know if I missed your KIP.
> > > >
> > > > The 3.3.0 release plan page also enumerates the potential release
> > > > dates. Here is a summary of them:
> > > > 1. KIP Freeze June 15th, 2022
> > > > 2. Feature Freeze July 6th, 2022
> > > > 3. Code Freeze July 20th, 2022
> > > >
> > > > Thanks and let me know what you think,
> > > > -José
> > > >
>


Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API

2022-06-03 Thread Cong Ding
Thank you, Mickael. One more question: are you imaging these
tooling/automation to call this API at a very low frequency? since
high-frequency calls to this API are prohibitively expensive. Can you
give some examples of low-frequency call use cases? I can think of
some high-frequency call use cases which are valid in this case, but I
had a hard time coming up with low-frequency call use cases.

The one you give in the KIP is validating whether disk resize
operations have been completed. However, this looks like a
high-frequency call use case to me because we need to keep monitoring
disk usage before and after resizing.

Cong

On Fri, Jun 3, 2022 at 5:22 AM Mickael Maison  wrote:
>
> Hi Cong,
>
> Maybe some people can do without this KIP.
> But in many cases, especially around tooling and automation, it's
> useful to be able to retrieve disk utilization values via the Kafka
> API rather than interfacing with a metrics system.
>
> Does that clarify the motivation?
>
> Thanks,
> Mickael
>
> On Wed, Jun 1, 2022 at 7:10 PM Cong Ding  wrote:
> >
> > Thanks for the explanation. I think the question is that if we have disk
> > utilization in our environment, what is the use case for KIP-827? The disk
> > utilization in our environment can already do the job. Is there anything I
> > missed?
> >
> > Thanks,
> > Cong
> >
> > On Tue, May 31, 2022 at 2:57 AM Mickael Maison 
> > wrote:
> >
> > > Hi Cong,
> > >
> > > Kafka does not expose disk utilization metrics. This is something you
> > > need to provide in your environment. You definitively should have a
> > > mechanism for exposing metrics from your Kafka broker hosts and you
> > > should absolutely monitor disk usage and have appropriate alerts.
> > >
> > > Thanks,
> > > Mickael
> > >
> > > On Thu, May 26, 2022 at 7:34 PM Jun Rao  wrote:
> > > >
> > > > Hi, Igor,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > I agree that this KIP could be useful for improving the tool for moving
> > > > data across disks. It would be useful to clarify on the main motivation
> > > of
> > > > the KIP. Also, DescribeLogDirsResponse already includes the size of each
> > > > partition on a disk. So, it seems that UsableBytes is redundant since
> > > it's
> > > > derivable.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Thu, May 26, 2022 at 3:30 AM Igor Soarez  wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This can also be quite useful to make better use of existing
> > > functionality
> > > > > in the Kafka API — moving replicas between log directories via
> > > > > ALTER_REPLICA_LOG_DIRS. If usable space information is also available
> > > the
> > > > > caller can make better decisions using the same API. It means a more
> > > > > consistent way of interacting with Kafka to manage replicas locations
> > > > > within a broker without having to correlate Kafka metrics with
> > > information
> > > > > from the Kafka API.
> > > > >
> > > > > --
> > > > > Igor
> > > > >
> > > > > On Wed, May 25, 2022, at 8:16 PM, Jun Rao wrote:
> > > > > > Hi, Mickael,
> > > > > >
> > > > > > Thanks for the KIP.  Since this is mostly for monitoring and
> > > alerting,
> > > > > > could we expose them as metrics instead of as part of the API? We
> > > already
> > > > > > have a size metric per log. Perhaps we could extend that to add
> > > > > used/total
> > > > > > metrics per disk?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Thu, May 19, 2022 at 10:21 PM Raman Verma
> > >  > > > > >
> > > > > > wrote:
> > > > > >
> > > > > >> Hello Mikael,
> > > > > >>
> > > > > >> Thanks for the KIP.
> > > > > >>
> > > > > >> I see that the API response contains some information about each
> > > > > partition.
> > > > > >> ```
> > > > > >> { "name": "PartitionSize", "type": "int64", "versions": "0+",
> > > > > >>   "about": "The size of the log segments in this partition in
> > > bytes." }
> > > > > >> ```
> > > > > >> Can this be summed up to provide a used space in a `log.dir`
> > > > > >> This will also be specific to a `log.dir` (for the case where
> > > multiple
> > > > > >> log.dir are hosted on the same underlying device)
> > > > > >>
> > > > > >> On Thu, May 19, 2022 at 10:21 AM Cong Ding
> > > 
> > > > > >> wrote:
> > > > > >> >
> > > > > >> > Hey Mickael,
> > > > > >> >
> > > > > >> > Great KIP!
> > > > > >> >
> > > > > >> > I have one question:
> > > > > >> >
> > > > > >> > You mentioned "DescribeLogDirs is usually a low volume API. This
> > > > > change
> > > > > >> > should not
> > > > > >> > significantly affect the latency of this API." and "That would
> > > allow
> > > > > to
> > > > > >> > easily validate whether disk operations (like a resize), or topic
> > > > > >> deletion
> > > > > >> > (log deletion only happen after a short delay) have completed." I
> > > > > wonder
> > > > > >> if
> > > > > >> > there is an existing metric/API that can allow administrators to
> > > > > >> determine
> > > > > >> > whether we need to resize? If adm

Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #971

2022-06-03 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 632902 lines...]
[2022-06-03T13:19:03.009Z] 
[2022-06-03T13:19:03.009Z] 
org.apache.kafka.streams.integration.SuppressionDurabilityIntegrationTest > 
shouldRecoverBufferAfterShutdown[exactly_once_v2] STARTED
[2022-06-03T13:20:00.037Z] 
[2022-06-03T13:20:00.037Z] 
org.apache.kafka.streams.integration.SuppressionDurabilityIntegrationTest > 
shouldRecoverBufferAfterShutdown[exactly_once_v2] PASSED
[2022-06-03T13:20:00.037Z] 
[2022-06-03T13:20:00.037Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftInner[caching enabled = true] STARTED
[2022-06-03T13:20:02.808Z] 
[2022-06-03T13:20:02.808Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftInner[caching enabled = true] PASSED
[2022-06-03T13:20:02.808Z] 
[2022-06-03T13:20:02.808Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftOuter[caching enabled = true] STARTED
[2022-06-03T13:20:09.766Z] 
[2022-06-03T13:20:09.766Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftOuter[caching enabled = true] PASSED
[2022-06-03T13:20:09.766Z] 
[2022-06-03T13:20:09.766Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftLeft[caching enabled = true] STARTED
[2022-06-03T13:20:16.804Z] 
[2022-06-03T13:20:16.804Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftLeft[caching enabled = true] PASSED
[2022-06-03T13:20:16.804Z] 
[2022-06-03T13:20:16.804Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterLeft[caching enabled = true] STARTED
[2022-06-03T13:20:23.869Z] 
[2022-06-03T13:20:23.869Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterLeft[caching enabled = true] PASSED
[2022-06-03T13:20:23.869Z] 
[2022-06-03T13:20:23.869Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInner[caching enabled = true] STARTED
[2022-06-03T13:20:30.992Z] 
[2022-06-03T13:20:30.992Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInner[caching enabled = true] PASSED
[2022-06-03T13:20:30.992Z] 
[2022-06-03T13:20:30.992Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuter[caching enabled = true] STARTED
[2022-06-03T13:20:38.185Z] 
[2022-06-03T13:20:38.185Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuter[caching enabled = true] PASSED
[2022-06-03T13:20:38.185Z] 
[2022-06-03T13:20:38.185Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeft[caching enabled = true] STARTED
[2022-06-03T13:20:45.309Z] 
[2022-06-03T13:20:45.309Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeft[caching enabled = true] PASSED
[2022-06-03T13:20:45.309Z] 
[2022-06-03T13:20:45.309Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerInner[caching enabled = true] STARTED
[2022-06-03T13:20:51.014Z] 
[2022-06-03T13:20:51.014Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerInner[caching enabled = true] PASSED
[2022-06-03T13:20:51.014Z] 
[2022-06-03T13:20:51.014Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerOuter[caching enabled = true] STARTED
[2022-06-03T13:20:56.886Z] 
[2022-06-03T13:20:56.886Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerOuter[caching enabled = true] PASSED
[2022-06-03T13:20:56.886Z] 
[2022-06-03T13:20:56.886Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = true] STARTED
[2022-06-03T13:21:04.013Z] 
[2022-06-03T13:21:04.013Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testInnerLeft[caching enabled = true] PASSED
[2022-06-03T13:21:04.013Z] 
[2022-06-03T13:21:04.013Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = true] STARTED
[2022-06-03T13:21:10.973Z] 
[2022-06-03T13:21:10.973Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterInner[caching enabled = true] PASSED
[2022-06-03T13:21:10.973Z] 
[2022-06-03T13:21:10.973Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterOuter[caching enabled = true] STARTED
[2022-06-03T13:21:17.932Z] 
[2022-06-03T13:21:17.932Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testOuterOuter[caching enabled = true] PASSED
[2022-06-03T13:21:17.932Z] 
[2022-06-03T13:21:17.932Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftInner[caching enabled = false] STARTED
[2022-06-03T13:21:24.905Z] 
[2022-06-03T13:21:24.905Z] 
org.apache.kafka.streams.integration.TableTableJoinIntegrationTest > 
testLeftInner[caching enabled = false] PASSED
[202

[jira] [Created] (KAFKA-13959) Controller should unfence Broker with busy metadata log

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)
Jose Armando Garcia Sancio created KAFKA-13959:
--

 Summary: Controller should unfence Broker with busy metadata log
 Key: KAFKA-13959
 URL: https://issues.apache.org/jira/browse/KAFKA-13959
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.3.0
Reporter: Jose Armando Garcia Sancio


https://issues.apache.org/jira/browse/KAFKA-13955 showed that it is possible 
for the controller to not unfence a broker if the committed offset keeps 
increasing.

 

One solution to this problem is to require the broker to only catch up to the 
last committed offset when they last sent the heartbeat. For example:
 # Broker sends a heartbeat with current offset of {{{}Y{}}}. The last commit 
offset is {{{}X{}}}. The controller remember this last commit offset, call it 
{{X'}}
 # Broker sends another heartbeat with current offset of {{{}Z{}}}. Unfence the 
broker if {{Z >= X}} or {{{}Z >= X'{}}}.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-03 Thread Mickael Maison
Hi José,

Can we also add KIP-827? The vote has passed and I opened a PR today.

Thanks,
Mickael

On Fri, Jun 3, 2022 at 5:43 PM David Jacot  wrote:
>
> Hi José,
>
> KIP-841 has been accepted. Could we add it to the release plan?
>
> Thanks,
> David
>
> On Wed, May 11, 2022 at 11:04 PM Chris Egerton  
> wrote:
> >
> > Hi José,
> >
> > Could we add KIP-618 (
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors)
> > to the release plan? It's been blocked on review for almost a year now. If
> > we can't make time to review it before the 3.3.0 release I'll probably
> > close my PRs and abandon the feature.
> >
> > Best,
> >
> > Chris
> >
> > On Wed, May 11, 2022 at 4:44 PM José Armando García Sancio
> >  wrote:
> >
> > > Great.
> > >
> > > I went ahead and created a release page for 3.3.0:
> > > https://cwiki.apache.org/confluence/x/-xahD
> > >
> > > The planned KIP content is based on the list of KIPs targeting 3.3.0
> > > in https://cwiki.apache.org/confluence/x/4QwIAw. Please take a look at
> > > the list and let me know if I missed your KIP.
> > >
> > > The 3.3.0 release plan page also enumerates the potential release
> > > dates. Here is a summary of them:
> > > 1. KIP Freeze June 15th, 2022
> > > 2. Feature Freeze July 6th, 2022
> > > 3. Code Freeze July 20th, 2022
> > >
> > > Thanks and let me know what you think,
> > > -José
> > >


Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-06-03 Thread Jun Rao
Hi, Luke,

Thanks for the explanation.

10. It makes sense to me now. Instead of using a longer name, perhaps we
could keep the current name, but make the description clear that it's the
remaining segments for the current log assigned to a thread. Also, would it
be better to use ToRecover instead of ToRecovery?

Thanks,

Jun

On Fri, Jun 3, 2022 at 1:18 AM Luke Chen  wrote:

> Hi Jun,
>
> > how do we implement kafka.log
>
> :type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> which tracks at the segment level?
>
> It looks like the name is misleading.
> Suppose we have 2 log recovery threads
> (num.recovery.threads.per.data.dir=2),
> and 10 logs to iterate through
> As mentioned before, we don't know how many segments in each log until the
> log is iterated(loaded)
> So, when thread-1 iterates logA, it gets the segments to recover, and
> expose the number to `remainingSegmentsToRecovery` metric.
> And the same for thread-2 iterating logB.
> That is, the metric showed in `remainingSegmentsToRecovery` is actually the
> remaining segments to recover "in a specific log".
>
> Maybe I should rename it: remainingSegmentsToRecovery ->
> remainingSegmentsToRecoverInCurrentLog
> WDYT?
>
> Thank you.
> Luke
>
> On Fri, Jun 3, 2022 at 1:27 AM Jun Rao  wrote:
>
> > Hi, Luke,
> >
> > Thanks for the reply.
> >
> > 10. You are saying it's difficult to track the number of segments to
> > recover. But how do we
> > implement
> >
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> > which tracks at the segment level?
> >
> > Jun
> >
> > On Thu, Jun 2, 2022 at 3:39 AM Luke Chen  wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for the comment.
> > >
> > > Yes, I've tried to work on this way to track the number of remaining
> > > segments, but it will change the design in UnifiedLog, so I only track
> > the
> > > logs number.
> > > Currently, we will load all segments and recover those segments if
> needed
> > > "during creating UnifiedLog instance". And also get the log offsets
> here
> > > <
> > >
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > > >
> > > .
> > > That is, if we want to get all segments to be recovered before running
> > log
> > > recovery, we need to break the logic in UnifiedLog, to create a partial
> > > UnifiedLog instance, and add more info to it later, which I think is
> just
> > > making the codes more complicated.
> > >
> > >
> > > Thank you.
> > > Luke
> > >
> > >
> > >
> > > On Thu, May 26, 2022 at 2:57 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Luke,
> > > >
> > > > Thanks for the KIP. Just one comment.
> > > >
> > > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could
> > we
> > > > instead track the number of remaining segments? This monitors the
> > > progress
> > > > at a finer granularity and is also consistent with the thread level
> > > metric.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley 
> > wrote:
> > > >
> > > > > Thanks Luke! LGTM.
> > > > >
> > > > > On Sun, 22 May 2022 at 05:18, Luke Chen  wrote:
> > > > >
> > > > > > Hi Tom and Raman,
> > > > > >
> > > > > > Thanks for your comments.
> > > > > >
> > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > updating).
> > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > Please update the links to JIRA and the discussion thread.
> > > > > >
> > > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > > >
> > > > > > > 3. I wonder whether we need to keep these metrics (with value
> 0)
> > > once
> > > > > the
> > > > > > broker enters the running state. Do you see it as valuable? A
> > benefit
> > > > of
> > > > > > removing the metrics would be a reduction on storage required for
> > > > metric
> > > > > > stores which are recording these metrics.
> > > > > >
> > > > > > Yes, removing the metrics after log recovery completed is a good
> > > idea.
> > > > > > Updated the KIP.
> > > > > >
> > > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > > clearer.
> > > > > > Previous KIPs which added metrics usually used a table, with the
> > > MBean
> > > > > > name, metric type and description. SeeKIP-551 for example (or
> > > KIP-748,
> > > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > > section
> > > > > > rather than describing the tree you'd see in an MBean console.
> > > > > >
> > > > > > Good point! Updated the KIP to use a table to list the MBean
> name,
> > > > metric
> > > > > > type and descriptions.
> > > > > >
> > > > > >
> > > > > > Thank you.
> > > > > > Luke
> > > > > >
> > > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > > >  > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > The change is useful and simple. Thanks.
> > > > > > > Please up

[jira] [Resolved] (KAFKA-13803) Refactor Leader API Access

2022-06-03 Thread Jun Rao (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jun Rao resolved KAFKA-13803.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

merged the PR to trunk

> Refactor Leader API Access
> --
>
> Key: KAFKA-13803
> URL: https://issues.apache.org/jira/browse/KAFKA-13803
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Rittika Adhikari
>Assignee: Rittika Adhikari
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, AbstractFetcherThread has a series of protected APIs which control 
> access to the Leader. ReplicaFetcherThread and ReplicaAlterLogDirsThread 
> respectively override these protected APIs and handle access to the Leader in 
> a remote broker leader and a local leader context.
> We propose to move these protected APIs to a LeaderEndPoint interface, which 
> will serve all fetches from the Leader. We will implement a 
> RemoteLeaderEndPoint and a LocalLeaderEndPoint accordingly. This change will 
> greatly simplify our existing follower fetch code.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Kafka producer fails with expiring 2 record(s) for myTopic-1:120004 ms has passed since batch creation

2022-06-03 Thread Kuttaiah Robin
0


I have an issue with kafka producer in production where I see below error.

*"Publish failed, Expiring 2 record(s) for myTopic-1:120004 ms has passed
since batch creation[[ org.apache.kafka.common.errors.TimeoutException:
Expiring 2 record(s) for myTopic-1:120004 ms has passed since batch
creation"*

Kafka brokers are of confluent 5.3.2 version and the kafka-client is apache
2.3.1. Producer config which are explicitly specified in my code are below
and remaining are defaults.

batch.size = 102400 linger.ms = 100 compression.type = lz4 ack = all

*Sample Java Code*

ProducerRecord rec = new ProducerRecord("myTopic,1,"myKey","json-payload-here");
producer.send(rec, new ProducerCallback(jsonPayload));

private class ProducerCallback implements Callback {

  private String _ME ="onCompletion";
  private String jsonPayload;

  public ProducerCallback(String jsonPayload) {
this.jsonPayload = jsonPayload;
  }

  @Override
  public void onCompletion(RecordMetadata recordMetadata, Exception e) {
if (e == null) {
  LOG.logp(Level.FINEST, _CL, _ME, "Published kafka event "+jsonPayload);
} else {
  //Note: Exception is logged here.
  LOG.log(Level.SEVERE, "Publish failed, "+e.getMessage(), e);
}
  }
}

*Couple of questions*

   1. Load is not heavy in production and its moderate as of now and might
   be heavy in later stages. Am I missing some producer config to rectify
   above issue?
   2. Assuming 2 records has been expired in the batch, Is there a way I
   can get those expired records in java so that I can get payload and key to
   republish them?

Thanks, appreciate your help in advance.


Re: [DISCUSS] Apache Kafka 3.3.0 Release

2022-06-03 Thread David Jacot
Hi José,

KIP-841 has been accepted. Could we add it to the release plan?

Thanks,
David

On Wed, May 11, 2022 at 11:04 PM Chris Egerton  wrote:
>
> Hi José,
>
> Could we add KIP-618 (
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-618%3A+Exactly-Once+Support+for+Source+Connectors)
> to the release plan? It's been blocked on review for almost a year now. If
> we can't make time to review it before the 3.3.0 release I'll probably
> close my PRs and abandon the feature.
>
> Best,
>
> Chris
>
> On Wed, May 11, 2022 at 4:44 PM José Armando García Sancio
>  wrote:
>
> > Great.
> >
> > I went ahead and created a release page for 3.3.0:
> > https://cwiki.apache.org/confluence/x/-xahD
> >
> > The planned KIP content is based on the list of KIPs targeting 3.3.0
> > in https://cwiki.apache.org/confluence/x/4QwIAw. Please take a look at
> > the list and let me know if I missed your KIP.
> >
> > The 3.3.0 release plan page also enumerates the potential release
> > dates. Here is a summary of them:
> > 1. KIP Freeze June 15th, 2022
> > 2. Feature Freeze July 6th, 2022
> > 3. Code Freeze July 20th, 2022
> >
> > Thanks and let me know what you think,
> > -José
> >


Re: [VOTE] KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft

2022-06-03 Thread David Jacot
Thanks all!

The vote passes with binding +1 votes from Colin, David, José and myself.

David

On Fri, Jun 3, 2022 at 2:14 AM José Armando García Sancio
 wrote:
>
> Thanks for proposing this improvement David Jacot. I think it is going
> to make the graceful shutdown process much more efficient.
>
> +1 (binding) from me.


Re: Log4j 1.2 update timeline?

2022-06-03 Thread Cleveland, Michael
Any word on this timeline by chance?


Michael Cleveland (He/Him/His)
Senior Mac Support Engineer, Engineering
Working from Syracuse, New York
Office: (516) 600-0927
BreadFinancial.com
Alliance Data is now Bread Financial
[cid:image001.png@01D8773C.16A5B010]


[cid:image002.png@01D8773C.16A5B010]

[cid:image003.png@01D8773C.16A5B010]

[cid:image004.png@01D8773C.16A5B010]

[cid:image005.png@01D8773C.16A5B010]



From: Cleveland, Michael 
Date: Thursday, June 2, 2022 at 10:16 AM
To: dev@kafka.apache.org 
Subject: Log4j 1.2 update timeline?
Good morning,

I was wondering if there was a timeline when the version of Log4j will be 
updated when using Kafka?

Thank you,
Michael Cleveland

__
The information contained in this e-mail message and any attachments may be 
privileged and confidential. If the reader of this message is not the intended 
recipient or an agent responsible for delivering it to the intended recipient, 
you are hereby notified that any review, dissemination, distribution or copying 
of this communication is strictly prohibited. If you have received this 
communication in error, please notify the sender immediately by replying to 
this e-mail and delete the message and any attachments from your computer.
__

[jira] [Resolved] (KAFKA-13883) KIP-835: Monitor Quorum

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13883.

Resolution: Fixed

> KIP-835: Monitor Quorum
> ---
>
> Key: KAFKA-13883
> URL: https://issues.apache.org/jira/browse/KAFKA-13883
> Project: Kafka
>  Issue Type: Improvement
>  Components: kraft
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>
> Tracking issue for the implementation of KIP-835.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Resolved] (KAFKA-13918) Schedule or cancel nooprecord write on metadata version change

2022-06-03 Thread Jose Armando Garcia Sancio (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Armando Garcia Sancio resolved KAFKA-13918.

Resolution: Duplicate

> Schedule or cancel nooprecord write on metadata version change
> --
>
> Key: KAFKA-13918
> URL: https://issues.apache.org/jira/browse/KAFKA-13918
> Project: Kafka
>  Issue Type: Sub-task
>  Components: controller
>Reporter: Jose Armando Garcia Sancio
>Assignee: Jose Armando Garcia Sancio
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] KIP-845: HasField predicate for kafka connect

2022-06-03 Thread Kumud Kumar Srivatsava Tirupati
Hi Chris,
Thanks for the explanation. This clears my thoughts.
I can now agree that the concerns are totally different for SMTs and
predicates.
I also do agree with you that this might encourage SMTs to be poorly
designed.

Do you see this worth considering just for the filter use case?
['errors.tolerance' would make it drop from the pipeline instead of
skipping from the transformation chain]. Maybe by further limiting its
functionality to not support nested fields. I can now discard the other
use-cases I mentioned in the KIP in favor of "Improving the corresponding
SMTs".

*---*
*Thanks and Regards,*
*Kumud Kumar Srivatsava Tirupati*
*Ph : +91-8686073938*

On Fri, 3 Jun 2022 at 18:04, Chris Egerton  wrote:

> Hi Kumud,
>
> Responses inline.
>
> > But, I still believe this logic of predicate checking shouldn't be made a
> part of each individual SMT. After all, that is what the predicates are for
> right?
>
> I don't quite agree. I think the benefit of predicates is that they can
> allow you to selectively apply a transformation based on conditions that
> don't directly relate to the behavior of the transformation. For example,
> "mask field 'ID' if the 'SSN' header is present". The use cases identified
> in the KIP don't quite align with this; instead, they're more about working
> around undesirable behavior in transformations that makes them difficult to
> use in some pipelines. If an SMT is difficult to use, it's better to just
> improve the SMT directly instead of adding a layer of indirection on top of
> it.
>
> > `HasField` predicate is something very similar but is more powerful
> because
> it allows users to skip transformations at SMT level or drop them from the
> transformation chain altogether using the current `
> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just
> limited to skipping some SMTs.
>
> This use case should be included in the KIP if it's a significant
> motivating factor for adding this predicate. But it's also possible to drop
> entire records from a pipeline today by setting the 'errors.tolerance'
> property to 'all' and allowing all records that cause exceptions to be
> thrown in the transformation chain to be skipped/written to a DLQ/logged.
>
> Ultimately, I'm worried that we're encouraging bad habits with SMT
> developers by establishing a precedent where core logic that most users
> would expect to be present is left out in favor of a predicate, which makes
> things harder to configure and can confound new users by adding a layer of
> complexity in the form of another pluggable interface.
>
> Cheers,
>
> Chris
>
> On Thu, Jun 2, 2022 at 4:24 AM Kumud Kumar Srivatsava Tirupati <
> kumudkumartirup...@gmail.com> wrote:
>
>> Hi Chris,
>> Thanks for your comments.
>>
>> I am totally aligned with your comment on nested field names which include
>> dots. I will incorporate the same based on how the KIP-821 discussion goes
>> (maybe this parser could be a utility that can be reused in other areas as
>> well).
>>
>> But, I still believe this logic of predicate checking shouldn't be made a
>> part of each individual SMT. After all, that is what the predicates are
>> for
>> right?
>> If you see, the current predicates that we have are:
>> org.apache.kafka.connect.transforms.predicates.TopicNameMatches
>> org.apache.kafka.connect.transforms.predicates.HasHeaderKey
>> org.apache.kafka.connect.transforms.predicates.RecordIsTombstone
>> They only allow you filter/transform based on the metadata of the record.
>>
>> `HasField` predicate is something very similar but is more powerful
>> because
>> it allows users to skip transformations at SMT level or drop them from the
>> transformation chain altogether using the current `
>> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not
>> just
>> limited to skipping some SMTs.
>> If this makes sense, I should probably add this and give an example in the
>> KIP as well.
>>
>> *---*
>> *Thanks and Regards,*
>> *Kumud Kumar Srivatsava Tirupati*
>>
>>
>> On Wed, 1 Jun 2022 at 07:09, Chris Egerton 
>> wrote:
>>
>> > Hi Kumud,
>> >
>> > Thanks for the KIP. I'm a little bit skeptical about the necessity for
>> this
>> > predicate but I think we may be able to satisfy your requirements with a
>> > slightly different approach. The motivation section deals largely with
>> > skipping the invocation of SMTs that expect a certain field to be
>> present
>> > in a record, and will fail if it is not present. This seems like a
>> > reasonable use case; even when your data has a fixed schema, optional
>> > fields are still possible, and preventing SMTs from being used on
>> optional
>> > fields seems unnecessarily restrictive. In fact, it seems like such a
>> > reasonable use case that I wonder if it'd be worth investing in new
>> > SMT-level properties to handle this case, instead of requiring users to
>> > configure a predicate separately alongside them? Something like an
>> > "on.field.missing" property with options of

Re: [DISCUSS] KIP-845: HasField predicate for kafka connect

2022-06-03 Thread Chris Egerton
Hi Kumud,

Responses inline.

> But, I still believe this logic of predicate checking shouldn't be made a
part of each individual SMT. After all, that is what the predicates are for
right?

I don't quite agree. I think the benefit of predicates is that they can
allow you to selectively apply a transformation based on conditions that
don't directly relate to the behavior of the transformation. For example,
"mask field 'ID' if the 'SSN' header is present". The use cases identified
in the KIP don't quite align with this; instead, they're more about working
around undesirable behavior in transformations that makes them difficult to
use in some pipelines. If an SMT is difficult to use, it's better to just
improve the SMT directly instead of adding a layer of indirection on top of
it.

> `HasField` predicate is something very similar but is more powerful
because
it allows users to skip transformations at SMT level or drop them from the
transformation chain altogether using the current `
org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just
limited to skipping some SMTs.

This use case should be included in the KIP if it's a significant
motivating factor for adding this predicate. But it's also possible to drop
entire records from a pipeline today by setting the 'errors.tolerance'
property to 'all' and allowing all records that cause exceptions to be
thrown in the transformation chain to be skipped/written to a DLQ/logged.

Ultimately, I'm worried that we're encouraging bad habits with SMT
developers by establishing a precedent where core logic that most users
would expect to be present is left out in favor of a predicate, which makes
things harder to configure and can confound new users by adding a layer of
complexity in the form of another pluggable interface.

Cheers,

Chris

On Thu, Jun 2, 2022 at 4:24 AM Kumud Kumar Srivatsava Tirupati <
kumudkumartirup...@gmail.com> wrote:

> Hi Chris,
> Thanks for your comments.
>
> I am totally aligned with your comment on nested field names which include
> dots. I will incorporate the same based on how the KIP-821 discussion goes
> (maybe this parser could be a utility that can be reused in other areas as
> well).
>
> But, I still believe this logic of predicate checking shouldn't be made a
> part of each individual SMT. After all, that is what the predicates are for
> right?
> If you see, the current predicates that we have are:
> org.apache.kafka.connect.transforms.predicates.TopicNameMatches
> org.apache.kafka.connect.transforms.predicates.HasHeaderKey
> org.apache.kafka.connect.transforms.predicates.RecordIsTombstone
> They only allow you filter/transform based on the metadata of the record.
>
> `HasField` predicate is something very similar but is more powerful because
> it allows users to skip transformations at SMT level or drop them from the
> transformation chain altogether using the current `
> org.apache.kafka.connect.transforms.Filter`. So, the use cases are not just
> limited to skipping some SMTs.
> If this makes sense, I should probably add this and give an example in the
> KIP as well.
>
> *---*
> *Thanks and Regards,*
> *Kumud Kumar Srivatsava Tirupati*
>
>
> On Wed, 1 Jun 2022 at 07:09, Chris Egerton 
> wrote:
>
> > Hi Kumud,
> >
> > Thanks for the KIP. I'm a little bit skeptical about the necessity for
> this
> > predicate but I think we may be able to satisfy your requirements with a
> > slightly different approach. The motivation section deals largely with
> > skipping the invocation of SMTs that expect a certain field to be present
> > in a record, and will fail if it is not present. This seems like a
> > reasonable use case; even when your data has a fixed schema, optional
> > fields are still possible, and preventing SMTs from being used on
> optional
> > fields seems unnecessarily restrictive. In fact, it seems like such a
> > reasonable use case that I wonder if it'd be worth investing in new
> > SMT-level properties to handle this case, instead of requiring users to
> > configure a predicate separately alongside them? Something like an
> > "on.field.missing" property with options of "skip", "fail", and/or "warn"
> > could do the trick.
> >
> > It's also worth noting that the proposed syntax for the "field.path"
> > property in the HasField predicate would make it impossible to refer to
> > field names that have dots in them. It hasn't been finalized yet but if
> we
> > do end up going this route, we should probably use the same syntax that's
> > agreed upon for KIP-821 [1].
> >
> > [1] -
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-821%3A+Connect+Transforms+support+for+nested+structures
> >
> > Cheers,
> >
> > Chris
> >
> > On Thu, May 26, 2022 at 5:28 AM Sagar  wrote:
> >
> > > Hi Kumud,
> > >
> > > Thanks for that. I don't have any other comments at this point on the
> > KIP.
> > > LGTM overall.
> > >
> > > Thanks!
> > > Sagar.
> > >
> > > On Wed, May 25, 2022 at 5:14 PM Sagar 
> wrote:
> > >
> > > > T

Re: [DISCUSS] KIP-827: Expose logdirs total and usable space via Kafka API

2022-06-03 Thread Mickael Maison
Hi Cong,

Maybe some people can do without this KIP.
But in many cases, especially around tooling and automation, it's
useful to be able to retrieve disk utilization values via the Kafka
API rather than interfacing with a metrics system.

Does that clarify the motivation?

Thanks,
Mickael

On Wed, Jun 1, 2022 at 7:10 PM Cong Ding  wrote:
>
> Thanks for the explanation. I think the question is that if we have disk
> utilization in our environment, what is the use case for KIP-827? The disk
> utilization in our environment can already do the job. Is there anything I
> missed?
>
> Thanks,
> Cong
>
> On Tue, May 31, 2022 at 2:57 AM Mickael Maison 
> wrote:
>
> > Hi Cong,
> >
> > Kafka does not expose disk utilization metrics. This is something you
> > need to provide in your environment. You definitively should have a
> > mechanism for exposing metrics from your Kafka broker hosts and you
> > should absolutely monitor disk usage and have appropriate alerts.
> >
> > Thanks,
> > Mickael
> >
> > On Thu, May 26, 2022 at 7:34 PM Jun Rao  wrote:
> > >
> > > Hi, Igor,
> > >
> > > Thanks for the reply.
> > >
> > > I agree that this KIP could be useful for improving the tool for moving
> > > data across disks. It would be useful to clarify on the main motivation
> > of
> > > the KIP. Also, DescribeLogDirsResponse already includes the size of each
> > > partition on a disk. So, it seems that UsableBytes is redundant since
> > it's
> > > derivable.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Thu, May 26, 2022 at 3:30 AM Igor Soarez  wrote:
> > >
> > > > Hi,
> > > >
> > > > This can also be quite useful to make better use of existing
> > functionality
> > > > in the Kafka API — moving replicas between log directories via
> > > > ALTER_REPLICA_LOG_DIRS. If usable space information is also available
> > the
> > > > caller can make better decisions using the same API. It means a more
> > > > consistent way of interacting with Kafka to manage replicas locations
> > > > within a broker without having to correlate Kafka metrics with
> > information
> > > > from the Kafka API.
> > > >
> > > > --
> > > > Igor
> > > >
> > > > On Wed, May 25, 2022, at 8:16 PM, Jun Rao wrote:
> > > > > Hi, Mickael,
> > > > >
> > > > > Thanks for the KIP.  Since this is mostly for monitoring and
> > alerting,
> > > > > could we expose them as metrics instead of as part of the API? We
> > already
> > > > > have a size metric per log. Perhaps we could extend that to add
> > > > used/total
> > > > > metrics per disk?
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jun
> > > > >
> > > > > On Thu, May 19, 2022 at 10:21 PM Raman Verma
> >  > > > >
> > > > > wrote:
> > > > >
> > > > >> Hello Mikael,
> > > > >>
> > > > >> Thanks for the KIP.
> > > > >>
> > > > >> I see that the API response contains some information about each
> > > > partition.
> > > > >> ```
> > > > >> { "name": "PartitionSize", "type": "int64", "versions": "0+",
> > > > >>   "about": "The size of the log segments in this partition in
> > bytes." }
> > > > >> ```
> > > > >> Can this be summed up to provide a used space in a `log.dir`
> > > > >> This will also be specific to a `log.dir` (for the case where
> > multiple
> > > > >> log.dir are hosted on the same underlying device)
> > > > >>
> > > > >> On Thu, May 19, 2022 at 10:21 AM Cong Ding
> > 
> > > > >> wrote:
> > > > >> >
> > > > >> > Hey Mickael,
> > > > >> >
> > > > >> > Great KIP!
> > > > >> >
> > > > >> > I have one question:
> > > > >> >
> > > > >> > You mentioned "DescribeLogDirs is usually a low volume API. This
> > > > change
> > > > >> > should not
> > > > >> > significantly affect the latency of this API." and "That would
> > allow
> > > > to
> > > > >> > easily validate whether disk operations (like a resize), or topic
> > > > >> deletion
> > > > >> > (log deletion only happen after a short delay) have completed." I
> > > > wonder
> > > > >> if
> > > > >> > there is an existing metric/API that can allow administrators to
> > > > >> determine
> > > > >> > whether we need to resize? If administrators use this API to
> > determine
> > > > >> > whether we need a resize, would this API become a high-volume
> > API? I
> > > > >> > understand we don't want this API to be a high-volume one because
> > the
> > > > API
> > > > >> > is already costly by returning `"name": "Topics"`.
> > > > >> >
> > > > >> > Cong
> > > > >> >
> > > > >> > On Thu, Apr 7, 2022 at 2:17 AM Mickael Maison <
> > > > mickael.mai...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi,
> > > > >> > >
> > > > >> > > I wrote a small KIP to expose the total and usable space of
> > logdirs
> > > > >> > > via the DescribeLogDirs API:
> > > > >> > >
> > > > >> > >
> > > > >>
> > > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API
> > > > >> > >
> > > > >> > > Please take a look and let me know if you have any feedback.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Mickael
> 

[jira] [Created] (KAFKA-13958) Expose logdirs total and usable space via Kafka API

2022-06-03 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-13958:
--

 Summary: Expose logdirs total and usable space via Kafka API
 Key: KAFKA-13958
 URL: https://issues.apache.org/jira/browse/KAFKA-13958
 Project: Kafka
  Issue Type: Bug
Reporter: Mickael Maison
Assignee: Mickael Maison


JIRA for KIP-827: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Log4j 1.2 update timeline?

2022-06-03 Thread Tom Bentley
Hi Michael,

KIP-653 seeks to update the log4jv1 dependency to log4jv2. Following a
discussion a month or so ago [1] I think the plan is to do this in Kafka
4.0. The discussion for KIP-833 [2] suggests we'll see Kafka 3.5 followed
by Kafka 4.0, which should be around August 2023 (assuming we stick to the
usual time-based releases).

If you're asking for log4shell reasons, Kafka 3.2.0 and 3.1.1 use reload4j
instead of the original log4jv1 implementation.

Hopefully this answers your question.

Kind regards,

Tom

[1]: https://lists.apache.org/thread/qo1y3249xldt4cpg6r8zkcq5m1q32bf1
[2]: https://lists.apache.org/thread/90zkqvmmw3y8j6tkgbg3md78m7hs4yn6

On Thu, 2 Jun 2022 at 18:03, Cleveland, Michael <
michael.clevel...@breadfinancial.com> wrote:

> Good morning,
>
> I was wondering if there was a timeline when the version of Log4j will be
> updated when using Kafka?
>
> Thank you,
> Michael Cleveland
>
> __
> The information contained in this e-mail message and any attachments may
> be privileged and confidential. If the reader of this message is not the
> intended recipient or an agent responsible for delivering it to the
> intended recipient, you are hereby notified that any review, dissemination,
> distribution or copying of this communication is strictly prohibited. If
> you have received this communication in error, please notify the sender
> immediately by replying to this e-mail and delete the message and any
> attachments from your computer.
> __


Re: [DISCUSS] KIP-831: Add metric for log recovery progress

2022-06-03 Thread Luke Chen
Hi Jun,

> how do we implement kafka.log
:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
which tracks at the segment level?

It looks like the name is misleading.
Suppose we have 2 log recovery threads (num.recovery.threads.per.data.dir=2),
and 10 logs to iterate through
As mentioned before, we don't know how many segments in each log until the
log is iterated(loaded)
So, when thread-1 iterates logA, it gets the segments to recover, and
expose the number to `remainingSegmentsToRecovery` metric.
And the same for thread-2 iterating logB.
That is, the metric showed in `remainingSegmentsToRecovery` is actually the
remaining segments to recover "in a specific log".

Maybe I should rename it: remainingSegmentsToRecovery ->
remainingSegmentsToRecoverInCurrentLog
WDYT?

Thank you.
Luke

On Fri, Jun 3, 2022 at 1:27 AM Jun Rao  wrote:

> Hi, Luke,
>
> Thanks for the reply.
>
> 10. You are saying it's difficult to track the number of segments to
> recover. But how do we
> implement
> kafka.log:type=LogManager,name=remainingSegmentsToRecovery,dir=([-._\/\w\d\s]+),threadNum=([0-9]+),
> which tracks at the segment level?
>
> Jun
>
> On Thu, Jun 2, 2022 at 3:39 AM Luke Chen  wrote:
>
> > Hi Jun,
> >
> > Thanks for the comment.
> >
> > Yes, I've tried to work on this way to track the number of remaining
> > segments, but it will change the design in UnifiedLog, so I only track
> the
> > logs number.
> > Currently, we will load all segments and recover those segments if needed
> > "during creating UnifiedLog instance". And also get the log offsets here
> > <
> >
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/UnifiedLog.scala#L1819-L1842
> > >
> > .
> > That is, if we want to get all segments to be recovered before running
> log
> > recovery, we need to break the logic in UnifiedLog, to create a partial
> > UnifiedLog instance, and add more info to it later, which I think is just
> > making the codes more complicated.
> >
> >
> > Thank you.
> > Luke
> >
> >
> >
> > On Thu, May 26, 2022 at 2:57 AM Jun Rao 
> wrote:
> >
> > > Hi, Luke,
> > >
> > > Thanks for the KIP. Just one comment.
> > >
> > > 10. For kafka.log:type=LogManager,name=remainingLogsToRecovery, could
> we
> > > instead track the number of remaining segments? This monitors the
> > progress
> > > at a finer granularity and is also consistent with the thread level
> > metric.
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Wed, May 25, 2022 at 7:47 AM Tom Bentley 
> wrote:
> > >
> > > > Thanks Luke! LGTM.
> > > >
> > > > On Sun, 22 May 2022 at 05:18, Luke Chen  wrote:
> > > >
> > > > > Hi Tom and Raman,
> > > > >
> > > > > Thanks for your comments.
> > > > >
> > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > updating).
> > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > Please update the links to JIRA and the discussion thread.
> > > > >
> > > > > Yes, thanks for the reminder. I've updated the KIP.
> > > > >
> > > > > > 3. I wonder whether we need to keep these metrics (with value 0)
> > once
> > > > the
> > > > > broker enters the running state. Do you see it as valuable? A
> benefit
> > > of
> > > > > removing the metrics would be a reduction on storage required for
> > > metric
> > > > > stores which are recording these metrics.
> > > > >
> > > > > Yes, removing the metrics after log recovery completed is a good
> > idea.
> > > > > Updated the KIP.
> > > > >
> > > > > > 4. I think the KIP's public interfaces section could be a bit
> > > clearer.
> > > > > Previous KIPs which added metrics usually used a table, with the
> > MBean
> > > > > name, metric type and description. SeeKIP-551 for example (or
> > KIP-748,
> > > > > KIP-608). Similarly you could use a table in the proposed changes
> > > section
> > > > > rather than describing the tree you'd see in an MBean console.
> > > > >
> > > > > Good point! Updated the KIP to use a table to list the MBean name,
> > > metric
> > > > > type and descriptions.
> > > > >
> > > > >
> > > > > Thank you.
> > > > > Luke
> > > > >
> > > > > On Fri, May 20, 2022 at 9:13 AM Raman Verma
> > >  > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi Luke,
> > > > > >
> > > > > > The change is useful and simple. Thanks.
> > > > > > Please update the links to JIRA and the discussion thread.
> > > > > >
> > > > > > Best Regards,
> > > > > > Raman Verma
> > > > > >
> > > > > > On Thu, May 19, 2022 at 8:57 AM Tom Bentley  >
> > > > wrote:
> > > > > > >
> > > > > > > Hi Luke,
> > > > > > >
> > > > > > > Thanks for the KIP. I think the idea makes sense and would
> > provide
> > > > > useful
> > > > > > > observability of log recovery. I have a few comments.
> > > > > > >
> > > > > > > 1. There's not a JIRA for this KIP (or the JIRA link needs
> > > updating).
> > > > > > > 2. Similarly the link to this discussion thread needs updating.
> > > > > > > 3. I wonder whether we need to keep these metrics (with value
>

Re: [DISCUSS] KIP-841: Fenced replicas should not be allowed to join the ISR in KRaft

2022-06-03 Thread David Jacot
That's correct.

David

On Fri, Jun 3, 2022 at 2:11 AM José Armando García Sancio
 wrote:
>
> David Jacot wrote:
> > At the moment, the KIP stipulates that the broker remains in
> > InControlledShutdown state until it is re-registered with a new
> > incarnation id. This implies that a broker can be both fenced and in
> > controlled shutdown state. We could make them mutually exclusive but I
> > think that there is value in the current proposal because we are able
> > to differentiate if a broker was fenced due to the controlled shutdown
> >or not.
>
> Thanks David. Is this the reason why the BrokerRegistrationChangeRecord says:
>
> > { "name": "InControlledShutdown", "type": "int8", "versions": "1+", 
> > "taggedVersions": "1+", "tag": 1,
> > "about": "0 if no change, 1 if the broker is in controlled shutdown." }
>
> In other words the only way to change the InControlShutdown to false
> is to create a new registration with a new incarnation id.
>
> > The broker will leave this state when it registers itself with a new 
> > incarnation id.
>
> -José