Permission to contribute to Apache Kafka

2023-06-26 Thread kafka
Hi,

I'd like to request permissions to contribute to Apache Kafka. My account 
details are as follows:

# Wiki

Email: gaurav_naru...@apple.com 
Username: gnarula

# JIRA

Email: gaurav_naru...@apple.com 
Username: gnarula

Regards,
Gaurav

Re: [DISCUSS] KIP-941: Range queries to accept null lower and upper bounds

2023-06-26 Thread Lucia Cerchie
Thanks for asking for clarification, Sophie; that gives me guidance on
improving the KIP! Here's the updated version, including the JIRA link:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-941%3A+Range+queries+to+accept+null+lower+and+upper+bounds


On Thu, Jun 22, 2023 at 12:57 PM Sophie Blee-Goldman 
wrote:

> Hey Lucia, thanks for the KIP! Just some minor notes:
>
> I'm in favor of the proposal overall, at least I think so -- for someone
> not intimately familiar with the new IQ API and *RangeQuery* class, the KIP
> was a bit difficult to follow along and I had to read in between the lines
> to figure out what the old behavior was and what the new and improved logic
> would do.
>
> It would be good to state clearly in the beginning what happens when null
> is passed in right now, and what will happen after this KIP is implemented.
> For example in the "Public Interfaces" section, I couldn't tell if the
> middle sentence was describing what was changing, or what it was changing
> *to.*
>
> One last little thing is can you link to the jira ticket at the top? And
> please create one if it doesn't already exist -- it helps people figure out
> when a KIP has been implemented and in which versions, as well as navigate
> from the KIP to the actual code that was merged. Things can change during
> implementation and the KIP document is how most people read up on new
> features, but almost all of us are probably guilty of forgetting to update
> the KIP document. So it's important to be able to find the code when in
> doubt.
>
> Otherwise nice KIP!
>
> On Thu, Jun 22, 2023 at 8:19 AM Lucia Cerchie
> 
> wrote:
>
> > Thanks Kirk and John for the valuable feedback!
> >
> > John, I'll update the KIP to reflect that nuance you mention -- yes it is
> > just about making the withRange method more permissive. Thanks for the
> > testing file as well, I'll be sure to write my test cases there.
> >
> > On Wed, Jun 21, 2023 at 10:50 AM Kirk True  wrote:
> >
> > > Hi John/Lucia,
> > >
> > > Thanks for the feedback!
> > >
> > > Of course I only noticed the private-ness of the RangeQuery constructor
> > > moments after sending my email ¯\_(ツ)_/¯
> > >
> > > Just to be clear, I’m happy with the proposed change as it conforms to
> > > Postel’s Law ;) Apologies that it was worded tersely.
> > >
> > > Thanks,
> > > Kirk
> > >
> > > > On Jun 21, 2023, at 10:20 AM, John Roesler 
> > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > Thanks for the KIP, Lucia! This is a nice change.
> > > >
> > > > To Kirk's question (1), the example is a bit misleading. The typical
> > > case that would ease user pain is specifically using "null" to indicate
> > an
> > > open-ended range, especially since null is not a valid key.
> > > >
> > > > I could additionally see an empty string as being nice, but the
> actual
> > > API is generic, not String, so there's no meaningful concept of
> > > empty/blank/whitespace that we could check for, just null or not.
> > > >
> > > > Regarding (2), there's no public factory that takes Optional
> > parameters.
> > > I think you're looking at the private constructor. An alternative Lucia
> > > could consider is to instead propose adding a new factory like
> > > `withRange(Optional lower, Optional upper)`.
> > > >
> > > > FWIW, I'd be in favor of this KIP as proposed.
> > > >
> > > > A couple of smaller notes:
> > > >
> > > > 3. In the compatibility notes, I wasn't sure what "web request" was
> > > referring to. I think you just mean that all existing valid API calls
> > will
> > > continue to work the same, and we're only making the withRange method
> > more
> > > permissive with its arguments.
> > > >
> > > > 4. For the Test Plan, I wrote some tests that validate these queries
> > > against every kind and configuration of store possible. Please add your
> > new
> > > test cases to that one to make absolutely sure it'll work for every
> > store.
> > > Obviously, you may also want to add some specific unit tests in
> addition.
> > > >
> > > > See
> > >
> >
> https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/IQv2StoreIntegrationTest.java
> > > >
> > > > Thanks again!
> > > > -John
> > > >
> > > > On 6/21/23 12:00, Kirk True wrote:
> > > >> Hi Lucia,
> > > >> One question:
> > > >> 1. Since the proposed implementation change for withRange() method
> > uses
> > > Optional.ofNullable() (which only catches nulls and not
> blank/whitespace
> > > strings), wouldn’t users still need to have code like that in the
> > example?
> > > >> 2. Why don't users create RangeQuery objects that use Optional
> > > directly? What’s the benefit of introducing what appears to be a very
> > thin
> > > utility facade?
> > > >> Thanks,
> > > >> Kirk
> > > >>> On Jun 21, 2023, at 9:51 AM, Kirk True  wrote:
> > > >>>
> > > >>> Hi Lucia,
> > > >>>
> > > >>> Thanks for the KIP!
> > > >>>
> > > >>> The KIP wasn’t in the email and I didn’t see it on the main KIP
> > > directory. Here it is:
> 

[jira] [Created] (KAFKA-15126) Change range queries to accept null lower and upper bounds

2023-06-26 Thread Lucia Cerchie (Jira)
Lucia Cerchie created KAFKA-15126:
-

 Summary: Change range queries to accept null lower and upper bounds
 Key: KAFKA-15126
 URL: https://issues.apache.org/jira/browse/KAFKA-15126
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: Lucia Cerchie
Assignee: Lucia Cerchie


{color:#1d1c1d}When web client requests come in with query params, it's common 
for those params to be null. We want developers to just be able to pass in the 
upper/lower bounds if they want instead of implementing their own logic to 
avoid getting the whole range (which will happen if they leave the params 
null). {color}

{color:#1d1c1d}An example of the logic they can avoid using after this KIP is 
implemented is below:{color}
{code:java}
private RangeQuery> 
createRangeQuery(String lower, String upper) {
if (isBlank(lower) && isBlank(upper)) {
return RangeQuery.withNoBounds();
} else if (!isBlank(lower) && isBlank(upper)) {
return RangeQuery.withLowerBound(lower);
} else if (isBlank(lower) && !isBlank(upper)) {
return RangeQuery.withUpperBound(upper);
} else {
return RangeQuery.withRange(lower, upper);
}
} {code}
 
| |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15125) Break down SocketServer classes into separate files

2023-06-26 Thread Jae Wie (Jira)
Jae Wie created KAFKA-15125:
---

 Summary: Break down SocketServer classes into separate files
 Key: KAFKA-15125
 URL: https://issues.apache.org/jira/browse/KAFKA-15125
 Project: Kafka
  Issue Type: Task
  Components: network
Reporter: Jae Wie


Since SocketServer is almost 2000 lines of code, it's easier to move its 
classes into separate files  in order to make them more manageable.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] divijvaidya commented on pull request #528: MINOR: Add statmenet about ZK deprecation to 3.5 release blog post

2023-06-26 Thread via GitHub


divijvaidya commented on PR #528:
URL: https://github.com/apache/kafka-site/pull/528#issuecomment-1607874433

   > I believe the most important points are the following:
   - update the blog post (with minimum information and link to the docs)
   - update the docs with ZK deprecation section
   -- explain what deprecation means etc etc (everything mentioned above)
   -- provide tentative timeline for 4.0 (and make clear, it's tentative)
   
   
   My concerns are covered by the points here, so I am good.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] mjsax commented on pull request #528: MINOR: Add statmenet about ZK deprecation to 3.5 release blog post

2023-06-26 Thread via GitHub


mjsax commented on PR #528:
URL: https://github.com/apache/kafka-site/pull/528#issuecomment-1607861293

   @mimaison @ijuma @divijvaidya -- any thoughts? Would not nice to move this 
forward.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Resolved] (KAFKA-15122) Moving partitions between log dirs leads to kafka.log:type=Log metrics being deleted

2023-06-26 Thread Mickael Maison (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mickael Maison resolved KAFKA-15122.

Fix Version/s: 3.5.0
   Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-14544 which is fixed 
in 3.5.0

> Moving partitions between log dirs leads to kafka.log:type=Log metrics being 
> deleted
> 
>
> Key: KAFKA-15122
> URL: https://issues.apache.org/jira/browse/KAFKA-15122
> Project: Kafka
>  Issue Type: Task
>Affects Versions: 3.5.0
>Reporter: Mickael Maison
>Priority: Major
> Fix For: 3.5.0
>
>
> # Start a broker with 2 log directories
> # Create a topic-partition
> Metrics with the following names are created: 
> kafka.log:type=Log,name=Size,topic=,partition=0
> # Using kafka-reassign-partitions move that partition to the other log 
> directory
> A tag is-future=true is added to the existing metrics, 
> kafka.log:type=Log,name=Size,topic=,partition=0,is-future=true
> # Using kafka-reassign-partitions move that partition back to its original 
> log directory
> The metrics are deleted!
> I don't expect the metrics to be renamed during the first reassignment. The 
> metrics should not be deleted during the second reassignment, the topic still 
> exists. Restarting the broker resolves the issue.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

2023-06-26 Thread Andrew Schofield
Hi Edo,
Thanks for the KIP. Looks like a useful improvement. I have some 
comments/questions.

4) For a new interface, I wonder whether it would be better to use 
TopicIdPartition rather
than TopicPartition. Topic IDs are gradually spreading across the public 
interfaces for Kafka.

5) The new topic config is called `record.validation.policy`. The javadoc for 
the validationPolicy()
method says `validation.policy`.

6) I’m surprised that you need a `HeaderProxy` interface when `Headers` and 
`Header` are
already interfaces. I would have expected it was possible to create proxy 
instances of the
headers using the existing interfaces with a little cunning.

Thanks,
Andrew

> On 26 Jun 2023, at 16:05, Edoardo Comar  wrote:
>
> Hi Kirk,
> thanks for your comments.
>
>> 1. Does record.validation.policy.class.name support multiple classes, or 
>> just one? I’m probably not wrapping my head around it,
>> but I’d imagine different policies for different families or groupings of 
>> topics, thus the need for supporting multiple policies.
>> But if there are multiple then you run the risk of conflicts of ownership of 
>> validation, so ¯\_(ツ)_/¯
>
> We have only identified the use case for a single policy, but as
> mentioned in the Rejected Alternatives section of the KIP, we think a
> Facade-like policy could be written by a Kafka admin to dispatch
> validation to different policies. Allowing multiple policies to be
> specified would introduce complexities (as you noted), and we wanted
> to avoid making too many assumptions without having a strong use-case
> for this support.
>
>> 2. Is there any concern that a validation class may alter the contents of 
>> the ByteBuffer of the key and/or value?
>> Perhaps that’s already handled and/or outside the scope of this KIP?
>
> Good point. This behaviour isn’t defined, and what happens could
> change between Kafka versions (or depending on compression settings).
> We have modified the Javadoc in the KIP to indicate that the
> ByteBuffer’s are read-only wrapper, as we don’t intend this plug-point
> to be used for modifying message data. We also spotted that this was a
> problem for returning headers (as the common Header interface returns
> a byte array from one of its methods), and have updated the Javadoc
> accordingly.
>
>
>> 3. What is the benefit to introducing the inner TopicMetadata and 
>> RecordProxy interfaces vs.
>> passing the TopicPartition, String (validation policy), and Record into the 
>> validate() method directly?
>
> We wanted to avoid Record as the package
> org.apache.kafka.common.record is documented as *not* being a
> published API, and doesn’t necessarily maintain compatibility between
> versions. This was highlighted as a potential problem during the
> discussion of KIP-729.
>
> We designed the API using interfaces as arguments to the methods, so
> that further properties can be added in the future without breaking
> existing implementations.
>
>
> On Wed, 21 Jun 2023 at 17:08, Kirk True  wrote:
>>
>> Hi Edo/Adrian!
>>
>> Thanks for the KIP.
>>
>> I have some questions, and apologies that the may fall under the “stupid” 
>> column because I’m not that familiar with this area :)
>>
>> 1. Does record.validation.policy.class.name support multiple classes, or 
>> just one? I’m probably not wrapping my head around it, but I’d imagine 
>> different policies for different families or groupings of topics, thus the 
>> need for supporting multiple policies. But if there are multiple then you 
>> run the risk of conflicts of ownership of validation, so ¯\_(ツ)_/¯
>>
>> 2. Is there any concern that a validation class may alter the contents of 
>> the ByteBuffer of the key and/or value? Perhaps that’s already handled 
>> and/or outside the scope of this KIP?
>>
>> 3. What is the benefit to introducing the inner TopicMetadata and 
>> RecordProxy interfaces vs. passing the TopicPartition, String (validation 
>> policy), and Record into the validate() method directly?
>>
>> Thanks,
>> Kirk
>>
>>> On Jun 20, 2023, at 2:28 AM, Edoardo Comar  wrote:
>>>
>>> Thanks Николай,
>>> We’d like to open a vote next week.
>>> Hopefully getting some more feedback before then.
>>>
>>> Edo
>>>
>>>
>>> On Wed, 7 Jun 2023 at 19:15, Николай Ижиков  wrote:
>>>
 Hello.

 As author of one of related KIPs I’m +1 for this change.
 Long waited feature.

> 7 июня 2023 г., в 19:02, Edoardo Comar  написал(а):
>
> Dear all,
> Adrian and I would like to start a discussion thread on
>
> KIP-940: Broker extension point for validating record contents at
 produce time
>
>
 https://cwiki.apache.org/confluence/display/KAFKA/KIP-940%3A+Broker+extension+point+for+validating+record+contents+at+produce+time
>
> This KIP proposes a new broker-side extension point (a “record
 validation policy”) that can be used to reject records published by a
 misconfigured client.
> Though general, it is aimed at the common, best-practice 

[jira] [Created] (KAFKA-15124) KRaft snapshot freeze should never perform copy

2023-06-26 Thread Jira
José Armando García Sancio created KAFKA-15124:
--

 Summary: KRaft snapshot freeze should never perform copy
 Key: KAFKA-15124
 URL: https://issues.apache.org/jira/browse/KAFKA-15124
 Project: Kafka
  Issue Type: Sub-task
Reporter: José Armando García Sancio
Assignee: José Armando García Sancio






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15123) Add tests for ChunkedBytesStream

2023-06-26 Thread Divij Vaidya (Jira)
Divij Vaidya created KAFKA-15123:


 Summary: Add tests for ChunkedBytesStream
 Key: KAFKA-15123
 URL: https://issues.apache.org/jira/browse/KAFKA-15123
 Project: Kafka
  Issue Type: Improvement
Reporter: Divij Vaidya


We need to add cases against the public interfaces of this class to test for 
scenarios for Int overflow etc. for input parameters

Refer: [https://github.com/apache/kafka/pull/12948#discussion_r1242345211] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15122) Moving partitions between log dirs leads to kafka.log:type=Log metrics being deleted

2023-06-26 Thread Mickael Maison (Jira)
Mickael Maison created KAFKA-15122:
--

 Summary: Moving partitions between log dirs leads to 
kafka.log:type=Log metrics being deleted
 Key: KAFKA-15122
 URL: https://issues.apache.org/jira/browse/KAFKA-15122
 Project: Kafka
  Issue Type: Task
Affects Versions: 3.5.0
Reporter: Mickael Maison


# Start a broker with 2 log directories
# Create a topic-partition
Metrics with the following names are created: 
kafka.log:type=Log,name=Size,topic=,partition=0
# Using kafka-reassign-partitions move that partition to the other log directory
A tag isFuture=true is added to the existing metrics, 
kafka.log:type=Log,name=Size,topic=,partition=0,isFuture=true
# Using kafka-reassign-partitions move that partition back to its original log 
directory
The metrics are deleted!

I don't expect the metrics to be renamed during the first reassignment. The 
metrics should not be deleted during the second reassignment, the topic still 
exists. Restarting the broker resolves the issue.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-940: Broker extension point for validating record contents at produce time

2023-06-26 Thread Edoardo Comar
Hi Kirk,
thanks for your comments.

> 1. Does record.validation.policy.class.name support multiple classes, or just 
> one? I’m probably not wrapping my head around it,
> but I’d imagine different policies for different families or groupings of 
> topics, thus the need for supporting multiple policies.
> But if there are multiple then you run the risk of conflicts of ownership of 
> validation, so ¯\_(ツ)_/¯

We have only identified the use case for a single policy, but as
mentioned in the Rejected Alternatives section of the KIP, we think a
Facade-like policy could be written by a Kafka admin to dispatch
validation to different policies. Allowing multiple policies to be
specified would introduce complexities (as you noted), and we wanted
to avoid making too many assumptions without having a strong use-case
for this support.

> 2. Is there any concern that a validation class may alter the contents of the 
> ByteBuffer of the key and/or value?
> Perhaps that’s already handled and/or outside the scope of this KIP?

Good point. This behaviour isn’t defined, and what happens could
change between Kafka versions (or depending on compression settings).
We have modified the Javadoc in the KIP to indicate that the
ByteBuffer’s are read-only wrapper, as we don’t intend this plug-point
to be used for modifying message data. We also spotted that this was a
problem for returning headers (as the common Header interface returns
a byte array from one of its methods), and have updated the Javadoc
accordingly.


> 3. What is the benefit to introducing the inner TopicMetadata and RecordProxy 
> interfaces vs.
> passing the TopicPartition, String (validation policy), and Record into the 
> validate() method directly?

We wanted to avoid Record as the package
org.apache.kafka.common.record is documented as *not* being a
published API, and doesn’t necessarily maintain compatibility between
versions. This was highlighted as a potential problem during the
discussion of KIP-729.

We designed the API using interfaces as arguments to the methods, so
that further properties can be added in the future without breaking
existing implementations.


On Wed, 21 Jun 2023 at 17:08, Kirk True  wrote:
>
> Hi Edo/Adrian!
>
> Thanks for the KIP.
>
> I have some questions, and apologies that the may fall under the “stupid” 
> column because I’m not that familiar with this area :)
>
> 1. Does record.validation.policy.class.name support multiple classes, or just 
> one? I’m probably not wrapping my head around it, but I’d imagine different 
> policies for different families or groupings of topics, thus the need for 
> supporting multiple policies. But if there are multiple then you run the risk 
> of conflicts of ownership of validation, so ¯\_(ツ)_/¯
>
> 2. Is there any concern that a validation class may alter the contents of the 
> ByteBuffer of the key and/or value? Perhaps that’s already handled and/or 
> outside the scope of this KIP?
>
> 3. What is the benefit to introducing the inner TopicMetadata and RecordProxy 
> interfaces vs. passing the TopicPartition, String (validation policy), and 
> Record into the validate() method directly?
>
> Thanks,
> Kirk
>
> > On Jun 20, 2023, at 2:28 AM, Edoardo Comar  wrote:
> >
> > Thanks Николай,
> > We’d like to open a vote next week.
> > Hopefully getting some more feedback before then.
> >
> > Edo
> >
> >
> > On Wed, 7 Jun 2023 at 19:15, Николай Ижиков  wrote:
> >
> >> Hello.
> >>
> >> As author of one of related KIPs I’m +1 for this change.
> >> Long waited feature.
> >>
> >>> 7 июня 2023 г., в 19:02, Edoardo Comar  написал(а):
> >>>
> >>> Dear all,
> >>> Adrian and I would like to start a discussion thread on
> >>>
> >>> KIP-940: Broker extension point for validating record contents at
> >> produce time
> >>>
> >>>
> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-940%3A+Broker+extension+point+for+validating+record+contents+at+produce+time
> >>>
> >>> This KIP proposes a new broker-side extension point (a “record
> >> validation policy”) that can be used to reject records published by a
> >> misconfigured client.
> >>> Though general, it is aimed at the common, best-practice use case of
> >> defining Kafka message formats with schemas maintained in a schema 
> >> registry.
> >>>
> >>> Please post your feedback, thanks !
> >>>
> >>> Edoardo & Adrian
> >>>
> >>> Unless otherwise stated above:
> >>>
> >>> IBM United Kingdom Limited
> >>> Registered in England and Wales with number 741598
> >>> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
> >>
> >>
>


Re: Permissions to contribute to Apache Kafka

2023-06-26 Thread Chris Egerton
Hi Erik,

Thanks for contributing to Apache Kafka! You should be good to go now. I
also took the liberty of assigning KAFKA-14972 to you; feel free to
unassign if you do not plan to work on that ticket.

Cheers,

Chris

On Mon, Jun 26, 2023 at 9:57 AM Erik van Oosten
 wrote:

> Dear reader,
>
> I would like to create a KIP and understand I need to request
> permissions for that.
>
> my wiki username: e.vanoos...@chello.nl  (note, this is /not/ my email
> address)
> my Jira username: erikvanoosten
>
> Kind regards,
>  Erik.
>
>
> --
> Erik van Oosten
> e.vanoos...@grons.nl
> https://day-to-day-stuff.blogspot.com
>


Permissions to contribute to Apache Kafka

2023-06-26 Thread Erik van Oosten

Dear reader,

I would like to create a KIP and understand I need to request 
permissions for that.


my wiki username: e.vanoos...@chello.nl  (note, this is /not/ my email 
address)

my Jira username: erikvanoosten

Kind regards,
    Erik.


--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com


Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-06-26 Thread Igor Soarez
Hi Colin,

Thanks for your support with getting this over the line and that’s
great re the preliminary pass! Thanks also for sharing your
thoughts, I've had a careful look at each of these and sharing my
comments below.

I agree, it is important to avoid a perf hit on non-JBOD.
I've opted for tagged fields in:

 - Assignment.Directory in PartitionRecord and PartitionChangeRecord
 - OnlineLogDirs in RegisterBrokerRecord,
   BrokerRegistrationChangeRecord and BrokerRegistrationRequest
 - OfflineLogDirs in BrokerHeartbeatRequest

I don't think we should use UUID.Zero to refer to the first log
directory because this value also indicates "unknown, or no log dir
yet assigned". We can work with the default value by gating on JBOD
configuration — determined by log.dirs (Broker side) and
BrokerRegistration (Controller side). In non-JBOD:
 - The single logdir won't have a UUID
 - BrokerRegistration doesn't list any log dir UUIDs
 - AssignReplicasToDirs is never used

Directory reassignment will work the same way as in ZK mode, but
with the difference that the promotion of the future replica
requires an AssignReplicasToDirs request to update the assignment.
I've tried to improve the description of this operation and
included a diagram to illustrate it.

I've renamed LogDirsOfflined to OfflineLogDirs in the
BrokerHeartbeatRequest. This field was named differently because
it's only used for log directories that have become offline but are
not yet represented as offline in the metadata, from the Broker's
point of view — as opposed to always listing the full set of offline
log dirs.

I don't think we should identify log directories using system
paths, because those may be arbitrary. A set of storage devices may
be mounted and re-mounted on the same set of system paths using a
different order every time. Kafka only cares about the content, not
the location of the log directories.

I think I have overcomplicated this by trying to identify offline
log directories. In ZK mode we don't care to do this, and we
shouldn't do it in KRaft either. What we need to know is if there
are any offline log directories, to prevent re-streaming the offline
replicas into the remaining online log dirs. In ZK mode, the 'isNew'
flag is used to prevent the Broker from creating partitions when any
logdir is offline unless they're new. In KRaft the Controller can
reset the assignment to UUID.Zero for replicas in log dirs not
listed in the broker registration only when the broker registration
indicates no offline log dirs. So I've updated the KIP to:
 - Remove directory.ids from meta.properties
 - Change OfflineLogDirs in RegisterBrokerRecord,
   BrokerRegistrationChangeRecord and BrokerRegistrationRequest
   to a boolean
 - Describe this behavior in the Controller in Broker

It’s been a lot of work to get here and I’m similarly excited to get
this released soon! The vote has been open over the last week
and I'm happy to give it another two to get any other thoughts without
rushing. Thanks again for the support and input!

Best,

--
Igor




[jira] [Created] (KAFKA-15121) FileStreamSourceConnector and FileStreamSinkConnector should implement KIP-875 APIs

2023-06-26 Thread Yash Mayya (Jira)
Yash Mayya created KAFKA-15121:
--

 Summary: FileStreamSourceConnector and FileStreamSinkConnector 
should implement KIP-875 APIs
 Key: KAFKA-15121
 URL: https://issues.apache.org/jira/browse/KAFKA-15121
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Yash Mayya
Assignee: Yash Mayya


[https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
 introduced the new SourceConnector::alterOffsets and 
SinkConnector::alterOffsets APIs. The FileStreamSourceConnector and 
FileStreamSinkConnector should implement these new methods to improve the user 
experience when modifying offsets for these connectors and also to serve as an 
example for other connectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: How to Integrate MySQL database with Kafka? Need Demo

2023-06-26 Thread Avani Panchal
Hi Sagar,

Thank you for the information, you solved our confusion.
I also saw lots of links for documentation on Kafka, but I am confused
which document I should use.
So can you share the proper link from where I can read the documents.

Thanks,
Avani Panchal


On Mon, Jun 26, 2023 at 1:48 PM Sagar  wrote:

> Hey Avani,
>
> Kafka Connect  is the
> tool
> to use when you want to stream data to/from Kafka via external systems. One
> would typically configure connectors which allow streaming data to/from
> Kafka. There are 2 types of connectors:
> 1) Source Connectors: Which stream data from external systems like
> databases etc to Kafka and
> 2) Sink Connectors: Which stream data from Kafka to external systems.
>
> Since you want to stream data from MySQL to SQL Server, with Kafka Connect
> it would be a 2 step process:
>
> 1) Capture changes from MySQL to Kafka using connectors like JDBC source
> connector or Debezium MySQL connector.
> 2) Once the data is in Kafka, you can use JDBC sink connectors to stream
> data from Kafka topics to the tables in SQL Server.
>
> Note that this is a very simplified view of how you can achieve your goal
> of streaming changes from MySQL to SQL Server and I would recommend reading
> the documentation of the individual connectors and the Kafka Connect
> framework to understand how to make it work for your usecase.
>
> Thanks for your interest on Apache Kafka!
>
> Thanks!
> Sagar.
>
>
> On Mon, Jun 26, 2023 at 11:42 AM Avani Panchal
>  wrote:
>
> > Hi,
> > In my application I  want to sync my client's data to my SQL server. at
> > client place the database is MYSQL.
> >
> > How can I achieve this using Kafka? I read a lot of documents but I don't
> > understand which setup I need and how I can achieve it.
> >
> > I was also wondering about "Book a demo with Kafka" but didn't find it.
> >
> > Please help me.
> >
> > Thank you,
> > Avani
> >
>


Re: How to Integrate MySQL database with Kafka? Need Demo

2023-06-26 Thread Avani Panchal
Hi Josep Prat,

Thank you for your reply, I was wondering from so many days you solved
my problem.
So you are from "Aiven", your company is providing a demo for Kafka?

Thanks,
Avani Panchal


On Mon, Jun 26, 2023 at 1:46 PM Josep Prat 
wrote:

> Hi Avani,
>
> Thanks for taking interest in Apache Kafka.
> Regarding the example code involving a MySQL database as a sink you can
> take a look at this blog post:
>
> https://aiven.io/developer/db-technology-migration-with-apache-kafka-and-kafka-connect
> The example in there contains more moving pieces, but I believe it contains
> your use case. Disclaimer, I'm employed by Aiven.
>
> Regarding the "Book a demo with Kafka" you won't find this on the
> kafka.apache.org page, because this is the project's website. For
> commercial offerings you should head over to the websites of the copmanies
> that offer Apache Kafka as a service. aiven.io, confluent.io and
> instaclustr.com should have an option to request a demo.
>
> Best,
>
> On Mon, Jun 26, 2023 at 8:12 AM Avani Panchal
>  wrote:
>
> > Hi,
> > In my application I  want to sync my client's data to my SQL server. at
> > client place the database is MYSQL.
> >
> > How can I achieve this using Kafka? I read a lot of documents but I don't
> > understand which setup I need and how I can achieve it.
> >
> > I was also wondering about "Book a demo with Kafka" but didn't find it.
> >
> > Please help me.
> >
> > Thank you,
> > Avani
> >
>
>
> --
> [image: Aiven] 
>
> *Josep Prat*
> Open Source Engineering Director, *Aiven*
> josep.p...@aiven.io   |   +491715557497
> aiven.io    |    >
>      <
> https://twitter.com/aiven_io>
> *Aiven Deutschland GmbH*
> Alexanderufer 3-7, 10117 Berlin
> Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
> Amtsgericht Charlottenburg, HRB 209739 B
>


[jira] [Created] (KAFKA-15120) Optimizing Recovery Time for Non-Transactional and Idempotent Partitions in Kafka

2023-06-26 Thread weiguang liu (Jira)
weiguang liu created KAFKA-15120:


 Summary: Optimizing Recovery Time for Non-Transactional and 
Idempotent Partitions in Kafka
 Key: KAFKA-15120
 URL: https://issues.apache.org/jira/browse/KAFKA-15120
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: weiguang liu


Kafka's recovery logic involves rebuilding the index and producerStats from the 
log segment after the recovery point. In scenarios where a broker has a large 
number of partitions, the recovery time can become very long. For example, when 
a broker has 1,000 partitions and the average log segment size is 1GB, the 
broker may require reading as much as 500GB of log data for recovery, which can 
be unbearable. Most of the partitions might not be using transactions and 
idempotency, so can we consider using a recovery method that starts from the 
recovery point for those partitions that do not use transactions and 
idempotency, instead of starting the recovery from the beginning of the entire 
log segment? My understanding is that for non-transactional and idempotent 
partitions, the index is append-only and can be completely recovered from the 
recovery point, rather than from the start offset of the log segment. I am not 
sure what the potential risks of this approach might be or why the community 
did not consider it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: How to Integrate MySQL database with Kafka? Need Demo

2023-06-26 Thread Sagar
Hey Avani,

Kafka Connect  is the tool
to use when you want to stream data to/from Kafka via external systems. One
would typically configure connectors which allow streaming data to/from
Kafka. There are 2 types of connectors:
1) Source Connectors: Which stream data from external systems like
databases etc to Kafka and
2) Sink Connectors: Which stream data from Kafka to external systems.

Since you want to stream data from MySQL to SQL Server, with Kafka Connect
it would be a 2 step process:

1) Capture changes from MySQL to Kafka using connectors like JDBC source
connector or Debezium MySQL connector.
2) Once the data is in Kafka, you can use JDBC sink connectors to stream
data from Kafka topics to the tables in SQL Server.

Note that this is a very simplified view of how you can achieve your goal
of streaming changes from MySQL to SQL Server and I would recommend reading
the documentation of the individual connectors and the Kafka Connect
framework to understand how to make it work for your usecase.

Thanks for your interest on Apache Kafka!

Thanks!
Sagar.


On Mon, Jun 26, 2023 at 11:42 AM Avani Panchal
 wrote:

> Hi,
> In my application I  want to sync my client's data to my SQL server. at
> client place the database is MYSQL.
>
> How can I achieve this using Kafka? I read a lot of documents but I don't
> understand which setup I need and how I can achieve it.
>
> I was also wondering about "Book a demo with Kafka" but didn't find it.
>
> Please help me.
>
> Thank you,
> Avani
>


Re: How to Integrate MySQL database with Kafka? Need Demo

2023-06-26 Thread Josep Prat
Hi Avani,

Thanks for taking interest in Apache Kafka.
Regarding the example code involving a MySQL database as a sink you can
take a look at this blog post:
https://aiven.io/developer/db-technology-migration-with-apache-kafka-and-kafka-connect
The example in there contains more moving pieces, but I believe it contains
your use case. Disclaimer, I'm employed by Aiven.

Regarding the "Book a demo with Kafka" you won't find this on the
kafka.apache.org page, because this is the project's website. For
commercial offerings you should head over to the websites of the copmanies
that offer Apache Kafka as a service. aiven.io, confluent.io and
instaclustr.com should have an option to request a demo.

Best,

On Mon, Jun 26, 2023 at 8:12 AM Avani Panchal
 wrote:

> Hi,
> In my application I  want to sync my client's data to my SQL server. at
> client place the database is MYSQL.
>
> How can I achieve this using Kafka? I read a lot of documents but I don't
> understand which setup I need and how I can achieve it.
>
> I was also wondering about "Book a demo with Kafka" but didn't find it.
>
> Please help me.
>
> Thank you,
> Avani
>


-- 
[image: Aiven] 

*Josep Prat*
Open Source Engineering Director, *Aiven*
josep.p...@aiven.io   |   +491715557497
aiven.io    |   
     
*Aiven Deutschland GmbH*
Alexanderufer 3-7, 10117 Berlin
Geschäftsführer: Oskari Saarenmaa & Hannu Valtonen
Amtsgericht Charlottenburg, HRB 209739 B


Review request for Java Kafka consumer

2023-06-26 Thread Erik van Oosten

Dear Kafka developers,

I submitted https://github.com/apache/kafka/pull/13914 to fix a long 
standing problem that the Kafka consumer on the JVM is not usable from 
asynchronous runtimes such as Kotlin co-routines and ZIO.


Your review is much appreciated.

Kind regards,
    Erik.


--
Erik van Oosten
e.vanoos...@grons.nl
https://day-to-day-stuff.blogspot.com



Re: [DISCUSS] KIP-714: Client metrics and observability

2023-06-26 Thread Andrew Schofield
Hi Kirk,
Thanks for your comments.

1) I agree that this KIP is not aiming to create a new first-class construct of 
a unique, managed, per-client instance ID.
I’ll add some clarifying words.

2) I can see what the KIP is trying to say about the Terminating flag, but it 
doesn’t quite do it for me. Essentially,
a terminating client with an active metrics subscription can send a final 
metrics push without waiting for the
interval to elapse. However, there’s a chance that the subscription has changed 
and the PushTelemetry RPC fails
with INVALID_SUBSCRIPTION_ID. Then, the client is supposed to get a new 
subscription ID and presumably
send its terminating metrics with this new ID without waiting for the push 
interval to elapse. I will update the text.

3) The KIP is not explicit about the regular expression matcher for matching 
client selectors. I will change it to
call out RE2/J in line with KIP-848. This is also a user-provided, server-side 
regular expression match.

4) I think you’re right about the inclusion of temporality in the 
GetTelemetrySubscriptions response. A client would
be expected to support both cumulative or delta, although initially the broker 
will only use delta. However, it’s quite
an important part of the OTLP metrics specification. I think there is benefit 
in supporting this in KIP-714 clients
to enable temporality to be used by brokers without requiring another round of 
client version upgrades.

5) The ClientTelemetry interface gives a level of indirection between the 
MetricsReporter and the
ClientTelemetryReceiver. A MetricsReporter could implement 
ClientTelemetryReceiver directly, but
the implementation of the CTR could equally well be a separate object.

Thanks for helping to tighten up the KIP.

Andrew

> On 20 Jun 2023, at 16:47, Kirk True  wrote:
>
> Hi Andrew,
>
>
>
>> On Jun 13, 2023, at 8:06 AM, Andrew Schofield 
>>  wrote:
>>
>> Hi,
>> I would like to start a new discussion thread on KIP-714: Client metrics and 
>> observability.
>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-714%3A+Client+metrics+and+observability
>>
>> I have edited the proposal significantly to reduce the scope. The overall 
>> mechanism for client metric subscriptions is unchanged, but the
>> KIP is now based on the existing client metrics, rather than introducing new 
>> metrics. The purpose remains helping cluster operators
>> investigate performance problems experienced by clients without requiring 
>> changes to the client application code or configuration.
>>
>> Thanks,
>> Andrew
>
> Thanks for the KIP updates. A few questions:
>
> 1. The concept of a client instance ID is somewhat similar to the unique 
> producer ID that is created for transactional producers. Can we augment the 
> name or somehow clarify that this client instance ID is only for use by 
> telemetry? The pandora’s box alternative is to make the creation, management, 
> etc. of a unique, per-client instance ID a first-class construct. I assume 
> that’s something we don’t want to bite off in this KIP ;)
>
> 2. I’m having trouble understanding where this provision for the terminating 
> flag would be useful:
>
>> The Terminating flag may be reused upon the next expiry of PushIntervalMs.
>
> In the happy path, the terminating flag is set once at time of application 
> shutdown by the close() method of a client. A buggy/nefarious client may send 
> multiple push telemetry requests with the terminating flag set to skirt 
> throttling. What’s the use case where an application would want to send a 
> second request with the terminating flag set after PushInteralMs?
>
> 3. KIP-848 introduces a new flavor of regex for topic subscriptions. Is that 
> what we plan to adopt for the regex used by the subscription match?
>
> 4. What’s the benefit of having the broker specify the delta temporality if 
> it’s (for now) always delta, besides API protocol bumping?
>
> 5. What is gained by the existence of the ClientTelemetry interface? Why not 
> let interested parties implement ClientTelemetryReceiver directly?
>
> Thanks!




How to Integrate MySQL database with Kafka? Need Demo

2023-06-26 Thread Avani Panchal
Hi,
In my application I  want to sync my client's data to my SQL server. at
client place the database is MYSQL.

How can I achieve this using Kafka? I read a lot of documents but I don't
understand which setup I need and how I can achieve it.

I was also wondering about "Book a demo with Kafka" but didn't find it.

Please help me.

Thank you,
Avani