Re: [VOTE] KIP-858: Handle JBOD broker disk failure in KRaft

2023-07-03 Thread Colin McCabe
On Mon, Jun 26, 2023, at 05:08, Igor Soarez wrote:
> Hi Colin,
>
> Thanks for your support with getting this over the line and that’s
> great re the preliminary pass! Thanks also for sharing your
> thoughts, I've had a careful look at each of these and sharing my
> comments below.
>
> I agree, it is important to avoid a perf hit on non-JBOD.
> I've opted for tagged fields in:
>
>  - Assignment.Directory in PartitionRecord and PartitionChangeRecord
>  - OnlineLogDirs in RegisterBrokerRecord,
>BrokerRegistrationChangeRecord and BrokerRegistrationRequest
>  - OfflineLogDirs in BrokerHeartbeatRequest
>
> I don't think we should use UUID.Zero to refer to the first log
> directory because this value also indicates "unknown, or no log dir
> yet assigned". We can work with the default value by gating on JBOD
> configuration — determined by log.dirs (Broker side) and
> BrokerRegistration (Controller side).

Hi Igor,

If we want to reserve zero for unknown, we can do that. Then "1" can be 
reserved for the first log directory. But it's unclear to me why we need an 
"unknown" value (see below).

> In non-JBOD:
>  - The single logdir won't have a UUID
>  - BrokerRegistration doesn't list any log dir UUIDs
>  - AssignReplicasToDirs is never used
>
> Directory reassignment will work the same way as in ZK mode, but
> with the difference that the promotion of the future replica
> requires an AssignReplicasToDirs request to update the assignment.
> I've tried to improve the description of this operation and
> included a diagram to illustrate it.
>
> I've renamed LogDirsOfflined to OfflineLogDirs in the
> BrokerHeartbeatRequest. This field was named differently because
> it's only used for log directories that have become offline but are
> not yet represented as offline in the metadata, from the Broker's
> point of view — as opposed to always listing the full set of offline
> log dirs.
>
> I don't think we should identify log directories using system
> paths, because those may be arbitrary. A set of storage devices may
> be mounted and re-mounted on the same set of system paths using a
> different order every time. Kafka only cares about the content, not
> the location of the log directories.

The problem is that the only thing the broker can really reliably determine is 
the UUIDs of the directories that it can read. The UUID of directories that it 
can't read can only be inferred. And the inference can't be done with 100% 
reliability.

>
> I think I have overcomplicated this by trying to identify offline
> log directories. In ZK mode we don't care to do this, and we
> shouldn't do it in KRaft either.

I agree that identifying the UUIDs of the directories that are present (as 
opposed to those that are absent) seems much cleaner. As I mentioned above, we 
can't reliably find missing or broken directories anyway.

> What we need to know is if there
> are any offline log directories, to prevent re-streaming the offline
> replicas into the remaining online log dirs. In ZK mode, the 'isNew'
> flag is used to prevent the Broker from creating partitions when any
> logdir is offline unless they're new. In KRaft the Controller can
> reset the assignment to UUID.Zero for replicas in log dirs not
> listed in the broker registration only when the broker registration
> indicates no offline log dirs.

In the previous email I asked, "who is responsible for assigning replicas to 
broker directories?" Can you clarify what the answer is to that? If the answer 
is the controller, there is no need for an "unknown" state for assignments, 
since the controller can simply choose an assignment immediately when it 
creates a replica.

The broker has access to the same metadata, of course, and already knows what 
directory a replica is supposed to be in. If that directory doesn't exist, then 
the replica is offline. There is no need to modify replicas to have an 
"unknown" assignment state.

best,
Colin


> So I've updated the KIP to:
>  - Remove directory.ids from meta.properties
>  - Change OfflineLogDirs in RegisterBrokerRecord,
>BrokerRegistrationChangeRecord and BrokerRegistrationRequest
>to a boolean
>  - Describe this behavior in the Controller in Broker
>
> It’s been a lot of work to get here and I’m similarly excited to get
> this released soon! The vote has been open over the last week
> and I'm happy to give it another two to get any other thoughts without
> rushing. Thanks again for the support and input!
>
> Best,
>
> --
> Igor


Re: [DISCUSS] KIP-942: Add Power(ppc64le) support

2023-07-03 Thread Colin McCabe
I agree with Divij. A nightly Apache Kafka build for PowerPC would be welcome. 
But it shouldn't run on every build, since the extra time and complexity would 
not be worth it.

By the way, are there any features or plugins we don't intend to support on 
PPC? If there are, this KIP would be a good place to spell them out.

Naively, I would think all of our Java and Scala code should work on PPC 
without changes. However, there may be library dependencies that don't exist on 
PPC. (We have to remember that the last desktop PowerPC chip that an average 
user could buy shipped in 2005)

best,
Colin


On Mon, Jun 19, 2023, at 23:12, Vaibhav Nazare wrote:
> Thank you for response Divij.
>
> 1. We are going to use ASF infra provided nodes for better availability 
> and stability as there are 3 power9 nodes managed officially by ASF 
> infra team themselves.
> Ref: https://issues.apache.org/jira/browse/INFRA-24663
> https://jenkins-ccos.apache.org/view/Shared%20-%20ppc64le%20nodes/
> previously used power node details for apache/kafka CI:
> RAM- 16GB
> VCPUs- 8 VCPU
> Disk- 160GB
> for shared VMs we need to check with ASF infra team to provide details
>
> 2. We can run nightly builds once or twice in a day on specific period 
> of time instead of every commit
> 3. apache/camel https://builds.apache.org/job/Camel/job/el/ has already 
> enabled CI for power platform they are using same H/W resources as
> RAM- 16GB
> VCPUs- 8 VCPU
> Disk- 160GB
>
> -Original Message-
> From: Divij Vaidya  
> Sent: Monday, June 19, 2023 10:20 PM
> To: dev@kafka.apache.org
> Subject: [EXTERNAL] Re: [DISCUSS] KIP-942: Add Power(ppc64le) support
>
> Thank you for the KIP Vaibhav.
>
> 1. Builds for power architecture were intentionally disabled in the 
> past since the infrastructure was flaky [1]. Could you please add to 
> the KIP on what has changed since then?
> 2. What do you think about an alternative solution where we run a 
> nightly build for this platform instead of running the CI with every 
> PR/commit?
> 3. To bolster the case for this KIP, could you please add information 
> from other Apache projects who are already running CI for this 
> platform? Is their CI stable on Apache Infra hosts?
>
>
> [1] https://github.com/apache/kafka/pull/12380 
>
> --
> Divij Vaidya
>
>
>
> On Mon, Jun 19, 2023 at 12:30 PM Vaibhav Nazare 
>  wrote:
>
>>
>> INVALID URI REMOVED
>> confluence_display_KAFKA_KIP-2D942-253A-2BAdd-2BPower-2528ppc64le-2529
>> -2Bsupport&d=DwIFaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=s9I3h_d72lHAurpHrTUoOkX
>> 8ByFHVUGD0XU1PTKfCiw&m=z6ZZ_vt5XP--aKB5lpRRZxdVMA37hD_0ch7COCLdMtLhMve
>> 8AJcbKfwRtBac267r&s=BQtj2lbWlu32mK0TP37XeZanal33QOf5HB1-33QJIqc&e=
>>


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1969

2023-07-03 Thread Apache Jenkins Server
See 




Re: [VOTE] KIP-793: Allow sink connectors to be used with topic-mutating SMTs

2023-07-03 Thread Greg Harris
Hey Yash,

Thanks so much for your effort in the design and discussion phase!

+1 (non-binding)

Greg

On Mon, Jul 3, 2023 at 7:19 AM Chris Egerton  wrote:
>
> Hi Yash,
>
> Thanks for the KIP! +1 (binding)
>
> Cheers,
>
> Chris
>
> On Mon, Jul 3, 2023 at 7:02 AM Yash Mayya  wrote:
>
> > Hi all,
> >
> > I'd like to start a vote on KIP-793 which enables sink connector
> > implementations to be used with SMTs that mutate the topic / partition /
> > offset information of a record.
> >
> > KIP -
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs
> >
> > Discussion thread -
> > https://lists.apache.org/thread/dfo3spv0xtd7vby075qoxvcwsgx5nkj8
> >
> > Thanks,
> > Yash
> >


[GitHub] [kafka-site] divijvaidya merged pull request #530: Revert "MINOR: Fix typo in coding-guide.html (#529)"

2023-07-03 Thread via GitHub


divijvaidya merged PR #530:
URL: https://github.com/apache/kafka-site/pull/530


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #1968

2023-07-03 Thread Apache Jenkins Server
See 




[jira] [Created] (KAFKA-15144) Checkpoint downstreamOffset stuck to 1

2023-07-03 Thread Edoardo Comar (Jira)
Edoardo Comar created KAFKA-15144:
-

 Summary: Checkpoint downstreamOffset stuck to 1
 Key: KAFKA-15144
 URL: https://issues.apache.org/jira/browse/KAFKA-15144
 Project: Kafka
  Issue Type: Bug
  Components: mirrormaker
Reporter: Edoardo Comar
 Attachments: edo-connect-mirror-maker-sourcetarget.properties

Steps to reproduce :

Start source cluster

Start target cluster

start connect-mirror-maker.sh using a config like the attached

 

create topic in source cluster

produce a few messages

consume them all with autocmiit enabled

 

then dumping the Checkpoint topic content e.g.

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
source.checkpoints.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.CheckpointFormatter

Checkpoint\{consumerGroupId=edogroup, 
topicPartition=source.vf-mirroring-test-edo-0, upstreamOffset=3, 
downstreamOffset=1, metadata=}


the downstreamOffset remains at 1, while, in a fresh cluster pair like with the 
source topic created while MM2 is running, I'd expect the downstreamOffset to 
match the upstreamOffset.

 

dumping the offset sync topic

% bin/kafka-console-consumer.sh --bootstrap-server localhost:9192 --topic 
mm2-offset-syncs.source.internal --from-beginning --formatter 
org.apache.kafka.connect.mirror.formatters.OffsetSyncFormatter

shows matching initial offsets

OffsetSync\{topicPartition=vf-mirroring-test-edo-0, upstreamOffset=0, 
downstreamOffset=0}
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[GitHub] [kafka-site] divijvaidya opened a new pull request, #530: Revert "MINOR: Fix typo in coding-guide.html (#529)"

2023-07-03 Thread via GitHub


divijvaidya opened a new pull request, #530:
URL: https://github.com/apache/kafka-site/pull/530

   This reverts commit 4de394f2ff19eef742b9e08098e329da2c7d0146.
   
   I, incorrect merged a PR introducing a typo! Reverting that change.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya commented on pull request #529: Update coding-guide.html

2023-07-03 Thread via GitHub


divijvaidya commented on PR #529:
URL: https://github.com/apache/kafka-site/pull/529#issuecomment-1618645568

   > Am I missing something? This is not fixing a typo, this is introducing 
one! It's replacing them with themnn
   
   Ah no! You are right. It's me being hasty and confusing the changed vs. 
earlier. I will fix it. Sorry about this.
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-07-03 Thread Nick Telford
Hi Bruno

Yes, that's correct, although the impact on IQ is not something I had
considered.

What about atomically updating the state store from the transaction
> buffer every commit interval and writing the checkpoint (thus, flushing
> the memtable) every configured amount of data and/or number of commit
> intervals?
>

I'm not quite sure I follow. Are you suggesting that we add an additional
config for the max number of commit intervals between checkpoints? That
way, we would checkpoint *either* when the transaction buffers are nearly
full, *OR* whenever a certain number of commit intervals have elapsed,
whichever comes first?

That certainly seems reasonable, although this re-ignites an earlier debate
about whether a config should be measured in "number of commit intervals",
instead of just an absolute time.

FWIW, I realised that this issue is the reason I was pursuing the Atomic
Checkpoints, as it de-couples memtable flush from checkpointing, which
enables us to just checkpoint on every commit without any performance
impact. Atomic Checkpointing is definitely the "best" solution, but I'm not
sure if this is enough to bring it back into this KIP.

I'm currently working on moving all the transactional logic directly into
RocksDBStore itself, which does away with the StateStore#newTransaction
method, and reduces the number of new classes introduced, significantly
reducing the complexity. If it works, and the complexity is drastically
reduced, I may try bringing back Atomic Checkpoints into this KIP.

Regards,
Nick

On Mon, 3 Jul 2023 at 15:27, Bruno Cadonna  wrote:

> Hi Nick,
>
> Thanks for the insights! Very interesting!
>
> As far as I understand, you want to atomically update the state store
> from the transaction buffer, flush the memtable of a state store and
> write the checkpoint not after the commit time elapsed but after the
> transaction buffer reached a size that would lead to exceeding
> statestore.transaction.buffer.max.bytes before the next commit interval
> ends.
> That means, the Kafka transaction would commit every commit interval but
> the state store will only be atomically updated roughly every
> statestore.transaction.buffer.max.bytes of data. Also IQ would then only
> see new data roughly every statestore.transaction.buffer.max.bytes.
> After a failure the state store needs to restore up to
> statestore.transaction.buffer.max.bytes.
>
> Is this correct?
>
> What about atomically updating the state store from the transaction
> buffer every commit interval and writing the checkpoint (thus, flushing
> the memtable) every configured amount of data and/or number of commit
> intervals? In such a way, we would have the same delay for records
> appearing in output topics and IQ because both would appear when the
> Kafka transaction is committed. However, after a failure the state store
> still needs to restore up to statestore.transaction.buffer.max.bytes and
> it might restore data that is already in the state store because the
> checkpoint lags behind the last stable offset (i.e. the last committed
> offset) of the changelog topics. Restoring data that is already in the
> state store is idempotent, so eos should not violated.
> This solution needs at least one new config to specify when a checkpoint
> should be written.
>
>
>
> A small correction to your previous e-mail that does not change anything
> you said: Under alos the default commit interval is 30 seconds, not five
> seconds.
>
>
> Best,
> Bruno
>
>
> On 01.07.23 12:37, Nick Telford wrote:
> > Hi everyone,
> >
> > I've begun performance testing my branch on our staging environment,
> > putting it through its paces in our non-trivial application. I'm already
> > observing the same increased flush rate that we saw the last time we
> > attempted to use a version of this KIP, but this time, I think I know
> why.
> >
> > Pre-KIP-892, StreamTask#postCommit, which is called at the end of the
> Task
> > commit process, has the following behaviour:
> >
> > - Under ALOS: checkpoint the state stores. This includes
> > flushing memtables in RocksDB. This is acceptable because the default
> > commit.interval.ms is 5 seconds, so forcibly flushing memtables
> every 5
> > seconds is acceptable for most applications.
> > - Under EOS: checkpointing is not done, *unless* it's being forced,
> due
> > to e.g. the Task closing or being revoked. This means that under
> normal
> > processing conditions, the state stores will not be checkpointed,
> and will
> > not have memtables flushed at all , unless RocksDB decides to flush
> them on
> > its own. Checkpointing stores and force-flushing their memtables is
> only
> > done when a Task is being closed.
> >
> > Under EOS, KIP-892 needs to checkpoint stores on at least *some* normal
> > Task commits, in order to write the RocksDB transaction buffers to the
> > state stores, and to ensure the offsets are synced to disk to prevent
> > restores from getting out of hand.

[GitHub] [kafka-site] mimaison commented on pull request #529: Update coding-guide.html

2023-07-03 Thread via GitHub


mimaison commented on PR #529:
URL: https://github.com/apache/kafka-site/pull/529#issuecomment-1618628168

   Am I missing something? This is not fixing a typo, this is introducing one! 
It's replacing `them` with `themnn`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya commented on pull request #529: Update coding-guide.html

2023-07-03 Thread via GitHub


divijvaidya commented on PR #529:
URL: https://github.com/apache/kafka-site/pull/529#issuecomment-1618574725

   Thank you for fixing this typo @imranhirey!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [kafka-site] divijvaidya merged pull request #529: Update coding-guide.html

2023-07-03 Thread via GitHub


divijvaidya merged PR #529:
URL: https://github.com/apache/kafka-site/pull/529


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[jira] [Created] (KAFKA-15143) FixedKeyMockProcessorContext is missing from test-utils

2023-07-03 Thread Tomasz Kaszuba (Jira)
Tomasz Kaszuba created KAFKA-15143:
--

 Summary: FixedKeyMockProcessorContext is missing from test-utils
 Key: KAFKA-15143
 URL: https://issues.apache.org/jira/browse/KAFKA-15143
 Project: Kafka
  Issue Type: Bug
  Components: streams-test-utils
Affects Versions: 3.5.0
Reporter: Tomasz Kaszuba


I am trying to test a ContextualFixedKeyProcessor but it is not possible to 
call the init method from within a unit test since the MockProcessorContext 
doesn't implement  
{code:java}
FixedKeyProcessorContext {code}
but only
{code:java}
ProcessorContext
{code}
Shouldn't there also be a *MockFixedKeyProcessorContext* in the test utils?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-892: Transactional Semantics for StateStores

2023-07-03 Thread Bruno Cadonna

Hi Nick,

Thanks for the insights! Very interesting!

As far as I understand, you want to atomically update the state store 
from the transaction buffer, flush the memtable of a state store and 
write the checkpoint not after the commit time elapsed but after the 
transaction buffer reached a size that would lead to exceeding 
statestore.transaction.buffer.max.bytes before the next commit interval 
ends.
That means, the Kafka transaction would commit every commit interval but 
the state store will only be atomically updated roughly every 
statestore.transaction.buffer.max.bytes of data. Also IQ would then only 
see new data roughly every statestore.transaction.buffer.max.bytes.
After a failure the state store needs to restore up to 
statestore.transaction.buffer.max.bytes.


Is this correct?

What about atomically updating the state store from the transaction 
buffer every commit interval and writing the checkpoint (thus, flushing 
the memtable) every configured amount of data and/or number of commit 
intervals? In such a way, we would have the same delay for records 
appearing in output topics and IQ because both would appear when the 
Kafka transaction is committed. However, after a failure the state store 
still needs to restore up to statestore.transaction.buffer.max.bytes and 
it might restore data that is already in the state store because the 
checkpoint lags behind the last stable offset (i.e. the last committed 
offset) of the changelog topics. Restoring data that is already in the 
state store is idempotent, so eos should not violated.
This solution needs at least one new config to specify when a checkpoint 
should be written.




A small correction to your previous e-mail that does not change anything 
you said: Under alos the default commit interval is 30 seconds, not five 
seconds.



Best,
Bruno


On 01.07.23 12:37, Nick Telford wrote:

Hi everyone,

I've begun performance testing my branch on our staging environment,
putting it through its paces in our non-trivial application. I'm already
observing the same increased flush rate that we saw the last time we
attempted to use a version of this KIP, but this time, I think I know why.

Pre-KIP-892, StreamTask#postCommit, which is called at the end of the Task
commit process, has the following behaviour:

- Under ALOS: checkpoint the state stores. This includes
flushing memtables in RocksDB. This is acceptable because the default
commit.interval.ms is 5 seconds, so forcibly flushing memtables every 5
seconds is acceptable for most applications.
- Under EOS: checkpointing is not done, *unless* it's being forced, due
to e.g. the Task closing or being revoked. This means that under normal
processing conditions, the state stores will not be checkpointed, and will
not have memtables flushed at all , unless RocksDB decides to flush them on
its own. Checkpointing stores and force-flushing their memtables is only
done when a Task is being closed.

Under EOS, KIP-892 needs to checkpoint stores on at least *some* normal
Task commits, in order to write the RocksDB transaction buffers to the
state stores, and to ensure the offsets are synced to disk to prevent
restores from getting out of hand. Consequently, my current implementation
calls maybeCheckpoint on *every* Task commit, which is far too frequent.
This causes checkpoints every 10,000 records, which is a change in flush
behaviour, potentially causing performance problems for some applications.

I'm looking into possible solutions, and I'm currently leaning towards
using the statestore.transaction.buffer.max.bytes configuration to
checkpoint Tasks once we are likely to exceed it. This would complement the
existing "early Task commit" functionality that this configuration
provides, in the following way:

- Currently, we use statestore.transaction.buffer.max.bytes to force an
early Task commit if processing more records would cause our state store
transactions to exceed the memory assigned to them.
- New functionality: when a Task *does* commit, we will not checkpoint
the stores (and hence flush the transaction buffers) unless we expect to
cross the statestore.transaction.buffer.max.bytes threshold before the next
commit

I'm also open to suggestions.

Regards,
Nick

On Thu, 22 Jun 2023 at 14:06, Nick Telford  wrote:


Hi Bruno!

3.
By "less predictable for users", I meant in terms of understanding the
performance profile under various circumstances. The more complex the
solution, the more difficult it would be for users to understand the
performance they see. For example, spilling records to disk when the
transaction buffer reaches a threshold would, I expect, reduce write
throughput. This reduction in write throughput could be unexpected, and
potentially difficult to diagnose/understand for users.
At the moment, I think the "early commit" concept is relatively
straightforward; it's easy to document, and conceptually fairly obvious to
u

Re: [VOTE] KIP-793: Allow sink connectors to be used with topic-mutating SMTs

2023-07-03 Thread Chris Egerton
Hi Yash,

Thanks for the KIP! +1 (binding)

Cheers,

Chris

On Mon, Jul 3, 2023 at 7:02 AM Yash Mayya  wrote:

> Hi all,
>
> I'd like to start a vote on KIP-793 which enables sink connector
> implementations to be used with SMTs that mutate the topic / partition /
> offset information of a record.
>
> KIP -
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs
>
> Discussion thread -
> https://lists.apache.org/thread/dfo3spv0xtd7vby075qoxvcwsgx5nkj8
>
> Thanks,
> Yash
>


Re: Subscribe to Kafka dev mailing list

2023-07-03 Thread Divij Vaidya
Hello

You sent this email to the wrong email address. Please find instruction on
how to subscribe at https://kafka.apache.org/contact

--
Divij Vaidya



On Sun, Jul 2, 2023 at 10:59 PM 最红 <1559063...@qq.com.invalid> wrote:

> Subscribe to Kafka dev mailing list


Re:[DISCUSS] KIP-943: Add independent "offset.storage.segment.bytes" for connect-distributed.properties

2023-07-03 Thread hudeqi
Is anyone following this KIP? Bump this thread.

Re: [DISCUSS] Release plan for Apache Kafka 3.5.1 release

2023-07-03 Thread Divij Vaidya
Satish - Thank you for catching that. It is now fixed.

David - Please refer to the security@kafka mailing thread with "Reg CVE
2023-34455" where it was proposed to have an early release for 3.5.1. The
rationale of releasing 3.5.1 early is to have a version of Kafka released
which does not have any known CVE, specifically
https://issues.apache.org/jira/browse/KAFKA-15096. Separately, I am going
to start a PR today to update the CVE list with more information on this
CVE and the potential workaround.

--
Divij Vaidya



On Mon, Jul 3, 2023 at 2:00 PM David Jacot 
wrote:

> Hi Divij,
>
> Thanks for the release plan.
>
> I wonder if we should wait a little more as 3.5.0 was released on June
> 15th. Releasing 3.5.1 a month after seems not enough in order to have time
> to catch bugs in 3.5.0. I think that we usually release the first minor
> release ~3 months after the major one. Is there a reason to release it in
> July?
>
> As a side note, we don't have a formal code freeze for minor releases.
>
> Best,
> David
>
> On Mon, Jul 3, 2023 at 1:51 PM Divij Vaidya 
> wrote:
>
> > Hi folks
> >
> > Here's the release plan for
> > https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.5.1
> >
> > 3.5.1 will be a bug fix release which also addresses some of the CVEs
> such
> > as CVE-2023-34455 [1] in snappy-java. If all goes smoothly, I am
> estimating
> > a release date in the 3rd or 4th week of July. I will continue to post
> > important updates on the mailing list and you can also follow the
> progress
> > on the release plan wiki above.
> >
> > *Call for action* 📢
> >
> > If you think that a commit from the trunk should be backported to 3.5.1,
> > please let me know. Note that we usually backport only the critical bug
> > fixes which don't have a production work around and security fixes. Note
> > that code freeze is on 9th July and no new commits will be added to the
> 3.5
> > .1 release after that.
> >
> > *Important dates *📅
> >
> > 9th July - Code freeze for 3.5.1
> > 10th July - First release candidate is published for voting
> > 18th July - Expected completion of release
> >
> > --
> > Divij Vaidya
> > Release Manager for Apache Kafka 3.5.1
> >
> > [1] https://nvd.nist.gov/vuln/detail/CVE-2023-34455
> >
>


[jira] [Created] (KAFKA-15142) Add Client Metadata to RemoteStorageFetchInfo

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)
Jorge Esteban Quilcate Otoya created KAFKA-15142:


 Summary: Add Client Metadata to RemoteStorageFetchInfo
 Key: KAFKA-15142
 URL: https://issues.apache.org/jira/browse/KAFKA-15142
 Project: Kafka
  Issue Type: Improvement
  Components: core
Reporter: Jorge Esteban Quilcate Otoya
Assignee: Jorge Esteban Quilcate Otoya


Once Tiered Storage is deployed, it will be important to understand how remote 
data is accessed and what consumption patterns emerge on each deployment.

To do this, tiered storage logs/metrics could provide more context about which 
client is fetching which partition/offset range and when.

At the moment, Client metadata is not propagated to the tiered-storage 
framework. To fix this, {{RemoteStorageFetchInfo}} can be extended with 
{{Optional[ClientMetadata]}} available on {{{}FetchParams{}}}, and have this 
bits of data available to improve the logging/metrics when fetching.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-852 Optimize calculation of size for log in remote tier

2023-07-03 Thread Satish Duggana
Thanks Divij for taking the feedback and updating the motivation
section in the KIP.

One more comment on Alternative solution-3, The con is not valid as
that will not affect the broker restart times as discussed in the
earlier email in this thread. You may want to update that.

~Satish.

On Sun, 2 Jul 2023 at 01:03, Divij Vaidya  wrote:
>
> Thank you folks for reviewing this KIP.
>
> Satish, I have modified the motivation to make it more clear. Now it says,
> "Since the main feature of tiered storage is storing a large amount of
> data, we expect num_remote_segments to be large. A frequent linear scan
> (i.e. listing all segment metadata) could be expensive/slower because of
> the underlying storage used by RemoteLogMetadataManager. This slowness to
> list all segment metadata could result in the loss of availability"
>
> Jun, Kamal, Satish, if you don't have any further concerns, I would
> appreciate a vote for this KIP in the voting thread -
> https://lists.apache.org/thread/soz00990gvzodv7oyqj4ysvktrqy6xfk
>
> --
> Divij Vaidya
>
>
>
> On Sat, Jul 1, 2023 at 6:16 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Hi Divij,
> >
> > Thanks for the explanation. LGTM.
> >
> > --
> > Kamal
> >
> > On Sat, Jul 1, 2023 at 7:28 AM Satish Duggana 
> > wrote:
> >
> > > Hi Divij,
> > > I am fine with having an API to compute the size as I mentioned in my
> > > earlier reply in this mail thread. But I have the below comment for
> > > the motivation for this KIP.
> > >
> > > As you discussed offline, the main issue here is listing calls for
> > > remote log segment metadata is slower because of the storage used for
> > > RLMM. These can be avoided with this new API.
> > >
> > > Please add this in the motivation section as it is one of the main
> > > motivations for the KIP.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Sat, 1 Jul 2023 at 01:43, Jun Rao  wrote:
> > > >
> > > > Hi, Divij,
> > > >
> > > > Sorry for the late reply.
> > > >
> > > > Given your explanation, the new API sounds reasonable to me. Is that
> > > enough
> > > > to build the external metadata layer for the remote segments or do you
> > > need
> > > > some additional API changes?
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Fri, Jun 9, 2023 at 7:08 AM Divij Vaidya 
> > > wrote:
> > > >
> > > > > Thank you for looking into this Kamal.
> > > > >
> > > > > You are right in saying that a cold start (i.e. leadership failover
> > or
> > > > > broker startup) does not impact the broker startup duration. But it
> > > does
> > > > > have the following impact:
> > > > > 1. It leads to a burst of full-scan requests to RLMM in case multiple
> > > > > leadership failovers occur at the same time. Even if the RLMM
> > > > > implementation has the capability to serve the total size from an
> > index
> > > > > (and hence handle this burst), we wouldn't be able to use it since
> > the
> > > > > current API necessarily calls for a full scan.
> > > > > 2. The archival (copying of data to tiered storage) process will
> > have a
> > > > > delayed start. The delayed start of archival could lead to local
> > build
> > > up
> > > > > of data which may lead to disk full.
> > > > >
> > > > > The disadvantage of adding this new API is that every provider will
> > > have to
> > > > > implement it, agreed. But I believe that this tradeoff is worthwhile
> > > since
> > > > > the default implementation could be the same as you mentioned, i.e.
> > > keeping
> > > > > cumulative in-memory count.
> > > > >
> > > > > --
> > > > > Divij Vaidya
> > > > >
> > > > >
> > > > >
> > > > > On Sun, Jun 4, 2023 at 5:48 PM Kamal Chandraprakash <
> > > > > kamal.chandraprak...@gmail.com> wrote:
> > > > >
> > > > > > Hi Divij,
> > > > > >
> > > > > > Thanks for the KIP! Sorry for the late reply.
> > > > > >
> > > > > > Can you explain the rejected alternative-3?
> > > > > > Store the cumulative size of remote tier log in-memory at
> > > > > RemoteLogManager
> > > > > > "*Cons*: Every time a broker starts-up, it will scan through all
> > the
> > > > > > segments in the remote tier to initialise the in-memory value. This
> > > would
> > > > > > increase the broker start-up time."
> > > > > >
> > > > > > Keeping the source of truth to determine the remote-log-size in the
> > > > > leader
> > > > > > would be consistent across different implementations of the plugin.
> > > The
> > > > > > concern posted in the KIP is that we are calculating the
> > > remote-log-size
> > > > > on
> > > > > > each iteration of the cleaner thread (say 5 mins). If we calculate
> > > only
> > > > > > once during broker startup or during the leadership reassignment,
> > do
> > > we
> > > > > > still need the cache?
> > > > > >
> > > > > > The broker startup-time won't be affected by the remote log manager
> > > > > > initialisation. The broker continue to start accepting the new
> > > > > > produce/fetch requests, while the RLM thread in the background can
> > > > > > determ

Re: [DISCUSS] Release plan for Apache Kafka 3.5.1 release

2023-07-03 Thread David Jacot
Hi Divij,

Thanks for the release plan.

I wonder if we should wait a little more as 3.5.0 was released on June
15th. Releasing 3.5.1 a month after seems not enough in order to have time
to catch bugs in 3.5.0. I think that we usually release the first minor
release ~3 months after the major one. Is there a reason to release it in
July?

As a side note, we don't have a formal code freeze for minor releases.

Best,
David

On Mon, Jul 3, 2023 at 1:51 PM Divij Vaidya  wrote:

> Hi folks
>
> Here's the release plan for
> https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.5.1
>
> 3.5.1 will be a bug fix release which also addresses some of the CVEs such
> as CVE-2023-34455 [1] in snappy-java. If all goes smoothly, I am estimating
> a release date in the 3rd or 4th week of July. I will continue to post
> important updates on the mailing list and you can also follow the progress
> on the release plan wiki above.
>
> *Call for action* 📢
>
> If you think that a commit from the trunk should be backported to 3.5.1,
> please let me know. Note that we usually backport only the critical bug
> fixes which don't have a production work around and security fixes. Note
> that code freeze is on 9th July and no new commits will be added to the 3.5
> .1 release after that.
>
> *Important dates *📅
>
> 9th July - Code freeze for 3.5.1
> 10th July - First release candidate is published for voting
> 18th July - Expected completion of release
>
> --
> Divij Vaidya
> Release Manager for Apache Kafka 3.5.1
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2023-34455
>


Re: [DISCUSS] Release plan for Apache Kafka 3.5.1 release

2023-07-03 Thread Satish Duggana
Hi Divij,
Thanks for sharing the 3.5.1 release plan.

The "Open/Blocked/Merged Issues" sections in the wiki seem to point to
3.4.1 release instead of 3.5.1 release.

~Satish.

On Mon, 3 Jul 2023 at 17:22, Divij Vaidya  wrote:
>
> Hi folks
>
> Here's the release plan for
> https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.5.1
>
> 3.5.1 will be a bug fix release which also addresses some of the CVEs such
> as CVE-2023-34455 [1] in snappy-java. If all goes smoothly, I am estimating
> a release date in the 3rd or 4th week of July. I will continue to post
> important updates on the mailing list and you can also follow the progress
> on the release plan wiki above.
>
> *Call for action* 📢
>
> If you think that a commit from the trunk should be backported to 3.5.1,
> please let me know. Note that we usually backport only the critical bug
> fixes which don't have a production work around and security fixes. Note
> that code freeze is on 9th July and no new commits will be added to the 3.5
> .1 release after that.
>
> *Important dates *📅
>
> 9th July - Code freeze for 3.5.1
> 10th July - First release candidate is published for voting
> 18th July - Expected completion of release
>
> --
> Divij Vaidya
> Release Manager for Apache Kafka 3.5.1
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2023-34455


[DISCUSS] Release plan for Apache Kafka 3.5.1 release

2023-07-03 Thread Divij Vaidya
Hi folks

Here's the release plan for
https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.5.1

3.5.1 will be a bug fix release which also addresses some of the CVEs such
as CVE-2023-34455 [1] in snappy-java. If all goes smoothly, I am estimating
a release date in the 3rd or 4th week of July. I will continue to post
important updates on the mailing list and you can also follow the progress
on the release plan wiki above.

*Call for action* 📢

If you think that a commit from the trunk should be backported to 3.5.1,
please let me know. Note that we usually backport only the critical bug
fixes which don't have a production work around and security fixes. Note
that code freeze is on 9th July and no new commits will be added to the 3.5
.1 release after that.

*Important dates *📅

9th July - Code freeze for 3.5.1
10th July - First release candidate is published for voting
18th July - Expected completion of release

--
Divij Vaidya
Release Manager for Apache Kafka 3.5.1

[1] https://nvd.nist.gov/vuln/detail/CVE-2023-34455


[VOTE] KIP-793: Allow sink connectors to be used with topic-mutating SMTs

2023-07-03 Thread Yash Mayya
Hi all,

I'd like to start a vote on KIP-793 which enables sink connector
implementations to be used with SMTs that mutate the topic / partition /
offset information of a record.

KIP -
https://cwiki.apache.org/confluence/display/KAFKA/KIP-793%3A+Allow+sink+connectors+to+be+used+with+topic-mutating+SMTs

Discussion thread -
https://lists.apache.org/thread/dfo3spv0xtd7vby075qoxvcwsgx5nkj8

Thanks,
Yash


Re: [DISCUSS] KIP-793: Sink Connectors: Support topic-mutating SMTs for async connectors (preCommit users)

2023-07-03 Thread Yash Mayya
Hi Chris,

Thanks for pointing that out, I hadn't realized that the SubmittedRecords
class has almost exactly the same semantics needed for handling offset
commits in the per-sink record ack API case. However, I agree that it isn't
worth the tradeoff and we've already discussed the backward compatibility
concerns imposed on connector developers if we were to consider deprecating
/ removing the preCommit hook in favor of a new ack-based API.

Thanks,
Yash

On Thu, Jun 29, 2023 at 7:31 PM Chris Egerton 
wrote:

> Hi Yash,
>
> Thanks for your continued work on this tricky feature. I have no further
> comments or suggestions on the KIP and am ready to vote in favor of it.
>
> That said, I did want to quickly respond to this comment:
>
> > On a side note, this also means that the per sink record ack API
> that was proposed earlier wouldn't really work for this case since Kafka
> consumers themselves don't support per message acknowledgement semantics
> (and any sort of manual book-keeping based on offset linearity in a topic
> partition would be affected by things like log compaction, control records
> for transactional use cases etc.) right?
>
> I believe we could still use the SubmittedRecords class [1] (with some
> small tweaks) to track ack'd messages and the latest-committable offsets
> per topic partition, without relying on assumptions about offsets for
> consecutive records consumed from Kafka always differing by one. But at
> this point I think that, although this approach does come with the
> advantage of also enabling fine-grained metrics on record delivery to the
> sink system, it's not worth the tradeoff in intuition since it's less clear
> why users should prefer that API instead of using SinkTask::preCommit.
>
> [1] -
>
> https://github.com/apache/kafka/blob/12be344fdd3b20f338ccab87933b89049ce202a4/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/SubmittedRecords.java
>
> Cheers,
>
> Chris
>
> On Wed, Jun 21, 2023 at 9:46 AM Yash Mayya  wrote:
>
> > Hi Chris,
> >
> > Firstly, thanks for sharing your detailed thoughts on this thorny issue!
> > Point taken on Kafka Connect being a brownfield project and I guess we
> > might just need to trade off elegant / "clean" interfaces for fixing this
> > gap in functionality. Also, thanks for calling out all the existing
> > cross-plugin interactions and also the fact that connectors are not and
> > should not be developed in silos ignoring the rest of the ecosystem. That
> > said, here are my thoughts:
> >
> > > we could replace these methods with headers that the
> > > Connect runtime automatically injects into records directly
> > > before dispatching them to SinkTask::put.
> >
> > Hm, that's an interesting idea to get around the need for connectors to
> > handle potential 'NoSuchMethodError's in calls to
> > SinkRecord::originalTopic/originalKafkaPartition/originalKafkaOffset.
> > However, I'm inclined to agree that retrieving these values from the
> record
> > headers seems even less intuitive and I'm okay with adding this to the
> > rejected alternatives list.
> >
> > > we can consider eliminating the overridden
> > > SinkTask::open/close methods
> >
> > I tried to further explore the idea of keeping just the existing
> > SinkTask::open / SinkTask::close methods but only calling them with
> > post-transform topic partitions and ended up coming to the same
> conclusion
> > that you did earlier in this thread :)
> >
> > The overloaded SinkTask::open / SinkTask::close are currently the biggest
> > sticking points with the latest iteration of this KIP and I'd prefer this
> > elimination for now. The primary reasoning is that the information from
> > open / close on pre-transform topic partitions can be combined with the
> per
> > record information of both pre-transform and post-transform topic
> > partitions to handle most practical use cases without significantly
> > muddying the sink connector related public interfaces. The argument that
> > this makes it harder for sink connectors to deal with post-transform
> topic
> > partitions (i.e. in terms of grouping together or batching records for
> > writing to the sink system) can be countered with the fact that it'll be
> > similarly challenging even with the overloaded method approach of calling
> > open / close with both pre-transform and post-transform topic partitions
> > since the batching would be done on post-transform topic partitions
> whereas
> > offset tracking and reporting for commits would be done on pre-transform
> > topic partitions (and the two won't necessarily serially advance in
> > lockstep). On a side note, this also means that the per sink record ack
> API
> > that was proposed earlier wouldn't really work for this case since Kafka
> > consumers themselves don't support per message acknowledgement semantics
> > (and any sort of manual book-keeping based on offset linearity in a topic
> > partition would be affected by things like log compaction, control
> records
> > for

[jira] [Resolved] (KAFKA-15131) Improve RemoteStorageManager exception handling documentation

2023-07-03 Thread Jorge Esteban Quilcate Otoya (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jorge Esteban Quilcate Otoya resolved KAFKA-15131.
--
Resolution: Fixed

> Improve RemoteStorageManager exception handling documentation
> -
>
> Key: KAFKA-15131
> URL: https://issues.apache.org/jira/browse/KAFKA-15131
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Jorge Esteban Quilcate Otoya
>Priority: Major
>  Labels: tiered-storage
>
> As discussed here[1], RemoteStorageManager javadocs requires clarification 
> regarding error handling:
>  * Remove ambiguity on `RemoteResourceNotFoundException` description
>  * Describe when `RemoteResourceNotFoundException` can/should be thrown
>  * Describe expectations for idempotent operations when copying/deleting
>  
> [1] 
> https://issues.apache.org/jira/browse/KAFKA-7739?focusedCommentId=17720936&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17720936



--
This message was sent by Atlassian Jira
(v8.20.10#820010)