Re: [DISCUSS] KIP-1163: Diskless Core

Jun Rao via dev Tue, 12 May 2026 11:29:21 -0700

Hi, Victor,

Thanks for the reply.


JR1. (A) and (B) Yes, your summary matches my thinking.
(C) "Generally I think that (i) (ii) (iii) and (iv) may be addressed with
an aggressive tiered storage consolidation (the first approach)".
Hmm, I am confused by the above statement. By "the first approach", do you
mean aggressive tiering with faster segment rolling through the existing
RLMM? I don't think the existing RLMM is designed to solve these issues due
to inefficiencies in cost, metadata propagation and metadata storage as we
previously discussed.

JR11. I was thinking we leave the existing RLMM as is and continue to use
it for classic topics. We design a new, more efficient metadata management
component independent of RLMM. This new component will be the only metadata
component that diskless topics depend on.

Jun

On Tue, May 12, 2026 at 8:43 AM Viktor Somogyi-Vass <[email protected]>
wrote:

> Hi Jun,
>
> JR1
> (1)-(2)-(3) I'd address these together and let me explain our current idea
> to solve the tiny object problem because I'm not sure if we're 100% talking
> about the same thing. I have two approaches in mind for TS consolidation
> ((A) and (B)) and I'm not sure if we're both assuming the same idea, so
> let's clarify this.
>
> (A)
> This is our current assumption. This uses local disks (create classic
> local logs with UnifiedLog) to consolidate logs into the classic log format
> and use RSM and RLMM to store them in tiered storage. This way we're not
> limited by the need to have short rollovers. Local logs become a form of
> staging environment to serve reads and accumulate records for tiered
> storage. This means that:
>  (a) Once a message is consolidated into the classic log format, we can
> use it for serving lagging consumers. Diskless reads should really be used
> for the head of the log and after a few seconds logs should be consolidated.
>  (b) The real cost is much closer to that 87.5% (and in fact my google
> sheet I shared also assumes this model) because we have more freedom in
> choosing the retention parameters of the classic log.
>  (c) Metadata is smaller as we only need to keep diskless segments until
> the tiered offset surpasses the individual batches' offset.
>  (d) RLMM metadata is also somewhat manageable due to the larger segment
> sizes but it's still possible to run into the metadata explosion problem.
>  (e) It needs to rebuild this local log on reassignment to serve lagging
> consumers effectively, so reassignment is a bit more messy.
>  (f) It's not optimal when partitions have a single replica: on failure we
> can only fall back to diskless mode until the partition is reassigned to a
> functioning broker.
>
> (B)
> Compared to the above there can be an alternative approach, which is to
> consolidate when diskless segments expire (after 15 minutes for instance).
> In that case your points seem to fit better as:
>  (a) we can only use the classic, consolidated logs to serve lagging
> consumers after they have been tiered
>  (b) to be more efficient with lagging consumers we have to stick to a
> short rollover
>  (c) it's more costly due to the short rollovers
>  (d) the RLMM bottleneck still exists due to the short rollovers
>  (e) it's not given whether we use local disks for transforming logs as we
> can do it in memory too (which can be ineffective and more expensive) but
> perhaps a “chunked transfer encoding” that S3 supports or similar with
> other providers is a cost effective way. If we know the final size advance,
> we can upload data in chunks and still get billed for 1 put.
>  (f) more efficient reassignment or failover is cleaner and faster as
> there isn't a need to rebuild local caches.
>
> (C)
> Apart from the first 2 approaches there is a 3rd, which is WAL merging. To
> understand your points, let me summarize that I could gather so far as
> reasons for WAL merging (and please correct me if I missed something):
>  (i) protecting consumer lag: small WAL files create inefficient objects
> for lagging consumers, so larger objects should be more efficient
>  (ii) avoiding the RLMM replay bottleneck: managing small segments with
> RLMM is very inefficient (100s of GB metadata)
>  (iii) reducing batch metadata overhead: merging WAL files may reduce the
> metadata we need to store, but it depends on the merge algorithm and how we
> can compact batch data
>  (iv) cost effectiveness: retrieving merged WAL files reduces the number
> of get requests to object storage
>  (v) architectural redundancy with RLMM: ideally we wouldn't need 2
> solutions to 2 somewhat similar problems (tiered storage and diskless)
>
> Generally I think that (i) (ii) (iii) and (iv) may be addressed with an
> aggressive tiered storage consolidation (the first approach), so the only
> remaining gap would be (v). I also agree that having 2 different solutions
> for metadata handling isn't ideal and perhaps there is a possibility of
> improvement here. It should be possible to redesign RLMM to be more similar
> to the diskless coordinator or design a common solution.
>
> JR11
> "If we support merging in the diskless coordinator, I wonder how useful
> RLMM
> is. It seems simpler to manage all metadata from the object store in a
> single place."
>
> Could you please clarify this a little bit? Do you think that we should
> replace the RLMM with a solution that is more similar to the diskless
> coordinator or deprecate tiered storage altogether in favor of diskless?
> I'm not sure which option you're referring:
>  (1) Unify tiered storage and diskless under a single storage layer (and
> possibly deprecate tiered storage in favor of diskless with merging WAL
> segments).
>  (2) Create a smart coordinator instead of RLMM and possibly unify
> metadata coordination with diskless.
>  (3) Keep tiered storage and diskless separate with their own solutions
> for metadata (probably not optimal).
>
> Thanks,
> Viktor
>
> On Fri, May 1, 2026 at 9:08 PM Jun Rao via dev <[email protected]>
> wrote:
>
>> Hi, Viktor and Greg,
>>
>> Thanks for the reply.
>>
>> JR1.
>> 1) Thanks for verifying the cost estimation. I noticed a bug in my earlier
>> calculation. I estimated the per broker network transfer rate at 2MB/sec.
>> It should be 4MB/sec. If I correct it, the estimated savings are similar
>> to
>> yours.
>> The cost for transferring 4MB through the network is 4 * 2 * 10^-5 = $8*
>> 10^-5
>> If it's replaced with 2 S3 puts, the cost is $1 * 10^-5. The savings are
>> about 87.5%.
>> If it's replaced with 6 S3 puts, the cost is $3 * 10^-5. The savings are
>> 62.5%.
>> Savings are still significantly lower when using RLMM.
>>
>> "To me it seems like that Greg's previous suggestion for a 15 min rollover
>> may be a bit too much. With 1 hour we can achieve better cost saving and
>> less coordinate metadata being stored."
>> This solves the cost issue, but it has other implications (see point 2)
>> below).
>>
>> 2) "Yes, I think this is to be expected and a lot depends on the
>> implementation. Ideally segments or chunks should be cached to minimize
>> the
>> number of times segments pulled from remote storage."
>> In a classic topic, when a consumer lags, its requests are served either
>> from the local cache or from large objects in the object store. With the
>> current design in a diskless topic, lagging consumer requests might be
>> served from tiny 500-byte objects. This will significantly slow down the
>> consumer's catch-up, which is not expected user behavior. Ideally, we
>> don't
>> want those tiny objects to last more than a few minutes, let alone an
>> hour.
>>
>> 3) "I think if my calculations are correct (and we use a 60 minute
>> window),
>> then metadata generation should be slower, please see the google sheet I
>> linked above. I think given that traffic, the current topic based RLMM
>> should be able to handle it."
>> Why is a 60 minute window used? RLMM metadata needs to be retained for the
>> longest retention time among all topics. This means that the retention
>> window can be weeks instead of 1 hour. This means that RLMM might need to
>> replay over 100GB of data during reassignment, which is not what it is
>> designed for.
>>
>> JR10. "Your example of 100,000 1kb/s partitions is a borderline case,
>> where
>> there are some configurations which are not viable due to scale or cost,
>> and some that are. It would be up to the operator to tune their cluster,
>> by
>> changing diskless.segment.ms
>> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$>
>> <
>> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$
>> >,
>> dividing up the cluster, or switching to a more scalable RLMM
>> implementation."
>> A broker with 4MB/sec produce throughput can probably be considered high
>> throughput. Even with 4K partitions per broker, we could still achieve an
>> 87.5% cost saving as listed above, if we do the right implementation. So,
>> ideally, it would be useful to support that as well.
>>
>> JR11. "We had a short conversation with Greg and we came to the conclusion
>> that because of the explosiveness of diskless metadata, it may be worth
>> revisiting the merging case as it can indeed buy us some more cost saving
>> for the added complexity. "
>> If we support merging in the diskless coordinator, I wonder how useful
>> RLMM
>> is. It seems simpler to manage all metadata from the object store in a
>> single place.
>>
>> Jun
>>
>> On Mon, Apr 27, 2026 at 4:17 PM Greg Harris <[email protected]> wrote:
>>
>> > Hi Jun,
>> >
>> > Thank you for scrutinizing the scalability of the current
>> > direct-to-tiered-storage strategy, and its metadata scalability.
>> >
>> > One of our implicit assumptions with this design was that users are able
>> > to choose between the Diskless and Classic mechanisms, and that any
>> > situations where the Diskless design was deficient, the Classic topics
>> > could continue to be used.
>> > This was originally applied to low-latency use-cases, but now also
>> applies
>> > to low-throughput use-cases too. When the throughput on a topic is low,
>> the
>> > benefit of using Diskless is also low, because it is proportional to the
>> > amount of data transferred, and it is more likely that the batch
>> overhead
>> > of the topics is significant.
>> > In other words, we've been treating cost-effective support for
>> arbitrarily
>> > low throughput topics as a non-goal.
>> >
>> > Your example of 100,000 1kb/s partitions is a borderline case, where
>> there
>> > are some configurations which are not viable due to scale or cost, and
>> some
>> > that are. It would be up to the operator to tune their cluster, by
>> changing
>> > diskless.segment.ms
>> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$>
>> > <
>> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$
>> >,
>> > dividing up the cluster, or switching to a more scalable RLMM
>> > implementation.
>> >
>> > Do you think we should have cost-effective support for arbitrarily
>> > low-throughput partitions in Diskless? How much total demand is there in
>> > partitions where batches are >1kb but the partition throughput is
>> <1kb/s?
>> >
>> > Thanks,
>> > Greg
>> >
>> > On Fri, Apr 24, 2026 at 10:23 AM Viktor Somogyi-Vass <[email protected]
>> >
>> > wrote:
>> >
>> >> Hi Jun,
>> >>
>> >> Regarding JR1.
>> >> We had a short conversation with Greg and we came to the conclusion
>> that
>> >> because of the explosiveness of diskless metadata, it may be worth
>> >> revisiting the merging case as it can indeed buy us some more cost
>> saving
>> >> for the added complexity. Also, it would support smaller topics and we
>> >> could somewhat manage the tiered storage consolidation costs. I think
>> that
>> >> we would still need to consolidate WAL segments into tiered storage.
>> >> Reasons are: to limit WAL metadata, to be able to dynamically
>> >> enable/disable diskless and to be compatible with existing and future
>> TS
>> >> improvements.
>> >> I'll try to refresh KIP-1165 and build it into the calculator above (if
>> >> it's possible at all :) ) and come back to you.
>> >> Regardless, I just wanted to give a short update in the meantime,
>> looking
>> >> forward to your answer.
>> >>
>> >> Best,
>> >> Viktor
>> >>
>> >> On Fri, Apr 24, 2026 at 3:46 PM Viktor Somogyi-Vass <
>> >> [email protected]>
>> >> wrote:
>> >>
>> >> > Hi Jun,
>> >> >
>> >> > Thanks for the quick reply.
>> >> >
>> >> > JR1.
>> >> > 1) Thanks for putting the numbers together. While your calculation
>> >> > seems to be correct in the sense that 6 PUTs would worsen the cost
>> >> saving
>> >> > benefits, I think that in a byte for byte comparison there is a
>> bigger
>> >> > difference. The reason is that the 4 tiered storage puts transfer
>> much
>> >> more
>> >> > data compared to the small WAL segments, so in practice there should
>> be
>> >> > fewer TS puts.
>> >> > I made a google sheet calculator for this which I'd like to share
>> with
>> >> > you:
>> >> >
>> >>
>> https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906#gid=749470906
>> <https://urldefense.com/v3/__https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906*gid=749470906__;Iw!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDHN-4uGY$>
>> >> <
>> https://urldefense.com/v3/__https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906*gid=749470906__;Iw!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wNjeT01kw$
>> >
>> >> > Please copy the sheet to modify the values.
>> >> > About my findings: I was trying to create a similar cluster model
>> that
>> >> has
>> >> > been discussed here previously to see how cost varies over different
>> >> > segment rollovers.To me it seems like that Greg's previous suggestion
>> >> for a
>> >> > 15 min rollover may be a bit too much. With 1 hour we can achieve
>> better
>> >> > cost saving and less coordinate metadata being stored. I have also
>> >> tried to
>> >> > account for the producer batch metadata generated by diskless
>> partitions
>> >> > but to me it seems like a lower number than Greg's original numbers.
>> >> >
>> >> > 2) "Note that local storage could be lost on reassigned partitions.
>> In
>> >> > that case, lagging reads can only be served from the object store."
>> >> > Yes, I think this is to be expected and a lot depends on the
>> >> > implementation. Ideally segments or chunks should be cached to
>> minimize
>> >> the
>> >> > number of times segments pulled from remote storage.
>> >> >
>> >> > "The 2MB/sec I quoted is for a specific broker. Depending on the
>> broker
>> >> > instance type, a broker may only be able to handle low 10s of MB/sec
>> of
>> >> > data. So, 2MB/sec overhead is significant."
>> >> > Yes, I have indeed misunderstood, however I have updated my
>> calculator
>> >> > sheet with metadata calculation. Overall, the number of tiered
>> storage
>> >> > segments created seems to be much lower than in your calculations
>> given
>> >> the
>> >> > parameters of the cluster you specified earlier. Please take a look,
>> I'd
>> >> > like to really understand the thinking here because this is a crucial
>> >> point.
>> >> >
>> >> > 3) I think if my calculations are correct (and we use a 60 minute
>> >> window),
>> >> > then metadata generation should be slower, please see the google
>> sheet I
>> >> > linked above. I think given that traffic, the current topic based
>> RLMM
>> >> > should be able to handle it.
>> >> > In the case where we would need to make the RLMM capable of handling
>> a
>> >> > similar traffic as the diskless coordinator, then you're right, we
>> >> probably
>> >> > should consider how we can improve it. I think there are multiple
>> >> > possibilities as you mentioned, but ideally there should be a common
>> >> > implementation for metadata coordination that could handle these
>> cases.
>> >> >
>> >> > JR7.
>> >> > Yes, your expectation is totally reasonable, we should expect the get
>> >> and
>> >> > put operations to be strongly consistent for the read-after-write
>> >> > scenarios. And I think that since major cloud providers give strongly
>> >> > consistent object storages, it should be sufficient for a wide
>> >> user-group.
>> >> > So we could shrink the scope of the KIP a bit this way and avoid
>> adding
>> >> > complexity that is needed mostly on the margin.
>> >> > I can expect though that "list" can stay eventually consistent as the
>> >> KIP
>> >> > relies on it for only garbage collection where it is fine if a few
>> >> segments
>> >> > can be collected only in the next iteration.
>> >> >
>> >> > JR3.
>> >> > Since Greg hasn't replied yet, I'll try to catch up with him and
>> >> formulate
>> >> > an answer next week.
>> >> >
>> >> > Best,
>> >> > Viktor
>> >> >
>> >> > On Tue, Apr 21, 2026 at 8:16 PM Jun Rao via dev <
>> [email protected]>
>> >> > wrote:
>> >> >
>> >> >> Hi, Victor,
>> >> >>
>> >> >> Thanks for the reply.
>> >> >>
>> >> >> JR1.
>> >> >> 1)  "So while it seems to be significant that we tripled the number
>> of
>> >> >> PUTs, cost-wise it doesn't seem to be significant."
>> >> >> Let's compare the savings achieved by replacing network replication
>> >> >> transfer with S3 puts in AWS.
>> >> >> network transfer cost: $0.02/GB = $2 * 10^-5/MB
>> >> >> S3 put cost: $0.005 per 1000 requests = $0.5 * 10^-5/request
>> >> >>
>> >> >> The KIP batches data up to 4MB. So, let's assume that we write 2MB
>> S3
>> >> >> objects on average.
>> >> >>
>> >> >> The cost for transferring 2MB through the network is 2 * 2 * 10^-5 =
>> >> $4*
>> >> >> 10^-5
>> >> >> If it's replaced with 2 S3 puts, the cost is $1 * 10^-5. The savings
>> >> are
>> >> >> about 75%.
>> >> >> If it's replaced with 6 S3 puts, the cost is $3 * 10^-5. The savings
>> >> are
>> >> >> 25%. As you can see, the savings are significantly lower.
>> >> >>
>> >> >> 2) "Therefore we could expect classic local segments to be present
>> >> which
>> >> >> could be used for catching up consumers."
>> >> >> Note that local storage could be lost on reassigned partitions. In
>> that
>> >> >> case, lagging reads can only be served from the object store.
>> >> >>
>> >> >> "Regarding the amount of metadata: 2MB/sec is well below the 2GB/s
>> >> >> throughput that Greg calculated previously, so I think it should be
>> >> >> manageable for a cluster with that amount of throughput,"
>> >> >> It seems that you didn't make the correct comparison. 2GB/s that
>> Greg
>> >> >> mentioned is the throughput for the whole cluster. The 2MB/sec I
>> >> quoted is
>> >> >> for a specific broker. Depending on the broker instance type, a
>> broker
>> >> may
>> >> >> only be able to handle low 10s of MB/sec of data. So, 2MB/sec
>> overhead
>> >> is
>> >> >> significant.
>> >> >>
>> >> >> 3) "I'd separate it from the discussion of diskless core and
>> perhaps we
>> >> >> could address it in a separate KIP as it is mostly a redesign of the
>> >> >> RLMM."
>> >> >> Those problems don't exist in the existing usage of RLMM. They
>> manifest
>> >> >> because diskless tries to use RLMM in a way it wasn't designed for
>> >> (there
>> >> >> is at least a 20X increase in metadata). It would be useful to
>> consider
>> >> >> whether fixing those problems in RLMM or using a new approach is
>> >> >> better. For example, KIP-1164 already introduces a snapshotting
>> >> mechanism.
>> >> >> Adding another snapshotting mechanism to RLMM seems redundant.
>> >> >>
>> >> >> JR7. A typical object store supports 3 operations: puts, gets and
>> >> lists.
>> >> >> Which operations used by diskless can be eventually consistent? I'd
>> >> expect
>> >> >> that get should always see the result of the latest put.
>> >> >>
>> >> >> Jun
>> >> >>
>> >> >> On Mon, Apr 20, 2026 at 8:14 AM Viktor Somogyi-Vass <
>> [email protected]
>> >> >
>> >> >> wrote:
>> >> >>
>> >> >> > Hi Jun,
>> >> >> >
>> >> >> > I'd like to add my thoughts too until Greg has time to respond.
>> >> >> >
>> >> >> > JR1. I also think there are shortcomings in the current tiered
>> >> storage
>> >> >> > design, around the RLMM.
>> >> >> > 1) I think this is a correct observation, however if my
>> calculations
>> >> are
>> >> >> > correct, it actually comes down to a negligible amount of cost.
>> >> Taking
>> >> >> the
>> >> >> > AWS pricing sheet at
>> >> >> >
>> >> >>
>> >>
>> https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps
>> <https://urldefense.com/v3/__https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDFpWs-Lg$>
>> >> <
>> https://urldefense.com/v3/__https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMK8C32Iw$
>> >
>> >> >> > it seems like the difference between 6 or 2 PUTs per second is
>> ~$52
>> >> for
>> >> >> a
>> >> >> > month. The calculation follows
>> >> >> > as: 6*60*60*24*30*0.005/1000-2*60*60*24*30*0.005/1000=$51.84. So
>> >> while
>> >> >> it
>> >> >> > seems to be significant that we tripled the number of PUTs,
>> >> cost-wise it
>> >> >> > doesn't seem to be significant.
>> >> >> > 2) Reflecting to your original problem: the tiered storage
>> >> consolidation
>> >> >> > process should be continuously running and transforming WAL
>> segments
>> >> >> into
>> >> >> > classic logs. Therefore we could expect classic local segments to
>> be
>> >> >> > present which could be used for catching up consumers. So they
>> would
>> >> >> only
>> >> >> > switch to WAL reading when they're close to the end of the log.
>> Since
>> >> >> this
>> >> >> > offset space should be cached, the reads from there should be
>> fast.
>> >> >> > Regarding the amount of metadata: 2MB/sec is well below the 2GB/s
>> >> >> > throughput that Greg calculated previously, so I think it should
>> be
>> >> >> > manageable for a cluster with that amount of throughput, although
>> I
>> >> >> agree
>> >> >> > with your comment that the current topic based tiered metadata
>> >> manager
>> >> >> > isn't optimal and we could develop a better solution.
>> >> >> > 3) Tied to the previous point, I agree that your comments are
>> >> absolutely
>> >> >> > valid, however similarly to that, I'd separate it from the
>> >> discussion of
>> >> >> > diskless core and perhaps we could address it in a separate KIP as
>> >> it is
>> >> >> > mostly a redesign of the RLMM.
>> >> >> >
>> >> >> > JR2. Ack. We will raise a KIP in the near future.
>> >> >> >
>> >> >> > JR3. I'd leave answering this to Greg as I don't have too much
>> >> context
>> >> >> on
>> >> >> > this one.
>> >> >> >
>> >> >> > JR7. I think this could be similar to the tiered storage design,
>> so
>> >> any
>> >> >> > coordinator operation should be strongly consistent (since we're
>> >> using
>> >> >> > classic topics there). Therefore the WAL segment storage layer
>> could
>> >> be
>> >> >> > eventually consistent as we store its metadata in a strongly
>> >> consistent
>> >> >> > manner. I'm not sure though if this was the answer you're looking
>> >> for?
>> >> >> >
>> >> >> > Best,
>> >> >> > Viktor
>> >> >> >
>> >> >> >
>> >> >> >
>> >> >> > On Thu, Mar 26, 2026 at 11:43 PM Jun Rao via dev <
>> >> [email protected]>
>> >> >> > wrote:
>> >> >> >
>> >> >> >> Hi, Greg,
>> >> >> >>
>> >> >> >> Thanks for the reply.
>> >> >> >>
>> >> >> >> JR1. Rolling log segments every 15 minutes addresses the 3
>> concerns
>> >> I
>> >> >> >> listed, but it introduces some new issues because it doesn't
>> quite
>> >> fit
>> >> >> the
>> >> >> >> design of the current tiered storage. (a) The current tiered
>> storage
>> >> >> >> design
>> >> >> >> stores a single partition per object. If we roll a log segment
>> >> every 15
>> >> >> >> minutes, with 4K partitions per broker, this means an additional
>> 4
>> >> S3
>> >> >> puts
>> >> >> >> per second. The diskless design aims for 2 S3 puts per second.
>> So,
>> >> this
>> >> >> >> triples the S3 put cost and reduces the savings benefits. (b)
>> With
>> >> Tier
>> >> >> >> storage, each broker essentially needs to read the tier metadata
>> >> from
>> >> >> all
>> >> >> >> tier metadata partitions if the number of user partitions exceeds
>> >> 50.
>> >> >> >> Assuming that we generate 100 bytes of tier metadata per
>> partition
>> >> >> every
>> >> >> >> 15
>> >> >> >> minutes. Assuming that each broker has 4K partitions and a
>> cluster
>> >> of
>> >> >> 500
>> >> >> >> brokers. Each broker needs to receive tier metadata at a rate of
>> >> 100 *
>> >> >> 4K
>> >> >> >> *
>> >> >> >> 500 / (15 * 60) = 200KB/Sec. For a broker hosting one of the 50
>> tier
>> >> >> >> metadata topic partitions, it needs to send out metadata at 100 *
>> >> 4K *
>> >> >> 500
>> >> >> >> / 50 * 500 / (15 * 60) = 2MB/Sec. This increases unnecessary
>> network
>> >> >> and
>> >> >> >> CPU overhead. (c) Tier storage doesn't support snapshots. A
>> >> restarted
>> >> >> >> broker needs to replay the tier metadata log from the beginning
>> to
>> >> >> build
>> >> >> >> the tier metadata state. Suppose that the tier metadata log is
>> kept
>> >> >> for 7
>> >> >> >> days. The total amount of tier metadata that needs to be
>> replayed is
>> >> >> 200KB
>> >> >> >> * 7 * 24 * 3600 = 120GB.
>> >> >> >> Does the merging optimization you mentioned address those new
>> >> >> concerns? If
>> >> >> >> so, could you describe how it works?
>> >> >> >>
>> >> >> >> JR2. It's fine to cover the default partition assignment strategy
>> >> for
>> >> >> >> diskless topics in a separate KIP. However, since this is
>> essential
>> >> for
>> >> >> >> achieving the cost saving goal, we need a solution before
>> releasing
>> >> the
>> >> >> >> diskless KIP.
>> >> >> >>
>> >> >> >> JR3. Sounds good. Could you document how this work?
>> >> >> >>
>> >> >> >> JR7. Could you describe which parts of the operation can be
>> >> eventually
>> >> >> >> consistent?
>> >> >> >>
>> >> >> >> Jun
>> >> >> >>
>> >> >> >> On Thu, Mar 19, 2026 at 1:35 PM Greg Harris <
>> [email protected]>
>> >> >> wrote:
>> >> >> >>
>> >> >> >> > Hi Jun,
>> >> >> >> >
>> >> >> >> > Thanks for your comments!
>> >> >> >> >
>> >> >> >> > JR1:
>> >> >> >> > You are correct that the segment rolling configurations are
>> >> currently
>> >> >> >> > critical to balance the scalability of Diskless and Tiered
>> >> Storage,
>> >> >> as
>> >> >> >> > larger roll configurations benefit tiered storage, and smaller
>> >> roll
>> >> >> >> > configurations benefit Diskless.
>> >> >> >> >
>> >> >> >> > To address your points specifically:
>> >> >> >> > (1) A Diskless topic which is cost-competitive with an
>> equivalent
>> >> >> >> Classic
>> >> >> >> > topic will have a metadata size <1% of the data size. A cluster
>> >> >> storing
>> >> >> >> > 360GB of metadata will have >36TB of data under management and
>> a
>> >> >> >> retention
>> >> >> >> > of 5hr implies a throughput of >2GB/s. This will require
>> multiple
>> >> >> >> Diskless
>> >> >> >> > coordinators, which can share the load of storing the Diskless
>> >> >> metadata,
>> >> >> >> > and serving Diskless requests.
>> >> >> >> > (2) Catching up consumers are intended to be served from tiered
>> >> >> storage
>> >> >> >> > and local segment caches. Brokers which are building their
>> local
>> >> >> segment
>> >> >> >> > caches will have to read many files, but will amortize those
>> >> reads by
>> >> >> >> > receiving data for multiple partitions in a single read.
>> >> >> >> > (3) This is a fundamental downside of storing data from
>> multiple
>> >> >> topics
>> >> >> >> in
>> >> >> >> > a single object, similar to classic segments. We can implement
>> a
>> >> >> >> > configurable cluster-wide maximum roll time, which would set
>> the
>> >> >> slowest
>> >> >> >> > cadence at which Tiered Storage segments are rolled from
>> Diskless
>> >> >> >> segments.
>> >> >> >> > If an individual partition has more aggressive roll settings,
>> it
>> >> may
>> >> >> be
>> >> >> >> > rolled earlier.
>> >> >> >> > This configuration would permit the cluster operator to
>> >> approximately
>> >> >> >> > bound the number of diskless WAL segments, which bounds the
>> total
>> >> >> size
>> >> >> >> of
>> >> >> >> > the WAL segments, disk cache, diskless coordinator state, and
>> >> >> excessive
>> >> >> >> > retention window. For example, a diskless.segment.ms
>> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$>
>> >> <
>> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$
>> >
>> >> of 15 minutes
>> >> >> >> would
>> >> >> >> > reduce the metadata storage to 18GB, WAL segments to 1.8TB, and
>> >> >> permit
>> >> >> >> > short-retention data to be physically deleted as soon as ~15
>> >> minutes
>> >> >> >> after
>> >> >> >> > being produced.
>> >> >> >> > Of course, this will reduce the size of the tiered storage
>> >> segments
>> >> >> for
>> >> >> >> > topics that have low throughput, and where segment.ms
>> <https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDyo9_OLg$>
>> >> <
>> https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPVjk2MJw$
>> >
>> >> >
>> >> >> >> > diskless.segment.ms
>> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$>
>> >> <
>> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$
>> >,
>> >> increasing overhead in the RLMM. We can perform
>> >> >> >> > merging/optimization of Tiered Storage segments to achieve the
>> >> >> per-topic
>> >> >> >> > segment.ms
>> <https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDyo9_OLg$>
>> >> <
>> https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPVjk2MJw$
>> >
>> >> .
>> >> >> >> > There were some reasons why we retracted the prior file-merging
>> >> >> >> approach,
>> >> >> >> > and why merging in tiered storage appears better:
>> >> >> >> > * Rewriting files requires mutability for existing data, which
>> >> adds
>> >> >> >> > complexity. Diskless batches or Remote Log Segments would need
>> to
>> >> be
>> >> >> >> made
>> >> >> >> > mutable, and the remote log will be made mutable in KIP-1272
>> [1]
>> >> >> >> > * Because a WAL Segment can contain batches from multiple
>> Diskless
>> >> >> >> > Coordinators, multiple coordinators must also be involved in
>> the
>> >> >> merging
>> >> >> >> > step. The Tiered Storage design has exclusive ownership for
>> remote
>> >> >> log
>> >> >> >> > segments within the RLMM.
>> >> >> >> > * Diskless file merging competes for resources with
>> >> latency-sensitive
>> >> >> >> > producers and hot consumers. Tiered storage file merging
>> competes
>> >> for
>> >> >> >> > resources with lagging consumers, which are typically less
>> latency
>> >> >> >> > sensitive.
>> >> >> >> > * Implementing merging in Tiered Storage allows this
>> optimization
>> >> to
>> >> >> >> > benefit both classic topics and diskless topics, covering both
>> >> high
>> >> >> and
>> >> >> >> low
>> >> >> >> > throughput partitions.
>> >> >> >> > * Remote log segments may be optimized over much longer time
>> >> windows
>> >> >> >> > rather than performing optimization once in the first few
>> hours of
>> >> >> the
>> >> >> >> life
>> >> >> >> > of a WAL segment and then freezing the arrangement of the data
>> >> until
>> >> >> it
>> >> >> >> is
>> >> >> >> > deleted.
>> >> >> >> > * File merging will need to rely on heuristics, which should be
>> >> >> >> > configurable by the user. Multi-partition heuristics are more
>> >> >> >> complicated
>> >> >> >> > to describe and reason about than single-partition heuristics.
>> >> >> >> > What do you think of this alternative?
>> >> >> >> >
>> >> >> >> > JR2:
>> >> >> >> > Yes, the current default partition assignment strategy will
>> need
>> >> some
>> >> >> >> > improvement. This problem with Diskless WAL segments is
>> analogous
>> >> to
>> >> >> the
>> >> >> >> > Classic topics’ dense inter-broker connection graph.
>> >> >> >> > The natural solution to this seems to be some sort of cellular
>> >> >> design,
>> >> >> >> > where the replica placements tend to locate partitions in
>> similar
>> >> >> >> groups.
>> >> >> >> > Partitions in the same cell can generally share the same WAL
>> >> Segments
>> >> >> >> and
>> >> >> >> > the same Diskless Coordinator requests. This would also benefit
>> >> >> Classic
>> >> >> >> > topics, which would need fewer connections and fetch requests.
>> >> >> >> > Such a feature is out-of-scope of this KIP, and either we will
>> >> >> publish a
>> >> >> >> > follow-up KIP, or let operators and community tooling address
>> >> this.
>> >> >> >> >
>> >> >> >> > JR3:
>> >> >> >> > Yes we will replace the ISR/ELR election logic for diskless
>> >> topics,
>> >> >> as
>> >> >> >> > they no longer rely on replicas for data integrity. We will
>> fully
>> >> >> model
>> >> >> >> the
>> >> >> >> > state/lifecycle of the diskless replicas in KRaft, and choose
>> how
>> >> we
>> >> >> >> > display this to clients.
>> >> >> >> > For backwards compatibility, clients using older metadata
>> requests
>> >> >> >> should
>> >> >> >> > see diskless topics, but interpret them as classic topics. We
>> >> could
>> >> >> tell
>> >> >> >> > older clients that the leader is in the ISR, even if it just
>> >> started
>> >> >> >> > building its cache.
>> >> >> >> > For clients using the latest metadata, they should see the true
>> >> >> state of
>> >> >> >> > the diskless partition: which nodes can accept
>> >> >> produce/fetch/sharefetch
>> >> >> >> > requests, which ranges of offsets are cached on-broker, etc.
>> This
>> >> >> could
>> >> >> >> > also be used to break apart the “leader” field into more
>> granular
>> >> >> >> fields,
>> >> >> >> > now that leadership has changed meaning.
>> >> >> >> >
>> >> >> >> > JR4:
>> >> >> >> > Yes, we can replace the empty fetch requests to the leader
>> nodes
>> >> with
>> >> >> >> > cache hint fields in the requests to the Diskless Coordinator,
>> and
>> >> >> rely
>> >> >> >> on
>> >> >> >> > the coordinator to distribute cache hints to all replicas. This
>> >> >> should
>> >> >> >> be
>> >> >> >> > low-overhead, and eliminate the inter-broker communication for
>> >> >> brokers
>> >> >> >> > which only host Diskless topics.
>> >> >> >> >
>> >> >> >> > JR5.1:
>> >> >> >> > You are correct and this text was ambiguous, only specifying
>> that
>> >> the
>> >> >> >> > controller waits for the sync to be complete. This section is
>> now
>> >> >> >> updated
>> >> >> >> > to explicitly say that local segments are built from object
>> >> storage.
>> >> >> >> >
>> >> >> >> > JR5.2:
>> >> >> >> > Extending the JR2 discussion, reassignment of diskless topics
>> >> would
>> >> >> >> > generally happen within a cell, where the marginal cost of
>> >> reading an
>> >> >> >> > additional partition is very low. When cells are re-balanced
>> and a
>> >> >> >> > partition is migrated between cells, there is a brief time
>> (until
>> >> the
>> >> >> >> next
>> >> >> >> > Tiered Storage segment roll) when the marginal cost is doubled.
>> >> This
>> >> >> >> should
>> >> >> >> > be infrequent and well-amortized by other topics which aren’t
>> >> being
>> >> >> >> > re-balanced between cells.
>> >> >> >> >
>> >> >> >> > JR6.1:
>> >> >> >> > We plan to move data from Diskless to Tiered Storage. Once the
>> >> data
>> >> >> is
>> >> >> >> in
>> >> >> >> > Tiered Storage, it can be compacted using the functionality
>> >> >> described in
>> >> >> >> > KIP-1272 [1]
>> >> >> >> >
>> >> >> >> > JR6.2:
>> >> >> >> > We will add details for this soon.
>> >> >> >> >
>> >> >> >> > JR7:
>> >> >> >> > We specify the requirement of eventual consistency to allow
>> >> Diskless
>> >> >> >> > Topics to be used with other object storage implementations
>> which
>> >> >> aren’t
>> >> >> >> > the three major public clouds, such as self-managed software or
>> >> >> weaker
>> >> >> >> > consistency caches.
>> >> >> >> >
>> >> >> >> > Thanks,
>> >> >> >> > Greg
>> >> >> >> >
>> >> >> >> > [1]
>> >> >> >> >
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272%3A+Support+compacted+topic+in+tiered+storage
>> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272*3A*Support*compacted*topic*in*tiered*storage__;JSsrKysrKw!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbND2ONImL0$>
>> >> <
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272*3A*Support*compacted*topic*in*tiered*storage__;JSsrKysrKw!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMraeR_8A$
>> >
>> >> >> >> >
>> >> >> >> > On Fri, Mar 6, 2026 at 4:14 PM Jun Rao via dev <
>> >> [email protected]
>> >> >> >
>> >> >> >> > wrote:
>> >> >> >> >
>> >> >> >> >> Hi, Ivan,
>> >> >> >> >>
>> >> >> >> >> Thanks for the KIP. A few comments below.
>> >> >> >> >>
>> >> >> >> >> JR1. I am concerned about the usage of the current tiered
>> >> storage to
>> >> >> >> >> control the number of small WAL files. Current tiered storage
>> >> only
>> >> >> >> tiers
>> >> >> >> >> the data when a segment rolls, which can take hours. This
>> causes
>> >> >> three
>> >> >> >> >> problems. (1) Much more metadata needs to be stored and
>> >> maintained,
>> >> >> >> which
>> >> >> >> >> increases the cost. Suppose that each segment rolls every 5
>> >> hours,
>> >> >> each
>> >> >> >> >> partition generates 2 WAL files per second and each WAL file's
>> >> >> metadata
>> >> >> >> >> takes 100 bytes. Each partition will generate 5 * 3.6K * 2 *
>> 100
>> >> =
>> >> >> >> 3.6MB
>> >> >> >> >> of
>> >> >> >> >> metadata. In a cluster with 100K partitions, this translates
>> to
>> >> >> 360GB
>> >> >> >> of
>> >> >> >> >> metadata stored on the diskless coordinators. (2) A
>> catching-up
>> >> >> >> consumer's
>> >> >> >> >> performance degrades since it's forced to read data from many
>> >> small
>> >> >> WAL
>> >> >> >> >> files. (3) The data in WAL files could be retained much longer
>> >> than
>> >> >> >> >> retention time. Since the small WAL files aren't completely
>> >> deleted
>> >> >> >> until
>> >> >> >> >> all partitions' data in it are obsolete, the deletion of the
>> WAL
>> >> >> files
>> >> >> >> >> could be delayed by hours or more. If the WAL file includes a
>> >> >> partition
>> >> >> >> >> with a low retention time, the retention contract could be
>> >> violated
>> >> >> >> >> significantly. The earlier design of the KIP included a
>> separate
>> >> >> object
>> >> >> >> >> merging process that combines small WAL files much more
>> >> aggressively
>> >> >> >> than
>> >> >> >> >> tiered storage, which seems to be a much better choice.
>> >> >> >> >>
>> >> >> >> >> JR2. I don't think the current default partition assignment
>> >> strategy
>> >> >> >> for
>> >> >> >> >> classic topics works for diskless topics. Current strategy
>> tries
>> >> to
>> >> >> >> spread
>> >> >> >> >> the replicas to as many brokers as possible. For example, if a
>> >> >> broker
>> >> >> >> has
>> >> >> >> >> 100 partitions, their replicas could be spread over 100
>> brokers.
>> >> If
>> >> >> the
>> >> >> >> >> broker generates a WAL file with 100 partitions, this WAL file
>> >> will
>> >> >> be
>> >> >> >> >> read
>> >> >> >> >> 100 times, once by each broker. S3 read cost is 1/12 of the
>> cost
>> >> of
>> >> >> S3
>> >> >> >> >> put.
>> >> >> >> >> This assignment strategy will increase the S3 cost by about
>> 8X,
>> >> >> which
>> >> >> >> is
>> >> >> >> >> prohibitive. We need to design a cost effective assignment
>> >> strategy
>> >> >> for
>> >> >> >> >> diskless topics.
>> >> >> >> >>
>> >> >> >> >> JR3. We need to think through the leade election logic with
>> >> diskless
>> >> >> >> >> topic.
>> >> >> >> >> The KIP tries to reuse the ISR logic for class topic, but it
>> >> doesn't
>> >> >> >> seem
>> >> >> >> >> very natural.
>> >> >> >> >> JR3.1 In classsic topic, the leader is always in ISR. In the
>> >> >> diskless
>> >> >> >> >> topic, the KIP says that a leader could be out of sync.
>> >> >> >> >> JR3.2 The existing leader election logic based on ISR/ELR
>> mainly
>> >> >> >> retries
>> >> >> >> >> to
>> >> >> >> >> preserve previously acknowledged data. With diskless topics,
>> >> since
>> >> >> the
>> >> >> >> >> object store provides durability, this logic seems no longer
>> >> needed.
>> >> >> >> The
>> >> >> >> >> existing min.isr and unclean leader election logic also don't
>> >> apply.
>> >> >> >> >>
>> >> >> >> >> JR4. "Despite that there is no inter-broker replication,
>> replicas
>> >> >> will
>> >> >> >> >> still issue FetchRequest to leaders. Leaders will respond with
>> >> empty
>> >> >> >> (no
>> >> >> >> >> records) FetchResponse."
>> >> >> >> >> This seems unnatural. Could we avoid issuing inter broker
>> fetch
>> >> >> >> requests
>> >> >> >> >> for diskless topics?
>> >> >> >> >>
>> >> >> >> >> JR5. "The replica reassignment will follow the same flow as in
>> >> >> classic
>> >> >> >> >> topic:".
>> >> >> >> >> JR5.1 Is this true? Since inter broker fetch response is alway
>> >> >> empty,
>> >> >> >> it
>> >> >> >> >> doesn't seem the current reassignment flow works for diskless
>> >> topic.
>> >> >> >> Also,
>> >> >> >> >> since the source of the data is object store, it seems more
>> >> natural
>> >> >> >> for a
>> >> >> >> >> replica to back fill the data from the object store, instead
>> of
>> >> >> other
>> >> >> >> >> replicas. This will also incur lower costs.
>> >> >> >> >> JR5.2 How do we prevent reassignment on diskless topics from
>> >> causing
>> >> >> >> the
>> >> >> >> >> same cost issue described in JR2?
>> >> >> >> >>
>> >> >> >> >> JR6." In other functional aspects, diskless topics are
>> >> >> >> indistinguishable
>> >> >> >> >> from classic topics. This includes durability guarantees,
>> >> ordering
>> >> >> >> >> guarantees, transactional and non-transactional producer API,
>> >> >> consumer
>> >> >> >> >> API,
>> >> >> >> >> consumer groups, share groups, data retention (deletion &
>> >> compact),"
>> >> >> >> >> JR6.1 Could you describe how compact diskless topics are
>> >> supported?
>> >> >> >> >> JR6.2 Neither this KIP nor KIP-1164 describes the
>> transactional
>> >> >> >> support in
>> >> >> >> >> detail.
>> >> >> >> >>
>> >> >> >> >> JR7. "Object Storage: A shared, durable, concurrent, and
>> >> eventually
>> >> >> >> >> consistent storage supporting arbitrary sized byte values and
>> a
>> >> >> minimal
>> >> >> >> >> set
>> >> >> >> >> of atomic operations: put, delete, list, and ranged get."
>> >> >> >> >> It seems that the object storage in all three major public
>> clouds
>> >> >> are
>> >> >> >> >> strongly consistent.
>> >> >> >> >>
>> >> >> >> >> Jun
>> >> >> >> >>
>> >> >> >> >> On Mon, Mar 2, 2026 at 5:43 AM Ivan Yurchenko <[email protected]
>> >
>> >> >> wrote:
>> >> >> >> >>
>> >> >> >> >> > Hi all,
>> >> >> >> >> >
>> >> >> >> >> > The parent KIP-1150 was voted for and accepted. Let's now
>> >> focus on
>> >> >> >> the
>> >> >> >> >> > technical details presented in this KIP-1163 and also in
>> >> KIP-1164:
>> >> >> >> >> Diskless
>> >> >> >> >> > Coordinator  [1].
>> >> >> >> >> >
>> >> >> >> >> > Best,
>> >> >> >> >> > Ivan
>> >> >> >> >> >
>> >> >> >> >> > [1]
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Diskless+Coordinator
>> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164*3A*Diskless*Coordinator__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDZKiPB2A$>
>> >> <
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164*3A*Diskless*Coordinator__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPUG7nCtg$
>> >
>> >> >> >> >> >
>> >> >> >> >> > On Wed, Apr 23, 2025, at 11:41, Ivan Yurchenko wrote:
>> >> >> >> >> > > Hi all!
>> >> >> >> >> > >
>> >> >> >> >> > > We want to start the discussion thread for KIP-1163:
>> Diskless
>> >> >> Core
>> >> >> >> >> [1],
>> >> >> >> >> > which is a sub-KIP for KIP-1150 [2].
>> >> >> >> >> > >
>> >> >> >> >> > > Let's use the main KIP-1150 discuss thread [3] for
>> high-level
>> >> >> >> >> questions,
>> >> >> >> >> > motivation, and general direction of the feature and this
>> >> thread
>> >> >> for
>> >> >> >> >> > particular details of implementation.
>> >> >> >> >> > >
>> >> >> >> >> > > Best,
>> >> >> >> >> > > Ivan
>> >> >> >> >> > >
>> >> >> >> >> > > [1]
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core
>> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163*3A*Diskless*Core__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDrNzi-QI$>
>> >> <
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163*3A*Diskless*Core__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMShS6OOA$
>> >
>> >> >> >> >> > > [2]
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >>
>> >> >>
>> >>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics
>> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150*3A*Diskless*Topics__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDgFavpPM$>
>> >> <
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150*3A*Diskless*Topics__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wP36tp67w$
>> >
>> >> >> >> >> > > [3]
>> >> >> >> https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d
>> <https://urldefense.com/v3/__https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbND75I4_MY$>
>> >> <
>> https://urldefense.com/v3/__https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wN7nkkcTA$
>> >
>> >> >> >> >> >
>> >> >> >> >>
>> >> >> >> >
>> >> >> >>
>> >> >> >
>> >> >>
>> >> >
>> >>
>> >
>>
>

Re: [DISCUSS] KIP-1163: Diskless Core

Reply via email to