Hi, Victor, Thanks for the reply.
JR1. (A) and (B) Yes, your summary matches my thinking. (C) "Generally I think that (i) (ii) (iii) and (iv) may be addressed with an aggressive tiered storage consolidation (the first approach)". Hmm, I am confused by the above statement. By "the first approach", do you mean aggressive tiering with faster segment rolling through the existing RLMM? I don't think the existing RLMM is designed to solve these issues due to inefficiencies in cost, metadata propagation and metadata storage as we previously discussed. JR11. I was thinking we leave the existing RLMM as is and continue to use it for classic topics. We design a new, more efficient metadata management component independent of RLMM. This new component will be the only metadata component that diskless topics depend on. Jun On Tue, May 12, 2026 at 8:43 AM Viktor Somogyi-Vass <[email protected]> wrote: > Hi Jun, > > JR1 > (1)-(2)-(3) I'd address these together and let me explain our current idea > to solve the tiny object problem because I'm not sure if we're 100% talking > about the same thing. I have two approaches in mind for TS consolidation > ((A) and (B)) and I'm not sure if we're both assuming the same idea, so > let's clarify this. > > (A) > This is our current assumption. This uses local disks (create classic > local logs with UnifiedLog) to consolidate logs into the classic log format > and use RSM and RLMM to store them in tiered storage. This way we're not > limited by the need to have short rollovers. Local logs become a form of > staging environment to serve reads and accumulate records for tiered > storage. This means that: > (a) Once a message is consolidated into the classic log format, we can > use it for serving lagging consumers. Diskless reads should really be used > for the head of the log and after a few seconds logs should be consolidated. > (b) The real cost is much closer to that 87.5% (and in fact my google > sheet I shared also assumes this model) because we have more freedom in > choosing the retention parameters of the classic log. > (c) Metadata is smaller as we only need to keep diskless segments until > the tiered offset surpasses the individual batches' offset. > (d) RLMM metadata is also somewhat manageable due to the larger segment > sizes but it's still possible to run into the metadata explosion problem. > (e) It needs to rebuild this local log on reassignment to serve lagging > consumers effectively, so reassignment is a bit more messy. > (f) It's not optimal when partitions have a single replica: on failure we > can only fall back to diskless mode until the partition is reassigned to a > functioning broker. > > (B) > Compared to the above there can be an alternative approach, which is to > consolidate when diskless segments expire (after 15 minutes for instance). > In that case your points seem to fit better as: > (a) we can only use the classic, consolidated logs to serve lagging > consumers after they have been tiered > (b) to be more efficient with lagging consumers we have to stick to a > short rollover > (c) it's more costly due to the short rollovers > (d) the RLMM bottleneck still exists due to the short rollovers > (e) it's not given whether we use local disks for transforming logs as we > can do it in memory too (which can be ineffective and more expensive) but > perhaps a “chunked transfer encoding” that S3 supports or similar with > other providers is a cost effective way. If we know the final size advance, > we can upload data in chunks and still get billed for 1 put. > (f) more efficient reassignment or failover is cleaner and faster as > there isn't a need to rebuild local caches. > > (C) > Apart from the first 2 approaches there is a 3rd, which is WAL merging. To > understand your points, let me summarize that I could gather so far as > reasons for WAL merging (and please correct me if I missed something): > (i) protecting consumer lag: small WAL files create inefficient objects > for lagging consumers, so larger objects should be more efficient > (ii) avoiding the RLMM replay bottleneck: managing small segments with > RLMM is very inefficient (100s of GB metadata) > (iii) reducing batch metadata overhead: merging WAL files may reduce the > metadata we need to store, but it depends on the merge algorithm and how we > can compact batch data > (iv) cost effectiveness: retrieving merged WAL files reduces the number > of get requests to object storage > (v) architectural redundancy with RLMM: ideally we wouldn't need 2 > solutions to 2 somewhat similar problems (tiered storage and diskless) > > Generally I think that (i) (ii) (iii) and (iv) may be addressed with an > aggressive tiered storage consolidation (the first approach), so the only > remaining gap would be (v). I also agree that having 2 different solutions > for metadata handling isn't ideal and perhaps there is a possibility of > improvement here. It should be possible to redesign RLMM to be more similar > to the diskless coordinator or design a common solution. > > JR11 > "If we support merging in the diskless coordinator, I wonder how useful > RLMM > is. It seems simpler to manage all metadata from the object store in a > single place." > > Could you please clarify this a little bit? Do you think that we should > replace the RLMM with a solution that is more similar to the diskless > coordinator or deprecate tiered storage altogether in favor of diskless? > I'm not sure which option you're referring: > (1) Unify tiered storage and diskless under a single storage layer (and > possibly deprecate tiered storage in favor of diskless with merging WAL > segments). > (2) Create a smart coordinator instead of RLMM and possibly unify > metadata coordination with diskless. > (3) Keep tiered storage and diskless separate with their own solutions > for metadata (probably not optimal). > > Thanks, > Viktor > > On Fri, May 1, 2026 at 9:08 PM Jun Rao via dev <[email protected]> > wrote: > >> Hi, Viktor and Greg, >> >> Thanks for the reply. >> >> JR1. >> 1) Thanks for verifying the cost estimation. I noticed a bug in my earlier >> calculation. I estimated the per broker network transfer rate at 2MB/sec. >> It should be 4MB/sec. If I correct it, the estimated savings are similar >> to >> yours. >> The cost for transferring 4MB through the network is 4 * 2 * 10^-5 = $8* >> 10^-5 >> If it's replaced with 2 S3 puts, the cost is $1 * 10^-5. The savings are >> about 87.5%. >> If it's replaced with 6 S3 puts, the cost is $3 * 10^-5. The savings are >> 62.5%. >> Savings are still significantly lower when using RLMM. >> >> "To me it seems like that Greg's previous suggestion for a 15 min rollover >> may be a bit too much. With 1 hour we can achieve better cost saving and >> less coordinate metadata being stored." >> This solves the cost issue, but it has other implications (see point 2) >> below). >> >> 2) "Yes, I think this is to be expected and a lot depends on the >> implementation. Ideally segments or chunks should be cached to minimize >> the >> number of times segments pulled from remote storage." >> In a classic topic, when a consumer lags, its requests are served either >> from the local cache or from large objects in the object store. With the >> current design in a diskless topic, lagging consumer requests might be >> served from tiny 500-byte objects. This will significantly slow down the >> consumer's catch-up, which is not expected user behavior. Ideally, we >> don't >> want those tiny objects to last more than a few minutes, let alone an >> hour. >> >> 3) "I think if my calculations are correct (and we use a 60 minute >> window), >> then metadata generation should be slower, please see the google sheet I >> linked above. I think given that traffic, the current topic based RLMM >> should be able to handle it." >> Why is a 60 minute window used? RLMM metadata needs to be retained for the >> longest retention time among all topics. This means that the retention >> window can be weeks instead of 1 hour. This means that RLMM might need to >> replay over 100GB of data during reassignment, which is not what it is >> designed for. >> >> JR10. "Your example of 100,000 1kb/s partitions is a borderline case, >> where >> there are some configurations which are not viable due to scale or cost, >> and some that are. It would be up to the operator to tune their cluster, >> by >> changing diskless.segment.ms >> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$> >> < >> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$ >> >, >> dividing up the cluster, or switching to a more scalable RLMM >> implementation." >> A broker with 4MB/sec produce throughput can probably be considered high >> throughput. Even with 4K partitions per broker, we could still achieve an >> 87.5% cost saving as listed above, if we do the right implementation. So, >> ideally, it would be useful to support that as well. >> >> JR11. "We had a short conversation with Greg and we came to the conclusion >> that because of the explosiveness of diskless metadata, it may be worth >> revisiting the merging case as it can indeed buy us some more cost saving >> for the added complexity. " >> If we support merging in the diskless coordinator, I wonder how useful >> RLMM >> is. It seems simpler to manage all metadata from the object store in a >> single place. >> >> Jun >> >> On Mon, Apr 27, 2026 at 4:17 PM Greg Harris <[email protected]> wrote: >> >> > Hi Jun, >> > >> > Thank you for scrutinizing the scalability of the current >> > direct-to-tiered-storage strategy, and its metadata scalability. >> > >> > One of our implicit assumptions with this design was that users are able >> > to choose between the Diskless and Classic mechanisms, and that any >> > situations where the Diskless design was deficient, the Classic topics >> > could continue to be used. >> > This was originally applied to low-latency use-cases, but now also >> applies >> > to low-throughput use-cases too. When the throughput on a topic is low, >> the >> > benefit of using Diskless is also low, because it is proportional to the >> > amount of data transferred, and it is more likely that the batch >> overhead >> > of the topics is significant. >> > In other words, we've been treating cost-effective support for >> arbitrarily >> > low throughput topics as a non-goal. >> > >> > Your example of 100,000 1kb/s partitions is a borderline case, where >> there >> > are some configurations which are not viable due to scale or cost, and >> some >> > that are. It would be up to the operator to tune their cluster, by >> changing >> > diskless.segment.ms >> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$> >> > < >> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$ >> >, >> > dividing up the cluster, or switching to a more scalable RLMM >> > implementation. >> > >> > Do you think we should have cost-effective support for arbitrarily >> > low-throughput partitions in Diskless? How much total demand is there in >> > partitions where batches are >1kb but the partition throughput is >> <1kb/s? >> > >> > Thanks, >> > Greg >> > >> > On Fri, Apr 24, 2026 at 10:23 AM Viktor Somogyi-Vass <[email protected] >> > >> > wrote: >> > >> >> Hi Jun, >> >> >> >> Regarding JR1. >> >> We had a short conversation with Greg and we came to the conclusion >> that >> >> because of the explosiveness of diskless metadata, it may be worth >> >> revisiting the merging case as it can indeed buy us some more cost >> saving >> >> for the added complexity. Also, it would support smaller topics and we >> >> could somewhat manage the tiered storage consolidation costs. I think >> that >> >> we would still need to consolidate WAL segments into tiered storage. >> >> Reasons are: to limit WAL metadata, to be able to dynamically >> >> enable/disable diskless and to be compatible with existing and future >> TS >> >> improvements. >> >> I'll try to refresh KIP-1165 and build it into the calculator above (if >> >> it's possible at all :) ) and come back to you. >> >> Regardless, I just wanted to give a short update in the meantime, >> looking >> >> forward to your answer. >> >> >> >> Best, >> >> Viktor >> >> >> >> On Fri, Apr 24, 2026 at 3:46 PM Viktor Somogyi-Vass < >> >> [email protected]> >> >> wrote: >> >> >> >> > Hi Jun, >> >> > >> >> > Thanks for the quick reply. >> >> > >> >> > JR1. >> >> > 1) Thanks for putting the numbers together. While your calculation >> >> > seems to be correct in the sense that 6 PUTs would worsen the cost >> >> saving >> >> > benefits, I think that in a byte for byte comparison there is a >> bigger >> >> > difference. The reason is that the 4 tiered storage puts transfer >> much >> >> more >> >> > data compared to the small WAL segments, so in practice there should >> be >> >> > fewer TS puts. >> >> > I made a google sheet calculator for this which I'd like to share >> with >> >> > you: >> >> > >> >> >> https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906#gid=749470906 >> <https://urldefense.com/v3/__https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906*gid=749470906__;Iw!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDHN-4uGY$> >> >> < >> https://urldefense.com/v3/__https://docs.google.com/spreadsheets/d/127GOTWfFSN27B5ezif14GPj8KtrghjBqsXG9GG6NxhI/edit?gid=749470906*gid=749470906__;Iw!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wNjeT01kw$ >> > >> >> > Please copy the sheet to modify the values. >> >> > About my findings: I was trying to create a similar cluster model >> that >> >> has >> >> > been discussed here previously to see how cost varies over different >> >> > segment rollovers.To me it seems like that Greg's previous suggestion >> >> for a >> >> > 15 min rollover may be a bit too much. With 1 hour we can achieve >> better >> >> > cost saving and less coordinate metadata being stored. I have also >> >> tried to >> >> > account for the producer batch metadata generated by diskless >> partitions >> >> > but to me it seems like a lower number than Greg's original numbers. >> >> > >> >> > 2) "Note that local storage could be lost on reassigned partitions. >> In >> >> > that case, lagging reads can only be served from the object store." >> >> > Yes, I think this is to be expected and a lot depends on the >> >> > implementation. Ideally segments or chunks should be cached to >> minimize >> >> the >> >> > number of times segments pulled from remote storage. >> >> > >> >> > "The 2MB/sec I quoted is for a specific broker. Depending on the >> broker >> >> > instance type, a broker may only be able to handle low 10s of MB/sec >> of >> >> > data. So, 2MB/sec overhead is significant." >> >> > Yes, I have indeed misunderstood, however I have updated my >> calculator >> >> > sheet with metadata calculation. Overall, the number of tiered >> storage >> >> > segments created seems to be much lower than in your calculations >> given >> >> the >> >> > parameters of the cluster you specified earlier. Please take a look, >> I'd >> >> > like to really understand the thinking here because this is a crucial >> >> point. >> >> > >> >> > 3) I think if my calculations are correct (and we use a 60 minute >> >> window), >> >> > then metadata generation should be slower, please see the google >> sheet I >> >> > linked above. I think given that traffic, the current topic based >> RLMM >> >> > should be able to handle it. >> >> > In the case where we would need to make the RLMM capable of handling >> a >> >> > similar traffic as the diskless coordinator, then you're right, we >> >> probably >> >> > should consider how we can improve it. I think there are multiple >> >> > possibilities as you mentioned, but ideally there should be a common >> >> > implementation for metadata coordination that could handle these >> cases. >> >> > >> >> > JR7. >> >> > Yes, your expectation is totally reasonable, we should expect the get >> >> and >> >> > put operations to be strongly consistent for the read-after-write >> >> > scenarios. And I think that since major cloud providers give strongly >> >> > consistent object storages, it should be sufficient for a wide >> >> user-group. >> >> > So we could shrink the scope of the KIP a bit this way and avoid >> adding >> >> > complexity that is needed mostly on the margin. >> >> > I can expect though that "list" can stay eventually consistent as the >> >> KIP >> >> > relies on it for only garbage collection where it is fine if a few >> >> segments >> >> > can be collected only in the next iteration. >> >> > >> >> > JR3. >> >> > Since Greg hasn't replied yet, I'll try to catch up with him and >> >> formulate >> >> > an answer next week. >> >> > >> >> > Best, >> >> > Viktor >> >> > >> >> > On Tue, Apr 21, 2026 at 8:16 PM Jun Rao via dev < >> [email protected]> >> >> > wrote: >> >> > >> >> >> Hi, Victor, >> >> >> >> >> >> Thanks for the reply. >> >> >> >> >> >> JR1. >> >> >> 1) "So while it seems to be significant that we tripled the number >> of >> >> >> PUTs, cost-wise it doesn't seem to be significant." >> >> >> Let's compare the savings achieved by replacing network replication >> >> >> transfer with S3 puts in AWS. >> >> >> network transfer cost: $0.02/GB = $2 * 10^-5/MB >> >> >> S3 put cost: $0.005 per 1000 requests = $0.5 * 10^-5/request >> >> >> >> >> >> The KIP batches data up to 4MB. So, let's assume that we write 2MB >> S3 >> >> >> objects on average. >> >> >> >> >> >> The cost for transferring 2MB through the network is 2 * 2 * 10^-5 = >> >> $4* >> >> >> 10^-5 >> >> >> If it's replaced with 2 S3 puts, the cost is $1 * 10^-5. The savings >> >> are >> >> >> about 75%. >> >> >> If it's replaced with 6 S3 puts, the cost is $3 * 10^-5. The savings >> >> are >> >> >> 25%. As you can see, the savings are significantly lower. >> >> >> >> >> >> 2) "Therefore we could expect classic local segments to be present >> >> which >> >> >> could be used for catching up consumers." >> >> >> Note that local storage could be lost on reassigned partitions. In >> that >> >> >> case, lagging reads can only be served from the object store. >> >> >> >> >> >> "Regarding the amount of metadata: 2MB/sec is well below the 2GB/s >> >> >> throughput that Greg calculated previously, so I think it should be >> >> >> manageable for a cluster with that amount of throughput," >> >> >> It seems that you didn't make the correct comparison. 2GB/s that >> Greg >> >> >> mentioned is the throughput for the whole cluster. The 2MB/sec I >> >> quoted is >> >> >> for a specific broker. Depending on the broker instance type, a >> broker >> >> may >> >> >> only be able to handle low 10s of MB/sec of data. So, 2MB/sec >> overhead >> >> is >> >> >> significant. >> >> >> >> >> >> 3) "I'd separate it from the discussion of diskless core and >> perhaps we >> >> >> could address it in a separate KIP as it is mostly a redesign of the >> >> >> RLMM." >> >> >> Those problems don't exist in the existing usage of RLMM. They >> manifest >> >> >> because diskless tries to use RLMM in a way it wasn't designed for >> >> (there >> >> >> is at least a 20X increase in metadata). It would be useful to >> consider >> >> >> whether fixing those problems in RLMM or using a new approach is >> >> >> better. For example, KIP-1164 already introduces a snapshotting >> >> mechanism. >> >> >> Adding another snapshotting mechanism to RLMM seems redundant. >> >> >> >> >> >> JR7. A typical object store supports 3 operations: puts, gets and >> >> lists. >> >> >> Which operations used by diskless can be eventually consistent? I'd >> >> expect >> >> >> that get should always see the result of the latest put. >> >> >> >> >> >> Jun >> >> >> >> >> >> On Mon, Apr 20, 2026 at 8:14 AM Viktor Somogyi-Vass < >> [email protected] >> >> > >> >> >> wrote: >> >> >> >> >> >> > Hi Jun, >> >> >> > >> >> >> > I'd like to add my thoughts too until Greg has time to respond. >> >> >> > >> >> >> > JR1. I also think there are shortcomings in the current tiered >> >> storage >> >> >> > design, around the RLMM. >> >> >> > 1) I think this is a correct observation, however if my >> calculations >> >> are >> >> >> > correct, it actually comes down to a negligible amount of cost. >> >> Taking >> >> >> the >> >> >> > AWS pricing sheet at >> >> >> > >> >> >> >> >> >> https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps >> <https://urldefense.com/v3/__https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDFpWs-Lg$> >> >> < >> https://urldefense.com/v3/__https://aws.amazon.com/s3/pricing/?nc2=h_pr_s3&trk=aebc39a1-139c-43bb-8354-211ac811b83a&sc_channel=ps__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMK8C32Iw$ >> > >> >> >> > it seems like the difference between 6 or 2 PUTs per second is >> ~$52 >> >> for >> >> >> a >> >> >> > month. The calculation follows >> >> >> > as: 6*60*60*24*30*0.005/1000-2*60*60*24*30*0.005/1000=$51.84. So >> >> while >> >> >> it >> >> >> > seems to be significant that we tripled the number of PUTs, >> >> cost-wise it >> >> >> > doesn't seem to be significant. >> >> >> > 2) Reflecting to your original problem: the tiered storage >> >> consolidation >> >> >> > process should be continuously running and transforming WAL >> segments >> >> >> into >> >> >> > classic logs. Therefore we could expect classic local segments to >> be >> >> >> > present which could be used for catching up consumers. So they >> would >> >> >> only >> >> >> > switch to WAL reading when they're close to the end of the log. >> Since >> >> >> this >> >> >> > offset space should be cached, the reads from there should be >> fast. >> >> >> > Regarding the amount of metadata: 2MB/sec is well below the 2GB/s >> >> >> > throughput that Greg calculated previously, so I think it should >> be >> >> >> > manageable for a cluster with that amount of throughput, although >> I >> >> >> agree >> >> >> > with your comment that the current topic based tiered metadata >> >> manager >> >> >> > isn't optimal and we could develop a better solution. >> >> >> > 3) Tied to the previous point, I agree that your comments are >> >> absolutely >> >> >> > valid, however similarly to that, I'd separate it from the >> >> discussion of >> >> >> > diskless core and perhaps we could address it in a separate KIP as >> >> it is >> >> >> > mostly a redesign of the RLMM. >> >> >> > >> >> >> > JR2. Ack. We will raise a KIP in the near future. >> >> >> > >> >> >> > JR3. I'd leave answering this to Greg as I don't have too much >> >> context >> >> >> on >> >> >> > this one. >> >> >> > >> >> >> > JR7. I think this could be similar to the tiered storage design, >> so >> >> any >> >> >> > coordinator operation should be strongly consistent (since we're >> >> using >> >> >> > classic topics there). Therefore the WAL segment storage layer >> could >> >> be >> >> >> > eventually consistent as we store its metadata in a strongly >> >> consistent >> >> >> > manner. I'm not sure though if this was the answer you're looking >> >> for? >> >> >> > >> >> >> > Best, >> >> >> > Viktor >> >> >> > >> >> >> > >> >> >> > >> >> >> > On Thu, Mar 26, 2026 at 11:43 PM Jun Rao via dev < >> >> [email protected]> >> >> >> > wrote: >> >> >> > >> >> >> >> Hi, Greg, >> >> >> >> >> >> >> >> Thanks for the reply. >> >> >> >> >> >> >> >> JR1. Rolling log segments every 15 minutes addresses the 3 >> concerns >> >> I >> >> >> >> listed, but it introduces some new issues because it doesn't >> quite >> >> fit >> >> >> the >> >> >> >> design of the current tiered storage. (a) The current tiered >> storage >> >> >> >> design >> >> >> >> stores a single partition per object. If we roll a log segment >> >> every 15 >> >> >> >> minutes, with 4K partitions per broker, this means an additional >> 4 >> >> S3 >> >> >> puts >> >> >> >> per second. The diskless design aims for 2 S3 puts per second. >> So, >> >> this >> >> >> >> triples the S3 put cost and reduces the savings benefits. (b) >> With >> >> Tier >> >> >> >> storage, each broker essentially needs to read the tier metadata >> >> from >> >> >> all >> >> >> >> tier metadata partitions if the number of user partitions exceeds >> >> 50. >> >> >> >> Assuming that we generate 100 bytes of tier metadata per >> partition >> >> >> every >> >> >> >> 15 >> >> >> >> minutes. Assuming that each broker has 4K partitions and a >> cluster >> >> of >> >> >> 500 >> >> >> >> brokers. Each broker needs to receive tier metadata at a rate of >> >> 100 * >> >> >> 4K >> >> >> >> * >> >> >> >> 500 / (15 * 60) = 200KB/Sec. For a broker hosting one of the 50 >> tier >> >> >> >> metadata topic partitions, it needs to send out metadata at 100 * >> >> 4K * >> >> >> 500 >> >> >> >> / 50 * 500 / (15 * 60) = 2MB/Sec. This increases unnecessary >> network >> >> >> and >> >> >> >> CPU overhead. (c) Tier storage doesn't support snapshots. A >> >> restarted >> >> >> >> broker needs to replay the tier metadata log from the beginning >> to >> >> >> build >> >> >> >> the tier metadata state. Suppose that the tier metadata log is >> kept >> >> >> for 7 >> >> >> >> days. The total amount of tier metadata that needs to be >> replayed is >> >> >> 200KB >> >> >> >> * 7 * 24 * 3600 = 120GB. >> >> >> >> Does the merging optimization you mentioned address those new >> >> >> concerns? If >> >> >> >> so, could you describe how it works? >> >> >> >> >> >> >> >> JR2. It's fine to cover the default partition assignment strategy >> >> for >> >> >> >> diskless topics in a separate KIP. However, since this is >> essential >> >> for >> >> >> >> achieving the cost saving goal, we need a solution before >> releasing >> >> the >> >> >> >> diskless KIP. >> >> >> >> >> >> >> >> JR3. Sounds good. Could you document how this work? >> >> >> >> >> >> >> >> JR7. Could you describe which parts of the operation can be >> >> eventually >> >> >> >> consistent? >> >> >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> On Thu, Mar 19, 2026 at 1:35 PM Greg Harris < >> [email protected]> >> >> >> wrote: >> >> >> >> >> >> >> >> > Hi Jun, >> >> >> >> > >> >> >> >> > Thanks for your comments! >> >> >> >> > >> >> >> >> > JR1: >> >> >> >> > You are correct that the segment rolling configurations are >> >> currently >> >> >> >> > critical to balance the scalability of Diskless and Tiered >> >> Storage, >> >> >> as >> >> >> >> > larger roll configurations benefit tiered storage, and smaller >> >> roll >> >> >> >> > configurations benefit Diskless. >> >> >> >> > >> >> >> >> > To address your points specifically: >> >> >> >> > (1) A Diskless topic which is cost-competitive with an >> equivalent >> >> >> >> Classic >> >> >> >> > topic will have a metadata size <1% of the data size. A cluster >> >> >> storing >> >> >> >> > 360GB of metadata will have >36TB of data under management and >> a >> >> >> >> retention >> >> >> >> > of 5hr implies a throughput of >2GB/s. This will require >> multiple >> >> >> >> Diskless >> >> >> >> > coordinators, which can share the load of storing the Diskless >> >> >> metadata, >> >> >> >> > and serving Diskless requests. >> >> >> >> > (2) Catching up consumers are intended to be served from tiered >> >> >> storage >> >> >> >> > and local segment caches. Brokers which are building their >> local >> >> >> segment >> >> >> >> > caches will have to read many files, but will amortize those >> >> reads by >> >> >> >> > receiving data for multiple partitions in a single read. >> >> >> >> > (3) This is a fundamental downside of storing data from >> multiple >> >> >> topics >> >> >> >> in >> >> >> >> > a single object, similar to classic segments. We can implement >> a >> >> >> >> > configurable cluster-wide maximum roll time, which would set >> the >> >> >> slowest >> >> >> >> > cadence at which Tiered Storage segments are rolled from >> Diskless >> >> >> >> segments. >> >> >> >> > If an individual partition has more aggressive roll settings, >> it >> >> may >> >> >> be >> >> >> >> > rolled earlier. >> >> >> >> > This configuration would permit the cluster operator to >> >> approximately >> >> >> >> > bound the number of diskless WAL segments, which bounds the >> total >> >> >> size >> >> >> >> of >> >> >> >> > the WAL segments, disk cache, diskless coordinator state, and >> >> >> excessive >> >> >> >> > retention window. For example, a diskless.segment.ms >> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$> >> >> < >> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$ >> > >> >> of 15 minutes >> >> >> >> would >> >> >> >> > reduce the metadata storage to 18GB, WAL segments to 1.8TB, and >> >> >> permit >> >> >> >> > short-retention data to be physically deleted as soon as ~15 >> >> minutes >> >> >> >> after >> >> >> >> > being produced. >> >> >> >> > Of course, this will reduce the size of the tiered storage >> >> segments >> >> >> for >> >> >> >> > topics that have low throughput, and where segment.ms >> <https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDyo9_OLg$> >> >> < >> https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPVjk2MJw$ >> > >> >> > >> >> >> >> > diskless.segment.ms >> <https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDluPtSxE$> >> >> < >> https://urldefense.com/v3/__http://diskless.segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wOdb3oIbw$ >> >, >> >> increasing overhead in the RLMM. We can perform >> >> >> >> > merging/optimization of Tiered Storage segments to achieve the >> >> >> per-topic >> >> >> >> > segment.ms >> <https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDyo9_OLg$> >> >> < >> https://urldefense.com/v3/__http://segment.ms__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPVjk2MJw$ >> > >> >> . >> >> >> >> > There were some reasons why we retracted the prior file-merging >> >> >> >> approach, >> >> >> >> > and why merging in tiered storage appears better: >> >> >> >> > * Rewriting files requires mutability for existing data, which >> >> adds >> >> >> >> > complexity. Diskless batches or Remote Log Segments would need >> to >> >> be >> >> >> >> made >> >> >> >> > mutable, and the remote log will be made mutable in KIP-1272 >> [1] >> >> >> >> > * Because a WAL Segment can contain batches from multiple >> Diskless >> >> >> >> > Coordinators, multiple coordinators must also be involved in >> the >> >> >> merging >> >> >> >> > step. The Tiered Storage design has exclusive ownership for >> remote >> >> >> log >> >> >> >> > segments within the RLMM. >> >> >> >> > * Diskless file merging competes for resources with >> >> latency-sensitive >> >> >> >> > producers and hot consumers. Tiered storage file merging >> competes >> >> for >> >> >> >> > resources with lagging consumers, which are typically less >> latency >> >> >> >> > sensitive. >> >> >> >> > * Implementing merging in Tiered Storage allows this >> optimization >> >> to >> >> >> >> > benefit both classic topics and diskless topics, covering both >> >> high >> >> >> and >> >> >> >> low >> >> >> >> > throughput partitions. >> >> >> >> > * Remote log segments may be optimized over much longer time >> >> windows >> >> >> >> > rather than performing optimization once in the first few >> hours of >> >> >> the >> >> >> >> life >> >> >> >> > of a WAL segment and then freezing the arrangement of the data >> >> until >> >> >> it >> >> >> >> is >> >> >> >> > deleted. >> >> >> >> > * File merging will need to rely on heuristics, which should be >> >> >> >> > configurable by the user. Multi-partition heuristics are more >> >> >> >> complicated >> >> >> >> > to describe and reason about than single-partition heuristics. >> >> >> >> > What do you think of this alternative? >> >> >> >> > >> >> >> >> > JR2: >> >> >> >> > Yes, the current default partition assignment strategy will >> need >> >> some >> >> >> >> > improvement. This problem with Diskless WAL segments is >> analogous >> >> to >> >> >> the >> >> >> >> > Classic topics’ dense inter-broker connection graph. >> >> >> >> > The natural solution to this seems to be some sort of cellular >> >> >> design, >> >> >> >> > where the replica placements tend to locate partitions in >> similar >> >> >> >> groups. >> >> >> >> > Partitions in the same cell can generally share the same WAL >> >> Segments >> >> >> >> and >> >> >> >> > the same Diskless Coordinator requests. This would also benefit >> >> >> Classic >> >> >> >> > topics, which would need fewer connections and fetch requests. >> >> >> >> > Such a feature is out-of-scope of this KIP, and either we will >> >> >> publish a >> >> >> >> > follow-up KIP, or let operators and community tooling address >> >> this. >> >> >> >> > >> >> >> >> > JR3: >> >> >> >> > Yes we will replace the ISR/ELR election logic for diskless >> >> topics, >> >> >> as >> >> >> >> > they no longer rely on replicas for data integrity. We will >> fully >> >> >> model >> >> >> >> the >> >> >> >> > state/lifecycle of the diskless replicas in KRaft, and choose >> how >> >> we >> >> >> >> > display this to clients. >> >> >> >> > For backwards compatibility, clients using older metadata >> requests >> >> >> >> should >> >> >> >> > see diskless topics, but interpret them as classic topics. We >> >> could >> >> >> tell >> >> >> >> > older clients that the leader is in the ISR, even if it just >> >> started >> >> >> >> > building its cache. >> >> >> >> > For clients using the latest metadata, they should see the true >> >> >> state of >> >> >> >> > the diskless partition: which nodes can accept >> >> >> produce/fetch/sharefetch >> >> >> >> > requests, which ranges of offsets are cached on-broker, etc. >> This >> >> >> could >> >> >> >> > also be used to break apart the “leader” field into more >> granular >> >> >> >> fields, >> >> >> >> > now that leadership has changed meaning. >> >> >> >> > >> >> >> >> > JR4: >> >> >> >> > Yes, we can replace the empty fetch requests to the leader >> nodes >> >> with >> >> >> >> > cache hint fields in the requests to the Diskless Coordinator, >> and >> >> >> rely >> >> >> >> on >> >> >> >> > the coordinator to distribute cache hints to all replicas. This >> >> >> should >> >> >> >> be >> >> >> >> > low-overhead, and eliminate the inter-broker communication for >> >> >> brokers >> >> >> >> > which only host Diskless topics. >> >> >> >> > >> >> >> >> > JR5.1: >> >> >> >> > You are correct and this text was ambiguous, only specifying >> that >> >> the >> >> >> >> > controller waits for the sync to be complete. This section is >> now >> >> >> >> updated >> >> >> >> > to explicitly say that local segments are built from object >> >> storage. >> >> >> >> > >> >> >> >> > JR5.2: >> >> >> >> > Extending the JR2 discussion, reassignment of diskless topics >> >> would >> >> >> >> > generally happen within a cell, where the marginal cost of >> >> reading an >> >> >> >> > additional partition is very low. When cells are re-balanced >> and a >> >> >> >> > partition is migrated between cells, there is a brief time >> (until >> >> the >> >> >> >> next >> >> >> >> > Tiered Storage segment roll) when the marginal cost is doubled. >> >> This >> >> >> >> should >> >> >> >> > be infrequent and well-amortized by other topics which aren’t >> >> being >> >> >> >> > re-balanced between cells. >> >> >> >> > >> >> >> >> > JR6.1: >> >> >> >> > We plan to move data from Diskless to Tiered Storage. Once the >> >> data >> >> >> is >> >> >> >> in >> >> >> >> > Tiered Storage, it can be compacted using the functionality >> >> >> described in >> >> >> >> > KIP-1272 [1] >> >> >> >> > >> >> >> >> > JR6.2: >> >> >> >> > We will add details for this soon. >> >> >> >> > >> >> >> >> > JR7: >> >> >> >> > We specify the requirement of eventual consistency to allow >> >> Diskless >> >> >> >> > Topics to be used with other object storage implementations >> which >> >> >> aren’t >> >> >> >> > the three major public clouds, such as self-managed software or >> >> >> weaker >> >> >> >> > consistency caches. >> >> >> >> > >> >> >> >> > Thanks, >> >> >> >> > Greg >> >> >> >> > >> >> >> >> > [1] >> >> >> >> > >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272%3A+Support+compacted+topic+in+tiered+storage >> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272*3A*Support*compacted*topic*in*tiered*storage__;JSsrKysrKw!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbND2ONImL0$> >> >> < >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1272*3A*Support*compacted*topic*in*tiered*storage__;JSsrKysrKw!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMraeR_8A$ >> > >> >> >> >> > >> >> >> >> > On Fri, Mar 6, 2026 at 4:14 PM Jun Rao via dev < >> >> [email protected] >> >> >> > >> >> >> >> > wrote: >> >> >> >> > >> >> >> >> >> Hi, Ivan, >> >> >> >> >> >> >> >> >> >> Thanks for the KIP. A few comments below. >> >> >> >> >> >> >> >> >> >> JR1. I am concerned about the usage of the current tiered >> >> storage to >> >> >> >> >> control the number of small WAL files. Current tiered storage >> >> only >> >> >> >> tiers >> >> >> >> >> the data when a segment rolls, which can take hours. This >> causes >> >> >> three >> >> >> >> >> problems. (1) Much more metadata needs to be stored and >> >> maintained, >> >> >> >> which >> >> >> >> >> increases the cost. Suppose that each segment rolls every 5 >> >> hours, >> >> >> each >> >> >> >> >> partition generates 2 WAL files per second and each WAL file's >> >> >> metadata >> >> >> >> >> takes 100 bytes. Each partition will generate 5 * 3.6K * 2 * >> 100 >> >> = >> >> >> >> 3.6MB >> >> >> >> >> of >> >> >> >> >> metadata. In a cluster with 100K partitions, this translates >> to >> >> >> 360GB >> >> >> >> of >> >> >> >> >> metadata stored on the diskless coordinators. (2) A >> catching-up >> >> >> >> consumer's >> >> >> >> >> performance degrades since it's forced to read data from many >> >> small >> >> >> WAL >> >> >> >> >> files. (3) The data in WAL files could be retained much longer >> >> than >> >> >> >> >> retention time. Since the small WAL files aren't completely >> >> deleted >> >> >> >> until >> >> >> >> >> all partitions' data in it are obsolete, the deletion of the >> WAL >> >> >> files >> >> >> >> >> could be delayed by hours or more. If the WAL file includes a >> >> >> partition >> >> >> >> >> with a low retention time, the retention contract could be >> >> violated >> >> >> >> >> significantly. The earlier design of the KIP included a >> separate >> >> >> object >> >> >> >> >> merging process that combines small WAL files much more >> >> aggressively >> >> >> >> than >> >> >> >> >> tiered storage, which seems to be a much better choice. >> >> >> >> >> >> >> >> >> >> JR2. I don't think the current default partition assignment >> >> strategy >> >> >> >> for >> >> >> >> >> classic topics works for diskless topics. Current strategy >> tries >> >> to >> >> >> >> spread >> >> >> >> >> the replicas to as many brokers as possible. For example, if a >> >> >> broker >> >> >> >> has >> >> >> >> >> 100 partitions, their replicas could be spread over 100 >> brokers. >> >> If >> >> >> the >> >> >> >> >> broker generates a WAL file with 100 partitions, this WAL file >> >> will >> >> >> be >> >> >> >> >> read >> >> >> >> >> 100 times, once by each broker. S3 read cost is 1/12 of the >> cost >> >> of >> >> >> S3 >> >> >> >> >> put. >> >> >> >> >> This assignment strategy will increase the S3 cost by about >> 8X, >> >> >> which >> >> >> >> is >> >> >> >> >> prohibitive. We need to design a cost effective assignment >> >> strategy >> >> >> for >> >> >> >> >> diskless topics. >> >> >> >> >> >> >> >> >> >> JR3. We need to think through the leade election logic with >> >> diskless >> >> >> >> >> topic. >> >> >> >> >> The KIP tries to reuse the ISR logic for class topic, but it >> >> doesn't >> >> >> >> seem >> >> >> >> >> very natural. >> >> >> >> >> JR3.1 In classsic topic, the leader is always in ISR. In the >> >> >> diskless >> >> >> >> >> topic, the KIP says that a leader could be out of sync. >> >> >> >> >> JR3.2 The existing leader election logic based on ISR/ELR >> mainly >> >> >> >> retries >> >> >> >> >> to >> >> >> >> >> preserve previously acknowledged data. With diskless topics, >> >> since >> >> >> the >> >> >> >> >> object store provides durability, this logic seems no longer >> >> needed. >> >> >> >> The >> >> >> >> >> existing min.isr and unclean leader election logic also don't >> >> apply. >> >> >> >> >> >> >> >> >> >> JR4. "Despite that there is no inter-broker replication, >> replicas >> >> >> will >> >> >> >> >> still issue FetchRequest to leaders. Leaders will respond with >> >> empty >> >> >> >> (no >> >> >> >> >> records) FetchResponse." >> >> >> >> >> This seems unnatural. Could we avoid issuing inter broker >> fetch >> >> >> >> requests >> >> >> >> >> for diskless topics? >> >> >> >> >> >> >> >> >> >> JR5. "The replica reassignment will follow the same flow as in >> >> >> classic >> >> >> >> >> topic:". >> >> >> >> >> JR5.1 Is this true? Since inter broker fetch response is alway >> >> >> empty, >> >> >> >> it >> >> >> >> >> doesn't seem the current reassignment flow works for diskless >> >> topic. >> >> >> >> Also, >> >> >> >> >> since the source of the data is object store, it seems more >> >> natural >> >> >> >> for a >> >> >> >> >> replica to back fill the data from the object store, instead >> of >> >> >> other >> >> >> >> >> replicas. This will also incur lower costs. >> >> >> >> >> JR5.2 How do we prevent reassignment on diskless topics from >> >> causing >> >> >> >> the >> >> >> >> >> same cost issue described in JR2? >> >> >> >> >> >> >> >> >> >> JR6." In other functional aspects, diskless topics are >> >> >> >> indistinguishable >> >> >> >> >> from classic topics. This includes durability guarantees, >> >> ordering >> >> >> >> >> guarantees, transactional and non-transactional producer API, >> >> >> consumer >> >> >> >> >> API, >> >> >> >> >> consumer groups, share groups, data retention (deletion & >> >> compact)," >> >> >> >> >> JR6.1 Could you describe how compact diskless topics are >> >> supported? >> >> >> >> >> JR6.2 Neither this KIP nor KIP-1164 describes the >> transactional >> >> >> >> support in >> >> >> >> >> detail. >> >> >> >> >> >> >> >> >> >> JR7. "Object Storage: A shared, durable, concurrent, and >> >> eventually >> >> >> >> >> consistent storage supporting arbitrary sized byte values and >> a >> >> >> minimal >> >> >> >> >> set >> >> >> >> >> of atomic operations: put, delete, list, and ranged get." >> >> >> >> >> It seems that the object storage in all three major public >> clouds >> >> >> are >> >> >> >> >> strongly consistent. >> >> >> >> >> >> >> >> >> >> Jun >> >> >> >> >> >> >> >> >> >> On Mon, Mar 2, 2026 at 5:43 AM Ivan Yurchenko <[email protected] >> > >> >> >> wrote: >> >> >> >> >> >> >> >> >> >> > Hi all, >> >> >> >> >> > >> >> >> >> >> > The parent KIP-1150 was voted for and accepted. Let's now >> >> focus on >> >> >> >> the >> >> >> >> >> > technical details presented in this KIP-1163 and also in >> >> KIP-1164: >> >> >> >> >> Diskless >> >> >> >> >> > Coordinator [1]. >> >> >> >> >> > >> >> >> >> >> > Best, >> >> >> >> >> > Ivan >> >> >> >> >> > >> >> >> >> >> > [1] >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164%3A+Diskless+Coordinator >> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164*3A*Diskless*Coordinator__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDZKiPB2A$> >> >> < >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1164*3A*Diskless*Coordinator__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wPUG7nCtg$ >> > >> >> >> >> >> > >> >> >> >> >> > On Wed, Apr 23, 2025, at 11:41, Ivan Yurchenko wrote: >> >> >> >> >> > > Hi all! >> >> >> >> >> > > >> >> >> >> >> > > We want to start the discussion thread for KIP-1163: >> Diskless >> >> >> Core >> >> >> >> >> [1], >> >> >> >> >> > which is a sub-KIP for KIP-1150 [2]. >> >> >> >> >> > > >> >> >> >> >> > > Let's use the main KIP-1150 discuss thread [3] for >> high-level >> >> >> >> >> questions, >> >> >> >> >> > motivation, and general direction of the feature and this >> >> thread >> >> >> for >> >> >> >> >> > particular details of implementation. >> >> >> >> >> > > >> >> >> >> >> > > Best, >> >> >> >> >> > > Ivan >> >> >> >> >> > > >> >> >> >> >> > > [1] >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163%3A+Diskless+Core >> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163*3A*Diskless*Core__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDrNzi-QI$> >> >> < >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1163*3A*Diskless*Core__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wMShS6OOA$ >> > >> >> >> >> >> > > [2] >> >> >> >> >> > >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150%3A+Diskless+Topics >> <https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150*3A*Diskless*Topics__;JSsr!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbNDgFavpPM$> >> >> < >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/display/KAFKA/KIP-1150*3A*Diskless*Topics__;JSsr!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wP36tp67w$ >> > >> >> >> >> >> > > [3] >> >> >> >> https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d >> <https://urldefense.com/v3/__https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d__;!!Ayb5sqE7!qD6UWpGNFDAUbr00WyBVsibHKHuiQKFjLSaOflC2lBt2rFw-s6OPvGrHyI1HZlkWV6j9UbND75I4_MY$> >> >> < >> https://urldefense.com/v3/__https://lists.apache.org/thread/ljxc495nf39myp28pmf77sm2xydwjm6d__;!!Ayb5sqE7!t2RHh2_lmpuV6wxO0CCQLMMuOcTLHitt0IY8HqA28tFdgk8EUF9qkqvS2l-vEXgJv_x1x3jBLey8-wN7nkkcTA$ >> > >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > >> >> >> >> >> >> >> > >> >> >> >> >> > >> >> >> > >> >
