I could argue here that other languages should not be a blocker here. I can speak on behalf of iceberg-go, implementing this as native feature there is doable thing.
Implementing Mumbling in Go natively worst case it's ~1–2 weeks of isolated work in a new internal package; best case (Roaring upstream accepts it) it's days of glue code. I can assume the same cost for other languages (java and cpp primarily). There is no language-specific risk here — the format is deliberately simple, the Rust prototype is small, the spec has concrete byte-level test vectors, and it does not touch any other iceberg-go packages. Best, Andrei вт, 21 апр. 2026 г., 15:30 Maximilian Michels <[email protected]>: > Hi Ryan, > > Thanks for the detailed analysis. The storage savings (for the sparse > range) and the general memory savings over Roaring are compelling. > > My main concern would be having to maintain our own bitmap format > across all implementations of Iceberg. I suppose it would be mainly > Java and Rust, as we can leverage Rust bindings for other languages, > but Roaring already has implementations for every language Iceberg > supports today. > > If we can include Mumbling as part of Roaring, this becomes a no-brainer. > > -Max > > On Tue, Apr 21, 2026 at 1:02 AM Ryan Blue <[email protected]> wrote: > > > > Hi everyone, > > > > For the v4 adaptive metadata tree work, we are planning on embedding > bitmaps in the root manifest that act as metadata/manifest deletion vectors > (MDVs). Amogh looked into how much space this would take in the manifests > and we found that the Roaring format is pretty large at the scale we're > targeting. When we compare it to raw bitmaps, we would be storing an extra > 500-2,000 bytes per bitmap. As a result, I tried to see if we could use the > ideas from Roaring, but with smaller containers to fit better with our more > limited use case: manifests that contain roughly 50,000 entries (a single > Roaring container). Since it is like Roaring but smaller, I've been calling > the new format Mumbling. > > > > You can view the results comparing Roaring, raw bitmaps, and Mumbling. > The results look promising: compressed sizes track much more closely to the > raw bitmap and the format has smaller overhead in memory than even Roaring > because of the more granular containers. > > > > The next steps are to discuss whether we want to use this format. To do > that, I've written up a Mumbling spec document so that it is clear what > exactly the format is doing. That should help us evaluate the design > choices and the cost of implementing this. > > > > I think that we should move forward with this bitmap format. It would > save quite a bit of space in the root manifest and it is a fairly simple > spec. My size tests used an implementation in Rust that is fairly compact > so it is not a huge amount of work. I've also reached out and we may be > able to partner with the Roaring community to make this a part of the > larger standard. > > > > Please take a look and discuss. Thanks, > > > > Ryan >
