Hi Ryan, Thanks for the detailed analysis. The storage savings (for the sparse range) and the general memory savings over Roaring are compelling.
My main concern would be having to maintain our own bitmap format across all implementations of Iceberg. I suppose it would be mainly Java and Rust, as we can leverage Rust bindings for other languages, but Roaring already has implementations for every language Iceberg supports today. If we can include Mumbling as part of Roaring, this becomes a no-brainer. -Max On Tue, Apr 21, 2026 at 1:02 AM Ryan Blue <[email protected]> wrote: > > Hi everyone, > > For the v4 adaptive metadata tree work, we are planning on embedding bitmaps > in the root manifest that act as metadata/manifest deletion vectors (MDVs). > Amogh looked into how much space this would take in the manifests and we > found that the Roaring format is pretty large at the scale we're targeting. > When we compare it to raw bitmaps, we would be storing an extra 500-2,000 > bytes per bitmap. As a result, I tried to see if we could use the ideas from > Roaring, but with smaller containers to fit better with our more limited use > case: manifests that contain roughly 50,000 entries (a single Roaring > container). Since it is like Roaring but smaller, I've been calling the new > format Mumbling. > > You can view the results comparing Roaring, raw bitmaps, and Mumbling. The > results look promising: compressed sizes track much more closely to the raw > bitmap and the format has smaller overhead in memory than even Roaring > because of the more granular containers. > > The next steps are to discuss whether we want to use this format. To do that, > I've written up a Mumbling spec document so that it is clear what exactly the > format is doing. That should help us evaluate the design choices and the cost > of implementing this. > > I think that we should move forward with this bitmap format. It would save > quite a bit of space in the root manifest and it is a fairly simple spec. My > size tests used an implementation in Rust that is fairly compact so it is not > a huge amount of work. I've also reached out and we may be able to partner > with the Roaring community to make this a part of the larger standard. > > Please take a look and discuss. Thanks, > > Ryan
