Hi everyone, For the v4 adaptive metadata tree work, we are planning on embedding bitmaps in the root manifest that act as metadata/manifest deletion vectors (MDVs). Amogh looked into how much space this would take in the manifests and we found that the Roaring format is pretty large at the scale we're targeting. When we compare it to raw bitmaps, we would be storing an extra 500-2,000 bytes per bitmap. As a result, I tried to see if we could use the ideas from Roaring, but with smaller containers to fit better with our more limited use case: manifests that contain roughly 50,000 entries (a single Roaring container). Since it is like Roaring but smaller, I've been calling the new format Mumbling.
You can view the results comparing Roaring, raw bitmaps, and Mumbling <https://docs.google.com/spreadsheets/d/1jOCMS6LYxqFcSKBa848tos2vhRBzZJQN3RqyUa0ElSY/edit?gid=0#gid=0>. The results look promising: compressed sizes track much more closely to the raw bitmap and the format has smaller overhead in memory than even Roaring because of the more granular containers. The next steps are to discuss whether we want to use this format. To do that, I've written up a Mumbling spec <https://gist.github.com/rdblue/20ff3fd91df03483ed84e6de26583743> document so that it is clear what exactly the format is doing. That should help us evaluate the design choices <https://gist.github.com/rdblue/20ff3fd91df03483ed84e6de26583743#design-choices> and the cost of implementing this. I think that we should move forward with this bitmap format. It would save quite a bit of space in the root manifest and it is a fairly simple spec. My size tests used an implementation in Rust <https://gist.github.com/rdblue/c989c8c2fecee951528973cbcc183a9b> that is fairly compact so it is not a huge amount of work. I've also reached out and we may be able to partner with the Roaring community to make this a part of the larger standard. Please take a look and discuss. Thanks, Ryan
