You bet, glad to help.  

Zillions of small files indeed present a relatively higher metadata overhead, 
and can be problematic in multiple ways.  When using RGW, indexless buckets may 
be advantageous.  

Another phenomenon is space amplification — with say a 1 GB file/object, a 
partially full last allocated block is a trivial amount of wasted space, 
sometimes called internal fragmentation.  As the files get smaller, this 
becomes an increasingly larger ratio. 

Mark’s sheet is terrific for visualizing this:

https://docs.google.com/spreadsheets/d/1rpGfScgG-GLoIGMJWDixEkqs-On9w8nAUToPQjN8bDI/edit?usp=sharing

Work was done a couple of releases ago to allow lowering the default 
min_alloc_size because of the inefficiency with small RGW objects especially.  
A subtle additional factor that is often missed is that RADOS writes full 
stripes, adding another layer of potential incremental wasted space that can be 
increased by misaligned / larger EC profiles vs replication.  


> On Feb 25, 2022, at 4:18 AM, Bobby <italienisch1...@gmail.com> wrote:
> 
> 
> 
> thanks Anthony and Janne....exactly what I have been looking for!
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to