In article <mailman.3442.1371389433.3114.python-l...@python.org>,
 Chris メKwpolskaモ Warrick <kwpol...@gmail.com> wrote:

> (I’m using wc -c to count the bytes in all files there are.  du is
> unaccurate with files smaller than 4096 bytes.)

It's not that du is not accurate, it's that it's measuring something 
different.  It's measuring how much disk space the file is using.  For 
most files, that's the number of characters in the file rounded up to a 
full block.  For large files, I believe it also includes the overhead of 
indirect blocks or extent trees.  And, finally, for sparse files, it 
takes into account that some logical blocks in the file may not be 
mapped to any physical storage.

So, whether you want to use "du" or "wc -c" depends on what you're 
trying to measure.  If you want to know how much disk space you're 
using, du is the right tool.  If you want to know how much data will be 
transmitted if the file is serialized (i.e. packed in a tarball or sent 
via a "{hg,git} clone" operation), then "wc-c" is what you want.

All that being said, for the vast majority of cases (and I would be 
astonished if this was not true for any real-life vcs repo), the 
difference between what wc and du tell you is not worth worrying about.  
And du is going to be a heck of a lot faster.
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to