In article <mailman.3442.1371389433.3114.python-l...@python.org>, Chris ï¾Kwpolskaï¾ Warrick <kwpol...@gmail.com> wrote:
> (Iâm using wc -c to count the bytes in all files there are. du is > unaccurate with files smaller than 4096 bytes.) It's not that du is not accurate, it's that it's measuring something different. It's measuring how much disk space the file is using. For most files, that's the number of characters in the file rounded up to a full block. For large files, I believe it also includes the overhead of indirect blocks or extent trees. And, finally, for sparse files, it takes into account that some logical blocks in the file may not be mapped to any physical storage. So, whether you want to use "du" or "wc -c" depends on what you're trying to measure. If you want to know how much disk space you're using, du is the right tool. If you want to know how much data will be transmitted if the file is serialized (i.e. packed in a tarball or sent via a "{hg,git} clone" operation), then "wc-c" is what you want. All that being said, for the vast majority of cases (and I would be astonished if this was not true for any real-life vcs repo), the difference between what wc and du tell you is not worth worrying about. And du is going to be a heck of a lot faster.
-- http://mail.python.org/mailman/listinfo/python-list