Eric Blake wrote: > [adding the upstream coreutils list] > > According to Barry Kelly on 11/23/2008 6:24 AM: > > I have a problem with du running out of memory. > > > > I'm feeding it a list of null-separated file names via standard input, > > to a command-line that looks like: > > > > du -b --files0-from=- > > > > The problem is that when du is run in this way, it leaks memory like a > > sieve. I feed it about 4.7 million paths but eventually it falls over as > > it hits the 32-bit address space limit. > > That's because du must keep track of which files it has visited, so that > it can determine whether to recount or ignore hard links that visit a file
That's why I said this: > > Now, I can understand why a du -c might want to exclude excess hard > > links to files, but that at most requires a hash table for device & > > inode pairs - it's hard to see why 4.7 million entries would cause OOM And 4.7 million inode and device pairs, assuming 64-bit inodes and 16-bit device data (major & minor), even including alignment (so 16 bytes), only adds up to 75MB of data. That shouldn't cause an overflow of 2GB address space. > already seen. The upstream ls source code was recently change to store > this information only for command line arguments, rather than every file > visited; I wonder if a similar change for du would make sense. A "visited" hashtable would still be required for calculating '-c' though. -- Barry -- http://barrkel.blogspot.com/ -- Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple Problem reports: http://cygwin.com/problems.html Documentation: http://cygwin.com/docs.html FAQ: http://cygwin.com/faq/