Eric Blake wrote: > On 07/01/2010 06:44 PM, Paul Eggert wrote: >> I've just gone through "du", and it's fresh in my >> mind, so I thought I'd run through what I needed there, to see how >> well it maps to the proposal. >> >> I needed three kinds of hash tables. >> >> * Table A is indexed by device number (a dev_t value) and the value is a >> secondary hash table. Typically the number of devices is relatively >> small. >> >> * Table B is indexed by inode number (an ino_t value) and has no value. >> All that matters is whether the key is present. So it is a set, >> not a general mapping. Often there are many, many inode numbers. >> >> * The gnulib hash package uses void * keys, but having a void * value >> that exists only to point to an ino_t value wastes memory. So, if >> an inode number is small (less than M, say), it maps to itself. If >> it is large, it is a key into a third table (table C) whose values >> are mapped inode numbers. (Values in table C are always M or >> greater.) The mapped inode number (whether taken from table C, or >> used directly) is cast to void * and then used as the index into >> table B. In practice, most inode numbers are less than M so this is >> a storage win. > > We need to be careful on cygwin. > > $ ls -i | head > 1125899907178954 ABOUT-NLS > 28147497671067760 AUTHORS > 1970324837308495 COPYING > 9851624184873962 ChangeLog > 4785074604197847 ChangeLog-2005 > 1970324837180710 ChangeLog-2006 > 1970324837078885 ChangeLog-2007 > 2251799813904421 ChangeLog-2008 > 281474977732635 GNUmakefile > 1125899907781258 GNUmakefile~ > > sizeof(void*) == sizeof(size_t) == 4, but sizeof(ino_t) == 8, and most > inodes are quite randomly dispersed but definitely larger than 4 bytes. > Does your scheme work well at mapping cygwin's 8-byte inodes into > 4-byte pointers?
It's better not to modify algorithms in order to optimize for systems with 4-byte pointers. We can find/use a set/hash package that can efficiently handle keys of type uintmax_t regardless of pointer size. Combine that with a slight re-design suggested by Paul and (before him, privately) by Tim Waugh -- to map device numbers to inode hash tables rather than to small integers that need to be encoded -- and you get all the benefits with much less complexity.
