There seem to be some performance issues with the Windows cache manager code in OpenAFS, at least on Windows XP.
With a status cache size of 10000 and a directory tree (which crosses multiple volumes) with 185000 files and directories, OpenAFS 1.2.5 on Windows XP can easily be made to run very poorly. I wrote a perl script to print out all of the files and directories underneath the parent of all of these files and directories. It should be doing nothing more to each file and directory other than checking to see if it is a file or a directory. For the first ~10000 directory entries, the script progresses reasonably and seems to be comparable to a somewhat recent CVS build of OpenAFS on RedHat Linux. After more than 10000 have been parsed, it starts slowing down pretty quickly, running at a snail's pace compared to running the same script on OpenAFS under Linux. I tried adding some logging to the status cache code on the Windows machine (src/WINNT/afsd/cm_scache.c), and it looks like the cm_GetNewSCache() function starts taking longer and longer to reuse existing status cache entries. It looks like the function has to traverse further and further through cm_scacheLRULastp (I assume that this is some kind of Least Recently Used list) before it can find a status cache entry with a reference count of 0 so that it can reuse that existing status cache entry. On every single call to cm_GetNewSCache(), it skips over a number of status cache entries which increases decidedly more than it decreases. It reaches a count in the hundreds pretty quickly and will steadily follow an increasing pattern to well over 2000 on each call to cm_GetNewScache(). I did not have the patience to let it run through all 185000 files and directories to determine the maximum number of status cache entries it will skip on each call. When the script gets to the point where it is obviously running slower than when it started, the AFS client service is taking up ~100% of the CPU. I have not tried running the AFS client through a debugger to see if cm_GetNewSCache() is the only thing taking up a lot of CPU, but this seems like a good place to start looking for a problem with the cache manager code on Windows. I was hoping to try keeping track of the increments and decrements to the reference counts on the various status cache entries, but the source code did not seem to lend itself to doing this sort of thing. Does anyone have an idea of how the AFS Windows cache manager code is supposed to handle the reference count on status cache entries and/or how status cache entries are maintained in the LRU list? I also realized that the Windows cache manager code appears to be completely separated from the mainstream cache manager code. Does anyone have any thoughts about whether or not it would be a good idea to try patching the Windows AFS client to use the mainstream cache manager code? I realize that the Windows client has a very different data cache (and possibly many more differences), but right now there doesn't seem to be much work being done on improving the Windows cache manager code. Ryan Lantzer _______________________________________________ OpenAFS-devel mailing list [EMAIL PROTECTED] https://lists.openafs.org/mailman/listinfo/openafs-devel
