Sean Hamilton wrote: > In my case I have a webserver serving up a few dozen files of about 10 MB > each. While yes it is true that I could purchase more memory, and I could > purchase more drives and stripe them, I am more interested in the fact that > this server is constantly grinding away because it has found a weakness in > the caching algorithm.
This is a problem in Linux, nd in SVR4, as well. THis is the main reason that scheduler classes were implemented for SVR4, and the main reason the "fixed" scheduler class was implemented in SVR4.0.2/SVR4.2, as of UnixWare. The system guarantees time to the programs running under this scheduling class, and what other software pages out, the program then has the CPU time to page back in. This was UnixWare's response to "move mouse, wiggle cursor", when the X server performance went in the toilet. The reason the UnixWare X server performance was in the toilet is that the UnixWare "ld" program mmap's the .o files, and then seeks around in them to resolve symbols, rather than reading the parts in and building in-core data structures. The result is predictable: the linker, when linking any sufficiently large set of object files, thrashes all other pages out of core, as it moves around accessing pages in these files. The obvious answer to this problem -- and you problem -- is to implement a working set size quota on a per-vnode basis, as I previously suggested. By doing this, you cause the pages in the large objects to replace the old pages in the large objects, and your other data does not get thrashed out of core. THis works great on UnixWare, but requires modification of the VM system in order to provide counting for the number of buffers that are hubg off a given file object, with the result being that the code is not binary compatible (this is why the modification never made it into UnixWare, even though it was tested, and found to solve the "ld/X server problem" without leading to more thrashing, and do it without needing to introduce a new scheduling class, and make less CPU time available to other applications on the system to let the X server have sufficient time to thrash its pages back in). This approach will also work for your problem, which is that your several 10M files thrash everything else out of the cache, including the executable pages. Note that this approach need not have any effect under normal conditionas, until available memory hits some low watermark, causing it to trigger, and therefore we are talking a single "if" test in the page replacement path, in the normal non-trigger case, to implement it. > After further thought, I propose something much simpler: when the kernel is > hinted that access will be sequential, it should stop caching when there is > little cache space available, instead of throwing away old blocks, or be > much more hesitant to throw away old blocks. Consider that in almost all > cases where access is sequential, as reading continues, the chances of the > read being aborted increase: ie, users downloading files, directory tree > traversal, etc. Since the likelihood of the first byte reading the first > byte is very high, and the next one less high, and the next less yet, etc, > it seems to make sense to tune the caching algorithm to accomodate this. This is much harder to implement. Specifically, the sequential nature is hueristically detected, and it is this hueristic, not the madvise, which is at issue. If this hueristic did *not* get triggered, then you would lose your read-ahead. Therefore it's not something that can be easily turned off. Second, the VM and buffer cache are unified in FreeBSD. THis means that you can not "reserve" buffer cache entries that are then not cached in VM objects, in order to cause the entries to turn over. Even if you were able to do this, through some serious kludge, you would not be able to differentiate the things that needed to be thrown out to make room for the transient pages, which leaves you in the sme boat you were in before. > While discussing disks, I have a minor complaint: at least on IDE systems, > when doing something like an untar, the entire system is painfully > unresponsive, even though CPU load is low. I presume this is because when an > executable is run, it needs to sit and wait for the disk. Wouldn't it make > sense to give very high disk priority to executables? Isn't that worth the > extra seeks? Actually, there reason for this is that the data portion of tagged writes can not disconnect from the drive in ATA, and therefore the tagged command queueing on reads does not help you for writes. The ATA protocol is broken. If you switched to a SCSI disk, you would see this problem go away. This was recently discussed in detail in the context of a discussion with a MAXTOR drive engineer in a thread on the FreeBSD-FS mailing list. In the limit, though, all disk requests come about as a result of executables making demands. Another serious issue that can cause this depends on it being a particular case, specifically, the untarring of the "ports" tree. If this is what you are talking about, the prblem has to do with bredth-first vs. depth-first storage, and therefore caching, on packing vs. unpacking of the tar archive. One approach to making this less of a problem is to reorder the archive index (and it's contents, to match, if you are using serial media, like tape, to access the archive). This has been discussed before in great detail, only the peple who care about it taking a long time have (so far) been unwilling to write the "archive optmizer". 8-). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message