Hi! On most of our boxes, running Fedora kernels (different versions, see below) and OpenAFS 1.3.82 to 1.4.1, we are suffering from the problem that after some uptime, dirty filesystem buffers are not periodically flushed anymore. I'm experiencing this problem only on machines running either OpenAFS or a Software-Suspend2 enabled kernel. (Maybe there is a relation?) Boxes without those extensions behave normally.
On machines that show the problem, buffers are flushed only based on the amount of dirty memory (dirty_ratio and dirty_background_ratio) or when doing a "sync", they are not periodically flushed anymore. There is one box where even "sync" does only flush a fraction of the dirty buffers [2]. I am not using vmware, no laptop_mode and no custom /proc/sys/vm/dirty* settings are involved. It looks like the problems arise after a few days uptime, a freshly booted system doesn't show the symptoms. (At least it doesn't show them when I'm looking out for them. ;)) I've written a small test-script to visualize the behavior. [3] The script creates a 200MB file, monitors nr_dirty in /proc/vmstat and executes sync after some time. The output looks like that: -------------------- snip! bad: -------------------- Linux slime.wu-wien.ac.at 2.6.14-1.1653_FC4smp #1 SMP Tue Dec 13 21:46:01 EST 2005 i686 i686 i386 GNU/Linux 12:22:46 up 10 days, 18:12, 37 users, load average: 0.17, 0.15, 0.09 12:22:46 start: head -c 200000000 /dev/zero >/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 1076 12:22:46 # nr_dirty 1805 12:22:47 end: head -c 200000000 /dev/zero >/var/tmp/dirty-buffers.EFYFF13399 # nr_dirty 31061 12:22:51 # nr_dirty 25671 12:22:56 # nr_dirty 25724 12:23:01 # nr_dirty 25724 12:23:06 # nr_dirty 25724 12:23:11 # nr_dirty 25724 12:23:16 # nr_dirty 25724 12:23:21 # nr_dirty 25724 12:23:26 # nr_dirty 25724 12:23:31 # nr_dirty 25724 12:23:36 # nr_dirty 25724 12:23:41 # nr_dirty 25724 12:23:47 # nr_dirty 25725 12:23:52 # nr_dirty 25726 12:23:57 # nr_dirty 25728 12:24:02 # nr_dirty 25728 12:24:07 # nr_dirty 25728 12:24:12 # nr_dirty 25728 12:24:12 # nr_dirty 25728 12:24:12 start: sync # nr_dirty 25728 12:24:12 end: sync # nr_dirty 23566 12:24:17 # nr_dirty 23566 12:24:22 # nr_dirty 23582 12:24:27 # nr_dirty 23583 12:24:32 # nr_dirty 23583 12:24:37 # nr_dirty 23583 12:24:42 # nr_dirty 23583 12:24:47 # nr_dirty 23583 12:24:52 # nr_dirty 23583 12:24:57 # nr_dirty 23583 -------------------- snip! -------------------- While writing the temp-file, some buffers are flushed. (31061->25671) But after writing is completed, the 25000 buffers remain dirty and are not flushed after 30 secs, as I would expect. The sync causes the dirty buffers to shrink from 25728 to 23566 but I'd expect that sync should cause them to become near 0. Here is the output of another system with a lower uptime that doesn't show that behavior yet: -------------------- snip! good: -------------------- Linux roaster.wu-wien.ac.at 2.6.16-1.2111_1.99.rhfc5.cubbi_suspend2_8ksmp #1 SMP Mon May 15 09:47:41 CEST 2006 i686 i686 i386 GNU/Linux 18:56:50 up 23:31, 5 users, load average: 0.01, 0.03, 0.06 18:56:50 # nr_dirty 4 18:56:50 start: head -c 200000000 /dev/zero >/tmp/dirty-buffers.wxtSuY5341 # nr_dirty 4 18:56:50 end: head -c 200000000 /dev/zero >/tmp/dirty-buffers.wxtSuY5341 # nr_dirty 25256 18:56:55 # nr_dirty 25256 18:57:00 # nr_dirty 25257 18:57:05 # nr_dirty 25257 18:57:10 # nr_dirty 25257 18:57:15 # nr_dirty 25257 18:57:20 # nr_dirty 25257 18:57:25 # nr_dirty 24232 18:57:30 # nr_dirty 1 18:57:35 # nr_dirty 1 18:57:40 # nr_dirty 1 18:57:45 # nr_dirty 1 18:57:50 # nr_dirty 1 18:57:55 # nr_dirty 1 18:58:00 # nr_dirty 1 18:58:05 # nr_dirty 0 18:58:10 # nr_dirty 0 18:58:15 # nr_dirty 1 18:58:15 # nr_dirty 1 18:58:15 start: sync # nr_dirty 1 18:58:15 end: sync # nr_dirty 0 18:58:20 # nr_dirty 0 18:58:25 # nr_dirty 0 18:58:30 # nr_dirty 0 18:58:35 # nr_dirty 0 18:58:40 # nr_dirty 0 18:58:45 # nr_dirty 0 18:58:50 # nr_dirty 0 18:58:55 # nr_dirty 0 18:59:00 # nr_dirty 0 -------------------- snip! -------------------- As I'm seeing this error on many Fedora-based systems (FC3, FC4 or FC5), using different OpenAFS versions, I guess there must be others that have the same problem (but maybe didn't notice it yet). Any help would be greatly appreciated. Cheers, --leo P.S.: I already reported it to LKML but didn't get a response. :( [1] Tests results: http://leo.kloburg.at/tmp/dirty-buffers/ [2] sync does flush only a fraction of dirty buffers: http://leo.kloburg.at/tmp/dirty-buffers/bad_slime-2.6.14-1.1653_FC4smp.txt [3] Test-script to visualize the error: http://leo.kloburg.at/tmp/dirty-buffers/dirty-buffers.sh -- ----------------------------------------------------------------------- [EMAIL PROTECTED] Fax: +43-1-31336-906050 Zentrum fuer Informatikdienste - Wirtschaftsuniversitaet Wien - Austria _______________________________________________ OpenAFS-devel mailing list OpenAFS-devel@openafs.org https://lists.openafs.org/mailman/listinfo/openafs-devel