Re: [OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Felix Frank Sun, 26 Apr 2009 23:05:06 -0700

- Felix's approach is to set the flag in writepage, and to prevent


writepage_sync, more specifically. I can't see how we might lock writepage
itself. That may remain a problem, see below.

re-entry into either writepage or entry in osi_VM_StoreAllSegments for
the same file if it is set.  This looks sound.  The net effect differs
from Chaskiel's suggestion in that it 1) disables
osi_Vm_StoreAllSegments on the same file for callers other than
doPartialWrite (probably a good idea), and 2) prevents concurrent
writepage calls within the same file (which might already be the
case).

Issues that remain:
- I think Felix still sees some deadlocks and data inconsistencies
with 2.6.18, but I can't reproduce with 2.6.29 or 2.6.30


There is reproduceable data loss during the mmap test, its amount being
dependent (linearly, it appears) on the size of physical memory. Corruption
seems to start right above 1/3 memory size.

Deadlocks still appear to occur above 1/2 memory size.

- I see extreme slowness with random mmap writes - nothing really new
here.  But Felix reports that he doesn't see this with his older
2.6.18 kernels, which is interesting.  We're probably doing something
that's not quite right for newer kernels.  Would be interesting to
bisect if I had a machine that could boot that range of kernels.


It's not exactly "older", it's RedHat's current 2.6.18-128.1.6.el5.

The data corruption issues in 1.4.10 are (from my point of view) more severe
than the possible deadlocks. Is the approach of not writing back pages at
inconvenient times actually valid?

Regards
 - Felix
_______________________________________________
OpenAFS-devel mailing list
[email protected]
https://lists.openafs.org/mailman/listinfo/openafs-devel

Re: [OpenAFS-devel] Cache inconsistency in client 1.4.8 and above

Reply via email to