Re: fsync(2) manual and hdd write caching
On Thu, Oct 28, 2010, per...@pluto.rain.com wrote: > Ivan Voras wrote: > > > ... The problem is actually pretty hard - since AFAIK SoftUpdates > > doesn't have "checkpoints" in the sense that it groups writes and > > all data "before" can guaranteed to be on-disk, the problem is > > *when* to issue BIO_FLUSH requests. > > Seems to me the originally-stated problem -- making fsync(2) > do what it claims to do -- is not hard at all. Just issue a > BIO_FLUSH request as the final step in handling fsync(2). Yes, for correctness, fsync(2) needs to flush the relevant parts of the disk's volatile write cache before returning. If it doesn't, applications like databases can fail if there is a power loss. Unfortunately, this isn't really practical. First, performance is poor: you generally can't flush a particular sector without flushing the entire write cache, and many disks (including all ATA disks) don't differentiate between volatile and non-volatile caches. Second, many disks ignore the command. So the status quo for all the major Unix variants is apparently to favor performance over correctness. However, FlushFileBuffers() in Windows does the right thing and flushes the disk write cache, and I've heard that ZFS and ext4 also do the right thing (subject to the correctness of the disk controller, of course). So FreeBSD isn't any worse than most of the world here. FreeBSD used to turn off disk write caches by default, but many people complained about FreeBSD being slow. Far fewer people complain about corruptions due to power failure. Usually people who require stronger reliability guarantees invest in replicated storage and battery backups anyway. Note that the "broken" behavior is still protective against kernel and application crashes -- just not power failures and certain types of disk faults. An informative article on the topic is here: http://www.postgresql.org/docs/9.0/static/wal-reliability.html > While we're at it, perhaps do the same in close(2). > I _hope_ we are already doing it in unmount(2). close(2) is a different beast; flushes would be too expensive, and they aren't needed except for NFS. Apps are expected to use fsync(2) if they require it. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
Ivan Voras wrote: > ... The problem is actually pretty hard - since AFAIK SoftUpdates > doesn't have "checkpoints" in the sense that it groups writes and > all data "before" can guaranteed to be on-disk, the problem is > *when* to issue BIO_FLUSH requests. Seems to me the originally-stated problem -- making fsync(2) do what it claims to do -- is not hard at all. Just issue a BIO_FLUSH request as the final step in handling fsync(2). While we're at it, perhaps do the same in close(2). I _hope_ we are already doing it in unmount(2). ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On 10/27/10 12:11, Bruce Cran wrote: > On Wed, 27 Oct 2010 02:00:51 -0700 > per...@pluto.rain.com wrote: > >> Short of mounting synchronously, with the attendant performance >> hit, would it not make sense for fsync(2) to issue ATA_FLUSHCACHE >> or SCSI "SYNCHRONIZE CACHE" after it has finished writing data >> to the drive? Surely the low-level capability to issue those >> commands must already exist, else we would have no way to safely >> prepare for power off. > > mounting synchronously won't help, will it? As I understand it that > just makes sure that data is sent straight to disk and not left in > memory; the data will still be stored in the HDD cache for a > while. Correct. The problem is actually pretty hard - since AFAIK SoftUpdates doesn't have "checkpoints" in the sense that it groups writes and all data "before" can guaranteed to be on-disk, the problem is *when* to issue BIO_FLUSH requests. One possible solution is to simply decide on a heuristic like: "ok, doing BIO_FLUSH all the time will destroy performance, we will only do it for every metadata write". Possibly with a sysctl tunable or per-mount option. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On Wed, 27 Oct 2010 02:00:51 -0700 per...@pluto.rain.com wrote: > Short of mounting synchronously, with the attendant performance > hit, would it not make sense for fsync(2) to issue ATA_FLUSHCACHE > or SCSI "SYNCHRONIZE CACHE" after it has finished writing data > to the drive? Surely the low-level capability to issue those > commands must already exist, else we would have no way to safely > prepare for power off. mounting synchronously won't help, will it? As I understand it that just makes sure that data is sent straight to disk and not left in memory; the data will still be stored in the HDD cache for a while. -- Bruce Cran ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
Ivan Voras wrote: > fsync(2) actually does behave as advertised, "auses all modified > data and attributes of fd to be moved to a permanent storage > device". It is the problem of the "permanent storage device" > if it caches this data further. IMO, volatile RAM without battery backup cannot reasonably be considered a "permanent storage device", regardless of where it is physically located. Short of mounting synchronously, with the attendant performance hit, would it not make sense for fsync(2) to issue ATA_FLUSHCACHE or SCSI "SYNCHRONIZE CACHE" after it has finished writing data to the drive? Surely the low-level capability to issue those commands must already exist, else we would have no way to safely prepare for power off. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On Tue, Oct 26, 2010 at 4:40 PM, Alexander Best wrote: > On Wed Oct 27 10, Bruce Cran wrote: >> On Tue, 26 Oct 2010 21:36:18 + >> Alexander Best wrote: >> >> > since there's a thread on freebsd-questions@ concerning fsync(2) and >> > the fact that hdd write caching can cause this syscall to basically >> > be a no op, could somebody please copy the BUGS section from sync(2) >> > to fsync(2)? >> >> Shouldn't the BUGS section of sync(2) be removed? >> >> "The sync() system call may return before the buffers are completely >> flushed." >> >> But from >> http://www.opengroup.org/onlinepubs/009695399/functions/sync.html : >> >> "The writing, although scheduled, is not necessarily complete upon >> return from sync()." >> >> That would suggest it's not actually a bug. > > well...you are right on the one hand. but still this should be documented imo. > how about turning BUGS into a CAVEATS section and then adding that section to > fsync(2)? > > the reason posix mentions this sync/fsync behavior is probably the fact that > they know that this cannot be avoided. so that statement seems itself to be a > caveat rather than a feature. ;) Just a sidenote, but that's POSIX 2004[.6?] spec, not POSIX 2008.1 (which is the most current spec -- http://www.unix.org/2008edition/ ). I double checked and the wording didn't differ for the fsync(2) system interface, but it could differ in others. HTH, -Garrett ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On Wed Oct 27 10, Bruce Cran wrote: > On Tue, 26 Oct 2010 21:36:18 + > Alexander Best wrote: > > > since there's a thread on freebsd-questions@ concerning fsync(2) and > > the fact that hdd write caching can cause this syscall to basically > > be a no op, could somebody please copy the BUGS section from sync(2) > > to fsync(2)? > > Shouldn't the BUGS section of sync(2) be removed? > > "The sync() system call may return before the buffers are completely > flushed." > > But from > http://www.opengroup.org/onlinepubs/009695399/functions/sync.html : > > "The writing, although scheduled, is not necessarily complete upon > return from sync()." > > That would suggest it's not actually a bug. well...you are right on the one hand. but still this should be documented imo. how about turning BUGS into a CAVEATS section and then adding that section to fsync(2)? the reason posix mentions this sync/fsync behavior is probably the fact that they know that this cannot be avoided. so that statement seems itself to be a caveat rather than a feature. ;) cheers. alex > > -- > Bruce Cran -- a13x ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On Tue, 26 Oct 2010 21:36:18 + Alexander Best wrote: > since there's a thread on freebsd-questions@ concerning fsync(2) and > the fact that hdd write caching can cause this syscall to basically > be a no op, could somebody please copy the BUGS section from sync(2) > to fsync(2)? Shouldn't the BUGS section of sync(2) be removed? "The sync() system call may return before the buffers are completely flushed." But from http://www.opengroup.org/onlinepubs/009695399/functions/sync.html : "The writing, although scheduled, is not necessarily complete upon return from sync()." That would suggest it's not actually a bug. -- Bruce Cran ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On Wed, 27 Oct 2010 01:19:18 +0200 Ivan Voras wrote: > fsync(2) actually does behave as advertised, "auses all modified data > and attributes of fd to be moved to a permanent storage device". It > is the problem of the "permanent storage device" if it caches this > data further. http://www.opengroup.org/onlinepubs/009695399/functions/fsync.html at first suggests it should flush write caches, but does allow for implementations that don't: "The fsync() function is intended to force a physical write of data from the buffer cache, and to assure that after a system crash or other failure that all data up to the time of the fsync() call is recorded on the disk." ... "In the middle ground between these extremes, fsync() might or might not actually cause data to be written where it is safe from a power failure." -- Bruce Cran ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
Re: fsync(2) manual and hdd write caching
On 10/26/10 23:36, Alexander Best wrote: hi there, since there's a thread on freebsd-questions@ concerning fsync(2) and the fact that hdd write caching can cause this syscall to basically be a no op, could somebody please copy the BUGS section from sync(2) to fsync(2)? I don't think they are the same. The "buffers" of sync(2) are not those from the discussion on fsync(2) safety. Or more correctly, they are but those 2 calls work on a different scope. fsync(2) actually does behave as advertised, "auses all modified data and attributes of fd to be moved to a permanent storage device". It is the problem of the "permanent storage device" if it caches this data further. ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"
fsync(2) manual and hdd write caching
hi there, since there's a thread on freebsd-questions@ concerning fsync(2) and the fact that hdd write caching can cause this syscall to basically be a no op, could somebody please copy the BUGS section from sync(2) to fsync(2)? cheers. alex -- a13x ___ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"