Re: FFS write coalescing
On Mon, 03 Dec 2012, Chuck Silvers wrote: the genfs code also never writes clean pages to disk, even though for RAID5 storage it would likely be more efficient to write clean pages that are in the same stripe as dirty pages if that would avoid issuing partial-stripe writes. (which is basically another way of saying what david said.) Perhaps there should be a way for block devices to report at least three block sizes: a) smallest possible block size (512 for almost all disks) b) smallest efficient block size and alignment (4k for modern disks, stripe size for raid) c) largest possible size (a device and bus-dependent variant of MAXPHYS) Then the file system could use (b) to know when it's a good idea to combine dirty and clean pages into the same write. --apb (Alan Barrett)
Re: FFS write coalescing
On Mon, Dec 03, 2012 at 06:21:30PM +0100, Edgar Fu wrote: > I could find out myself by digging through the source, but probabely someone > here knows the answer off his head: > When FFS does write coalescing, will it try to align the resulting 64k chunk? > I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2) > and (3,4) or 1, (2,3) and 4? > Of course, the background for my question is RAID stripe alignment. currently no, the genfs_putpages() code that (almost?) all the file systems use at this point doesn't align disk writes to anything larger than a page boundary. the genfs code also never writes clean pages to disk, even though for RAID5 storage it would likely be more efficient to write clean pages that are in the same stripe as dirty pages if that would avoid issuing partial-stripe writes. (which is basically another way of saying what david said.) -Chuck
Re: FFS write coalescing
On Mon, Dec 03, 2012 at 06:21:30PM +0100, Edgar Fu? wrote: > When FFS does write coalescing, will it try to align the resulting 64k chunk? > I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2) > and (3,4) or 1, (2,3) and 4? > Of course, the background for my question is RAID stripe alignment. With that thought, for RAID5 in particular, you'd want the FS code to indicate to the disk that it had some of the nearby data in memory. That would safe the read of the parity data. (Which would be really horrid to implement!) Perhaps the 'disk' should give some 'good parameters' for writes to the FS code when the filesystem is mounted. David -- David Laight: da...@l8s.co.uk
FFS write coalescing
I could find out myself by digging through the source, but probabely someone here knows the answer off his head: When FFS does write coalescing, will it try to align the resulting 64k chunk? I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2) and (3,4) or 1, (2,3) and 4? Of course, the background for my question is RAID stripe alignment.
Re: core statement on fexecve, O_EXEC, and O_SEARCH
Alan Barrett wrote: > The fexecve function could be implemented entirely in libc, > via execve(2) on a file name of the form "/proc/self/fd/". > Any security concerns around fexecve() also apply to exec of > /proc/self/fd/. I gave a try to this approach. There is an unexpected issue: for a reason I cannot figure, namei() does not resolve /proc/self/fd/. Here is a ktrace: 810 1 t_fexecve CALL open(0x8048db6,0,0) 810 1 t_fexecve NAMI "/usr/bin/touch" 810 1 t_fexecve RET open 3 810 1 t_fexecve CALL getpid 810 1 t_fexecve RET getpid 810/0x32a, 924/0x39c 810 1 t_fexecve CALL execve(0xbfbfe66f,0xbfbfea98,0xbfbfeaa4) 810 1 t_fexecve NAMI "/proc/self/fd/3" 810 1 t_fexecve RET execve -1 errno 2 No such file or directory -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org
Re: Problem identified: WAPL/RAIDframe performance problems
mo...@rodents-montreal.org (Mouse) writes: things. What I care about is the largest size "sector" that will (in >> ^^^ the ordinary course of things anyway) be written atomically. >>> Then those are 512-byte-sector drives [...] >> No; because I can do 4K atomic writes, I want to know about that. >And, can't you do that with traditional drives, drives which really do >have 512-byte sectors? Do a 4K transfer and you write 8 physical >sectors with no opportunity for any other operation to see the write >partially done. Is that wrong, or am I missing something else? The drive could partially complete the write, i.e. if one of the latter sectors has a write error or if the drive is powered down in the middle of the operation. Sure, you would know about it. But in case of a crash you can't rely on data consistency. -- -- Michael van Elst Internet: mlel...@serpens.de "A potential Snark may lurk in every tree."