Re: FFS write coalescing

2012-12-03 Thread Alan Barrett

On Mon, 03 Dec 2012, Chuck Silvers wrote:

the genfs code also never writes clean pages to disk, even though for
RAID5 storage it would likely be more efficient to write clean pages
that are in the same stripe as dirty pages if that would avoid issuing
partial-stripe writes.  (which is basically another way of saying
what david said.)


Perhaps there should be a way for block devices to report at least three
block sizes:

a) smallest possible block size (512 for almost all disks)

b) smallest efficient block size and alignment (4k for modern disks,
stripe size for raid)

c) largest possible size (a device and bus-dependent variant of MAXPHYS)

Then the file system could use (b) to know when it's a good idea to
combine dirty and clean pages into the same write.

--apb (Alan Barrett)


Re: FFS write coalescing

2012-12-03 Thread Chuck Silvers
On Mon, Dec 03, 2012 at 06:21:30PM +0100, Edgar Fu wrote:
> I could find out myself by digging through the source, but probabely someone
> here knows the answer off his head:
> When FFS does write coalescing, will it try to align the resulting 64k chunk?
> I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2)
> and (3,4) or 1, (2,3) and 4?
> Of course, the background for my question is RAID stripe alignment.

currently no, the genfs_putpages() code that (almost?) all the file systems
use at this point doesn't align disk writes to anything larger than
a page boundary.

the genfs code also never writes clean pages to disk, even though for
RAID5 storage it would likely be more efficient to write clean pages
that are in the same stripe as dirty pages if that would avoid issuing
partial-stripe writes.  (which is basically another way of saying
what david said.)

-Chuck


Re: FFS write coalescing

2012-12-03 Thread David Laight
On Mon, Dec 03, 2012 at 06:21:30PM +0100, Edgar Fu? wrote:
> When FFS does write coalescing, will it try to align the resulting 64k chunk?
> I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2)
> and (3,4) or 1, (2,3) and 4?
> Of course, the background for my question is RAID stripe alignment.

With that thought, for RAID5 in particular, you'd want the FS code
to indicate to the disk that it had some of the nearby data in memory.
That would safe the read of the parity data.
(Which would be really horrid to implement!)

Perhaps the 'disk' should give some 'good parameters' for writes to the
FS code when the filesystem is mounted.

David

-- 
David Laight: da...@l8s.co.uk


FFS write coalescing

2012-12-03 Thread Edgar Fuß
I could find out myself by digging through the source, but probabely someone
here knows the answer off his head:
When FFS does write coalescing, will it try to align the resulting 64k chunk?
I.e., if I have 32k blocks and I write blocks 1, 2, 3, 4; will it write (1,2)
and (3,4) or 1, (2,3) and 4?
Of course, the background for my question is RAID stripe alignment.


Re: core statement on fexecve, O_EXEC, and O_SEARCH

2012-12-03 Thread Emmanuel Dreyfus
Alan Barrett  wrote:

> The fexecve function could be implemented entirely in libc, 
> via execve(2) on a file name of the form "/proc/self/fd/". 
> Any security concerns around fexecve() also apply to exec of 
> /proc/self/fd/.

I gave a try to this approach. There is an unexpected issue:
for a reason I cannot figure, namei() does not resolve
/proc/self/fd/. Here is a ktrace:

   810  1 t_fexecve CALL  open(0x8048db6,0,0)
   810  1 t_fexecve NAMI  "/usr/bin/touch"
   810  1 t_fexecve RET   open 3
   810  1 t_fexecve CALL  getpid
   810  1 t_fexecve RET   getpid 810/0x32a, 924/0x39c
   810  1 t_fexecve CALL  execve(0xbfbfe66f,0xbfbfea98,0xbfbfeaa4)
   810  1 t_fexecve NAMI  "/proc/self/fd/3"
   810  1 t_fexecve RET   execve -1 errno 2 No such file or
directory


-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: Problem identified: WAPL/RAIDframe performance problems

2012-12-03 Thread Michael van Elst
mo...@rodents-montreal.org (Mouse) writes:

 things.  What I care about is the largest size "sector" that will (in
>> ^^^
 the ordinary course of things anyway) be written atomically.
>>> Then those are 512-byte-sector drives [...]
>> No; because I can do 4K atomic writes, I want to know about that.

>And, can't you do that with traditional drives, drives which really do
>have 512-byte sectors?  Do a 4K transfer and you write 8 physical
>sectors with no opportunity for any other operation to see the write
>partially done.  Is that wrong, or am I missing something else?

The drive could partially complete the write, i.e. if one of the
latter sectors has a write error or if the drive is powered down
in the middle of the operation.

Sure, you would know about it. But in case of a crash you can't rely
on data consistency.

-- 
-- 
Michael van Elst
Internet: mlel...@serpens.de
"A potential Snark may lurk in every tree."