Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 01:36:07PM +, Manuel Bouyer wrote:
> [...]
> Log Message:
> Support transfers of up to MACHINE_MAXPHYS in all pciide variants, and ahci.
> wd(4) limits its maxphys depending on the drives's capability (64k sectors
> for LBA48, 256 sectors for LBA and 128 sectors for older devices).
> 
> I assumed all pciide controllers could do MACHINE_MAXPHYS transfers, but
> this may not be true. The capabilities of each controller variants should be
> looked at more closely.

transfers with dd on the raw character device works fine, but I get this
panic when booting multiuser, both with pciide and ahci controllers:
Starting ntpd.
panic: bad chunksize 139264, iochunk 524288, request size 139264
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8025695d cs 8 rflags 246 cr2 7f7ff701dd90 ilevel 
0 rsp fe810b8a3790
curlwp 0xfe81af50d480 pid 383 lid 1 lowest kstack 0xfe810b8a
Stopped in pid 383.1 (ntpd) at  netbsd:breakpoint+0x5:  leave
db{7}> tr
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
printf_nolog() at netbsd:printf_nolog
ra_startio() at netbsd:ra_startio+0x11b
uvm_ra_request() at netbsd:uvm_ra_request+0x1a9
uvn_get() at netbsd:uvn_get+0xf1
uvm_fault_internal() at netbsd:uvm_fault_internal+0xcab
uvm_fault_wire() at netbsd:uvm_fault_wire+0x53
uvm_map_pageable_all() at netbsd:uvm_map_pageable_all+0x19d
syscall() at netbsd:syscall+0x94
--- syscall (number 242) ---

any idea ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Thor Lancelot Simon
On Tue, Oct 09, 2012 at 03:50:18PM +0200, Manuel Bouyer wrote:
> On Tue, Oct 09, 2012 at 01:36:07PM +, Manuel Bouyer wrote:
> > [...]
> > Log Message:
> > Support transfers of up to MACHINE_MAXPHYS in all pciide variants, and ahci.
> > wd(4) limits its maxphys depending on the drives's capability (64k sectors
> > for LBA48, 256 sectors for LBA and 128 sectors for older devices).
> > 
> > I assumed all pciide controllers could do MACHINE_MAXPHYS transfers, but
> > this may not be true. The capabilities of each controller variants should be
> > looked at more closely.
> 
> transfers with dd on the raw character device works fine, but I get this
> panic when booting multiuser, both with pciide and ahci controllers:
> Starting ntpd.
> panic: bad chunksize 139264, iochunk 524288, request size 139264

Hm.

This will be an attempt to transfer an entire file -- 139264 will be the
entire file size of something, probably an executable it's paging in.

I saw this before -- I can't remember what caused it -- the "bad chunksize"
is, I think, because it's not power-of-2.  The cause will be in the UVM
readahead code.

I could swear I fixed at least one cause of this already.

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 11:04:20AM -0400, Thor Lancelot Simon wrote:
> Hm.
> 
> This will be an attempt to transfer an entire file -- 139264 will be the
> entire file size of something, probably an executable it's paging in.
> 
> I saw this before -- I can't remember what caused it -- the "bad chunksize"
> is, I think, because it's not power-of-2.  The cause will be in the UVM
> readahead code.

It is. It looks like this code is different in tls-maxphys and HEAD.

I wonder if we really want to read in power-of-2 chunks here, or just
multiple of page size. Reading the code which calls ra_startio(),
I'm not sure why size of chunksize would be a power-of-2.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Thor Lancelot Simon
On Tue, Oct 09, 2012 at 05:21:17PM +0200, Manuel Bouyer wrote:
> On Tue, Oct 09, 2012 at 11:04:20AM -0400, Thor Lancelot Simon wrote:
> > Hm.
> > 
> > This will be an attempt to transfer an entire file -- 139264 will be the
> > entire file size of something, probably an executable it's paging in.
> > 
> > I saw this before -- I can't remember what caused it -- the "bad chunksize"
> > is, I think, because it's not power-of-2.  The cause will be in the UVM
> > readahead code.
> 
> It is. It looks like this code is different in tls-maxphys and HEAD.

Is it different in tls-maxphys-base and HEAD?  I played with this code
quite a bit while debugging previously, I believe.

> I wonder if we really want to read in power-of-2 chunks here, or just
> multiple of page size. Reading the code which calls ra_startio(),
> I'm not sure why size of chunksize would be a power-of-2.

I wondered that myself.  I think multiple of pagesize is probably okay,
but -- IIRC -- I couldn't convince myself we'd always do the right thing
with the residual.  Try it?

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 11:24:13AM -0400, Thor Lancelot Simon wrote:
> On Tue, Oct 09, 2012 at 05:21:17PM +0200, Manuel Bouyer wrote:
> > On Tue, Oct 09, 2012 at 11:04:20AM -0400, Thor Lancelot Simon wrote:
> > > Hm.
> > > 
> > > This will be an attempt to transfer an entire file -- 139264 will be the
> > > entire file size of something, probably an executable it's paging in.
> > > 
> > > I saw this before -- I can't remember what caused it -- the "bad 
> > > chunksize"
> > > is, I think, because it's not power-of-2.  The cause will be in the UVM
> > > readahead code.
> > 
> > It is. It looks like this code is different in tls-maxphys and HEAD.
> 
> Is it different in tls-maxphys-base and HEAD?  I played with this code
> quite a bit while debugging previously, I believe.

Well, the code which cause the panic changed from
const size_t chunksize = RA_IOCHUNK;
to
const size_t chunksize = MIN(chunksz, round_page(sz));

so the power-of-2 check changes from checking that a #define constant
has the right value, to sanity-checking that the values we got from upper
layers. And AFAIK then callers don't make any effort for these values to
be a power-of-2, but it may make sure that it's page-size aligned
(I'm not even sure of that: the end of the file is not necesserely
page-aligned ...)

> 
> > I wonder if we really want to read in power-of-2 chunks here, or just
> > multiple of page size. Reading the code which calls ra_startio(),
> > I'm not sure why size of chunksize would be a power-of-2.
> 
> I wondered that myself.  I think multiple of pagesize is probably okay,
> but -- IIRC -- I couldn't convince myself we'd always do the right thing
> with the residual.  Try it?

I'll have a closer look.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Chuck Silvers
On Tue, Oct 09, 2012 at 11:24:13AM -0400, Thor Lancelot Simon wrote:
> On Tue, Oct 09, 2012 at 05:21:17PM +0200, Manuel Bouyer wrote:
> > On Tue, Oct 09, 2012 at 11:04:20AM -0400, Thor Lancelot Simon wrote:
> > > Hm.
> > > 
> > > This will be an attempt to transfer an entire file -- 139264 will be the
> > > entire file size of something, probably an executable it's paging in.
> > > 
> > > I saw this before -- I can't remember what caused it -- the "bad 
> > > chunksize"
> > > is, I think, because it's not power-of-2.  The cause will be in the UVM
> > > readahead code.
> > 
> > It is. It looks like this code is different in tls-maxphys and HEAD.
> 
> Is it different in tls-maxphys-base and HEAD?  I played with this code
> quite a bit while debugging previously, I believe.
> 
> > I wonder if we really want to read in power-of-2 chunks here, or just
> > multiple of page size. Reading the code which calls ra_startio(),
> > I'm not sure why size of chunksize would be a power-of-2.
> 
> I wondered that myself.  I think multiple of pagesize is probably okay,
> but -- IIRC -- I couldn't convince myself we'd always do the right thing
> with the residual.  Try it?

the underlying mechanism used by the UVM read ahead code (pgo_get / 
VOP_GETPAGES)
can handle any number of pages, not just power-of-2-sized groups of pages.
there's no reason for the read ahead code to limit itself that way.

-Chuck


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Thor Lancelot Simon
On Tue, Oct 09, 2012 at 05:35:44PM +0200, Manuel Bouyer wrote:
> On Tue, Oct 09, 2012 at 11:24:13AM -0400, Thor Lancelot Simon wrote:
> > On Tue, Oct 09, 2012 at 05:21:17PM +0200, Manuel Bouyer wrote:
> > > On Tue, Oct 09, 2012 at 11:04:20AM -0400, Thor Lancelot Simon wrote:
> > > > Hm.
> > > > 
> > > > This will be an attempt to transfer an entire file -- 139264 will be the
> > > > entire file size of something, probably an executable it's paging in.
> > > > 
> > > > I saw this before -- I can't remember what caused it -- the "bad 
> > > > chunksize"
> > > > is, I think, because it's not power-of-2.  The cause will be in the UVM
> > > > readahead code.
> > > 
> > > It is. It looks like this code is different in tls-maxphys and HEAD.
> > 
> > Is it different in tls-maxphys-base and HEAD?  I played with this code
> > quite a bit while debugging previously, I believe.
> 
> Well, the code which cause the panic changed from
>   const size_t chunksize = RA_IOCHUNK;
> to
>   const size_t chunksize = MIN(chunksz, round_page(sz));
> 
> so the power-of-2 check changes from checking that a #define constant
> has the right value, to sanity-checking that the values we got from upper
> layers. And AFAIK then callers don't make any effort for these values to
> be a power-of-2, but it may make sure that it's page-size aligned
> (I'm not even sure of that: the end of the file is not necesserely
> page-aligned ...)

But the chunk-size really has two effects: we round short transfers *up*
to (a multiple of) the chunk-size if the file is large enough
(because intermediate layers lose the actual requested I/O size even
for read(), again IIRC) and we break large readheads *down* into multiple
reads of chunk-size.

That's the reason for the somewhat odd definition.  So I think the power
of 2 check should in fact go, now that I've thought through it more.

It seems to me if we got the residual at the end of the file right before
we probably still do now, but that could definitely use another set of
eyes.

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 09:20:46AM -0700, Chuck Silvers wrote:
> > [...]
> > I wondered that myself.  I think multiple of pagesize is probably okay,
> > but -- IIRC -- I couldn't convince myself we'd always do the right thing
> > with the residual.  Try it?
> 
> the underlying mechanism used by the UVM read ahead code (pgo_get / 
> VOP_GETPAGES)
> can handle any number of pages, not just power-of-2-sized groups of pages.
> there's no reason for the read ahead code to limit itself that way.

The reason was the computation of the tranfer length:
bytelen = ((off + chunksize) & -(off_t)chunksize) - off;

but as off and chunksize are both a multiple of PAGE_SIZE, using chunksize
as bytelen is just fine. I just removed this and use chunksize as the size
of the read-ahead, and it seems to work fine. 

Now, iostat -x shows that sequential read of a large file (either with cat
or dd bs=1m) is done in 32k transfers at the disk level. So I guess something
is not working properly here ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Thor Lancelot Simon
On Tue, Oct 09, 2012 at 10:15:06PM +0200, Manuel Bouyer wrote:
> 
> Now, iostat -x shows that sequential read of a large file (either with cat
> or dd bs=1m) is done in 32k transfers at the disk level. So I guess something
> is not working properly here ...

I saw that too, though I managed to get 64K transfers.  I have been meaning
to test with mmap, since that is more like the code path through the upper
layers that produces the attempt to page in an entire executable.

I've looked all through every layer I can identify and I do not see where
the seemingly still remaining 64K restriction comes from.  I thought it
was FFS, but in fact *every vestige* of the maxcontig or "maximum extent
size" use seems to have been excised from our FFS code.  So it's a mystery
to me.

Chuck, any ideas?

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 04:54:41PM -0400, Thor Lancelot Simon wrote:
> On Tue, Oct 09, 2012 at 10:15:06PM +0200, Manuel Bouyer wrote:
> > 
> > Now, iostat -x shows that sequential read of a large file (either with cat
> > or dd bs=1m) is done in 32k transfers at the disk level. So I guess 
> > something
> > is not working properly here ...
> 
> I saw that too, though I managed to get 64K transfers.  I have been meaning
> to test with mmap, since that is more like the code path through the upper
> layers that produces the attempt to page in an entire executable.
> 
> I've looked all through every layer I can identify and I do not see where
> the seemingly still remaining 64K restriction comes from.  I thought it
> was FFS, but in fact *every vestige* of the maxcontig or "maximum extent
> size" use seems to have been excised from our FFS code.  So it's a mystery
> to me.

There is still a reference to MAXPHYS in ufs_bmaparray(), which, if I got
it right, will limit the *runp in VOP_BMAP(), which will limit iobytes
in genfs_do_io().

but that doens't explain why I see 32k xfers. And, to make things worse,
I see 64k if I start the big file read with debug enabled, so it looks
timing-dependant :(

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 11:13:52PM +0200, Manuel Bouyer wrote:
> There is still a reference to MAXPHYS in ufs_bmaparray(), which, if I got
> it right, will limit the *runp in VOP_BMAP(), which will limit iobytes
> in genfs_do_io().
> 
> but that doens't explain why I see 32k xfers. And, to make things worse,
> I see 64k if I start the big file read with debug enabled, so it looks
> timing-dependant :(

OK, after fixing the issue I created in ra_startio(), and replacing the
MAXPHYS above with mnt_maxphys, I get:
device  read KB/tr/s   time MB/s write KB/tw/s   time MB/s
wd0504.22129   0.9063.45   0.00  0   0.90 0.00

with a 'cat big_file > /dev/null'
writes are still limited to 64k ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-09 Thread Chuck Silvers
On Tue, Oct 09, 2012 at 11:50:27PM +0200, Manuel Bouyer wrote:
> On Tue, Oct 09, 2012 at 11:13:52PM +0200, Manuel Bouyer wrote:
> > There is still a reference to MAXPHYS in ufs_bmaparray(), which, if I got
> > it right, will limit the *runp in VOP_BMAP(), which will limit iobytes
> > in genfs_do_io().
> > 
> > but that doens't explain why I see 32k xfers. And, to make things worse,
> > I see 64k if I start the big file read with debug enabled, so it looks
> > timing-dependant :(
> 
> OK, after fixing the issue I created in ra_startio(), and replacing the
> MAXPHYS above with mnt_maxphys, I get:
> device  read KB/tr/s   time MB/s write KB/tw/s   time MB/s
> wd0504.22129   0.9063.45   0.00  0   0.90 0.00
> 
> with a 'cat big_file > /dev/null'
> writes are still limited to 64k ...

I would hope that cat'ing a file to /dev/null wouldn't result in any writes.  
:-)
I assume you meant 'cat big_file > other_file' ?

if so, then the reason for the 64k writes would be this block of code in 
ffs_write():

if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
mutex_enter(vp->v_interlock);
error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
(uio->uio_offset >> 16) << 16,
PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
if (error)
break;
}


there's a similar block in many file systems at this point.  when I wrote that
I intended to replace it with something better before very long, but I just 
never
got back to it, alas.

-Chuck


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Manuel Bouyer
On Tue, Oct 09, 2012 at 05:59:06PM -0700, Chuck Silvers wrote:
> > [...]
> > with a 'cat big_file > /dev/null'
> > writes are still limited to 64k ...
> 
> I would hope that cat'ing a file to /dev/null wouldn't result in any writes.  
> :-)
> I assume you meant 'cat big_file > other_file' ?

I use: dd if=/dev/zero of=bigfile bs=1g count=7

> 
> if so, then the reason for the 64k writes would be this block of code in 
> ffs_write():
> 
>   if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
>   mutex_enter(vp->v_interlock);
>   error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
>   (uio->uio_offset >> 16) << 16,
>   PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
>   if (error)
>   break;
>   }
> 

that's it. I did s/16/32/g in the code above and now I get 128k writes.

> 
> there's a similar block in many file systems at this point.  when I wrote that
> I intended to replace it with something better before very long, but I just 
> never
> got back to it, alas.

I'm not sure what the best way to handle this would be.
If we assume that maxphys is a power of 2, we could use a maxphys-derived
mask here. Otherwise, maybe we should compute and cache the largest power-of-2
value below maxphys in v_mount, as is done in vfs_vnops.c:vn_ra_allocctx()
(actually, this would remove ra_iochunk as we could use the mount point's
value)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Thor Lancelot Simon
On Wed, Oct 10, 2012 at 11:34:48AM +0200, Manuel Bouyer wrote:
> 
> I'm not sure what the best way to handle this would be.
> If we assume that maxphys is a power of 2, we could use a maxphys-derived
> mask here. Otherwise, maybe we should compute and cache the largest power-of-2
> value below maxphys in v_mount, as is done in vfs_vnops.c:vn_ra_allocctx()
> (actually, this would remove ra_iochunk as we could use the mount point's
> value)

I originally did that.  There were some negative performance consequences.

With two of the SCSI (RAID, actually) controllers I have on hand, the DMA
descriptor format effectively imposes a restriction of 192K on transfer
size.  But these are exactly the sort of devices that really like to see
large writes.  The next lower power of two is 128k...

I think a lot of the clever math with shifts and masks should perhaps go
away.  Any efficiency improvements it yields are, I think, swamped by the
5,000 function calls we make to do a single I/O.

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Manuel Bouyer
On Wed, Oct 10, 2012 at 08:17:05AM -0400, Thor Lancelot Simon wrote:
> On Wed, Oct 10, 2012 at 11:34:48AM +0200, Manuel Bouyer wrote:
> > 
> > I'm not sure what the best way to handle this would be.
> > If we assume that maxphys is a power of 2, we could use a maxphys-derived
> > mask here. Otherwise, maybe we should compute and cache the largest 
> > power-of-2
> > value below maxphys in v_mount, as is done in vfs_vnops.c:vn_ra_allocctx()
> > (actually, this would remove ra_iochunk as we could use the mount point's
> > value)
> 
> I originally did that.  There were some negative performance consequences.
> 
> With two of the SCSI (RAID, actually) controllers I have on hand, the DMA
> descriptor format effectively imposes a restriction of 192K on transfer
> size.  But these are exactly the sort of devices that really like to see
> large writes.  The next lower power of two is 128k...
> 
> I think a lot of the clever math with shifts and masks should perhaps go
> away.  Any efficiency improvements it yields are, I think, swamped by the
> 5,000 function calls we make to do a single I/O.

I suspect UVM wants to do batch of read or writes in power of 2 size,
aligned on power of 2 boundaries. Or this is my feeling after playing
with the readahead code, and I wouldn't be surprised if the write
code had the same requirement.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Thor Lancelot Simon
On Wed, Oct 10, 2012 at 11:34:48AM +0200, Manuel Bouyer wrote:
> On Tue, Oct 09, 2012 at 05:59:06PM -0700, Chuck Silvers wrote:
> > > [...]
> > > with a 'cat big_file > /dev/null'
> > > writes are still limited to 64k ...
> > 
> > I would hope that cat'ing a file to /dev/null wouldn't result in any 
> > writes.  :-)
> > I assume you meant 'cat big_file > other_file' ?
> 
> I use: dd if=/dev/zero of=bigfile bs=1g count=7
> 
> > 
> > if so, then the reason for the 64k writes would be this block of code in 
> > ffs_write():
> > 
> > if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
> > mutex_enter(vp->v_interlock);
> > error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
> > (uio->uio_offset >> 16) << 16,
> > PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
> > if (error)
> > break;
> > }
> > 
> 
> that's it. I did s/16/32/g in the code above and now I get 128k writes.

32?  Not 17?

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Manuel Bouyer
On Wed, Oct 10, 2012 at 10:48:10AM -0400, Thor Lancelot Simon wrote:
> On Wed, Oct 10, 2012 at 11:34:48AM +0200, Manuel Bouyer wrote:
> > On Tue, Oct 09, 2012 at 05:59:06PM -0700, Chuck Silvers wrote:
> > > > [...]
> > > > with a 'cat big_file > /dev/null'
> > > > writes are still limited to 64k ...
> > > 
> > > I would hope that cat'ing a file to /dev/null wouldn't result in any 
> > > writes.  :-)
> > > I assume you meant 'cat big_file > other_file' ?
> > 
> > I use: dd if=/dev/zero of=bigfile bs=1g count=7
> > 
> > > 
> > > if so, then the reason for the 64k writes would be this block of code in 
> > > ffs_write():
> > > 
> > >   if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
> > >   mutex_enter(vp->v_interlock);
> > >   error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
> > >   (uio->uio_offset >> 16) << 16,
> > >   PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
> > >   if (error)
> > >   break;
> > >   }
> > > 
> > 
> > that's it. I did s/16/32/g in the code above and now I get 128k writes.
> 
> 32?  Not 17?

Yes, it was 17, of course.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Manuel Bouyer
On Wed, Oct 10, 2012 at 04:08:55PM +0200, Manuel Bouyer wrote:
> [...]
> I suspect UVM wants to do batch of read or writes in power of 2 size,
> aligned on power of 2 boundaries. Or this is my feeling after playing
> with the readahead code, and I wouldn't be surprised if the write
> code had the same requirement.

I tried replacing the code in ufs_readwrite():
if (!async && oldoff >> 17 != uio->uio_offset >> 17) {
mutex_enter(vp->v_interlock);
error = VOP_PUTPAGES(vp, (oldoff >> 17) << 17,
(uio->uio_offset >> 17) << 17,
PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
if (error)
break;
}
with:
int maxphys;
if (vp->v_mount && vp->v_mount->mnt_maxphys)
maxphys = vp->v_mount->mnt_maxphys;
else
maxphys = MAXPHYS;
if (!async && uio->uio_offset - oldoff >= maxphys) {
mutex_enter(vp->v_interlock);
error = VOP_PUTPAGES(vp, trunc_page(oldoff),
trunc_page(uio->uio_offset),
PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
if (error)
break;
}

but I only get 16k writes. So I suspect the power of 2 alignement is
important here.

Also, I did hit:
panic: kernel diagnostic assertion "npages <= (MAXPHYS >> PAGE_SHIFT)" failed: 
file "/dsk/l1/misc/bouyer/tls-maxphys/src/sys/uvm/uvm_pager.c", line 212 
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip 8025695d cs 8 rflags 246 cr2 80006c7c9ab4 
ilevel 0 rsp fe810b929f90
curlwp 0xfe810b94a780 pid 0 lid 93 lowest kstack 0xfe810b927000
Stopped in pid 0.93 (system) at netbsd:breakpoint+0x5:  leave
db{6}> tr
breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x1f2
kern_assert() at netbsd:kern_assert+0x48
uvm_pagermapin() at netbsd:uvm_pagermapin+0x207
genfs_gop_write() at netbsd:genfs_gop_write+0x2f
genfs_do_putpages() at netbsd:genfs_do_putpages+0xda6
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x3a
uvm_pageout() at netbsd:uvm_pageout+0x22a


I guess emergva-related code (allocation and use) should use MACHINE_MAXPHYS
instead of MAXPHYS

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Thor Lancelot Simon
On Tue, Oct 09, 2012 at 05:59:06PM -0700, Chuck Silvers wrote:
> 
> if so, then the reason for the 64k writes would be this block of code in 
> ffs_write():
> 
>   if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
>   mutex_enter(vp->v_interlock);
>   error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
>   (uio->uio_offset >> 16) << 16,
>   PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
>   if (error)
>   break;
>   }

Upon reflection, I do not understand how this works.

Consider a write() of 131072 starting at file offset 0.

oldoff >> 16 will be 0; uio->uio_offset will be 131072, unless
this is the source of my misunderstanding, after the call to ubc_uiomove().
Otherwise, I don't see how uio->uio_offset gets updated at all.

Now, we VOP_PUTPAGES(vp, 0, (131072 >> 16) << 16)) which is of course
VOP_PUTPAGES(vp, 0, 131072).  So why does this only push out 64K?

Thor


Re: CVS commit: [tls-maxphys] src/sys/dev

2012-10-10 Thread Manuel Bouyer
On Wed, Oct 10, 2012 at 11:48:37AM -0400, Thor Lancelot Simon wrote:
> On Tue, Oct 09, 2012 at 05:59:06PM -0700, Chuck Silvers wrote:
> > 
> > if so, then the reason for the 64k writes would be this block of code in 
> > ffs_write():
> > 
> > if (!async && oldoff >> 16 != uio->uio_offset >> 16) {
> > mutex_enter(vp->v_interlock);
> > error = VOP_PUTPAGES(vp, (oldoff >> 16) << 16,
> > (uio->uio_offset >> 16) << 16,
> > PGO_CLEANIT | PGO_JOURNALLOCKED | PGO_LAZY);
> > if (error)
> > break;
> > }
> 
> Upon reflection, I do not understand how this works.
> 
> Consider a write() of 131072 starting at file offset 0.
> 
> oldoff >> 16 will be 0; uio->uio_offset will be 131072, unless
> this is the source of my misunderstanding, after the call to ubc_uiomove().
> Otherwise, I don't see how uio->uio_offset gets updated at all.
> 
> Now, we VOP_PUTPAGES(vp, 0, (131072 >> 16) << 16)) which is of course
> VOP_PUTPAGES(vp, 0, 131072).  So why does this only push out 64K?

I don't think you can have oldoff at 0 and uio->uio_offset at 131072,
because the write is split in smaller chunks by the
while (uio->uio_resid > 0) { } loop. The chunk size will be:
bytelen = MIN(fs->fs_bsize - blkoffset, uio->uio_resid);

so we get data from userland in at most fs->fs_bsize chunks at a time,
and when we cross a 64k boundary we start a write.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--