Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-11-22 Thread Andriy Gapon
on 23/11/2010 08:14 Alexander Zagrebin said the following:
> It seems that this patch isn't merged into RELENG_8.
> Are there chances that it will be merged before 8.2-RELEASE?

Yes.  MFC timer is ticking.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-11-22 Thread Alexander Zagrebin
> -Original Message-
> From: Andriy Gapon [mailto:a...@freebsd.org] 
> Sent: Saturday, October 30, 2010 1:53 PM
> To: Artemiev Igor
> Cc: freebsd-stable@freebsd.org; freebsd...@freebsd.org; 
> Alexander Zagrebin
> Subject: Re: 8.1-STABLE: zfs and sendfile: problem still exists
> 
> 
> Heh, next try.
> 
> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ===
> --- 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (revision 214318)
> +++ 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (working copy)
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /*
>   * Programming rules.
> @@ -464,7 +465,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make 
> sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -474,9 +475,23 @@
>*/
>   KASSERT(off == 0,
>   ("unexpected offset in mappedread 
> for sendfile"));
> - if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
> + if (m != NULL && 
> vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
>   goto again;
> - vm_page_busy(m);
> + if (m == NULL) {
> + m = vm_page_alloc(obj, 
> OFF_TO_IDX(start),
> + VM_ALLOC_NOBUSY | VM_ALLOC_NORMAL);
> + if (m == NULL) {
> + VM_OBJECT_UNLOCK(obj);
> + VM_WAIT;
> + VM_OBJECT_LOCK(obj);
> + goto again;
> + }
> + } else {
> + vm_page_lock_queues();
> + vm_page_wire(m);
> + vm_page_unlock_queues();
> + }
> + vm_page_io_start(m);
>   VM_OBJECT_UNLOCK(obj);
>   if (dirbytes > 0) {
>   error = dmu_read_uio(os, zp->z_id, uio,
> @@ -494,7 +509,10 @@
>   VM_OBJECT_LOCK(obj);
>   if (error == 0)
>   m->valid = VM_PAGE_BITS_ALL;
> - vm_page_wakeup(m);
> + vm_page_io_finish(m);
> + vm_page_lock_queues();
> + vm_page_unwire(m, 0);
> + vm_page_unlock_queues();
>   if (error == 0) {
>   uio->uio_resid -= bytes;
>   uio->uio_offset += bytes;
> 

It seems that this patch isn't merged into RELENG_8.
Are there chances that it will be merged before 8.2-RELEASE?

-- 
Alexander Zagrebin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-11-01 Thread Willem Jan Withagen

On 2010-11-01 8:30, Andriy Gapon wrote:

First and foremost, the double-caching issue for ZFS+sendfile on FreeBSD is
still there and no resolution for this issue is on horizon.  So, you have to
account for the fact that twice as much memory is needed for this use-case.
Whether you plan your system, or configure it, or tune it.

Second, with recent head and stable/8 ARC should not be the primary victim of
memory pressure; ARC reclaim thread and the page daemon should cooperate in
freeing/recycling memory.

Nothing much to add.


Although this discussion started due to issues with serving files thru 
web-typish services, there are more apps that use sendfile.


For one, I noticed that I had once enabled sendfile in my Samba config.
As per this discussion I saw little advantage in keeping it that way..

But I'm open for other suggestions.

--WjW



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-11-01 Thread Andriy Gapon
on 31/10/2010 11:02 Alexander Zagrebin said the following:
> I have a question.
> When we transfer a file via sendfile, then current code allocates
> a memory, marked inactive. For example, if the file has size 100 MB,
> then 100 MB of memory will be allocated.
> If we have to transfer this file again later, then this memory will used
> as cache, and no disk io will be required.
> The memory will be freed if file will be deleted or operating system
> will need an additional memory. 
> I have correctly understood?
> If it so, the i continue...
> Such behaviour is good if we have files with relatively small size.
> Suppose we have to transfer file with large size (for example, greater 
> than amount of physical memory).
> While transfering, the inactive memory will grow, pressing the ARC.
> When size of the ARC will fall to its minimum (vfs.zfs.arc_min), then
> inactive memory will be reused.
> So, when transfer is complete, we have:
> 1. No free memory
> 2. Size of the ARC has minimal size (it is bad)
> 3. Inactive memory contains the _tail_ of the file only (it is bad too)
> Now if we have to transfer this file again, then
> 1. there is no (or few) file's data in ARC (ARC too small)
> 2. The inactive memory doesn't contain a head part of the file
> So the file's data will read from a disk again and again...
> Also i've noticed that inactive memory frees relatively slowly,
> so if there is a frequent access to large files, then system will run
> at very unoptimal conditions.
> It's imho...
> Can you comment this?
> 

First and foremost, the double-caching issue for ZFS+sendfile on FreeBSD is
still there and no resolution for this issue is on horizon.  So, you have to
account for the fact that twice as much memory is needed for this use-case.
Whether you plan your system, or configure it, or tune it.

Second, with recent head and stable/8 ARC should not be the primary victim of
memory pressure; ARC reclaim thread and the page daemon should cooperate in
freeing/recycling memory.

Nothing much to add.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-31 Thread Ronald Klop
On Sun, 31 Oct 2010 10:02:44 +0100, Alexander Zagrebin   
wrote:



>> I apologize for my haste, it should have been VM_ALLOC_WIRED.
>
> Ok, applied and tested under some load(~1200 active
connections, outgoing
> ~80MB/s). Patch work as expected and i has noted no side
effects.  Just one
> question - should grow Active memory counter, if some pages
is "hot"(during
> multiple sendfile on one file)?

Pages used by sendfile are marked as Inactive for faster
reclamation on demand.


I have a question.
When we transfer a file via sendfile, then current code allocates
a memory, marked inactive. For example, if the file has size 100 MB,
then 100 MB of memory will be allocated.
If we have to transfer this file again later, then this memory will used
as cache, and no disk io will be required.
The memory will be freed if file will be deleted or operating system
will need an additional memory.
I have correctly understood?
If it so, the i continue...
Such behaviour is good if we have files with relatively small size.
Suppose we have to transfer file with large size (for example, greater
than amount of physical memory).
While transfering, the inactive memory will grow, pressing the ARC.
When size of the ARC will fall to its minimum (vfs.zfs.arc_min), then
inactive memory will be reused.
So, when transfer is complete, we have:
1. No free memory
2. Size of the ARC has minimal size (it is bad)
3. Inactive memory contains the _tail_ of the file only (it is bad too)
Now if we have to transfer this file again, then
1. there is no (or few) file's data in ARC (ARC too small)
2. The inactive memory doesn't contain a head part of the file
So the file's data will read from a disk again and again...
Also i've noticed that inactive memory frees relatively slowly,
so if there is a frequent access to large files, then system will run
at very unoptimal conditions.
It's imho...
Can you comment this?



Add more RAM?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-31 Thread Alexander Zagrebin
> >> I apologize for my haste, it should have been VM_ALLOC_WIRED.
> > 
> > Ok, applied and tested under some load(~1200 active 
> connections, outgoing
> > ~80MB/s). Patch work as expected and i has noted no side 
> effects.  Just one
> > question - should grow Active memory counter, if some pages 
> is "hot"(during
> > multiple sendfile on one file)?
> 
> Pages used by sendfile are marked as Inactive for faster 
> reclamation on demand.

I have a question.
When we transfer a file via sendfile, then current code allocates
a memory, marked inactive. For example, if the file has size 100 MB,
then 100 MB of memory will be allocated.
If we have to transfer this file again later, then this memory will used
as cache, and no disk io will be required.
The memory will be freed if file will be deleted or operating system
will need an additional memory. 
I have correctly understood?
If it so, the i continue...
Such behaviour is good if we have files with relatively small size.
Suppose we have to transfer file with large size (for example, greater 
than amount of physical memory).
While transfering, the inactive memory will grow, pressing the ARC.
When size of the ARC will fall to its minimum (vfs.zfs.arc_min), then
inactive memory will be reused.
So, when transfer is complete, we have:
1. No free memory
2. Size of the ARC has minimal size (it is bad)
3. Inactive memory contains the _tail_ of the file only (it is bad too)
Now if we have to transfer this file again, then
1. there is no (or few) file's data in ARC (ARC too small)
2. The inactive memory doesn't contain a head part of the file
So the file's data will read from a disk again and again...
Also i've noticed that inactive memory frees relatively slowly,
so if there is a frequent access to large files, then system will run
at very unoptimal conditions.
It's imho...
Can you comment this?

-- 
Alexander Zagrebin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 30/10/2010 22:01 Artemiev Igor said the following:
> On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote:
> 
>> I apologize for my haste, it should have been VM_ALLOC_WIRED.
> 
> Ok, applied and tested under some load(~1200 active connections, outgoing
> ~80MB/s). Patch work as expected and i has noted no side effects.  Just one
> question - should grow Active memory counter, if some pages is "hot"(during
> multiple sendfile on one file)?

Pages used by sendfile are marked as Inactive for faster reclamation on demand.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 31/10/2010 02:37 Kostik Belousov said the following:
> On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote:
>> on 30/10/2010 14:25 Artemiev Igor said the following:
>>> On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote:
 on 30/10/2010 13:12 Artemiev Igor said the following:
> On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
>
>> Heh, next try.
>
> Got a panic, "vm_page_unwire: invalid wire count: 0"

 Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
 vm_page_alloc):
>>>
>>> Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i 
>>> slightly modified your patch:
>>
>> I apologize for my haste, it should have been VM_ALLOC_WIRED.
>> Here is a corrected patch:
>> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
>> ===
>> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c   
>> (revision 214318)
>> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c   
>> (working copy)
>> @@ -67,6 +67,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>
>>  /*
>>   * Programming rules.
>> @@ -464,7 +465,7 @@
>>  uiomove_fromphys(&m, off, bytes, uio);
>>  VM_OBJECT_LOCK(obj);
>>  vm_page_wakeup(m);
>> -} else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
>> +} else if (uio->uio_segflg == UIO_NOCOPY) {
>>  /*
>>   * The code below is here to make sendfile(2) work
>>   * correctly with ZFS. As pointed out by ups@
>> @@ -474,9 +475,23 @@
>>   */
>>  KASSERT(off == 0,
>>  ("unexpected offset in mappedread for sendfile"));
>> -if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
>> +if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
>> "zfsmrb"))
>>  goto again;
>> -vm_page_busy(m);
>> +if (m == NULL) {
>> +m = vm_page_alloc(obj, OFF_TO_IDX(start),
>> +VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | 
>> VM_ALLOC_NORMAL);
>> +if (m == NULL) {
>> +VM_OBJECT_UNLOCK(obj);
>> +VM_WAIT;
>> +VM_OBJECT_LOCK(obj);
>> +goto again;
>> +}
>> +} else {
>> +vm_page_lock_queues();
>> +vm_page_wire(m);
>> +vm_page_unlock_queues();
>> +}
>> +vm_page_io_start(m);
> Why wiring the page if it is busied ?

Eh?  Because it is not?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Kostik Belousov
On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote:
> on 30/10/2010 14:25 Artemiev Igor said the following:
> > On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote:
> >> on 30/10/2010 13:12 Artemiev Igor said the following:
> >>> On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
> >>>
>  Heh, next try.
> >>>
> >>> Got a panic, "vm_page_unwire: invalid wire count: 0"
> >>
> >> Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
> >> vm_page_alloc):
> > 
> > Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i 
> > slightly modified your patch:
> 
> I apologize for my haste, it should have been VM_ALLOC_WIRED.
> Here is a corrected patch:
> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ===
> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (revision 214318)
> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (working copy)
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /*
>   * Programming rules.
> @@ -464,7 +465,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -474,9 +475,23 @@
>*/
>   KASSERT(off == 0,
>   ("unexpected offset in mappedread for sendfile"));
> - if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
> + if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
> "zfsmrb"))
>   goto again;
> - vm_page_busy(m);
> + if (m == NULL) {
> + m = vm_page_alloc(obj, OFF_TO_IDX(start),
> + VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | 
> VM_ALLOC_NORMAL);
> + if (m == NULL) {
> + VM_OBJECT_UNLOCK(obj);
> + VM_WAIT;
> + VM_OBJECT_LOCK(obj);
> + goto again;
> + }
> + } else {
> + vm_page_lock_queues();
> + vm_page_wire(m);
> + vm_page_unlock_queues();
> + }
> + vm_page_io_start(m);
Why wiring the page if it is busied ?


pgp8p8bSN9Uij.pgp
Description: PGP signature


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Artemiev Igor
On Sat, Oct 30, 2010 at 05:43:54PM +0300, Andriy Gapon wrote:

> I apologize for my haste, it should have been VM_ALLOC_WIRED.

Ok, applied and tested under some load(~1200 active connections, outgoing
~80MB/s). Patch work as expected and i has noted no side effects.  Just one
question - should grow Active memory counter, if some pages is "hot"(during
multiple sendfile on one file)?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Alexander Zagrebin
> >> Oh, thank you for testing - forgot another piece 
> (VM_ALLOC_WIRE for vm_page_alloc):
> > 
> > Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, 
> therefore i slightly modified your patch:
> 
> I apologize for my haste, it should have been VM_ALLOC_WIRED.
> Here is a corrected patch:
> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ===
> --- 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (revision 214318)
> +++ 
> sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (working copy)
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /*
>   * Programming rules.
> @@ -464,7 +465,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make 
> sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -474,9 +475,23 @@
>*/
>   KASSERT(off == 0,
>   ("unexpected offset in mappedread 
> for sendfile"));
> - if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
> + if (m != NULL && 
> vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
>   goto again;
> - vm_page_busy(m);
> + if (m == NULL) {
> + m = vm_page_alloc(obj, 
> OFF_TO_IDX(start),
> + VM_ALLOC_NOBUSY | 
> VM_ALLOC_WIRED | VM_ALLOC_NORMAL);
> + if (m == NULL) {
> + VM_OBJECT_UNLOCK(obj);
> + VM_WAIT;
> + VM_OBJECT_LOCK(obj);
> + goto again;
> + }
> + } else {
> + vm_page_lock_queues();
> + vm_page_wire(m);
> + vm_page_unlock_queues();
> + }
> + vm_page_io_start(m);
>   VM_OBJECT_UNLOCK(obj);
>   if (dirbytes > 0) {
>   error = dmu_read_uio(os, zp->z_id, uio,
> @@ -494,7 +509,10 @@
>   VM_OBJECT_LOCK(obj);
>   if (error == 0)
>   m->valid = VM_PAGE_BITS_ALL;
> - vm_page_wakeup(m);
> + vm_page_io_finish(m);
> + vm_page_lock_queues();
> + vm_page_unwire(m, 0);
> + vm_page_unlock_queues();
>   if (error == 0) {
>   uio->uio_resid -= bytes;
>   uio->uio_offset += bytes;
> 

Big thanks to Andriy, Igor and all who has paid attention to this problem.
I've tried this patch on the test system running under VirtualBox,
and it seems that it solves the problem.
I'll try to test this patch in real conditions today.

-- 
Alexander Zagrebin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 30/10/2010 14:25 Artemiev Igor said the following:
> On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote:
>> on 30/10/2010 13:12 Artemiev Igor said the following:
>>> On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
>>>
 Heh, next try.
>>>
>>> Got a panic, "vm_page_unwire: invalid wire count: 0"
>>
>> Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
>> vm_page_alloc):
> 
> Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i slightly 
> modified your patch:

I apologize for my haste, it should have been VM_ALLOC_WIRED.
Here is a corrected patch:
Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (revision 
214318)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (working copy)
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -474,9 +475,23 @@
 */
KASSERT(off == 0,
("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
+   if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
"zfsmrb"))
goto again;
-   vm_page_busy(m);
+   if (m == NULL) {
+   m = vm_page_alloc(obj, OFF_TO_IDX(start),
+   VM_ALLOC_NOBUSY | VM_ALLOC_WIRED | 
VM_ALLOC_NORMAL);
+   if (m == NULL) {
+   VM_OBJECT_UNLOCK(obj);
+   VM_WAIT;
+   VM_OBJECT_LOCK(obj);
+   goto again;
+   }
+   } else {
+   vm_page_lock_queues();
+   vm_page_wire(m);
+   vm_page_unlock_queues();
+   }
+   vm_page_io_start(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {
error = dmu_read_uio(os, zp->z_id, uio,
@@ -494,7 +509,10 @@
VM_OBJECT_LOCK(obj);
if (error == 0)
m->valid = VM_PAGE_BITS_ALL;
-   vm_page_wakeup(m);
+   vm_page_io_finish(m);
+   vm_page_lock_queues();
+   vm_page_unwire(m, 0);
+   vm_page_unlock_queues();
if (error == 0) {
uio->uio_resid -= bytes;
uio->uio_offset += bytes;

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Artemiev Igor
On Sat, Oct 30, 2010 at 01:33:00PM +0300, Andriy Gapon wrote:
> on 30/10/2010 13:12 Artemiev Igor said the following:
> > On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
> > 
> >> Heh, next try.
> > 
> > Got a panic, "vm_page_unwire: invalid wire count: 0"
> 
> Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
> vm_page_alloc):

Yep, it work. But VM_ALLOC_WIRE not exists in RELENG_8, therefore i slightly 
modified your patch:

--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig 
2010-10-30 11:56:41.621138440 +0200
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  2010-10-30 
12:49:32.858692096 +0200
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -474,9 +475,23 @@
 */
KASSERT(off == 0,
("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
+if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
"zfsmrb"))
goto again;
-   vm_page_busy(m);
+if (m == NULL) {
+m = vm_page_alloc(obj, OFF_TO_IDX(start),
+VM_ALLOC_NOBUSY | VM_ALLOC_NORMAL);
+if (m == NULL) {
+VM_OBJECT_UNLOCK(obj);
+VM_WAIT;
+VM_OBJECT_LOCK(obj);
+goto again;
+}
+   }
+vm_page_lock_queues();
+vm_page_wire(m);
+vm_page_unlock_queues();
+
+vm_page_io_start(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {
error = dmu_read_uio(os, zp->z_id, uio,
@@ -494,6 +509,10 @@
VM_OBJECT_LOCK(obj);
if (error == 0)
m->valid = VM_PAGE_BITS_ALL;
+vm_page_io_finish(m);
+vm_page_lock_queues();
+vm_page_unwire(m, 0);
+vm_page_unlock_queues();
vm_page_wakeup(m);
if (error == 0) {
uio->uio_resid -= bytes;

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 30/10/2010 13:12 Artemiev Igor said the following:
> On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:
> 
>> Heh, next try.
> 
> Got a panic, "vm_page_unwire: invalid wire count: 0"

Oh, thank you for testing - forgot another piece (VM_ALLOC_WIRE for 
vm_page_alloc):

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (revision 
214318)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (working copy)
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -474,9 +475,23 @@
 */
KASSERT(off == 0,
("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
+   if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
"zfsmrb"))
goto again;
-   vm_page_busy(m);
+   if (m == NULL) {
+   m = vm_page_alloc(obj, OFF_TO_IDX(start),
+   VM_ALLOC_NOBUSY | VM_ALLOC_WIRE | 
VM_ALLOC_NORMAL);
+   if (m == NULL) {
+   VM_OBJECT_UNLOCK(obj);
+   VM_WAIT;
+   VM_OBJECT_LOCK(obj);
+   goto again;
+   }
+   } else {
+   vm_page_lock_queues();
+   vm_page_wire(m);
+   vm_page_unlock_queues();
+   }
+   vm_page_io_start(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {
error = dmu_read_uio(os, zp->z_id, uio,
@@ -494,7 +509,10 @@
VM_OBJECT_LOCK(obj);
if (error == 0)
m->valid = VM_PAGE_BITS_ALL;
-   vm_page_wakeup(m);
+   vm_page_io_finish(m);
+   vm_page_lock_queues();
+   vm_page_unwire(m, 0);
+   vm_page_unlock_queues();
if (error == 0) {
uio->uio_resid -= bytes;
uio->uio_offset += bytes;


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Artemiev Igor
On Sat, Oct 30, 2010 at 12:52:54PM +0300, Andriy Gapon wrote:

> Heh, next try.

Got a panic, "vm_page_unwire: invalid wire count: 0"
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Artemiev Igor
On Sat, Oct 30, 2010 at 11:25:05AM +0300, Andriy Gapon wrote:
> > Note: I have only compile tested the patch.
> 
> Missed one NULL check.
> 
> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ===
> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (revision 214318)
> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (working copy)
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /*
>   * Programming rules.
> @@ -464,7 +465,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -474,8 +475,18 @@
>*/
>   KASSERT(off == 0,
>   ("unexpected offset in mappedread for sendfile"));
> - if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
> + if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
> "zfsmrb"))
>   goto again;
> + if (m == NULL) {
> + m = vm_page_alloc(obj, OFF_TO_IDX(start),
> + VM_ALLOC_NOBUSY | VM_ALLOC_SYSTEM);
> + if (m == NULL) {
> + VM_OBJECT_UNLOCK(obj);
> + VM_WAIT;
> + VM_OBJECT_LOCK(obj);
> + goto again;
> + }
> + }
>   vm_page_busy(m);
>   VM_OBJECT_UNLOCK(obj);
>   if (dirbytes > 0) {

Ok, i tested this patch. It worked :) freebsd_zfs_read now calls
(file_size/MAXBSIZE) times.  Thanks!
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon

Heh, next try.

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (revision 
214318)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (working copy)
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -474,9 +475,23 @@
 */
KASSERT(off == 0,
("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
+   if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
"zfsmrb"))
goto again;
-   vm_page_busy(m);
+   if (m == NULL) {
+   m = vm_page_alloc(obj, OFF_TO_IDX(start),
+   VM_ALLOC_NOBUSY | VM_ALLOC_NORMAL);
+   if (m == NULL) {
+   VM_OBJECT_UNLOCK(obj);
+   VM_WAIT;
+   VM_OBJECT_LOCK(obj);
+   goto again;
+   }
+   } else {
+   vm_page_lock_queues();
+   vm_page_wire(m);
+   vm_page_unlock_queues();
+   }
+   vm_page_io_start(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {
error = dmu_read_uio(os, zp->z_id, uio,
@@ -494,7 +509,10 @@
VM_OBJECT_LOCK(obj);
if (error == 0)
m->valid = VM_PAGE_BITS_ALL;
-   vm_page_wakeup(m);
+   vm_page_io_finish(m);
+   vm_page_lock_queues();
+   vm_page_unwire(m, 0);
+   vm_page_unlock_queues();
if (error == 0) {
uio->uio_resid -= bytes;
uio->uio_offset += bytes;

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 30/10/2010 11:16 Andriy Gapon said the following:
> on 30/10/2010 11:16 Andriy Gapon said the following:
>> Or maybe something like the following?
>> It looks a little bit cleaner to me, but still is not perfect, as I have not
>> handled unnecessary busy-ing of the pages where something more lightweight 
>> could
>> have sufficed (e.g. wiring and shared busying).
> 
> Note: I have only compile tested the patch.

Missed one NULL check.

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (revision 
214318)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (working copy)
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -474,8 +475,18 @@
 */
KASSERT(off == 0,
("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
+   if (m != NULL && vm_page_sleep_if_busy(m, FALSE, 
"zfsmrb"))
goto again;
+   if (m == NULL) {
+   m = vm_page_alloc(obj, OFF_TO_IDX(start),
+   VM_ALLOC_NOBUSY | VM_ALLOC_SYSTEM);
+   if (m == NULL) {
+   VM_OBJECT_UNLOCK(obj);
+   VM_WAIT;
+   VM_OBJECT_LOCK(obj);
+   goto again;
+   }
+   }
vm_page_busy(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 30/10/2010 11:16 Andriy Gapon said the following:
> Or maybe something like the following?
> It looks a little bit cleaner to me, but still is not perfect, as I have not
> handled unnecessary busy-ing of the pages where something more lightweight 
> could
> have sufficed (e.g. wiring and shared busying).

Note: I have only compile tested the patch.

> Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> ===
> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (revision 214318)
> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> (working copy)
> @@ -67,6 +67,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
> 
>  /*
>   * Programming rules.
> @@ -464,7 +465,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -476,6 +477,16 @@
>   ("unexpected offset in mappedread for sendfile"));
>   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
>   goto again;
> + if (m == NULL) {
> + m = vm_page_alloc(obj, OFF_TO_IDX(start),
> + VM_ALLOC_NOBUSY | VM_ALLOC_SYSTEM);
> + if (m == NULL) {
> + VM_OBJECT_UNLOCK(obj);
> + VM_WAIT;
> + VM_OBJECT_LOCK(obj);
> + goto again;
> + }
> + }
>   vm_page_busy(m);
>   VM_OBJECT_UNLOCK(obj);
>   if (dirbytes > 0) {
> 
> 


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 29/10/2010 20:51 Artemiev Igor said the following:
> On Fri, Oct 29, 2010 at 07:06:03PM +0300, Andriy Gapon wrote:
>> Probably yes, but have to be careful there.
>> First, do vm_page_grab only for UIO_NOCOPY case.
>> Second, the first page is already "shared busy" after vm_page_io_start() 
>> call in
>> kern_sendfile; so you might need VM_ALLOC_IGN_SBUSY for that page to avoid a 
>> deadlock.
> 
> RELENG_8 doesn`t have VM_ALLOC_IGN_SBUSY, it appeared only in HEAD.
> Can you review this patch, Whether correctly I have understood? (didnt test 
> it yet) 
> 
> --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig   
> 2010-10-29 18:18:23.921078337 +0200
> +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
> 2010-10-29 19:23:48.142513084 +0200
> @@ -449,7 +449,7 @@
>   int bytes = MIN(PAGESIZE - off, len);
>  
>  again:
> - if ((m = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL &&
> + if (uio->uio_segflg != UIO_NOCOPY && (m = vm_page_lookup(obj, 
> OFF_TO_IDX(start))) != NULL &&
>   vm_page_is_valid(m, off, bytes)) {
>   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
>   goto again;
> @@ -464,7 +464,7 @@
>   uiomove_fromphys(&m, off, bytes, uio);
>   VM_OBJECT_LOCK(obj);
>   vm_page_wakeup(m);
> - } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
> + } else if (uio->uio_segflg == UIO_NOCOPY) {
>   /*
>* The code below is here to make sendfile(2) work
>* correctly with ZFS. As pointed out by ups@
> @@ -472,11 +472,9 @@
>* but it pessimize performance of sendfile/UFS, that's
>* why I handle this special case in ZFS code.
>*/
> - KASSERT(off == 0,
> - ("unexpected offset in mappedread for sendfile"));
> - if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
> - goto again;
> - vm_page_busy(m);
> + if((m = vm_page_lookup(obj, OFF_TO_IDX(start))) == NULL 
> || !vm_page_is_valid(m, off, bytes)) 
> + m = vm_page_grab(obj, OFF_TO_IDX(start), 
> VM_ALLOC_NORMAL|VM_ALLOC_RETRY);
> +
>   VM_OBJECT_UNLOCK(obj);
>   if (dirbytes > 0) {
>   error = dmu_read_uio(os, zp->z_id, uio,

Or maybe something like the following?
It looks a little bit cleaner to me, but still is not perfect, as I have not
handled unnecessary busy-ing of the pages where something more lightweight could
have sufficed (e.g. wiring and shared busying).

Index: sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
===
--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (revision 
214318)
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  (working copy)
@@ -67,6 +67,7 @@
 #include 
 #include 
 #include 
+#include 

 /*
  * Programming rules.
@@ -464,7 +465,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -476,6 +477,16 @@
("unexpected offset in mappedread for sendfile"));
if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
goto again;
+   if (m == NULL) {
+   m = vm_page_alloc(obj, OFF_TO_IDX(start),
+   VM_ALLOC_NOBUSY | VM_ALLOC_SYSTEM);
+   if (m == NULL) {
+   VM_OBJECT_UNLOCK(obj);
+   VM_WAIT;
+   VM_OBJECT_LOCK(obj);
+   goto again;
+   }
+   }
vm_page_busy(m);
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {


-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-30 Thread Andriy Gapon
on 29/10/2010 17:41 Andriy Gapon said the following:
> on 29/10/2010 15:36 Andriy Gapon said the following:
>> on 29/10/2010 12:04 Artemiev Igor said the following:
>>> Yep, this problem exists. You may workaround it via bumping up
>>> net.inet.tcp.sendspace up to 128k.  zfs sendfile is very ineffective. I have
>>> made a small investigation via DTrace, it reads MAXBSIZE chunks, but map in 
>>> vm
>>> only one page (4K).  I.e. if you have a file with size 512K, sendfile make
>>> calls freebsd_zfs_read 128 times.
>>
>> What svn revision of FreeBSD source tree did you test?
>>
> 
> Ah, I think I see what's going on.
> Either sendfile should (have an option to) use VOP_GETPAGES to request data 
> or ZFS
> mappedread should use vm_grab_page instead of vm_lookup_page for UIO_NOCOPY 
> case.
> Currently ZFS would read a whole FS block into ARC, but populate only one page
> with data and for the rest it would just wastefully do uiomove(UIO_NOCOPY) 
> from
> ARC data.
> So, e.g. zpool iostat would show that there are only few actual reads from a 
> pool.
>  The rest of the time must be spent churning over the data already in ARC and
> doing page-per-VOP_READ copies from it.

Hmm, I investigated the issue some more and now I wouldn't put all the blame on
ZFS.  Indeed, perhaps ZFS is very inefficient here, perhaps it does extra
looping and extra copying.  However those operations should not lead to such a
significant slowdown, but mostly to an increased CPU usage.

So, it looks that sendfile spends most of the time in sbwait().
Of course, "erratic" behavior of ZFS does contribute to that.
It's this code in kern_sendfile that gets triggered by ZFS:
if (pg->valid && vm_page_is_valid(pg, pgoff, xfsize))
VM_OBJECT_UNLOCK(obj);
else if (m != NULL)
error = EAGAIN; /* send what we already got */
else ...

Essentially, data is not only read from ZFS page by page, but it is also mostly
sent with page-sized chunk at a time.

P.S. just stating the obvious, kind of :-)
-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov
On Fri, Oct 29, 2010 at 06:22:54PM +0300, Andriy Gapon wrote:
> on 29/10/2010 18:17 Kostik Belousov said the following:
> > On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote:
> >> on 29/10/2010 17:53 Kostik Belousov said the following:
> >>> Could it be the priming of the vm object pages content ?
> >>
> >> Sorry, not familiar with this term.
> >> Do you mean prepopulation of vm object with valid pages?
> >>
> >>> Due to double-buffering, and (possibly false) optimization to only
> >>
> >> What optimization?
> > On zfs vnode read, the page from the corresponding vm object is only
> > populated with the vnode data if the page already exists in the
> > object.
> 
> Do you mean a specific type of read?
> For "normal" reads it's the other way around - if the page already exists and 
> is
> valid, then we read from the page, not from ARC.
Let me repeat it once more:
zfs does not properly caches the vnode data content in the page cache
(the cache is used in a weaker sence, not meaning the freebsd 'cached'
memory, but a cache in more common sence). Not doing the optimization
I mentioned would mean always allocating the pages and making it
(partially) valid for each read call.
> 
> > Not doing the optimization would be to allocate the page uncoditionally
> > on the read if not already present, and copy the data from ARC to the page.
> >>
> >>> perform double-buffering when vm object already has some data cached,
> >>> reads can prime vm object page list before file is mmapped or
> >>> sendfile-ed.
> >>>
> >>
> >> No double-buffering is done to optimize anything. Double-buffering
> >> is a consequence of having page cache and ARC. The special
> >> "double-buffering code" is to just handle that fact - e.g. making
> >> sure that VOP_READ reads data from page cache instead of ARC if it's
> >> possible that the data in them differs (i.e. page cache has more
> >> recent data).
> >>
> >> So, if I understood the term 'priming' correctly, no priming should
> >> ever occur.
> > The priming is done on the first call to VOP_READ() with the right
> > offset after the page is allocated.
> 
> Again, what is priming?
Filling the cache with an appropriate content.


pgpc8DbIfno18.pgp
Description: PGP signature


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Artemiev Igor
On Fri, Oct 29, 2010 at 07:06:03PM +0300, Andriy Gapon wrote:
> Probably yes, but have to be careful there.
> First, do vm_page_grab only for UIO_NOCOPY case.
> Second, the first page is already "shared busy" after vm_page_io_start() call 
> in
> kern_sendfile; so you might need VM_ALLOC_IGN_SBUSY for that page to avoid a 
> deadlock.

RELENG_8 doesn`t have VM_ALLOC_IGN_SBUSY, it appeared only in HEAD.
Can you review this patch, Whether correctly I have understood? (didnt test it 
yet) 

--- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig 
2010-10-29 18:18:23.921078337 +0200
+++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c  2010-10-29 
19:23:48.142513084 +0200
@@ -449,7 +449,7 @@
int bytes = MIN(PAGESIZE - off, len);
 
 again:
-   if ((m = vm_page_lookup(obj, OFF_TO_IDX(start))) != NULL &&
+   if (uio->uio_segflg != UIO_NOCOPY && (m = vm_page_lookup(obj, 
OFF_TO_IDX(start))) != NULL &&
vm_page_is_valid(m, off, bytes)) {
if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
goto again;
@@ -464,7 +464,7 @@
uiomove_fromphys(&m, off, bytes, uio);
VM_OBJECT_LOCK(obj);
vm_page_wakeup(m);
-   } else if (m != NULL && uio->uio_segflg == UIO_NOCOPY) {
+   } else if (uio->uio_segflg == UIO_NOCOPY) {
/*
 * The code below is here to make sendfile(2) work
 * correctly with ZFS. As pointed out by ups@
@@ -472,11 +472,9 @@
 * but it pessimize performance of sendfile/UFS, that's
 * why I handle this special case in ZFS code.
 */
-   KASSERT(off == 0,
-   ("unexpected offset in mappedread for sendfile"));
-   if (vm_page_sleep_if_busy(m, FALSE, "zfsmrb"))
-   goto again;
-   vm_page_busy(m);
+   if((m = vm_page_lookup(obj, OFF_TO_IDX(start))) == NULL 
|| !vm_page_is_valid(m, off, bytes)) 
+   m = vm_page_grab(obj, OFF_TO_IDX(start), 
VM_ALLOC_NORMAL|VM_ALLOC_RETRY);
+
VM_OBJECT_UNLOCK(obj);
if (dirbytes > 0) {
error = dmu_read_uio(os, zp->z_id, uio,
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Alexander Zagrebin
> > Can you reproduce the problem on your system?
> 
> I can't reproduce it on mine.  Note the resilvering was induced from
> some unrelated disk swaps/tests I was performing, and ftpd is already
> enabled via inetd on this system.
> 
> What ZFS tunings have you applied to your system?  Can you provide
> output from "sysctl -a kstat.zfs.misc.arcstats" before and after a
> transfer which exhibits the initial slowdown?

It's amd64 Intel Atom based system with 2G RAM.
/boot/loader.conf contains nothing special:

vm.kmem_size="1536M"
vfs.zfs.prefetch_disable="1"


$ dd if=/dev/random of=test bs=1m count=50; sysctl -a
kstat.zfs.misc.arcstats; fetch -o /dev/null http://localhost/test; sysctl -a
kstat.zfs.misc.arcstats
50+0 records in
50+0 records out
52428800 bytes transferred in 2.956783 secs (17731705 bytes/sec)
kstat.zfs.misc.arcstats.hits: 10889409
kstat.zfs.misc.arcstats.misses: 2482562
kstat.zfs.misc.arcstats.demand_data_hits: 7920924
kstat.zfs.misc.arcstats.demand_data_misses: 1587278
kstat.zfs.misc.arcstats.demand_metadata_hits: 2968455
kstat.zfs.misc.arcstats.demand_metadata_misses: 895284
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 30
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 5596211
kstat.zfs.misc.arcstats.mru_ghost_hits: 199040
kstat.zfs.misc.arcstats.mfu_hits: 5293198
kstat.zfs.misc.arcstats.mfu_ghost_hits: 481006
kstat.zfs.misc.arcstats.allocated: 2985083
kstat.zfs.misc.arcstats.deleted: 1901535
kstat.zfs.misc.arcstats.stolen: 1269643
kstat.zfs.misc.arcstats.recycle_miss: 464100
kstat.zfs.misc.arcstats.mutex_miss: 658
kstat.zfs.misc.arcstats.evict_skip: 148879
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 150609301504
kstat.zfs.misc.arcstats.evict_l2_ineligible: 36864
kstat.zfs.misc.arcstats.hash_elements: 91782
kstat.zfs.misc.arcstats.hash_elements_max: 168546
kstat.zfs.misc.arcstats.hash_collisions: 2058158
kstat.zfs.misc.arcstats.hash_chains: 23888
kstat.zfs.misc.arcstats.hash_chain_max: 18
kstat.zfs.misc.arcstats.p: 807441359
kstat.zfs.misc.arcstats.c: 1006632960
kstat.zfs.misc.arcstats.c_min: 125829120
kstat.zfs.misc.arcstats.c_max: 1006632960
kstat.zfs.misc.arcstats.size: 1006690472
kstat.zfs.misc.arcstats.hdr_size: 20252216
kstat.zfs.misc.arcstats.data_size: 917198336
kstat.zfs.misc.arcstats.other_size: 69239920
kstat.zfs.misc.arcstats.l2_hits: 0
kstat.zfs.misc.arcstats.l2_misses: 0
kstat.zfs.misc.arcstats.l2_feeds: 0
kstat.zfs.misc.arcstats.l2_rw_clash: 0
kstat.zfs.misc.arcstats.l2_read_bytes: 0
kstat.zfs.misc.arcstats.l2_write_bytes: 0
kstat.zfs.misc.arcstats.l2_writes_sent: 0
kstat.zfs.misc.arcstats.l2_writes_done: 0
kstat.zfs.misc.arcstats.l2_writes_error: 0
kstat.zfs.misc.arcstats.l2_writes_hdr_miss: 0
kstat.zfs.misc.arcstats.l2_evict_lock_retry: 0
kstat.zfs.misc.arcstats.l2_evict_reading: 0
kstat.zfs.misc.arcstats.l2_free_on_write: 0
kstat.zfs.misc.arcstats.l2_abort_lowmem: 0
kstat.zfs.misc.arcstats.l2_cksum_bad: 0
kstat.zfs.misc.arcstats.l2_io_error: 0
kstat.zfs.misc.arcstats.l2_size: 0
kstat.zfs.misc.arcstats.l2_hdr_size: 0
kstat.zfs.misc.arcstats.memory_throttle_count: 9
kstat.zfs.misc.arcstats.l2_write_trylock_fail: 0
kstat.zfs.misc.arcstats.l2_write_passed_headroom: 0
kstat.zfs.misc.arcstats.l2_write_spa_mismatch: 0
kstat.zfs.misc.arcstats.l2_write_in_l2: 0
kstat.zfs.misc.arcstats.l2_write_io_in_progress: 0
kstat.zfs.misc.arcstats.l2_write_not_cacheable: 30
kstat.zfs.misc.arcstats.l2_write_full: 0
kstat.zfs.misc.arcstats.l2_write_buffer_iter: 0
kstat.zfs.misc.arcstats.l2_write_pios: 0
kstat.zfs.misc.arcstats.l2_write_buffer_bytes_scanned: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_iter: 0
kstat.zfs.misc.arcstats.l2_write_buffer_list_null_iter: 0
/dev/null 100% of   50 MB  119 kBps
00m00s
kstat.zfs.misc.arcstats.hits: 10928358
kstat.zfs.misc.arcstats.misses: 2486504
kstat.zfs.misc.arcstats.demand_data_hits: 7959052
kstat.zfs.misc.arcstats.demand_data_misses: 1590868
kstat.zfs.misc.arcstats.demand_metadata_hits: 2969276
kstat.zfs.misc.arcstats.demand_metadata_misses: 895636
kstat.zfs.misc.arcstats.prefetch_data_hits: 0
kstat.zfs.misc.arcstats.prefetch_data_misses: 0
kstat.zfs.misc.arcstats.prefetch_metadata_hits: 30
kstat.zfs.misc.arcstats.prefetch_metadata_misses: 0
kstat.zfs.misc.arcstats.mru_hits: 5601378
kstat.zfs.misc.arcstats.mru_ghost_hits: 199211
kstat.zfs.misc.arcstats.mfu_hits: 5326980
kstat.zfs.misc.arcstats.mfu_ghost_hits: 482037
kstat.zfs.misc.arcstats.allocated: 2989914
kstat.zfs.misc.arcstats.deleted: 1904492
kstat.zfs.misc.arcstats.stolen: 1272047
kstat.zfs.misc.arcstats.recycle_miss: 464306
kstat.zfs.misc.arcstats.mutex_miss: 658
kstat.zfs.misc.arcstats.evict_skip: 148880
kstat.zfs.misc.arcstats.evict_l2_cached: 0
kstat.zfs.misc.arcstats.evict_l2_eligible: 150970209280
kstat.zfs.misc.arcstats.evict_l2_ineligible: 36864
kstat.zfs.misc.arcstats.hash_

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 18:26 Artemiev Igor said the following:
> On Fri, Oct 29, 2010 at 05:41:59PM +0300, Andriy Gapon wrote:
> 
>> What svn revision of FreeBSD source tree did you test?
> 
> r213936. Revision seems a little old.
> 
>> Ah, I think I see what's going on.
>> Either sendfile should (have an option to) use VOP_GETPAGES to request data 
>> or ZFS
>> mappedread should use vm_grab_page instead of vm_lookup_page for UIO_NOCOPY 
>> case.
>> Currently ZFS would read a whole FS block into ARC, but populate only one 
>> page
>> with data and for the rest it would just wastefully do uiomove(UIO_NOCOPY) 
>> from
>> ARC data.
>> So, e.g. zpool iostat would show that there are only few actual reads from a 
>> pool.
>>  The rest of the time must be spent churning over the data already in ARC and
>> doing page-per-VOP_READ copies from it.
> I can test it, but what allocflags? VM_ALLOC_RETRY|VM_ALLOC_NORMAL?

Probably yes, but have to be careful there.
First, do vm_page_grab only for UIO_NOCOPY case.
Second, the first page is already "shared busy" after vm_page_io_start() call in
kern_sendfile; so you might need VM_ALLOC_IGN_SBUSY for that page to avoid a 
deadlock.

I think that it may be good to separate UIO_NOCOPY/sendfile case from mappedread
into a function of its own.


P.S. doing VOP_GETPAGES instead of vn_rdwr() in kern_sendfile() might be a 
better
idea still.  But there are some additional details to that, e.g. a mount/fs flag
to tell which mechanism is preferred.  Because, as I've been told, vn_rdwr() has
better performance than VOP_GETPAGES.  Although, I don't understand why it
could/should be that way.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Artemiev Igor
On Fri, Oct 29, 2010 at 05:41:59PM +0300, Andriy Gapon wrote:

> What svn revision of FreeBSD source tree did you test?

r213936. Revision seems a little old.

> Ah, I think I see what's going on.
> Either sendfile should (have an option to) use VOP_GETPAGES to request data 
> or ZFS
> mappedread should use vm_grab_page instead of vm_lookup_page for UIO_NOCOPY 
> case.
> Currently ZFS would read a whole FS block into ARC, but populate only one page
> with data and for the rest it would just wastefully do uiomove(UIO_NOCOPY) 
> from
> ARC data.
> So, e.g. zpool iostat would show that there are only few actual reads from a 
> pool.
>  The rest of the time must be spent churning over the data already in ARC and
> doing page-per-VOP_READ copies from it.
I can test it, but what allocflags? VM_ALLOC_RETRY|VM_ALLOC_NORMAL?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 18:17 Kostik Belousov said the following:
> On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote:
>> on 29/10/2010 17:53 Kostik Belousov said the following:
>>> Could it be the priming of the vm object pages content ?
>>
>> Sorry, not familiar with this term.
>> Do you mean prepopulation of vm object with valid pages?
>>
>>> Due to double-buffering, and (possibly false) optimization to only
>>
>> What optimization?
> On zfs vnode read, the page from the corresponding vm object is only
> populated with the vnode data if the page already exists in the
> object.

Do you mean a specific type of read?
For "normal" reads it's the other way around - if the page already exists and is
valid, then we read from the page, not from ARC.

> Not doing the optimization would be to allocate the page uncoditionally
> on the read if not already present, and copy the data from ARC to the page.
>>
>>> perform double-buffering when vm object already has some data cached,
>>> reads can prime vm object page list before file is mmapped or
>>> sendfile-ed.
>>>
>>
>> No double-buffering is done to optimize anything. Double-buffering
>> is a consequence of having page cache and ARC. The special
>> "double-buffering code" is to just handle that fact - e.g. making
>> sure that VOP_READ reads data from page cache instead of ARC if it's
>> possible that the data in them differs (i.e. page cache has more
>> recent data).
>>
>> So, if I understood the term 'priming' correctly, no priming should
>> ever occur.
> The priming is done on the first call to VOP_READ() with the right
> offset after the page is allocated.

Again, what is priming?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov
On Fri, Oct 29, 2010 at 06:05:26PM +0300, Andriy Gapon wrote:
> on 29/10/2010 17:53 Kostik Belousov said the following:
> > Could it be the priming of the vm object pages content ?
> 
> Sorry, not familiar with this term.
> Do you mean prepopulation of vm object with valid pages?
> 
> > Due to double-buffering, and (possibly false) optimization to only
> 
> What optimization?
On zfs vnode read, the page from the corresponding vm object is only
populated with the vnode data if the page already exists in the
object.

Not doing the optimization would be to allocate the page uncoditionally
on the read if not already present, and copy the data from ARC to the page.
> 
> > perform double-buffering when vm object already has some data cached,
> > reads can prime vm object page list before file is mmapped or
> > sendfile-ed.
> > 
> 
> No double-buffering is done to optimize anything. Double-buffering
> is a consequence of having page cache and ARC. The special
> "double-buffering code" is to just handle that fact - e.g. making
> sure that VOP_READ reads data from page cache instead of ARC if it's
> possible that the data in them differs (i.e. page cache has more
> recent data).
>
> So, if I understood the term 'priming' correctly, no priming should
> ever occur.
The priming is done on the first call to VOP_READ() with the right
offset after the page is allocated.


pgpsWIastHVGc.pgp
Description: PGP signature


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 17:53 Kostik Belousov said the following:
> Could it be the priming of the vm object pages content ?

Sorry, not familiar with this term.
Do you mean prepopulation of vm object with valid pages?

> Due to double-buffering, and (possibly false) optimization to only

What optimization?

> perform double-buffering when vm object already has some data cached,
> reads can prime vm object page list before file is mmapped or
> sendfile-ed.
> 

No double-buffering is done to optimize anything.  Double-buffering is a
consequence of having page cache and ARC.  The special "double-buffering code" 
is
to just handle that fact - e.g. making sure that VOP_READ reads data from page
cache instead of ARC if it's possible that the data in them differs (i.e. page
cache has more recent data).

So, if I understood the term 'priming' correctly, no priming should ever occur.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Kostik Belousov
On Fri, Oct 29, 2010 at 06:31:21PM +0400, Alexander Zagrebin wrote:
> > > I've tried the nginx with
> > > disabled sendfile (the nginx.conf contains "sendfile off;"):
> > > 
> > > $ dd if=/dev/random of=test bs=1m count=100
> > > 100+0 records in
> > > 100+0 records out
> > > 104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   41 MBps
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   44 MBps
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   44 MBps
> > > 
> > 
> > I am really surprised with such a bad performance of sendfile.
> > Will you be able to profile the issue further?
> 
> Yes.
> 
> > I will also try to think of some measurements.
> 
> A transfer rate is too low for the _first_ attempt only.
> Further attempts demonstrates a reasonable transfer rate.
> For example, nginx with "sendfile on;":
> 
> $ dd if=/dev/random of=test bs=1m count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes transferred in 5.855305 secs (17908136 bytes/sec)
> $ fetch -o /dev/null http://localhost/test
> /dev/null   3% of  100 MB  118 kBps
> 13m50s^C
> fetch: transfer interrupted
> $ fetch -o /dev/null http://localhost/test
> /dev/null 100% of  100 MB   39 MBps
> 
> If there was no access to the file during some time, then everything
> repeats:
> The first attempt - transfer rate is too low
> A further attempts - no problems
> 
> Can you reproduce the problem on your system?

Could it be the priming of the vm object pages content ?
Due to double-buffering, and (possibly false) optimization to only
perform double-buffering when vm object already has some data cached,
reads can prime vm object page list before file is mmapped or
sendfile-ed.



pgpnA8KHQc5Dk.pgp
Description: PGP signature


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Jeremy Chadwick
On Fri, Oct 29, 2010 at 06:31:21PM +0400, Alexander Zagrebin wrote:
> > > I've tried the nginx with
> > > disabled sendfile (the nginx.conf contains "sendfile off;"):
> > > 
> > > $ dd if=/dev/random of=test bs=1m count=100
> > > 100+0 records in
> > > 100+0 records out
> > > 104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   41 MBps
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   44 MBps
> > > $ fetch -o /dev/null http://localhost/test
> > > /dev/null 100% of  100 
> > MB   44 MBps
> > > 
> > 
> > I am really surprised with such a bad performance of sendfile.
> > Will you be able to profile the issue further?
> 
> Yes.
> 
> > I will also try to think of some measurements.
> 
> A transfer rate is too low for the _first_ attempt only.
> Further attempts demonstrates a reasonable transfer rate.
> For example, nginx with "sendfile on;":
> 
> $ dd if=/dev/random of=test bs=1m count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes transferred in 5.855305 secs (17908136 bytes/sec)
> $ fetch -o /dev/null http://localhost/test
> /dev/null   3% of  100 MB  118 kBps
> 13m50s^C
> fetch: transfer interrupted
> $ fetch -o /dev/null http://localhost/test
> /dev/null 100% of  100 MB   39 MBps
> 
> If there was no access to the file during some time, then everything
> repeats:
> The first attempt - transfer rate is too low
> A further attempts - no problems
> 
> Can you reproduce the problem on your system?

I can't reproduce it on mine.  Note the resilvering was induced from
some unrelated disk swaps/tests I was performing, and ftpd is already
enabled via inetd on this system.

icarus# uname -a
FreeBSD icarus.home.lan 8.1-STABLE FreeBSD 8.1-STABLE #0: Sat Oct 16 07:10:54 
PDT 2010 r...@icarus.home.lan:/usr/obj/usr/src/sys/X7SBA_RELENG_8_amd64  
amd64
icarus# df -k
Filesystem   1024-blocks  Used Avail Capacity  Mounted on
/dev/ada0s1a 101297445180848013048%/
devfs  1 1 0   100%/dev
/dev/ada0s1d12186190103986  11107310 1%/var
/dev/ada0s1e 4058062  5468   3727950 0%/tmp
/dev/ada0s1f 8395622   1918300   580567425%/usr
data/cvs   686338517   289 686338228 0%/cvs
data/home  687130693792465 686338228 0%/home
data/storage   957080511 270742283 68633822828%/storage
icarus# zpool status
  pool: data
 state: ONLINE
 scrub: resilver completed after 0h43m with 0 errors on Sun Oct 17 10:11:19 2010
config:

NAMESTATE READ WRITE CKSUM
dataONLINE   0 0 0
  mirrorONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0  258G resilvered

errors: No known data errors

icarus# pw useradd ftp -g users -u 2000 -s /bin/csh
icarus# mkdir /home/ftp
icarus# chown ftp:users /home/ftp
icarus# cd /home/ftp
icarus# dd if=/dev/urandom of=test bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 1.384421 secs (75741116 bytes/sec)
icarus# chown ftp:users test
icarus# ls -l test
-rw-r--r--  1 ftp  users  104857600 Oct 29 07:41 test
icarus# date ; fetch -o /dev/null ftp://localhost/test
Fri Oct 29 07:45:47 PDT 2010
/dev/null 100% of  100 MB  174 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test
Fri Oct 29 07:45:48 PDT 2010
/dev/null 100% of  100 MB  156 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test
Fri Oct 29 07:45:49 PDT 2010
/dev/null 100% of  100 MB  170 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test
Fri Oct 29 07:45:50 PDT 2010
/dev/null 100% of  100 MB  155 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test
Fri Oct 29 07:45:52 PDT 2010
/dev/null 100% of  100 MB  151 MBps

icarus# dd if=/dev/urandom of=test2 bs=1m count=500
500+0 records in
500+0 records out
524288000 bytes transferred in 6.947780 secs (75461228 bytes/sec)
icarus# chown ftp:users test2
icarus# ls -l test2
-rw-r--r--  1 ftp  users  524288000 Oct 29 07:46 test2
icarus# date ; fetch -o /dev/null ftp://localhost/test2
Fri Oct 29 07:47:19 PDT 2010
/dev/null 100% of  500 MB  148 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test2
Fri Oct 29 07:47:24 PDT 2010
/dev/null 100% of  500 MB  175 MBps
icarus# date ; fetch -o /dev/null ftp://localhost/test2
Fri Oct 29 07:47:30 PDT 2010
/dev/null 100% of  500 MB  164 MBps

What ZFS tunin

Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 15:36 Andriy Gapon said the following:
> on 29/10/2010 12:04 Artemiev Igor said the following:
>> Yep, this problem exists. You may workaround it via bumping up
>> net.inet.tcp.sendspace up to 128k.  zfs sendfile is very ineffective. I have
>> made a small investigation via DTrace, it reads MAXBSIZE chunks, but map in 
>> vm
>> only one page (4K).  I.e. if you have a file with size 512K, sendfile make
>> calls freebsd_zfs_read 128 times.
> 
> What svn revision of FreeBSD source tree did you test?
> 

Ah, I think I see what's going on.
Either sendfile should (have an option to) use VOP_GETPAGES to request data or 
ZFS
mappedread should use vm_grab_page instead of vm_lookup_page for UIO_NOCOPY 
case.
Currently ZFS would read a whole FS block into ARC, but populate only one page
with data and for the rest it would just wastefully do uiomove(UIO_NOCOPY) from
ARC data.
So, e.g. zpool iostat would show that there are only few actual reads from a 
pool.
 The rest of the time must be spent churning over the data already in ARC and
doing page-per-VOP_READ copies from it.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Alexander Zagrebin
> > I've tried the nginx with
> > disabled sendfile (the nginx.conf contains "sendfile off;"):
> > 
> > $ dd if=/dev/random of=test bs=1m count=100
> > 100+0 records in
> > 100+0 records out
> > 104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
> > $ fetch -o /dev/null http://localhost/test
> > /dev/null 100% of  100 
> MB   41 MBps
> > $ fetch -o /dev/null http://localhost/test
> > /dev/null 100% of  100 
> MB   44 MBps
> > $ fetch -o /dev/null http://localhost/test
> > /dev/null 100% of  100 
> MB   44 MBps
> > 
> 
> I am really surprised with such a bad performance of sendfile.
> Will you be able to profile the issue further?

Yes.

> I will also try to think of some measurements.

A transfer rate is too low for the _first_ attempt only.
Further attempts demonstrates a reasonable transfer rate.
For example, nginx with "sendfile on;":

$ dd if=/dev/random of=test bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 5.855305 secs (17908136 bytes/sec)
$ fetch -o /dev/null http://localhost/test
/dev/null   3% of  100 MB  118 kBps
13m50s^C
fetch: transfer interrupted
$ fetch -o /dev/null http://localhost/test
/dev/null 100% of  100 MB   39 MBps

If there was no access to the file during some time, then everything
repeats:
The first attempt - transfer rate is too low
A further attempts - no problems

Can you reproduce the problem on your system?

-- 
Alexander Zagrebin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 16:14 Alexander Zagrebin said the following:
>>> I've noticed that ZFS on 8.1-STABLE still has problems with 
>> sendfile.
>>
>> Which svn revision, just in case?
> 
> 8.1-STABLE
> The source tree was updated 2010-10-27

OK, good.

>>> When accessing a file at first time the transfer speed is 
>> too low, but
>>> on following attempts the transfer speed is normal.
>>>
>>> How to repeat:
>>>
>>> $ dd if=/dev/random of=/tmp/test bs=1m count=100
>>> 100+0 records in
>>> 100+0 records out
>>> 104857600 bytes transferred in 5.933945 secs (17670807 bytes/sec)
>>> $ sudo env LC_ALL=C /usr/libexec/ftpd -D
>>>
>>> The first attempt to fetch file:
>>>
>>> $ fetch -o /dev/null ftp://localhost/tmp/test
>>> /dev/null   1% of  100 
>> MB  118 kBps
>>> 14m07s^C
>>> fetch: transfer interrupted
>>>
>>> The transfer rate is too low (approx. 120 kBps), but any 
>> subsequent attempts
>>> are success:
>>>
>>> $ fetch -o /dev/null ftp://localhost/tmp/test
>>> /dev/null 100% of  100 
>> MB   42 MBps
>>> $ fetch -o /dev/null ftp://localhost/tmp/test
>>> /dev/null 100% of  100 
>> MB   47 MBps
>>
>> Can you do an experiment with the same structure but sendfile 
>> excluded?
> 
> IMHO, ftpd hasn't an option to disable sendfile.

Seems so.
The source could be hacked (unconditional goto oldway in libexec/ftpd/ftpd.c, 
but
anyway.

> I've tried the nginx with
> disabled sendfile (the nginx.conf contains "sendfile off;"):
> 
> $ dd if=/dev/random of=test bs=1m count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
> $ fetch -o /dev/null http://localhost/test
> /dev/null 100% of  100 MB   41 MBps
> $ fetch -o /dev/null http://localhost/test
> /dev/null 100% of  100 MB   44 MBps
> $ fetch -o /dev/null http://localhost/test
> /dev/null 100% of  100 MB   44 MBps
> 

I am really surprised with such a bad performance of sendfile.
Will you be able to profile the issue further?
I will also try to think of some measurements.

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


RE: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Alexander Zagrebin
> > I've noticed that ZFS on 8.1-STABLE still has problems with 
> sendfile.
> 
> Which svn revision, just in case?

8.1-STABLE
The source tree was updated 2010-10-27

> > When accessing a file at first time the transfer speed is 
> too low, but
> > on following attempts the transfer speed is normal.
> > 
> > How to repeat:
> > 
> > $ dd if=/dev/random of=/tmp/test bs=1m count=100
> > 100+0 records in
> > 100+0 records out
> > 104857600 bytes transferred in 5.933945 secs (17670807 bytes/sec)
> > $ sudo env LC_ALL=C /usr/libexec/ftpd -D
> > 
> > The first attempt to fetch file:
> > 
> > $ fetch -o /dev/null ftp://localhost/tmp/test
> > /dev/null   1% of  100 
> MB  118 kBps
> > 14m07s^C
> > fetch: transfer interrupted
> > 
> > The transfer rate is too low (approx. 120 kBps), but any 
> subsequent attempts
> > are success:
> > 
> > $ fetch -o /dev/null ftp://localhost/tmp/test
> > /dev/null 100% of  100 
> MB   42 MBps
> > $ fetch -o /dev/null ftp://localhost/tmp/test
> > /dev/null 100% of  100 
> MB   47 MBps
> 
> Can you do an experiment with the same structure but sendfile 
> excluded?

IMHO, ftpd hasn't an option to disable sendfile. I've tried the nginx with
disabled sendfile (the nginx.conf contains "sendfile off;"):

$ dd if=/dev/random of=test bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 5.892504 secs (17795083 bytes/sec)
$ fetch -o /dev/null http://localhost/test
/dev/null 100% of  100 MB   41 MBps
$ fetch -o /dev/null http://localhost/test
/dev/null 100% of  100 MB   44 MBps
$ fetch -o /dev/null http://localhost/test
/dev/null 100% of  100 MB   44 MBps

-- 
Alexander Zagrebin

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 29/10/2010 12:04 Artemiev Igor said the following:
> Yep, this problem exists. You may workaround it via bumping up
> net.inet.tcp.sendspace up to 128k.  zfs sendfile is very ineffective. I have
> made a small investigation via DTrace, it reads MAXBSIZE chunks, but map in vm
> only one page (4K).  I.e. if you have a file with size 512K, sendfile make
> calls freebsd_zfs_read 128 times.

What svn revision of FreeBSD source tree did you test?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Andriy Gapon
on 28/10/2010 08:57 Alexander Zagrebin said the following:
> Hi!
> 
> I've noticed that ZFS on 8.1-STABLE still has problems with sendfile.

Which svn revision, just in case?

> When accessing a file at first time the transfer speed is too low, but
> on following attempts the transfer speed is normal.
> 
> How to repeat:
> 
> $ dd if=/dev/random of=/tmp/test bs=1m count=100
> 100+0 records in
> 100+0 records out
> 104857600 bytes transferred in 5.933945 secs (17670807 bytes/sec)
> $ sudo env LC_ALL=C /usr/libexec/ftpd -D
> 
> The first attempt to fetch file:
> 
> $ fetch -o /dev/null ftp://localhost/tmp/test
> /dev/null   1% of  100 MB  118 kBps
> 14m07s^C
> fetch: transfer interrupted
> 
> The transfer rate is too low (approx. 120 kBps), but any subsequent attempts
> are success:
> 
> $ fetch -o /dev/null ftp://localhost/tmp/test
> /dev/null 100% of  100 MB   42 MBps
> $ fetch -o /dev/null ftp://localhost/tmp/test
> /dev/null 100% of  100 MB   47 MBps

Can you do an experiment with the same structure but sendfile excluded?

-- 
Andriy Gapon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"


Re: 8.1-STABLE: zfs and sendfile: problem still exists

2010-10-29 Thread Artemiev Igor
On Thu, Oct 28, 2010 at 09:57:22AM +0400, Alexander Zagrebin wrote:
> Hi!
> 
> I've noticed that ZFS on 8.1-STABLE still has problems with sendfile.
> When accessing a file at first time the transfer speed is too low, but
> on following attempts the transfer speed is normal.
...
> I've tried ftpd and nginx with "sendfile on". The behavior is the same.
> After disabling using sendfile in nginx ("sendfile off") the problem has
> gone.

Yep, this problem exists. You may workaround it via bumping up
net.inet.tcp.sendspace up to 128k.  zfs sendfile is very ineffective. I have
made a small investigation via DTrace, it reads MAXBSIZE chunks, but map in vm
only one page (4K).  I.e. if you have a file with size 512K, sendfile make
calls freebsd_zfs_read 128 times.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"