Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-12 Thread John Baldwin
On Wednesday, September 10, 2014 11:29:21 PM Slawa Olhovchenkov wrote:
 On Wed, Sep 10, 2014 at 02:29:30PM -0400, John Baldwin wrote:
   Application must change behaviour when reach limit (run GC, switch
   algorithm and etc.).
   But mmap of big data file for scaning and processing don't touch this
   limit.
   Must be mmap of some temoprary file touch this limit? I don't know.
   Must be MAP_ANON touch this limit? I think yes.
   How to make distinction of mmap data file for processing and mmap
   temporary file for bypass this limit? I don't know.
  
  Consider also mmap() of shm_open(SHM_ANON).
 
 No, shm limited separatly. mmap of shm_open(SHM_ANON) don't need to adjust
 this limit.

It would be another trivial way of escaping RLIMIT_DATA however.

   PS: question about jemalloc. How I can stop jemalloc to give back
   memory to OS for some application?
  
  You would have to hack pages_purge() in chunk_mmap.c to always return true
  and not call madvise().
 
 And got memory leak?

No, that doesn't leak memory.  jemalloc() considers that address range free 
regardless and will always reuse it for a future allocation.  This is just a 
hint to let the VM system take back the backing pages because they are 
currently not in use by the application, and if the address is reused, the 
current contents of the pages are not important (can be thrown away).

 I am talk about change behaviour from periodic mmap and madvise memory
 to only mmap when need and leave for future use when freed by
 aplication. I think (may be wrong -- please correct) in some cases
 mmap/madvise will be too often.

The madvise() does not cause a future mmap().  You would have to munmap() to 
require a future mmap().  Instead, the madvise() can result in the page being 
released back to the system.  Later when that address is reused by jemalloc, 
the process would fault on the address requiring the VM to locate a new page.  
Disabling the madvise() only prevents that fault from occurring (However, 
leaving all the pages around and marked as in-use may increase memory pressure 
and cause you to swap and still have to fault the page back in in the future, 
but at greater expense.)

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-10 Thread John Baldwin
On Tuesday, September 09, 2014 02:25:18 PM Slawa Olhovchenkov wrote:
 On Mon, Sep 08, 2014 at 11:17:51AM -0400, John Baldwin wrote:
  On Sunday, September 07, 2014 04:56:49 PM Slawa Olhovchenkov wrote:
   PS: very bad that 'data limit' don't anymore reflect application
   memory consumer. and very small application can adapt to 'no memory'
   from system.
  
  You can use RLIMIT_AS instead of RLIMIT_DATA.  jemalloc can also be
  configured to use sbrk(), though I think there's no way to prevent it
  from falling back to mmap(MAP_ANON) if sbrk() fails.
 
 Formally you are right. Realy this is scorn. And I don't know how
 rightly. May be account in RLIMIT_DATA MAP_ANON? Anyway firefox don't
 run GC if malloc fail and this is bad.

It was not intended as scorn, simply stating what options you have
available as alternatives to what you are asking for.  The Firefox
behavior certainly seems unfortunate. :(  However, using RLIMIT_AS
for that should work as the mmap() call malloc() makes will fail
causing malloc() to fail.  However, you need RLIMIT_AS to be a bit
larger than what you would set RLIMIT_DATA to.

As to having MAP_ANON honor RLIMIT_DATA, I'm not sure how well that
would work.  I assume that was discussed as an option in the previous
threads on the topic of RLIMIT_DATA being useless with jemalloc.
I don't recall it though.  Perhaps Alan can speak to whether or not
that is a good idea?

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-10 Thread Slawa Olhovchenkov
On Wed, Sep 10, 2014 at 10:44:46AM -0400, John Baldwin wrote:

 On Tuesday, September 09, 2014 02:25:18 PM Slawa Olhovchenkov wrote:
  On Mon, Sep 08, 2014 at 11:17:51AM -0400, John Baldwin wrote:
   On Sunday, September 07, 2014 04:56:49 PM Slawa Olhovchenkov wrote:
PS: very bad that 'data limit' don't anymore reflect application
memory consumer. and very small application can adapt to 'no memory'
from system.
   
   You can use RLIMIT_AS instead of RLIMIT_DATA.  jemalloc can also be
   configured to use sbrk(), though I think there's no way to prevent it
   from falling back to mmap(MAP_ANON) if sbrk() fails.
  
  Formally you are right. Realy this is scorn. And I don't know how
  rightly. May be account in RLIMIT_DATA MAP_ANON? Anyway firefox don't
  run GC if malloc fail and this is bad.
 
 It was not intended as scorn, simply stating what options you have

Hmm, this is may bad english and try to translate some russian sentence.
I am try to do some tuning for satisfaction me.
You proposal do some limiting, but don't delight.

 available as alternatives to what you are asking for.  The Firefox
 behavior certainly seems unfortunate. :(  However, using RLIMIT_AS
 for that should work as the mmap() call malloc() makes will fail
 causing malloc() to fail.  However, you need RLIMIT_AS to be a bit
 larger than what you would set RLIMIT_DATA to.
 
 As to having MAP_ANON honor RLIMIT_DATA, I'm not sure how well that
 would work.  I assume that was discussed as an option in the previous
 threads on the topic of RLIMIT_DATA being useless with jemalloc.
 I don't recall it though.  Perhaps Alan can speak to whether or not
 that is a good idea?

What is sense of RLIMIT_DATA (as I understand)?
This is signaling to application of memory consumption.
Limiting RSS don't visible to application.
Application must change behaviour when reach limit (run GC, switch
algorithm and etc.).
But mmap of big data file for scaning and processing don't touch this
limit.
Must be mmap of some temoprary file touch this limit? I don't know.
Must be MAP_ANON touch this limit? I think yes.
How to make distinction of mmap data file for processing and mmap
temporary file for bypass this limit? I don't know.

And again, most application don't handle correctly NULL of malloc().
I think this is depreciate of all.

PS: question about jemalloc. How I can stop jemalloc to give back
memory to OS for some application?
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-10 Thread John Baldwin
On Wednesday, September 10, 2014 08:30:58 PM Slawa Olhovchenkov wrote:
 On Wed, Sep 10, 2014 at 10:44:46AM -0400, John Baldwin wrote:
  On Tuesday, September 09, 2014 02:25:18 PM Slawa Olhovchenkov wrote:
   On Mon, Sep 08, 2014 at 11:17:51AM -0400, John Baldwin wrote:
On Sunday, September 07, 2014 04:56:49 PM Slawa Olhovchenkov wrote:
 PS: very bad that 'data limit' don't anymore reflect application
 memory consumer. and very small application can adapt to 'no memory'
 from system.

You can use RLIMIT_AS instead of RLIMIT_DATA.  jemalloc can also be
configured to use sbrk(), though I think there's no way to prevent it
from falling back to mmap(MAP_ANON) if sbrk() fails.
   
   Formally you are right. Realy this is scorn. And I don't know how
   rightly. May be account in RLIMIT_DATA MAP_ANON? Anyway firefox don't
   run GC if malloc fail and this is bad.
  
  It was not intended as scorn, simply stating what options you have
 
 Hmm, this is may bad english and try to translate some russian sentence.
 I am try to do some tuning for satisfaction me.
 You proposal do some limiting, but don't delight.

Ok.
 
  available as alternatives to what you are asking for.  The Firefox
  behavior certainly seems unfortunate. :(  However, using RLIMIT_AS
  for that should work as the mmap() call malloc() makes will fail
  causing malloc() to fail.  However, you need RLIMIT_AS to be a bit
  larger than what you would set RLIMIT_DATA to.
  
  As to having MAP_ANON honor RLIMIT_DATA, I'm not sure how well that
  would work.  I assume that was discussed as an option in the previous
  threads on the topic of RLIMIT_DATA being useless with jemalloc.
  I don't recall it though.  Perhaps Alan can speak to whether or not
  that is a good idea?
 
 What is sense of RLIMIT_DATA (as I understand)?
 This is signaling to application of memory consumption.

Yes, traditionally it only affected malloc() and not other portions of the 
address space like the stack (RLIMIT_STACK) or anything in shared libraries 
(including text/data/bss).

 Limiting RSS don't visible to application.

Yes, RLIMIT_RSS is not often useful because it is not synchronous with the
RSS changing.

 Application must change behaviour when reach limit (run GC, switch
 algorithm and etc.).
 But mmap of big data file for scaning and processing don't touch this
 limit.
 Must be mmap of some temoprary file touch this limit? I don't know.
 Must be MAP_ANON touch this limit? I think yes.
 How to make distinction of mmap data file for processing and mmap
 temporary file for bypass this limit? I don't know.

Consider also mmap() of shm_open(SHM_ANON).

 And again, most application don't handle correctly NULL of malloc().
 I think this is depreciate of all.

This is true, but it seems Firefox might, and you could set RLIMIT_AS
for Firefox in particular so it gets adequate feedback.  I do think if
you can determine the correct value for RLIMIT_AS for Firefox that
mmap(MAP_ANON) would fail and result in NULL from malloc().

 PS: question about jemalloc. How I can stop jemalloc to give back
 memory to OS for some application?

You would have to hack pages_purge() in chunk_mmap.c to always return true and 
not call madvise().

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-10 Thread Slawa Olhovchenkov
On Wed, Sep 10, 2014 at 02:29:30PM -0400, John Baldwin wrote:

  Application must change behaviour when reach limit (run GC, switch
  algorithm and etc.).
  But mmap of big data file for scaning and processing don't touch this
  limit.
  Must be mmap of some temoprary file touch this limit? I don't know.
  Must be MAP_ANON touch this limit? I think yes.
  How to make distinction of mmap data file for processing and mmap
  temporary file for bypass this limit? I don't know.
 
 Consider also mmap() of shm_open(SHM_ANON).

No, shm limited separatly. mmap of shm_open(SHM_ANON) don't need to adjust
this limit.

  And again, most application don't handle correctly NULL of malloc().
  I think this is depreciate of all.
 
 This is true, but it seems Firefox might, and you could set RLIMIT_AS
 for Firefox in particular so it gets adequate feedback.  I do think if
 you can determine the correct value for RLIMIT_AS for Firefox that
 mmap(MAP_ANON) would fail and result in NULL from malloc().

I am do some time ago firefox limiting by ulimit -v, firefox just
crashing. Don't run GC or something else.

  PS: question about jemalloc. How I can stop jemalloc to give back
  memory to OS for some application?
 
 You would have to hack pages_purge() in chunk_mmap.c to always return true 
 and 
 not call madvise().

And got memory leak?
I am talk about change behaviour from periodic mmap and madvise memory
to only mmap when need and leave for future use when freed by
aplication. I think (may be wrong -- please correct) in some cases
mmap/madvise will be too often.

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-09 Thread Slawa Olhovchenkov
On Mon, Sep 08, 2014 at 11:17:51AM -0400, John Baldwin wrote:

 On Sunday, September 07, 2014 04:56:49 PM Slawa Olhovchenkov wrote:
  PS: very bad that 'data limit' don't anymore reflect application
  memory consumer. and very small application can adapt to 'no memory'
  from system.
 
 You can use RLIMIT_AS instead of RLIMIT_DATA.  jemalloc can also be 
 configured 
 to use sbrk(), though I think there's no way to prevent it from falling back 
 to mmap(MAP_ANON) if sbrk() fails.

Formally you are right. Realy this is scorn. And I don't know how
rightly. May be account in RLIMIT_DATA MAP_ANON? Anyway firefox don't
run GC if malloc fail and this is bad.

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-08 Thread John Baldwin
On Sunday, September 07, 2014 04:56:49 PM Slawa Olhovchenkov wrote:
 PS: very bad that 'data limit' don't anymore reflect application
 memory consumer. and very small application can adapt to 'no memory'
 from system.

You can use RLIMIT_AS instead of RLIMIT_DATA.  jemalloc can also be configured 
to use sbrk(), though I think there's no way to prevent it from falling back 
to mmap(MAP_ANON) if sbrk() fails.

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-08 Thread Slawa Olhovchenkov
On Fri, Sep 05, 2014 at 12:23:26PM +0300, Andriy Gapon wrote:

 on 04/09/2014 04:18 Steven Hartland said the following:
  Indeed that would be interesting, but we might find that its quite memory 
  size
  dependent given the scaling so confirming HW details would be nice too.
  
  I'd also be interested to know who wins the free race between the VM and ARC
  when using that value.
 
 BTW, I've written a small silly program that tests for a problem that affected
 me in the distant past: http://people.freebsd.org/~avg/arc-vs-swap.c
 
 It gobbles almost all of the memory and then just sits on it never accessing 
 it
 again.  At the same time it repeatedly reads blocks of data from a large file.
 The idea is that eventually the unused memory should be pushed out to the swap
 and the ARC is allowed to grow to accommodate for the data being read.
 
 I run this program on a freshly booted system without any other applications.
 Prior to r270759 the system behaves as expected.  Although the pace of 
 shifting
 balance between the ARC and the swap-backed pages is quite slow.
 After r270759 and with the default tuning the ARC always sits at its minimum
 size.  To me this is a regression.
 
 To summarize: I really appreciate the improvements that you are making here
 https://reviews.freebsd.org/D702
 Thanks!
 
 P.S.
 I wish there was an easy way to make the page cache and the ARC aware of each 
 other.

I think no single way for any workload.
For some workloads ARC is prefered.
For some -- RSS is prefered.
May be need some tunable for elastics factor ARC/RSS?

PS: very bad that 'data limit' don't anymore reflect application
memory consumer. and very small application can adapt to 'no memory'
from system.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-08 Thread Nikolai Lifanov

On 2014-09-03 21:18, Steven Hartland wrote:

- Original Message - From: Andriy Gapon a...@freebsd.org



on 03/09/2014 23:22 Nikolai Lifanov said the following:

On 09/03/14 15:22, John Baldwin wrote:

On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:

On 09/03/14 04:09, Steven Hartland wrote:

I'm looking to MFC this change so wanted to check if
anyone had an final feedback / objections?

I know we currently have Alan's feedback on changing
the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
which sounds sensible but waiting Peter to comment on.

   Regards
   Steve


I have no technical input, but this change improves ARC usefulness 
for

me quite a bit. I would like to see the improvement in 10-STABLE.


Can you verify that the current 10-STABLE (as of today) with all the 
various pagedaemon fixes still has ARC issues for your workload?




It doesn't have any issues, but I noticed the improvement on CURRENT. 
I

observed that just after this change, my package builder is much more
likely to retain MFU and not evict useful things from there (the port
tree) after large builds.
However, I run a lot more 10.0-RELEASE than CURRENT and I would like 
to

see this improvement release-bound.

I would be happy to test this on 10-STABLE if you think that this is
relevant.



As noted before, unfortunately, this commit (plus its fixups) contains 
at least
two related but distinct changes.  So, to separate the wheat from the 
chaff,

could you please try to comment out the following block in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, function 
arc_reclaim_needed:


   if (kmem_free_count()  zfs_arc_free_target) {
   DTRACE_PROBE2(arc__reclaim_freetarget, uint64_t,
   kmem_free_count(), uint64_t, zfs_arc_free_target);
   return (1);
   }

Alternatively, I think that the same effect can be achieved by setting 
sysctl

vfs.zfs.arc_free_target to the same value as vm.stats.vm.v_free_min.


Thats correct that would achieve the same thing.

It's interesting to me whether you would still see the better 
performance or if

that improvement would be undone.


Indeed that would be interesting, but we might find that its quite 
memory size

dependent given the scaling so confirming HW details would be nice too.

I'd also be interested to know who wins the free race between the VM 
and ARC

when using that value.

For those following this thread but not the review, I've added some 
additional

information there which you might be interested in:
https://reviews.freebsd.org/D702

   Regards
   Steve


I had time to re-test both the stock condition after the improvements 
and the condition in which 
vfs.zfs.arc_free_target=vm.stats.vm.v_free_min. It seems that MFU is 
more likely to be reduced in the second case.


- Nikolai Lifanov
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-05 Thread Andriy Gapon
on 04/09/2014 04:18 Steven Hartland said the following:
 Indeed that would be interesting, but we might find that its quite memory size
 dependent given the scaling so confirming HW details would be nice too.
 
 I'd also be interested to know who wins the free race between the VM and ARC
 when using that value.

BTW, I've written a small silly program that tests for a problem that affected
me in the distant past: http://people.freebsd.org/~avg/arc-vs-swap.c

It gobbles almost all of the memory and then just sits on it never accessing it
again.  At the same time it repeatedly reads blocks of data from a large file.
The idea is that eventually the unused memory should be pushed out to the swap
and the ARC is allowed to grow to accommodate for the data being read.

I run this program on a freshly booted system without any other applications.
Prior to r270759 the system behaves as expected.  Although the pace of shifting
balance between the ARC and the swap-backed pages is quite slow.
After r270759 and with the default tuning the ARC always sits at its minimum
size.  To me this is a regression.

To summarize: I really appreciate the improvements that you are making here
https://reviews.freebsd.org/D702
Thanks!

P.S.
I wish there was an easy way to make the page cache and the ARC aware of each 
other.
-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-04 Thread Nikolai Lifanov
On 09/03/14 21:18, Steven Hartland wrote:
 
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 
 on 03/09/2014 23:22 Nikolai Lifanov said the following:
 On 09/03/14 15:22, John Baldwin wrote:
 On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:
 On 09/03/14 04:09, Steven Hartland wrote:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

 I know we currently have Alan's feedback on changing
 the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
 which sounds sensible but waiting Peter to comment on.

Regards
Steve

 I have no technical input, but this change improves ARC usefulness for
 me quite a bit. I would like to see the improvement in 10-STABLE.

 Can you verify that the current 10-STABLE (as of today) with all the
 various pagedaemon fixes still has ARC issues for your workload?


 It doesn't have any issues, but I noticed the improvement on CURRENT. I
 observed that just after this change, my package builder is much more
 likely to retain MFU and not evict useful things from there (the port
 tree) after large builds.
 However, I run a lot more 10.0-RELEASE than CURRENT and I would like to
 see this improvement release-bound.

 I would be happy to test this on 10-STABLE if you think that this is
 relevant.


 As noted before, unfortunately, this commit (plus its fixups) contains
 at least
 two related but distinct changes.  So, to separate the wheat from the
 chaff,
 could you please try to comment out the following block in
 sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, function
 arc_reclaim_needed:

if (kmem_free_count()  zfs_arc_free_target) {
DTRACE_PROBE2(arc__reclaim_freetarget, uint64_t,
kmem_free_count(), uint64_t, zfs_arc_free_target);
return (1);
}

 Alternatively, I think that the same effect can be achieved by setting
 sysctl
 vfs.zfs.arc_free_target to the same value as vm.stats.vm.v_free_min.
 
 Thats correct that would achieve the same thing.
 
 It's interesting to me whether you would still see the better
 performance or if
 that improvement would be undone.
 
 Indeed that would be interesting, but we might find that its quite
 memory size
 dependent given the scaling so confirming HW details would be nice too.
 
 I'd also be interested to know who wins the free race between the VM and
 ARC
 when using that value.
 
 For those following this thread but not the review, I've added some
 additional
 information there which you might be interested in:
 https://reviews.freebsd.org/D702
 
Regards
Steve

Just an update: I'm in the middle of testing this. I have to finish a
large bulk build to observe the behavior one way or another.

I have 32G of physical memory and 2x16G dedicated swap SSDs (L2ARC
wasn't very useful, but I should probably retest this) on this machine.
My ARC is usually at 14G with ~5G of MFU full of things I benefit from
keeping there (port trees, base jails). Builds themselves happen in
tmpfs and I usually have around 1.5G - 4G Free memory (unless building
something like pypy).

- Nikolai Lifanov
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



on 02/09/2014 20:43 Steven Hartland said the following:

- Original Message - From: Andriy Gapon a...@freebsd.org

And the newly added kmem_foo() functions probably do not belong in
cddl/compat/opensolaris as Solaris / illumos does not have those functions.


They could be moved but their current location keeps all the kmem
related functions neatly in one place. Spreading them around IMO
would just make things hard to find.


BTW, here is some of my old WIP that completely removed the pre-existing kmem_*
functions and made the related code much closer to that in illumos.
Unfortunately, I will now have hard time merging my changes with your change.

https://github.com/avg-I/freebsd/compare/wip/hc/kmem_size-memguard-fix
(esp.commits e0cf2f7 and becf087)


Looking good, I'm all for eliminating the differences between the two
code bases as that will make things easier to maintain in the future,
so something we should definitely try to get in for 11.

   Regards
   Steve
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Andriy Gapon
on 03/09/2014 11:09 Steven Hartland said the following:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

I think that your changes went in a bit prematurely (little review), so perhaps
MFC would be premature as well.

 I know we currently have Alan's feedback on changing
 the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
 which sounds sensible but waiting Peter to comment on.
 
Regards
Steve
 


-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



on 03/09/2014 11:09 Steven Hartland said the following:

I'm looking to MFC this change so wanted to check if
anyone had an final feedback / objections?


I think that your changes went in a bit prematurely (little review), so perhaps
MFC would be premature as well.


Its a change which really needs to make it into 10.1 IMO
due to its impact on users so I don't really want to hold
off too long.

If anyone has any substantiated reason to then off course
I'll hold off.

   Regards
   Steve
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Andriy Gapon
on 03/09/2014 15:17 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org
 
 on 03/09/2014 11:09 Steven Hartland said the following:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

 I think that your changes went in a bit prematurely (little review), so 
 perhaps
 MFC would be premature as well.
 
 Its a change which really needs to make it into 10.1 IMO

I think that this is the arguable point.
As I've mentioned before I have not noticed, perhaps through the fault of my
own, any reports that users need this change after Alan's pagedaemon fix(es).
Also, there is no confirmation yet that after this change ARC does not give up
its buffers too easily.

 due to its impact on users so I don't really want to hold
 off too long.
 
 If anyone has any substantiated reason to then off course
 I'll hold off.

-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Nikolai Lifanov
On 09/03/14 04:09, Steven Hartland wrote:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?
 
 I know we currently have Alan's feedback on changing
 the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
 which sounds sensible but waiting Peter to comment on.
 
Regards
Steve

I have no technical input, but this change improves ARC usefulness for
me quite a bit. I would like to see the improvement in 10-STABLE.

- Nikolai Lifanov

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Andriy Gapon
on 03/09/2014 17:58 Andriy Gapon said the following:
 on 03/09/2014 15:17 Steven Hartland said the following:
 - Original Message - From: Andriy Gapon a...@freebsd.org

 on 03/09/2014 11:09 Steven Hartland said the following:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

 I think that your changes went in a bit prematurely (little review), so 
 perhaps
 MFC would be premature as well.

 Its a change which really needs to make it into 10.1 IMO
 
 I think that this is the arguable point.
 As I've mentioned before I have not noticed, perhaps through the fault of my
 own, any reports that users need this change after Alan's pagedaemon fix(es).
 Also, there is no confirmation yet that after this change ARC does not give up
 its buffers too easily.
 
 due to its impact on users so I don't really want to hold
 off too long.

 If anyone has any substantiated reason to then off course
 I'll hold off.

Based on our parallel conversation I feel a need to clarify my position.
The commit in question has multiple changes in it:
1. removal of KVA check, which was later correctly restored for i386
2. addition of DTrace probes
3. zfs_arc_free_target check and all the support code for it

So, #1 plus later fixes is obviously correct.
#2 is useful and I like it.
#3 is what I have great doubts about.  All of what I said in the previous emails
applies to #3 exclusively.

-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread John Baldwin
On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:
 On 09/03/14 04:09, Steven Hartland wrote:
  I'm looking to MFC this change so wanted to check if
  anyone had an final feedback / objections?
  
  I know we currently have Alan's feedback on changing
  the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
  which sounds sensible but waiting Peter to comment on.
  
 Regards
 Steve
 
 I have no technical input, but this change improves ARC usefulness for
 me quite a bit. I would like to see the improvement in 10-STABLE.

Can you verify that the current 10-STABLE (as of today) with all the various 
pagedaemon fixes still has ARC issues for your workload?

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Nikolai Lifanov
On 09/03/14 15:22, John Baldwin wrote:
 On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:
 On 09/03/14 04:09, Steven Hartland wrote:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

 I know we currently have Alan's feedback on changing
 the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
 which sounds sensible but waiting Peter to comment on.

Regards
Steve

 I have no technical input, but this change improves ARC usefulness for
 me quite a bit. I would like to see the improvement in 10-STABLE.
 
 Can you verify that the current 10-STABLE (as of today) with all the various 
 pagedaemon fixes still has ARC issues for your workload?
 

It doesn't have any issues, but I noticed the improvement on CURRENT. I
observed that just after this change, my package builder is much more
likely to retain MFU and not evict useful things from there (the port
tree) after large builds.
However, I run a lot more 10.0-RELEASE than CURRENT and I would like to
see this improvement release-bound.

I would be happy to test this on 10-STABLE if you think that this is
relevant.

- Nikolai Lifanov

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Andriy Gapon
on 03/09/2014 23:22 Nikolai Lifanov said the following:
 On 09/03/14 15:22, John Baldwin wrote:
 On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:
 On 09/03/14 04:09, Steven Hartland wrote:
 I'm looking to MFC this change so wanted to check if
 anyone had an final feedback / objections?

 I know we currently have Alan's feedback on changing
 the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
 which sounds sensible but waiting Peter to comment on.

Regards
Steve

 I have no technical input, but this change improves ARC usefulness for
 me quite a bit. I would like to see the improvement in 10-STABLE.

 Can you verify that the current 10-STABLE (as of today) with all the various 
 pagedaemon fixes still has ARC issues for your workload?

 
 It doesn't have any issues, but I noticed the improvement on CURRENT. I
 observed that just after this change, my package builder is much more
 likely to retain MFU and not evict useful things from there (the port
 tree) after large builds.
 However, I run a lot more 10.0-RELEASE than CURRENT and I would like to
 see this improvement release-bound.
 
 I would be happy to test this on 10-STABLE if you think that this is
 relevant.


As noted before, unfortunately, this commit (plus its fixups) contains at least
two related but distinct changes.  So, to separate the wheat from the chaff,
could you please try to comment out the following block in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, function 
arc_reclaim_needed:

if (kmem_free_count()  zfs_arc_free_target) {
DTRACE_PROBE2(arc__reclaim_freetarget, uint64_t,
kmem_free_count(), uint64_t, zfs_arc_free_target);
return (1);
}

Alternatively, I think that the same effect can be achieved by setting sysctl
vfs.zfs.arc_free_target to the same value as vm.stats.vm.v_free_min.

It's interesting to me whether you would still see the better performance or if
that improvement would be undone.

Thanks!
-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-03 Thread Steven Hartland


- Original Message - 
From: Andriy Gapon a...@freebsd.org




on 03/09/2014 23:22 Nikolai Lifanov said the following:

On 09/03/14 15:22, John Baldwin wrote:

On Wednesday, September 03, 2014 11:05:04 AM Nikolai Lifanov wrote:

On 09/03/14 04:09, Steven Hartland wrote:

I'm looking to MFC this change so wanted to check if
anyone had an final feedback / objections?

I know we currently have Alan's feedback on changing
the #ifdef __i386__ to #ifndef UMA_MD_SMALL_ALLOC
which sounds sensible but waiting Peter to comment on.

   Regards
   Steve


I have no technical input, but this change improves ARC usefulness for
me quite a bit. I would like to see the improvement in 10-STABLE.


Can you verify that the current 10-STABLE (as of today) with all the various 
pagedaemon fixes still has ARC issues for your workload?




It doesn't have any issues, but I noticed the improvement on CURRENT. I
observed that just after this change, my package builder is much more
likely to retain MFU and not evict useful things from there (the port
tree) after large builds.
However, I run a lot more 10.0-RELEASE than CURRENT and I would like to
see this improvement release-bound.

I would be happy to test this on 10-STABLE if you think that this is
relevant.



As noted before, unfortunately, this commit (plus its fixups) contains at least
two related but distinct changes.  So, to separate the wheat from the chaff,
could you please try to comment out the following block in
sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c, function 
arc_reclaim_needed:

   if (kmem_free_count()  zfs_arc_free_target) {
   DTRACE_PROBE2(arc__reclaim_freetarget, uint64_t,
   kmem_free_count(), uint64_t, zfs_arc_free_target);
   return (1);
   }

Alternatively, I think that the same effect can be achieved by setting sysctl
vfs.zfs.arc_free_target to the same value as vm.stats.vm.v_free_min.


Thats correct that would achieve the same thing.


It's interesting to me whether you would still see the better performance or if
that improvement would be undone.


Indeed that would be interesting, but we might find that its quite memory size
dependent given the scaling so confirming HW details would be nice too.

I'd also be interested to know who wins the free race between the VM and ARC
when using that value.

For those following this thread but not the review, I've added some additional
information there which you might be interested in:
https://reviews.freebsd.org/D702

   Regards
   Steve
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread John Baldwin
On Saturday, August 30, 2014 1:37:43 pm Peter Wemm wrote:
 On Saturday 30 August 2014 02:03:42 Steven Hartland wrote:
 I'm very disappointed in the attention to detail and errors in the commit.  
 I'm almost at the point where I want to ask for the whole thing to be backed 
 out.

I would not be too supportive of that.  This PR has been open for a long, long 
time with many users using patches from it in production loads that were 
greatly improved by the changes and clamoring on the lists multiple times to 
get someone to look at it.  avg@ contributed quite a bit of time to diagnose 
this with Karl early on, but other developers aside from Steven did not.  It 
also was not hard to explain to Karl the meaning of 'cache + free' in the bug 
follow-ups itself (though I believe avg@ had tried this before and it didn't 
sink in that time for some reason).

I know Steven has since committed a fix, but if there are still concerns, I 
think it would be best to not just revert this entirely but to spend some time  
fixing the remaining issues.  Clearly this issue affects a lot of users and 
the earlier fixes to pagedaemon were not sufficient to fix their issues alone.

-- 
John Baldwin
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Andriy Gapon
on 02/09/2014 19:01 John Baldwin said the following:
 I know Steven has since committed a fix, but if there are still concerns, I 
 think it would be best to not just revert this entirely but to spend some 
 time  
 fixing the remaining issues.  Clearly this issue affects a lot of users and 
 the earlier fixes to pagedaemon were not sufficient to fix their issues alone.

I am not sure that that is the case.  I could have very well missed an evidence
of that, but then I'd appreciate a pointer or two to such reports.
I am certainly sure that a large number of reports about ZFS vs swapping issue
appeared after the pagedaemon problem was introduced.

I have also missed any theoretical justification for the patch.  That is, an
explanation of how the patch interacts with the pagedaemon and improves things.
The empirical evidence could be insufficient, because it's easy to tilt the
balance such that the ARC gives in too easily.  But people who were affected by
the opposite problem could be different from people who would be affected by the
new problem, because of differences in system characteristics such as amount of
RAM, workload patterns, working set sizes, etc.

Having said that, I do not ask for the changes to be reverted, but I'll probably
get back after I have sufficient time to look at the patch and also to test its
effect on my systems.  This might not be very soon though.

P.S. I think that there was no technical reason to initialize the newly
introduced parameters via SYSINIT mechanism.  I think that the initialization
could be just done in arc_init.  And the newly added kmem_foo() functions
probably do not belong in cddl/compat/opensolaris as Solaris / illumos does not
have those functions.  I think that in this case e.g. vm_cnt.v_free_target can
just be used directly by the FreeBSD-specific ARC code.
-- 
Andriy Gapon
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Alan Cox
On 09/02/2014 11:01, John Baldwin wrote:
 On Saturday, August 30, 2014 1:37:43 pm Peter Wemm wrote:
 On Saturday 30 August 2014 02:03:42 Steven Hartland wrote:
 I'm very disappointed in the attention to detail and errors in the commit.  
 I'm almost at the point where I want to ask for the whole thing to be backed 
 out.
 I would not be too supportive of that.  This PR has been open for a long, 
 long 
 time with many users using patches from it in production loads that were 
 greatly improved by the changes and clamoring on the lists multiple times to 
 get someone to look at it.  avg@ contributed quite a bit of time to diagnose 
 this with Karl early on, but other developers aside from Steven did not.  It 
 also was not hard to explain to Karl the meaning of 'cache + free' in the bug 
 follow-ups itself (though I believe avg@ had tried this before and it didn't 
 sink in that time for some reason).

 I know Steven has since committed a fix, but if there are still concerns, I 
 think it would be best to not just revert this entirely but to spend some 
 time  
 fixing the remaining issues.  Clearly this issue affects a lot of users and 
 the earlier fixes to pagedaemon were not sufficient to fix their issues alone.


The patch actually makes two completely orthogonal changes at once, and
one of those changes has no connection to the page daemon.  I suspect
that is why some people have said that their issues were not addressed
by the page daemon fix.

Prior to this patch, we were limiting the ARC size to 3/4 the kmem
map/arena/object size on all architectures, including 64-bit,
uma_small_alloc-enabled architectures where such a limit makes no
sense.  Consequently, some people were complaining, Why is 1/4 of my
memory going unused?


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Steven Hartland
- Original Message - 
From: Alan Cox a...@rice.edu



On 09/02/2014 11:01, John Baldwin wrote:

On Saturday, August 30, 2014 1:37:43 pm Peter Wemm wrote:

On Saturday 30 August 2014 02:03:42 Steven Hartland wrote:
I'm very disappointed in the attention to detail and errors in the commit.  
I'm almost at the point where I want to ask for the whole thing to be backed 
out.
I would not be too supportive of that.  This PR has been open for a long, long 
time with many users using patches from it in production loads that were 
greatly improved by the changes and clamoring on the lists multiple times to 
get someone to look at it.  avg@ contributed quite a bit of time to diagnose 
this with Karl early on, but other developers aside from Steven did not.  It 
also was not hard to explain to Karl the meaning of 'cache + free' in the bug 
follow-ups itself (though I believe avg@ had tried this before and it didn't 
sink in that time for some reason).


I know Steven has since committed a fix, but if there are still concerns, I 
think it would be best to not just revert this entirely but to spend some time  
fixing the remaining issues.  Clearly this issue affects a lot of users and 
the earlier fixes to pagedaemon were not sufficient to fix their issues alone.




The patch actually makes two completely orthogonal changes at once, and
one of those changes has no connection to the page daemon.  I suspect
that is why some people have said that their issues were not addressed
by the page daemon fix.

Prior to this patch, we were limiting the ARC size to 3/4 the kmem
map/arena/object size on all architectures, including 64-bit,
uma_small_alloc-enabled architectures where such a limit makes no
sense.  Consequently, some people were complaining, Why is 1/4 of my
memory going unused?


This is exactly the problem which lead me into investigating the issue.

It should be noted that for i386, as requested by Peter, this limitation
has been re-applied.

   Regards
   Steve
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Steven Hartland
- Original Message - 
From: Andriy Gapon a...@freebsd.org



on 02/09/2014 19:01 John Baldwin said the following:
I know Steven has since committed a fix, but if there are still 
concerns, I
think it would be best to not just revert this entirely but to spend 
some time
fixing the remaining issues.  Clearly this issue affects a lot of 
users and
the earlier fixes to pagedaemon were not sufficient to fix their 
issues alone.


I am not sure that that is the case.  I could have very well missed an 
evidence

of that, but then I'd appreciate a pointer or two to such reports.
I am certainly sure that a large number of reports about ZFS vs 
swapping issue

appeared after the pagedaemon problem was introduced.

I have also missed any theoretical justification for the patch. 
That is, an
explanation of how the patch interacts with the pagedaemon and 
improves things.
The empirical evidence could be insufficient, because it's easy to 
tilt the
balance such that the ARC gives in too easily.  But people who were 
affected by
the opposite problem could be different from people who would be 
affected by the
new problem, because of differences in system characteristics such as 
amount of

RAM, workload patterns, working set sizes, etc.

Having said that, I do not ask for the changes to be reverted, but 
I'll probably
get back after I have sufficient time to look at the patch and also to 
test its

effect on my systems.  This might not be very soon though.

P.S. I think that there was no technical reason to initialize the 
newly

introduced parameters via SYSINIT mechanism.


I created the SYSINT as the values are not available early enough
for the sysctl.


I think that the initialization could be just done in arc_init.


I thought I'd tested this when doing the initial implementation
but it was a long week so I just did a quick re-test and confirmed
arc_init is called before the pagedaemon is initialised so
it can't be used in this case.

On a side note it would be nice if ARC sysctls, which are currently
done via arc_init, where cleaned up as currently the limits aren't
enforced correctly when manually changed, as well as being split
up into two locations making them hard to follow. Something I intend
to look at when I get some free time.


And the newly added kmem_foo() functions probably do not belong in
cddl/compat/opensolaris as Solaris / illumos does not have those 
functions.


They could be moved but their current location keeps all the kmem
related functions neatly in one place. Spreading them around IMO
would just make things hard to find.


I think that in this case e.g. vm_cnt.v_free_target can just be
used directly by the FreeBSD-specific ARC code.


It could but as above keeping everything in one place makes it to
find and hence MFC as this area has seen changes which will require
all those fields renamed. It also means if the logic for free pages
changes in the future there's only one place it needs to be changed.

For those interested there's also an open review on additional
changes in this area: https://reviews.freebsd.org/D702

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Alan Cox
On 09/02/2014 12:34, Steven Hartland wrote:
 - Original Message - From: Alan Cox a...@rice.edu

 On 09/02/2014 11:01, John Baldwin wrote:
 On Saturday, August 30, 2014 1:37:43 pm Peter Wemm wrote:
 On Saturday 30 August 2014 02:03:42 Steven Hartland wrote:
 I'm very disappointed in the attention to detail and errors in the
 commit.  I'm almost at the point where I want to ask for the whole
 thing to be backed out.
 I would not be too supportive of that.  This PR has been open for a
 long, long time with many users using patches from it in production
 loads that were greatly improved by the changes and clamoring on the
 lists multiple times to get someone to look at it.  avg@ contributed
 quite a bit of time to diagnose this with Karl early on, but other
 developers aside from Steven did not.  It also was not hard to
 explain to Karl the meaning of 'cache + free' in the bug follow-ups
 itself (though I believe avg@ had tried this before and it didn't
 sink in that time for some reason).

 I know Steven has since committed a fix, but if there are still
 concerns, I think it would be best to not just revert this entirely
 but to spend some time  fixing the remaining issues.  Clearly this
 issue affects a lot of users and the earlier fixes to pagedaemon
 were not sufficient to fix their issues alone.


 The patch actually makes two completely orthogonal changes at once, and
 one of those changes has no connection to the page daemon.  I suspect
 that is why some people have said that their issues were not addressed
 by the page daemon fix.

 Prior to this patch, we were limiting the ARC size to 3/4 the kmem
 map/arena/object size on all architectures, including 64-bit,
 uma_small_alloc-enabled architectures where such a limit makes no
 sense.  Consequently, some people were complaining, Why is 1/4 of my
 memory going unused?

 This is exactly the problem which lead me into investigating the issue.



Is there any evidence that anything other than eliminating the KVA limit
is needed on machines where the page daemon isn't broken?


 It should be noted that for i386, as requested by Peter, this limitation
 has been re-applied.



Unlike Solaris, we run on a few 32-bit architectures, besides i386, that
don't have a direct map or a full 4GB address space for the kernel.  So,
for FreeBSD, this needs to be more general than just i386.  I would
suggest using '#ifndef UMA_MD_SMALL_ALLOC' as being the closest thing
that we have to what you want here.  This check will allow any machine,
including 32-bit machines that allocate some kernel memory through a
direct map, to opt out of the kmem map/arena/object limit.

Finally, the Solaris KVA check is written to avoid the possibility of
integer overflow.  However, the FreeBSD version is not.  For example,
suppose that I setup an i386 machine with something approaching a
2GB/2GB user/kernel split, 3 * kmem_size() could overflow.

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Steven Hartland
- Original Message - 
From: Alan Cox a...@rice.edu

On 09/02/2014 12:34, Steven Hartland wrote:

- Original Message - From: Alan Cox a...@rice.edu

The patch actually makes two completely orthogonal changes at once, and
one of those changes has no connection to the page daemon.  I suspect
that is why some people have said that their issues were not addressed
by the page daemon fix.

Prior to this patch, we were limiting the ARC size to 3/4 the kmem
map/arena/object size on all architectures, including 64-bit,
uma_small_alloc-enabled architectures where such a limit makes no
sense.  Consequently, some people were complaining, Why is 1/4 of my
memory going unused?


This is exactly the problem which lead me into investigating the issue.



Is there any evidence that anything other than eliminating the KVA limit
is needed on machines where the page daemon isn't broken?


In my direct experience, I would have to say no, I can't speak for others.


It should be noted that for i386, as requested by Peter, this limitation
has been re-applied.



Unlike Solaris, we run on a few 32-bit architectures, besides i386, that
don't have a direct map or a full 4GB address space for the kernel.  So,
for FreeBSD, this needs to be more general than just i386.  I would
suggest using '#ifndef UMA_MD_SMALL_ALLOC' as being the closest thing
that we have to what you want here.  This check will allow any machine,
including 32-bit machines that allocate some kernel memory through a
direct map, to opt out of the kmem map/arena/object limit.


I'm not and don't profess to be an expert in this domain as you
know ;-) So I'm to defer to you guys, Peter would you agree with
this?


Finally, the Solaris KVA check is written to avoid the possibility of
integer overflow.  However, the FreeBSD version is not.  For example,
suppose that I setup an i386 machine with something approaching a
2GB/2GB user/kernel split, 3 * kmem_size() could overflow.


The returns of said functions are uint64_t so I'm not sure where
the overflow would be when working with 2GB values?

We seem to be missing some of the following, which doesn't impact
us directly but may affect understanding of the current illumos
code as its not in the tree:
https://github.com/illumos/illumos-gate/commit/94dd93aee32d1616436eb51fb7b58094b9a8d3e8

   Regards
   Steve
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-09-02 Thread Alan Cox
On 09/02/2014 14:09, Steven Hartland wrote:
 - Original Message - From: Alan Cox a...@rice.edu
 On 09/02/2014 12:34, Steven Hartland wrote:
 - Original Message - From: Alan Cox a...@rice.edu
 The patch actually makes two completely orthogonal changes at once,
 and
 one of those changes has no connection to the page daemon.  I suspect
 that is why some people have said that their issues were not addressed
 by the page daemon fix.

 Prior to this patch, we were limiting the ARC size to 3/4 the kmem
 map/arena/object size on all architectures, including 64-bit,
 uma_small_alloc-enabled architectures where such a limit makes no
 sense.  Consequently, some people were complaining, Why is 1/4 of my
 memory going unused?

 This is exactly the problem which lead me into investigating the issue.


 Is there any evidence that anything other than eliminating the KVA limit
 is needed on machines where the page daemon isn't broken?

 In my direct experience, I would have to say no, I can't speak for
 others.

 It should be noted that for i386, as requested by Peter, this
 limitation
 has been re-applied.


 Unlike Solaris, we run on a few 32-bit architectures, besides i386, that
 don't have a direct map or a full 4GB address space for the kernel.  So,
 for FreeBSD, this needs to be more general than just i386.  I would
 suggest using '#ifndef UMA_MD_SMALL_ALLOC' as being the closest thing
 that we have to what you want here.  This check will allow any machine,
 including 32-bit machines that allocate some kernel memory through a
 direct map, to opt out of the kmem map/arena/object limit.

 I'm not and don't profess to be an expert in this domain as you
 know ;-) So I'm to defer to you guys, Peter would you agree with
 this?

 Finally, the Solaris KVA check is written to avoid the possibility of
 integer overflow.  However, the FreeBSD version is not.  For example,
 suppose that I setup an i386 machine with something approaching a
 2GB/2GB user/kernel split, 3 * kmem_size() could overflow.

 The returns of said functions are uint64_t so I'm not sure where
 the overflow would be when working with 2GB values?



Ah, I didn't see the uint64_t.  I only looked at the calculation and
assumed vm_offset_t was being used.  So, yes, you are safe from overflow.


 We seem to be missing some of the following, which doesn't impact
 us directly but may affect understanding of the current illumos
 code as its not in the tree:
 https://github.com/illumos/illumos-gate/commit/94dd93aee32d1616436eb51fb7b58094b9a8d3e8


Regards
Steve



___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-30 Thread Peter Wemm
On Saturday 30 August 2014 02:03:42 Steven Hartland wrote:
 - Original Message -
 From: Peter Wemm pe...@wemm.org
 
  On Friday 29 August 2014 21:42:15 Steven Hartland wrote:

 
  If this function returns non-zerp, ARC is given back:
  
  static int
  arc_reclaim_needed(void)
  {
  
  if (kmem_free_count()  zfs_arc_free_target) {
  
  return (1);
  
  }
  
   /*
   * Cooperate with pagedaemon when it's time for it to scan
   * and reclaim some pages.
   */
  
  if (vm_paging_needed()) {
  
  return (1);
  
  }
  
  ie: if v_free (ignoring v_cache free pages) gets below the threshold,
  stop
  evertyhing and discard ARC pages.
  
  The vm_paging_needed() code is a NO-OP at this point. It can never
  return
  
  true.  Consider:
  vm_cnt.v_free_target = 4 * vm_cnt.v_free_min +
  
  vm_cnt.v_free_reserved;
  vs
  
  vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11;
  
  zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of
  v_free_min, and compares it against the smaller v_free pool.
  
  vm_paging_needed() compares the total free pool (v_free + v_cache)
  against the
  smaller wakeup threshold - 110% of v_free_min.
  
  Comparing a larger value against a smaller target than the previous
  test will
  never succeed unless you manually change the arc_free_target sysctl.
 
 I'm aware of the values involved, and as I said what you're proposing
 was more akin to where I started, but I was informed that it had already
 been tested and didn't work well.

And Karl also said that his tests are on machines that have no v_cache, so 
he's not testing the scenario.

The code, as written, is wrong.  It's as simple as that.

The logic is wrong.

You've introduced dead code.

Your code changes introduce a scenario that CAUSES one of the very problems 
you're using as a justtification for the changes.

Your own testers have admitted that they don't test the scenario that the 
problem exists with.

  Also, what about the magic numbers here:
  u_int zfs_arc_free_target = (1  19); /* default before pagedaemon
  init only */
 
 That is just a total fall back case and should never be triggered unless
 as the comment states the pagedaemon isn't initialised.
 
  That's half a million pages, or 2GB of physical ram on a 4K page size
  system
  How is this going to work on early boot in the machines in the cluster
  with
  less than 2GB of ram?
 
 Its there to ensure that ARC doesn't run wild ARC for the few
 milliseconds
 / seconds before pagedaemon is initalised.
 
 We can change the value no problem, what would you suggest 116 aka
 256MB?

Please stop picking magic numbers out of thin air.  You are working with file 
system and VM - critical parts of the system.  This is NOT the place to be 
screwing around with things you don't understand.  alc@ was trying to be 
polite.

 Thanks for all the feedback, its great to have my understanding of
 how things work in this area confirmed by those who know.

 Hopefully we'll be able to get to the bottom of this with everyones
 help and get a solid fix for these issues that have plaged 10 into
 10.1 :)

I'm very disappointed in the attention to detail and errors in the commit.  
I'm almost at the point where I want to ask for the whole thing to be backed 
out.

-- 
Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246

signature.asc
Description: This is a digitally signed message part.


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-30 Thread Steven Hartland
- Original Message - 
From: Peter Wemm pe...@wemm.org

snip...
 I'm aware of the values involved, and as I said what you're 
 proposing
 was more akin to where I started, but I was informed that it had 
 already

 been tested and didn't work well.

And Karl also said that his tests are on machines that have no 
v_cache, so

he's not testing the scenario.

The code, as written, is wrong.  It's as simple as that.

The logic is wrong.

You've introduced dead code.

Your code changes introduce a scenario that CAUSES one of the very 
problems

you're using as a justtification for the changes.

Your own testers have admitted that they don't test the scenario that 
the

problem exists with.


Wooo hold on there, I'm trying to react to feedback and come to a proper
solution. I've already said that my initial version was very much like 
what
I believe your requesting its changed to, but I reacted to feedback on 
that.


Now if that feedback was inaccurate or miss-guided I'm sorry, and I'm
thankfull for your input as the domain expert.

The PR was created quite some time ago, and it wasn't till I encountered 
a
related issue in a live system the other weekend that it got some 
attention.


Its clearly an important issue that needs resolving so lets work 
together

to come up with a fix everyones happy with.


  Also, what about the magic numbers here:
  u_int zfs_arc_free_target = (1  19); /* default before 
  pagedaemon

  init only */

 That is just a total fall back case and should never be triggered 
 unless

 as the comment states the pagedaemon isn't initialised.

  That's half a million pages, or 2GB of physical ram on a 4K page 
  size

  system
  How is this going to work on early boot in the machines in the 
  cluster

  with
  less than 2GB of ram?

 Its there to ensure that ARC doesn't run wild ARC for the few
 milliseconds
 / seconds before pagedaemon is initalised.

 We can change the value no problem, what would you suggest 116 aka
 256MB?

Please stop picking magic numbers out of thin air.  You are working 
with file
system and VM - critical parts of the system.  This is NOT the place 
to be
screwing around with things you don't understand.  alc@ was trying to 
be

polite.


Please help me out here, I'm trying to do the right thing so I'm looking
to you guys for advice.


 Thanks for all the feedback, its great to have my understanding of
 how things work in this area confirmed by those who know.

 Hopefully we'll be able to get to the bottom of this with everyones
 help and get a solid fix for these issues that have plaged 10 into
 10.1 :)

I'm very disappointed in the attention to detail and errors in the 
commit.
I'm almost at the point where I want to ask for the whole thing to be 
backed

out.


I'm not disagreeing with anyone, I'm simply trying to accure the 
information
on how to update this that will result in the best for users, so 
sweeping

statements like this just make me feel bad for trying to help :(

I'm more than happy to address any concerns people have.

I've kicked this off with a webrev https://reviews.freebsd.org/D700

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Steven Hartland

On Thursday 28 August 2014 17:30:17 Alan Cox wrote:
 On 08/28/2014 16:15, Matthew D. Fuller wrote:
  On Thu, Aug 28, 2014 at 10:11:39PM +0100 I heard the voice of
 
  Steven Hartland, and lo! it spake thus:
  Its very likely applicable to stable/9 although I've never used 9
  myself, we jumped from 9 direct to 10.
 
  This is actually hitting two different issues from the two bugs:
 
  - 191510 is about ARC isn't greedy enough on huge-memory 
  machines,

 
and from the osreldate that bug was filed on 9.2, so presumably 
  is

applicable.
 
  - 187594 is about ARC is too greedy (probably mostly on 
  not-so-huge

 
machines) and starves/drives the rest of the system into swap. 
  That

I believe came about as a result of some unrelated change in the
10.x stream that upset the previous balance between ARC and the 
  rest

of the VM, so isn't a problem on 9.x.

 10.0 had a bug in the page daemon that was fixed in 10-STABLE about
 three months ago (r265945).  The ARC was not the only thing affected 
 by

this bug.

I'm concerned about potential unintended consequences of this change.

Before, arc reclaim was driven by vm_paging_needed(), which was:
vm_paging_needed(void)
{
return (vm_cnt.v_free_count + vm_cnt.v_cache_count 
vm_pageout_wakeup_thresh);
}

Now it's ignoring the v_cache_count and looking exclusively at 
v_free_count.
cache pages are free pages that just happen to have known contents. 
If I
read this change right, zfs arc will now discard checksummed cache 
pages to

make room for non-checksummed pages:


That test is still there so if it needs to it will still trigger.

However that often a lower level as vm_pageout_wakeup_thresh is only 
110%
of min free, where as zfs_arc_free_target is based of target free which 
is

4 * (min free + reserved).


+   if (kmem_free_count()  zfs_arc_free_target) {
+   return (1);
+   }
...
+kmem_free_count(void)
+{
+   return (vm_cnt.v_free_count);
+}

This seems like a pretty substantial behavior change.  I'm concerned 
that it

doesn't appear to count all the forms of free pages.

I haven't seen the problems with the over-aggressive ARC since the 
page daemon
bug was fixed.  It's been working fine under pretty abusive loads in 
the freebsd

cluster after that fix.


Others have also confirmed that even with r265945 they can still trigger
performance issue.

In addition without it we still have loads of RAM sat their unused, in 
my
particular experience we have 40GB of 192GB sitting their unused and 
that

was with a stable build from last weekend.

With the patch we confirmed that both RAM usage and performance for 
those

seeing that issue are resolved, with no reported regressions.

(I should know better than to fire a reply off before full fact 
checking, but

this commit worries me..)


Not a problem, its great to know people pay attention to changes, and 
raise
their concerns. Always better to have a discussion about potential 
issues

than to wait for a problem to occur.

Hopefully the above gives you some piece of mind, but if you still have 
any

concerns I'm all ears.

Out of interest would it be possible to update machines in the cluster 
to

see how their workload reacts to the change?

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Bryan Drewery
On 8/28/2014 6:22 PM, Peter Wemm wrote:
[...]
 vm_paging_needed(void)
 {
 return (vm_cnt.v_free_count + vm_cnt.v_cache_count 
 vm_pageout_wakeup_thresh);
 }

By the way there is a signed to unsigned comparison here.

-- 
Regards,
Bryan Drewery



signature.asc
Description: OpenPGP digital signature


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Peter Wemm
On Friday 29 August 2014 11:54:42 Alan Cox wrote:
 On 08/29/2014 03:32, Steven Hartland wrote:
  On Thursday 28 August 2014 17:30:17 Alan Cox wrote:
   On 08/28/2014 16:15, Matthew D. Fuller wrote:
On Thu, Aug 28, 2014 at 10:11:39PM +0100 I heard the voice of

Steven Hartland, and lo! it spake thus:
Its very likely applicable to stable/9 although I've never used 9
myself, we jumped from 9 direct to 10.

This is actually hitting two different issues from the two bugs:

- 191510 is about ARC isn't greedy enough on huge-memory  
  
  machines,
  
  and from the osreldate that bug was filed on 9.2, so presumably

is

  applicable.

- 187594 is about ARC is too greedy (probably mostly on  
  
  not-so-huge
  
  machines) and starves/drives the rest of the system into swap.

That

  I believe came about as a result of some unrelated change in the
  10.x stream that upset the previous balance between ARC and the

rest

  of the VM, so isn't a problem on 9.x.
   
   10.0 had a bug in the page daemon that was fixed in 10-STABLE about
   three months ago (r265945).  The ARC was not the only thing
  
  affected  by
  this bug.
  
  I'm concerned about potential unintended consequences of this change.
  
  Before, arc reclaim was driven by vm_paging_needed(), which was:
  vm_paging_needed(void)
  {
  
  return (vm_cnt.v_free_count + vm_cnt.v_cache_count 
  
  vm_pageout_wakeup_thresh);
  
  }
  
  Now it's ignoring the v_cache_count and looking exclusively at
  v_free_count.
  cache pages are free pages that just happen to have known contents.
  If I
  read this change right, zfs arc will now discard checksummed cache
  pages to
  
  make room for non-checksummed pages:
  That test is still there so if it needs to it will still trigger.
  
  However that often a lower level as vm_pageout_wakeup_thresh is only 110%
  of min free, where as zfs_arc_free_target is based of target free
  which is
  4 * (min free + reserved).
  
  +   if (kmem_free_count()  zfs_arc_free_target) {
  +   return (1);
  +   }
  ...
  +kmem_free_count(void)
  +{
  +   return (vm_cnt.v_free_count);
  +}
  
  This seems like a pretty substantial behavior change.  I'm concerned
  that it
  doesn't appear to count all the forms of free pages.
  
  I haven't seen the problems with the over-aggressive ARC since the
  page daemon
  bug was fixed.  It's been working fine under pretty abusive loads in
  the freebsd
  cluster after that fix.
  
  Others have also confirmed that even with r265945 they can still trigger
  performance issue.
  
  In addition without it we still have loads of RAM sat their unused, in my
  particular experience we have 40GB of 192GB sitting their unused and that
  was with a stable build from last weekend.
 
 The Solaris code only imposed this limit on 32-bit machines where the
 available kernel virtual address space may be much less than the
 available physical memory.  Previously, FreeBSD imposed this limit on
 both 32-bit and 64-bit machines.  Now, it imposes it on neither.  Why
 continue to do this differently from Solaris?

Since the question was asked below, we don't have zfs machines in the cluster 
running i386.  We can barely get them to boot as it is due to kva pressure.  
We have to reduce/cap physical memory and change the user/kernel virtual split 
from 3:1 to 2.5:1.5. 

We do run zfs on small amd64 machines with 2G of ram, but I can't imagine it 
working on the 10G i386 PAE machines that we have.


  With the patch we confirmed that both RAM usage and performance for those
  seeing that issue are resolved, with no reported regressions.
  
  (I should know better than to fire a reply off before full fact
  checking, but
  this commit worries me..)
  
  Not a problem, its great to know people pay attention to changes, and
  raise
  their concerns. Always better to have a discussion about potential issues
  than to wait for a problem to occur.
  
  Hopefully the above gives you some piece of mind, but if you still
  have any
  concerns I'm all ears.
 
 You didn't really address Peter's initial technical issue.  Peter
 correctly observed that cache pages are just another flavor of free
 pages.  Whenever the VM system is checking the number of free pages
 against any of the thresholds, it always uses the sum of v_cache_count
 and v_free_count.  So, to anyone familiar with the VM system, like
 Peter, what you've done, which is to derive a threshold from
 v_free_target but only compare v_free_count to that threshold, looks
 highly suspect.

I think I'd like to see something like this:

Index: cddl/compat/opensolaris/kern/opensolaris_kmem.c
===
--- cddl/compat/opensolaris/kern/opensolaris_kmem.c (revision 270824)
+++ cddl/compat/opensolaris/kern/opensolaris_kmem.c (working copy)
@@ -152,7 +152,8 @@
 

Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Steven Hartland
- Original Message - 
From: Alan Cox a...@rice.edu



You didn't really address Peter's initial technical issue.  Peter
correctly observed that cache pages are just another flavor of free
pages.  Whenever the VM system is checking the number of free pages
against any of the thresholds, it always uses the sum of v_cache_count
and v_free_count.  So, to anyone familiar with the VM system, like
Peter, what you've done, which is to derive a threshold from
v_free_target but only compare v_free_count to that threshold, looks
highly suspect.

That said, I can easily believe that your patch works better than the
existing code, because it is closer in spirit to my interpretation of
what the Solaris code does.  Specifically, I believe that the Solaris
code starts trimming the ARC before the Solaris page daemon starts
writing dirty pages to secondary storage.  Now, you've made FreeBSD do
the same.  However, you've expressed it in a way that looks broken.

To wrap up, I think that you can easily write this in a way that
simultaneously behaves like Solaris and doesn't look wrong to a VM 
expert.


More details in my last reply on this but in short it seems this has
already been tried and it didn't work.

I'd be interested in what domain experts think about why that is?

In the mean time I've asked Karl to see if he could retest with
this change to confirm counting cache pages along with free does
indeed still cause a problem.

For those that want to catch up on what has already tested see the
original PR:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Steven Hartland
- Original Message - 
From: Steven Hartland s...@freebsd.org



- Original Message - 
From: Alan Cox a...@rice.edu



You didn't really address Peter's initial technical issue.  Peter
correctly observed that cache pages are just another flavor of free
pages.  Whenever the VM system is checking the number of free pages
against any of the thresholds, it always uses the sum of 
v_cache_count

and v_free_count.  So, to anyone familiar with the VM system, like
Peter, what you've done, which is to derive a threshold from
v_free_target but only compare v_free_count to that threshold, looks
highly suspect.

That said, I can easily believe that your patch works better than the
existing code, because it is closer in spirit to my interpretation of
what the Solaris code does.  Specifically, I believe that the Solaris
code starts trimming the ARC before the Solaris page daemon starts
writing dirty pages to secondary storage.  Now, you've made FreeBSD 
do

the same.  However, you've expressed it in a way that looks broken.

To wrap up, I think that you can easily write this in a way that
simultaneously behaves like Solaris and doesn't look wrong to a VM 
expert.


More details in my last reply on this but in short it seems this has
already been tried and it didn't work.

I'd be interested in what domain experts think about why that is?

In the mean time I've asked Karl to see if he could retest with
this change to confirm counting cache pages along with free does
indeed still cause a problem.

For those that want to catch up on what has already tested see the
original PR:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=187594


Copying in Karl's response from the PR for easy access:

I can give it a shot on my test system but I will note that for my 
production
machine right now that would be a no-op as there are no cache pages in 
use.

Getting that on the production system with a high level of load over the
holiday is rather unlikely due to it being Labor Day, so all I will have 
are my
synthetics.  I do have some time over the long weekend to run that test 
and

should be able to do so if you think it's useful, but I am wary of that
approach being correct in the general case due to my previous 
experience.


My commentary on that discussion point and reasoning originally is here
(http://lists.freebsd.org/pipermail/freebsd-fs/2014-March/019084.html); 
along
that thread you'll see a vmstat output with the current paradigm showing 
that
cache doesn't grow without boundary, as many suggested it would if I 
didn't
include those pages as free and thus available for ARC to attempt to 
invade
(effectively trying to force the VM system to evict them from the cache 
list by
having it wake up first.)  Indeed, up at comment #31 is the patch on 
that
production system that has been up for about four months and you can see 
only a

few pages are in the cached state.  That is fairly typical.

What is the expected reason for concern about including them in the free 
count
for ARC (when they're not in terms of what wakes up the VM system as a 
whole.)
If the expected area of concern is that cache pages will grow without 
boundary

(or close to it) the evidence from both my production machine and others
strongly suggests that's incorrect.

My work suggests strongly that the most-likely to be correct behavior 
for the
majority of workloads is achieved when both the ARC paring routine and 
VM

page-cleaning system wake up at the same time.  That occurs when
vm_cnt.v_free_count is invaded.  Attempting to bias the outcome to force 
either
the VM system or the ARC cache to do something first appears to increase 
the

circumstances under which bad behavior occurs.


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Steven Hartland


- Original Message - 
From: Peter Wemm pe...@wemm.org

On Friday 29 August 2014 20:51:03 Steven Hartland wrote:

snip..

 Does Karl's explaination as to why this doesn't work above change 
 your

 mind?

Actually no, I would expect the code as committed would *cause* the
undesirable behavior that Karl described.

ie: access a few large files and cause them to reside in cache.  Say 
50GB or so

on a 200G ram machine.  We now have the state where:

v_cache = 50GB
v_free = 1MB

The rest of the vm system looks at vm_paging_needed(), which is:  do 
we have
enough v_cache + v_free?  Since there's 50.001GB free, the answer is 
no.
It'll let v_free run right down to v_free_min because of the giant 
pool of

v_cache just sitting there, waiting to be used.

The zfs change, as committed will ignore all the free memory in the 
form of
v_cache.. and will be freaking out about how low v_free is getting and 
will be

sacrificing ARC in order to put more memory into the v_free pool.

As long as ARC keeps sacrificing itself this way, the free pages in 
the v_cache
pool won't get used.  When ARC finally runs out of pages to give up to 
v_free,
the kernel will start using the free pages from v_cache.  Eventually 
it'll run
down that v_cache free pool and arc will be in a bare minimum state 
while this

is happening.

Meanwhile, ZFS ARC will be crippled.  This has consequences - it does 
RCU like
things from ARC to keep fragmentation under control.  With ARC 
crippled,
fragmentation will increase because there's less opportunistic 
gathering of

data from ARC.

Granted, you have to get things freed from active/inactive to the 
cache state,

but once it's there, depending on the worlkload, it'll mess with ARC.


There's already a vm_paging_needed() check in there below so this will 
already

be dealt with will it not?

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Peter Wemm
On Friday 29 August 2014 21:42:15 Steven Hartland wrote:
 - Original Message -
 From: Peter Wemm pe...@wemm.org
 
  On Friday 29 August 2014 20:51:03 Steven Hartland wrote:
 snip..
 
   Does Karl's explaination as to why this doesn't work above change
   your
   mind?
  
  Actually no, I would expect the code as committed would *cause* the
  undesirable behavior that Karl described.
  
  ie: access a few large files and cause them to reside in cache.  Say
  50GB or so
  on a 200G ram machine.  We now have the state where:
  
  v_cache = 50GB
  v_free = 1MB
  
  The rest of the vm system looks at vm_paging_needed(), which is:  do
  we have
  enough v_cache + v_free?  Since there's 50.001GB free, the answer is
  no.
  It'll let v_free run right down to v_free_min because of the giant
  pool of
  v_cache just sitting there, waiting to be used.
  
  The zfs change, as committed will ignore all the free memory in the
  form of
  v_cache.. and will be freaking out about how low v_free is getting and
  will be
  sacrificing ARC in order to put more memory into the v_free pool.
  
  As long as ARC keeps sacrificing itself this way, the free pages in
  the v_cache
  pool won't get used.  When ARC finally runs out of pages to give up to
  v_free,
  the kernel will start using the free pages from v_cache.  Eventually
  it'll run
  down that v_cache free pool and arc will be in a bare minimum state
  while this
  is happening.
  
  Meanwhile, ZFS ARC will be crippled.  This has consequences - it does
  RCU like
  things from ARC to keep fragmentation under control.  With ARC
  crippled,
  fragmentation will increase because there's less opportunistic
  gathering of
  data from ARC.
  
  Granted, you have to get things freed from active/inactive to the
  cache state,
  but once it's there, depending on the worlkload, it'll mess with ARC.
 
 There's already a vm_paging_needed() check in there below so this will
 already
 be dealt with will it not?

No.

If you read the code that you changed, you won't get that far. The v_free test 
comes before vm_paging_needed(), and if the v_free test triggers then ARC will 
return pages and not look at the rest of the function.

If this function returns non-zerp, ARC is given back:

static int
arc_reclaim_needed(void)
{
if (kmem_free_count()  zfs_arc_free_target) {
return (1);
}
 /*
 * Cooperate with pagedaemon when it's time for it to scan
 * and reclaim some pages.
 */
if (vm_paging_needed()) {
return (1);
}

ie: if v_free (ignoring v_cache free pages) gets below the threshold, stop 
evertyhing and discard ARC pages. 

The vm_paging_needed() code is a NO-OP at this point. It can never return 
true.  Consider:
vm_cnt.v_free_target = 4 * vm_cnt.v_free_min + vm_cnt.v_free_reserved;
vs
vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11;

zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of 
v_free_min, and compares it against the smaller v_free pool.

vm_paging_needed() compares the total free pool (v_free + v_cache) against the 
smaller wakeup threshold - 110% of v_free_min.

Comparing a larger value against a smaller target than the previous test will 
never succeed unless you manually change the arc_free_target sysctl.


Also, what about the magic numbers here:
u_int zfs_arc_free_target = (1  19); /* default before pagedaemon init only 
*/

That's half a million pages, or 2GB of physical ram on a 4K page size system  
How is this going to work on early boot in the machines in the cluster with 
less than 2GB of ram?

-- 
Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246

signature.asc
Description: This is a digitally signed message part.


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-29 Thread Steven Hartland


- Original Message - 
From: Peter Wemm pe...@wemm.org

On Friday 29 August 2014 21:42:15 Steven Hartland wrote:
 - Original Message -
 From: Peter Wemm pe...@wemm.org

  On Friday 29 August 2014 20:51:03 Steven Hartland wrote:
 snip..

   Does Karl's explaination as to why this doesn't work above 
   change

   your
   mind?
 
  Actually no, I would expect the code as committed would *cause* 
  the

  undesirable behavior that Karl described.
 
  ie: access a few large files and cause them to reside in cache. 
  Say

  50GB or so
  on a 200G ram machine.  We now have the state where:
 
  v_cache = 50GB
  v_free = 1MB
 
  The rest of the vm system looks at vm_paging_needed(), which is: 
  do

  we have
  enough v_cache + v_free?  Since there's 50.001GB free, the 
  answer is

  no.
  It'll let v_free run right down to v_free_min because of the giant
  pool of
  v_cache just sitting there, waiting to be used.
 
  The zfs change, as committed will ignore all the free memory in 
  the

  form of
  v_cache.. and will be freaking out about how low v_free is getting 
  and

  will be
  sacrificing ARC in order to put more memory into the v_free pool.
 
  As long as ARC keeps sacrificing itself this way, the free pages 
  in

  the v_cache
  pool won't get used.  When ARC finally runs out of pages to give 
  up to

  v_free,
  the kernel will start using the free pages from v_cache. 
  Eventually

  it'll run
  down that v_cache free pool and arc will be in a bare minimum 
  state

  while this
  is happening.
 
  Meanwhile, ZFS ARC will be crippled.  This has consequences - it 
  does

  RCU like
  things from ARC to keep fragmentation under control.  With ARC
  crippled,
  fragmentation will increase because there's less opportunistic
  gathering of
  data from ARC.
 
  Granted, you have to get things freed from active/inactive to the
  cache state,
  but once it's there, depending on the worlkload, it'll mess with 
  ARC.


 There's already a vm_paging_needed() check in there below so this 
 will

 already be dealt with will it not?

No.

If you read the code that you changed, you won't get that far. The 
v_free test
comes before vm_paging_needed(), and if the v_free test triggers then 
ARC will

return pages and not look at the rest of the function.


Sorry I should have phrased that question better, prior to the change
vm_paging_needed() was the top test, ignoring needfeed, but it still 
causing

performance issues.

Surely with all other return 1 cases triggering at what should be a 
higher level
of free memory we should never have seen the performance issues, but 
users where
reporting it lots. So there's still some mystery surrounding why was 
this

happening?


If this function returns non-zerp, ARC is given back:

static int
arc_reclaim_needed(void)
{
if (kmem_free_count()  zfs_arc_free_target) {
return (1);
}
 /*
 * Cooperate with pagedaemon when it's time for it to scan
 * and reclaim some pages.
 */
if (vm_paging_needed()) {
return (1);
}

ie: if v_free (ignoring v_cache free pages) gets below the threshold, 
stop

evertyhing and discard ARC pages.

The vm_paging_needed() code is a NO-OP at this point. It can never 
return

true.  Consider:
vm_cnt.v_free_target = 4 * vm_cnt.v_free_min + 
vm_cnt.v_free_reserved;

vs
vm_pageout_wakeup_thresh = (vm_cnt.v_free_min / 10) * 11;

zfs_arc_free_target defaults to vm_cnt.v_free_target, which is 400% of
v_free_min, and compares it against the smaller v_free pool.

vm_paging_needed() compares the total free pool (v_free + v_cache) 
against the

smaller wakeup threshold - 110% of v_free_min.

Comparing a larger value against a smaller target than the previous 
test will

never succeed unless you manually change the arc_free_target sysctl.


I'm aware of the values involved, and as I said what you're proposing
was more akin to where I started, but I was informed that it had already
been tested and didn't work well.

So as I'm sure you'll appreciate given that information I opted to trust
the real life tests.

Now its totally possible there could be something in the tests that
was skewing the result, but as it still indicated the performance issue
was still there, where as using the current values it wasn't, I opted
for that vs. what I believed was the more technically correct value.

Now you've confirmed what I initially thought should be the correct 
values

are indeed so; I've asked Karl to retest, so we can confirm that any of
changes that went into stable/10 after that point haven't changed this
behaviour.

Hope that makes sense?


Also, what about the magic numbers here:
u_int zfs_arc_free_target = (1  19); /* default before pagedaemon
init only */


That is just a total fall back case and should never be triggered unless
as the comment states the pagedaemon isn't initialised.

That's half a million pages, or 2GB of physical ram on 

svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Steven Hartland
Author: smh
Date: Thu Aug 28 19:50:08 2014
New Revision: 270759
URL: http://svnweb.freebsd.org/changeset/base/270759

Log:
  Refactor ZFS ARC reclaim logic to be more VM cooperative
  
  Prior to this change we triggered ARC reclaim when kmem usage passed 3/4
  of the total available, as indicated by vmem_size(kmem_arena, VMEM_ALLOC).
  
  This could lead large amounts of unused RAM e.g. on a 192GB machine with
  ARC the only major RAM consumer, 40GB of RAM would remain unused.
  
  The old method has also been seen to result in extreme RAM usage under
  certain loads, causing poor performance and stalls.
  
  We now trigger ARC reclaim when the number of free pages drops below the
  value defined by the new sysctl vfs.zfs.arc_free_target, which defaults
  to the value of vm.v_free_target.
  
  Credit to Karl Denninger for the original patch on which this update was
  based.
  
  PR:   191510 and 187594
  Tested by:dteske
  MFC after:1 week
  Relnotes: yes
  Sponsored by: Multiplay

Modified:
  head/sys/cddl/compat/opensolaris/kern/opensolaris_kmem.c
  head/sys/cddl/compat/opensolaris/sys/kmem.h
  head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
  head/sys/vm/vm_pageout.c

Modified: head/sys/cddl/compat/opensolaris/kern/opensolaris_kmem.c
==
--- head/sys/cddl/compat/opensolaris/kern/opensolaris_kmem.cThu Aug 28 
18:59:39 2014(r270758)
+++ head/sys/cddl/compat/opensolaris/kern/opensolaris_kmem.cThu Aug 28 
19:50:08 2014(r270759)
@@ -126,18 +126,47 @@ kmem_size_init(void *unused __unused)
 }
 SYSINIT(kmem_size_init, SI_SUB_KMEM, SI_ORDER_ANY, kmem_size_init, NULL);
 
-uint64_t
-kmem_size(void)
+/*
+ * The return values from kmem_free_* are only valid once the pagedaemon
+ * has been initialised, before then they return 0.
+ * 
+ * To ensure the returns are valid the caller can use a SYSINIT with
+ * subsystem set to SI_SUB_KTHREAD_PAGE and an order of at least
+ * SI_ORDER_SECOND.
+ */
+u_int
+kmem_free_target(void)
 {
 
-   return (kmem_size_val);
+   return (vm_cnt.v_free_target);
+}
+
+u_int
+kmem_free_min(void)
+{
+
+   return (vm_cnt.v_free_min);
+}
+
+u_int
+kmem_free_count(void)
+{
+
+   return (vm_cnt.v_free_count);
+}
+
+u_int
+kmem_page_count(void)
+{
+
+   return (vm_cnt.v_page_count);
 }
 
 uint64_t
-kmem_used(void)
+kmem_size(void)
 {
 
-   return (vmem_size(kmem_arena, VMEM_ALLOC));
+   return (kmem_size_val);
 }
 
 static int

Modified: head/sys/cddl/compat/opensolaris/sys/kmem.h
==
--- head/sys/cddl/compat/opensolaris/sys/kmem.h Thu Aug 28 18:59:39 2014
(r270758)
+++ head/sys/cddl/compat/opensolaris/sys/kmem.h Thu Aug 28 19:50:08 2014
(r270759)
@@ -66,7 +66,16 @@ typedef struct kmem_cache {
 void *zfs_kmem_alloc(size_t size, int kmflags);
 void zfs_kmem_free(void *buf, size_t size);
 uint64_t kmem_size(void);
-uint64_t kmem_used(void);
+u_int kmem_page_count(void);
+
+/*
+ * The return values from kmem_free_* are only valid once the pagedaemon
+ * has been initialised, before then they return 0.
+ */
+u_int kmem_free_count(void);
+u_int kmem_free_target(void);
+u_int kmem_free_min(void);
+
 kmem_cache_t *kmem_cache_create(char *name, size_t bufsize, size_t align,
 int (*constructor)(void *, void *, int), void (*destructor)(void *, void 
*),
 void (*reclaim)(void *) __unused, void *private, vmem_t *vmp, int cflags);

Modified: head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c
==
--- head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Thu Aug 28 
18:59:39 2014(r270758)
+++ head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c   Thu Aug 28 
19:50:08 2014(r270759)
@@ -193,9 +193,6 @@ extern int zfs_prefetch_disable;
  */
 static boolean_t arc_warm;
 
-/*
- * These tunables are for performance analysis.
- */
 uint64_t zfs_arc_max;
 uint64_t zfs_arc_min;
 uint64_t zfs_arc_meta_limit = 0;
@@ -204,6 +201,20 @@ int zfs_arc_shrink_shift = 0;
 int zfs_arc_p_min_shift = 0;
 int zfs_disable_dup_eviction = 0;
 uint64_t zfs_arc_average_blocksize = 8 * 1024; /* 8KB */
+u_int zfs_arc_free_target = (1  19); /* default before pagedaemon init only 
*/
+
+static int sysctl_vfs_zfs_arc_free_target(SYSCTL_HANDLER_ARGS);
+
+#ifdef _KERNEL
+static void
+arc_free_target_init(void *unused __unused)
+{
+
+   zfs_arc_free_target = kmem_free_target();
+}
+SYSINIT(arc_free_target_init, SI_SUB_KTHREAD_PAGE, SI_ORDER_ANY,
+arc_free_target_init, NULL);
+#endif
 
 TUNABLE_QUAD(vfs.zfs.arc_meta_limit, zfs_arc_meta_limit);
 SYSCTL_DECL(_vfs_zfs);
@@ -214,6 +225,35 @@ SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_min
 SYSCTL_UQUAD(_vfs_zfs, OID_AUTO, arc_average_blocksize, CTLFLAG_RDTUN,
 zfs_arc_average_blocksize, 0,
 ARC average blocksize);
+/*
+ * We don't 

Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Dmitry Morozovsky
Steve,

On Thu, 28 Aug 2014, Steven Hartland wrote:

 Author: smh
 Date: Thu Aug 28 19:50:08 2014
 New Revision: 270759
 URL: http://svnweb.freebsd.org/changeset/base/270759
 
 Log:
   Refactor ZFS ARC reclaim logic to be more VM cooperative

[snip]

   Credit to Karl Denninger for the original patch on which this update was
   based.
   Tested by:  dteske
   MFC after:  1 week
   Relnotes:   yes
   Sponsored by:   Multiplay

Thank you, and also Karl and Devin for this work!

Is this change applicable to stable/9? I suppose, pretty large base of users 
could benefit from this.

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Steven Hartland
- Original Message - 
From: Dmitry Morozovsky ma...@rinet.ru




Steve,

On Thu, 28 Aug 2014, Steven Hartland wrote:


Author: smh
Date: Thu Aug 28 19:50:08 2014
New Revision: 270759
URL: http://svnweb.freebsd.org/changeset/base/270759

Log:
  Refactor ZFS ARC reclaim logic to be more VM cooperative


[snip]

  Credit to Karl Denninger for the original patch on which this 
update was

  based.
  Tested by: dteske
  MFC after: 1 week
  Relnotes: yes
  Sponsored by: Multiplay


Thank you, and also Karl and Devin for this work!

Is this change applicable to stable/9? I suppose, pretty large base of 
users

could benefit from this.


Its very likely applicable to stable/9 although I've never used 9 
myself,

we jumped from 9 direct to 10.

My target for this is to make the cutoff for 10.1 freeze.

I'll MFC to 9 and possibly 8 as well at that time if relavent.

   Regards
   Steve 


___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Matthew D. Fuller
On Thu, Aug 28, 2014 at 10:11:39PM +0100 I heard the voice of
Steven Hartland, and lo! it spake thus:
 
 Its very likely applicable to stable/9 although I've never used 9
 myself, we jumped from 9 direct to 10.

This is actually hitting two different issues from the two bugs:

- 191510 is about ARC isn't greedy enough on huge-memory machines,
  and from the osreldate that bug was filed on 9.2, so presumably is
  applicable.

- 187594 is about ARC is too greedy (probably mostly on not-so-huge
  machines) and starves/drives the rest of the system into swap.  That
  I believe came about as a result of some unrelated change in the
  10.x stream that upset the previous balance between ARC and the rest
  of the VM, so isn't a problem on 9.x.



-- 
Matthew Fuller (MF4839)   |  fulle...@over-yonder.net
Systems/Network Administrator |  http://www.over-yonder.net/~fullermd/
   On the Internet, nobody can hear you scream.
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Dmitry Morozovsky
On Thu, 28 Aug 2014, Steven Hartland wrote:

  Thank you, and also Karl and Devin for this work!
  
  Is this change applicable to stable/9? I suppose, pretty large base of users
  could benefit from this.
 
 Its very likely applicable to stable/9 although I've never used 9 myself,
 we jumped from 9 direct to 10.
 
 My target for this is to make the cutoff for 10.1 freeze.
 
 I'll MFC to 9 and possibly 8 as well at that time if relavent.

No rush for other stable branches, surely; also, we're at the EoL (EoS at 
least) for stable/8 -- but if the changeset is not too harsh I see no reason 
not to merge it down there.

As I @work have some loaded machines with ZFS base on at least 9 (while not 
really memory-loaded, not more than 32G -- but sometimes rather heavy loaded 
with userbase apps), I would be glad to test changesets if it's feasible.

Thanks again!

-- 
Sincerely,
D.Marck [DM5020, MCK-RIPE, DM3-RIPN]
[ FreeBSD committer: ma...@freebsd.org ]

*** Dmitry Morozovsky --- D.Marck --- Wild Woozle --- ma...@rinet.ru ***

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Peter Wemm
On Thursday 28 August 2014 17:30:17 Alan Cox wrote:
 On 08/28/2014 16:15, Matthew D. Fuller wrote:
  On Thu, Aug 28, 2014 at 10:11:39PM +0100 I heard the voice of
  
  Steven Hartland, and lo! it spake thus:
  Its very likely applicable to stable/9 although I've never used 9
  myself, we jumped from 9 direct to 10.
  
  This is actually hitting two different issues from the two bugs:
  
  - 191510 is about ARC isn't greedy enough on huge-memory machines,
  
and from the osreldate that bug was filed on 9.2, so presumably is
applicable.
  
  - 187594 is about ARC is too greedy (probably mostly on not-so-huge
  
machines) and starves/drives the rest of the system into swap.  That
I believe came about as a result of some unrelated change in the
10.x stream that upset the previous balance between ARC and the rest
of the VM, so isn't a problem on 9.x.
 
 10.0 had a bug in the page daemon that was fixed in 10-STABLE about
 three months ago (r265945).  The ARC was not the only thing affected by
 this bug.

I'm concerned about potential unintended consequences of this change.

Before, arc reclaim was driven by vm_paging_needed(), which was:
vm_paging_needed(void)
{
return (vm_cnt.v_free_count + vm_cnt.v_cache_count 
vm_pageout_wakeup_thresh);
}
Now it's ignoring the v_cache_count and looking exclusively at v_free_count.  
cache pages are free pages that just happen to have known contents.  If I 
read this change right, zfs arc will now discard checksummed cache pages to 
make room for non-checksummed pages:

+   if (kmem_free_count()  zfs_arc_free_target) {
+   return (1);
+   }
...
+kmem_free_count(void)
+{
+   return (vm_cnt.v_free_count);
+}

This seems like a pretty substantial behavior change.  I'm concerned that it 
doesn't appear to count all the forms of free pages.

I haven't seen the problems with the over-aggressive ARC since the page daemon 
bug was fixed.  It's been working fine under pretty abusive loads in the 
freebsd 
cluster after that fix.

(I should know better than to fire a reply off before full fact checking, but 
this commit worries me..)

-- 
Peter Wemm - pe...@wemm.org; pe...@freebsd.org; pe...@yahoo-inc.com; KI6FJV
UTF-8: for when a ' or ... just won\342\200\231t do\342\200\246

signature.asc
Description: This is a digitally signed message part.


Re: svn commit: r270759 - in head/sys: cddl/compat/opensolaris/kern cddl/compat/opensolaris/sys cddl/contrib/opensolaris/uts/common/fs/zfs vm

2014-08-28 Thread Alexey Dokuchaev
On Thu, Aug 28, 2014 at 10:11:39PM +0100, Steven Hartland wrote:
 I'll MFC to 9 and possibly 8 as well at that time if relavent.

Although I don't use ZFS on stable/8, your care about our best stable
branch :) is much appreciated.

./danfe
___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to svn-src-head-unsubscr...@freebsd.org