Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

2018-09-08 Thread Matthew Macy
On Sat, Sep 8, 2018 at 11:03 Cy Schubert  wrote:

> In message , Jakob
> Alvermar
> k writes:
> >
> > Total MFU MRUAnon Hdr L2Hdr
> Other
> >   ZFS ARC667M186M168M 13M   3825K 0K295M
> >
> >  ratehits  misses   total hits total
> > misses
> >   arcstats  : 99%   65636 605 167338494
> 9317074
> >   arcstats.demand_data  : 57% 431 321 13414675
> 2117714
> >   arcstats.demand_metadata  : 99%   65175 193 152969480
> 5344919
> >   arcstats.prefetch_data:  0%   0  30 3292   401344
> >   arcstats.prefetch_metadata: 32%  30  61 951047  1453097
> >   zfetchstats   :  9% 1191077 612582 55041789
> >   arcstats.l2   :  0%   0   0 00
> >   vdev_cache_stats  :  0%   0   0 00
> >
> >
> >
> >
> > This is while a 'make -j8 buildworld' (it has 8 cores) is going.
>
> Overall you have a 94% hit ratio.
>
> slippy$ bc
> scale=4
> 167338494/(167338494+9317074)
> .9472
> slippy$
>
>
> It could be better.
>
> Why is your ZFS ARC so small? Before I answer this I will discuss my
> experience first.
>
> My machines are seeing something similar to this:
>
>   Total MFU MRUAnon Hdr   L2Hdr
> Other
>  ZFS ARC   4274M   2329M   1394M 17M 82M  0K
> 445M
>
> ratehits  misses   total hits total
> misses
>  arcstats  : 97% 614  13866509066
> 51853442
>  arcstats.demand_data  :100%  96   0107658733
> 3101522
>  arcstats.demand_metadata  : 97% 516  13755890353
> 48080146
>  arcstats.prefetch_data:  0%   0   0   327613
> 225688
>  arcstats.prefetch_metadata:100%   2   0  2632367
> 446086
>  zfetchstats   :  6%   6  80  2362709
> 294731645
>  arcstats.l2   :  0%   0   00
>  0
>  vdev_cache_stats  :  0%   0   00
>  0
>
> This is what you should see. This is with -CURRENT built two days ago.
>
> cwsys$ uname -a
> FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep  6
> 17:44:35 PDT 2018 root@cwsys:/export/obj/opt/src/svn-current/amd64.a
> md64/sys/BREAK  amd64
> cwsys$
>
> Top reports:
>
> CPU:  0.3% user, 89.9% nice,  9.5% system,  0.3% interrupt,  0.0% idle
> Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M
> Free
> ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other
>  3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio
> Swap: 20G Total, 11M Used, 20G Free
>
> This is healthy. It's running a poudriere build.
>
> My laptop:
>
>Total MFU MRUAnon Hdr   L2Hdr
> Other
>  ZFS ARC   3175M   1791M872M 69M165M  0K
> 277M
>
> ratehits  misses   total hits total
> misses
>  arcstats  : 99%3851  26 89082984
> 5101207
>  arcstats.demand_data  : 99% 345   2  6197930
> 340186
>  arcstats.demand_metadata  : 99%3506  24 81391265
> 4367755
>  arcstats.prefetch_data:  0%   0   011507
>  30945
>  arcstats.prefetch_metadata:  0%   0   0  1482282
> 362321
>  zfetchstats   :  2%  12 576   113185
> 38564546
>  arcstats.l2   :  0%   0   00
>  0
>  vdev_cache_stats  :  0%   0   00
>  0
>
> Similar results after working on a bunch of ports in four VMs last
> night, testing various combinations of options while Heimdal in base is
> private, hence the large ARC remaining this morning.
>
> Currently on the laptop top reports:
>
> CPU:  0.2% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.8% idle
> Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free
> ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other
>  2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio
> Swap: 22G Total, 22G Free
>
> This is also healthy.
>
> Now for questions:
>
> Do you have any UFS filesystems? Top will report buf. What is that at?
>
> Some background: My /, /usr, and /var are UFS (these are old
> installations which when I install a new machine I dump | rsh
> new-machine restore, change a couple of entries in rc.conf and fstab,
> rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient).
> Hence the legacy.
>
> I have noticed that when writing a lot to UFS, increasing the size of
> the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this
> is during a -j8 installworld to /, a test partition, an i386 partition
> and a number of VMs on UFS on a zpool and other VMs using ZFS on 

Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

2018-09-08 Thread Cy Schubert
In message 
, Matthew Macy writes:
>
> On Sat, Sep 8, 2018 at 11:03 Cy Schubert  wrote:
>
> > In message , Jakob
> > Alvermar
> > k writes:

> > >
> > > I will test the patch below and report back.
> >
> > Agreed, though IMO your workload and your environment need to be
> > understood first. What concerns me about the patch is what impact will
> > it have on other workloads. Not evicting data and only metadata could
> > impact buildworld -DNO_CLEAN for example. I do a -DNO_CLEAN
> > buildworlds, sometimes -DWORLDFAST. Adjusting vfs.zfs.arc_meta_limit to
> > the same value as vfs.zfs.arc_max improved my buildworld/installworld
> > performance. In addition disabling atime for the ZFS dataset containing
> > /usr/obj also improved buildworld/installworld performance by reducing
> > unnecessary (IMO) metadata writes. I think evicting metadata only might
> > cause a new set of problems for different workloads. (Maybe this should
> > be a sysctl?)
> >
>
> Mark's suggested change would just restore the default behavior from before
> balanced pruning was imported. Balanced pruning is dependent on code that
> didn't exist in FreeBSD - unfortunately when I did the import I did not
> pull in all the dependencies. Mid freeze, the least disruptive path is to
> disable the new behavior. Post branch we can restore it as a default,
> possibly contingent on the amount of memory in the system. There is
> precedent for this with prefetch.

Ok.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org

The need of the many outweighs the greed of the few.



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: intr_machdep.c:176:2: error: use of undeclared identifier 'interrupt_sorted'

2018-09-08 Thread Michael Butler
On 9/8/18 3:43 PM, Konstantin Belousov wrote:
> On Sat, Sep 08, 2018 at 02:07:41PM -0400, Michael Butler wrote:
>> On 8/31/18 1:28 AM, Konstantin Belousov wrote:
>>> On Fri, Aug 31, 2018 at 12:21:02AM -0400, Michael Butler wrote:
>>
>>  [ .. snip .. ]
>>
 I see another problem after using Ian's workaround of moving the #ifdef
 SMP; it seems I now run out of kernel stack on an i386 (Pentium-III)
 machine with only 512MB of RAM:

 Aug 29 23:29:19 sarah kernel: vm_thread_new: kstack allocation failed
 Aug 29 23:29:26 sarah kernel: vm_thread_new: kstack allocation failed
 Aug 29 23:29:30 sarah kernel: vm_thread_new: kstack allocation failed
 Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
 Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
 Aug 29 23:29:40 sarah kernel: vm_thread_new: kstack allocation failed
>>>
>>> What is the kernel revision for "now".  What was the previous revision
>>> where the kstack allocation failures did not happen.
>>>
>>> Also, what is the workload ?
>>
>> Sorry for the delay. Any version at or after SVN r338360 would either a)
>> not boot at all or b) crash shortly after boot with a swarm of messages
>> as above. It was stable before that.
>>
>> Unfortunately, this machine is remote and, being as old as it is, has no
>> remote console facility. 'nextboot' has been my savior ;-)
>>
>> It is a 700MHz Pentium-III with 512MB of RAM and has 3 used interfaces,
>> local ethernet (FXP), GIF for an IPv6 tunnel to HE and TAP for an
>> OpenVPN endpoint. It has IPFW compiled into the kernel and acts as a
>> router/firewall with few actual applications running.
>>
>> As another data point, I manually reversed both SVN r338360 and r338415
>> (a related change) and it is now stable running at SVN r338520,
> 
> It is very unprobable.  I do not see how could r338360 affect KVA allocation.
> Double-check that you booted right kernels.
> 

FreeBSD sarah.protected-networks.net 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #14
r338520M: Thu Sep  6 21:35:31 EDT 2018

'svn diff' reports the only changes being the two reversals I noted above,

imb

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: intr_machdep.c:176:2: error: use of undeclared identifier 'interrupt_sorted'

2018-09-08 Thread Konstantin Belousov
On Sat, Sep 08, 2018 at 02:07:41PM -0400, Michael Butler wrote:
> On 8/31/18 1:28 AM, Konstantin Belousov wrote:
> > On Fri, Aug 31, 2018 at 12:21:02AM -0400, Michael Butler wrote:
> 
>  [ .. snip .. ]
> 
> >> I see another problem after using Ian's workaround of moving the #ifdef
> >> SMP; it seems I now run out of kernel stack on an i386 (Pentium-III)
> >> machine with only 512MB of RAM:
> >>
> >> Aug 29 23:29:19 sarah kernel: vm_thread_new: kstack allocation failed
> >> Aug 29 23:29:26 sarah kernel: vm_thread_new: kstack allocation failed
> >> Aug 29 23:29:30 sarah kernel: vm_thread_new: kstack allocation failed
> >> Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
> >> Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
> >> Aug 29 23:29:40 sarah kernel: vm_thread_new: kstack allocation failed
> > 
> > What is the kernel revision for "now".  What was the previous revision
> > where the kstack allocation failures did not happen.
> > 
> > Also, what is the workload ?
> 
> Sorry for the delay. Any version at or after SVN r338360 would either a)
> not boot at all or b) crash shortly after boot with a swarm of messages
> as above. It was stable before that.
> 
> Unfortunately, this machine is remote and, being as old as it is, has no
> remote console facility. 'nextboot' has been my savior ;-)
> 
> It is a 700MHz Pentium-III with 512MB of RAM and has 3 used interfaces,
> local ethernet (FXP), GIF for an IPv6 tunnel to HE and TAP for an
> OpenVPN endpoint. It has IPFW compiled into the kernel and acts as a
> router/firewall with few actual applications running.
> 
> As another data point, I manually reversed both SVN r338360 and r338415
> (a related change) and it is now stable running at SVN r338520,

It is very unprobable.  I do not see how could r338360 affect KVA allocation.
Double-check that you booted right kernels.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: intr_machdep.c:176:2: error: use of undeclared identifier 'interrupt_sorted'

2018-09-08 Thread Michael Butler
On 8/31/18 1:28 AM, Konstantin Belousov wrote:
> On Fri, Aug 31, 2018 at 12:21:02AM -0400, Michael Butler wrote:

 [ .. snip .. ]

>> I see another problem after using Ian's workaround of moving the #ifdef
>> SMP; it seems I now run out of kernel stack on an i386 (Pentium-III)
>> machine with only 512MB of RAM:
>>
>> Aug 29 23:29:19 sarah kernel: vm_thread_new: kstack allocation failed
>> Aug 29 23:29:26 sarah kernel: vm_thread_new: kstack allocation failed
>> Aug 29 23:29:30 sarah kernel: vm_thread_new: kstack allocation failed
>> Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
>> Aug 29 23:29:38 sarah kernel: vm_thread_new: kstack allocation failed
>> Aug 29 23:29:40 sarah kernel: vm_thread_new: kstack allocation failed
> 
> What is the kernel revision for "now".  What was the previous revision
> where the kstack allocation failures did not happen.
> 
> Also, what is the workload ?

Sorry for the delay. Any version at or after SVN r338360 would either a)
not boot at all or b) crash shortly after boot with a swarm of messages
as above. It was stable before that.

Unfortunately, this machine is remote and, being as old as it is, has no
remote console facility. 'nextboot' has been my savior ;-)

It is a 700MHz Pentium-III with 512MB of RAM and has 3 used interfaces,
local ethernet (FXP), GIF for an IPv6 tunnel to HE and TAP for an
OpenVPN endpoint. It has IPFW compiled into the kernel and acts as a
router/firewall with few actual applications running.

As another data point, I manually reversed both SVN r338360 and r338415
(a related change) and it is now stable running at SVN r338520,

imb

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

2018-09-08 Thread Cy Schubert
In message , Jakob 
Alvermar
k writes:
>
>     Total MFU MRU    Anon Hdr L2Hdr   Other
>   ZFS ARC    667M    186M    168M 13M   3825K 0K    295M
>
>      rate    hits  misses   total hits total 
> misses
>   arcstats  : 99%   65636 605 167338494  9317074
>   arcstats.demand_data  : 57% 431 321 13414675  2117714
>   arcstats.demand_metadata  : 99%   65175 193 152969480  5344919
>   arcstats.prefetch_data    :  0%   0  30 3292   401344
>   arcstats.prefetch_metadata: 32%  30  61 951047  1453097
>   zfetchstats   :  9% 119    1077 612582 55041789
>   arcstats.l2   :  0%   0   0 0    0
>   vdev_cache_stats  :  0%   0   0 0    0
>
>
>
>
> This is while a 'make -j8 buildworld' (it has 8 cores) is going.

Overall you have a 94% hit ratio.

slippy$ bc
scale=4
167338494/(167338494+9317074)
.9472
slippy$ 


It could be better.

Why is your ZFS ARC so small? Before I answer this I will discuss my 
experience first.

My machines are seeing something similar to this:

  Total MFU MRUAnon Hdr   L2Hdr   
Other
 ZFS ARC   4274M   2329M   1394M 17M 82M  0K
445M

ratehits  misses   total hits total 
misses
 arcstats  : 97% 614  13866509066 
51853442
 arcstats.demand_data  :100%  96   0107658733  
3101522
 arcstats.demand_metadata  : 97% 516  13755890353 
48080146
 arcstats.prefetch_data:  0%   0   0   327613   
225688
 arcstats.prefetch_metadata:100%   2   0  2632367   
446086
 zfetchstats   :  6%   6  80  2362709
294731645
 arcstats.l2   :  0%   0   00   
 0
 vdev_cache_stats  :  0%   0   00   
 0

This is what you should see. This is with -CURRENT built two days ago.

cwsys$ uname -a
FreeBSD cwsys 12.0-ALPHA5 FreeBSD 12.0-ALPHA5 #51 r338520M: Thu Sep  6 
17:44:35 PDT 2018 root@cwsys:/export/obj/opt/src/svn-current/amd64.a
md64/sys/BREAK  amd64
cwsys$ 

Top reports:

CPU:  0.3% user, 89.9% nice,  9.5% system,  0.3% interrupt,  0.0% idle
Mem: 678M Active, 344M Inact, 175M Laundry, 6136M Wired, 168M Buf, 598M 
Free
ARC: 4247M Total, 2309M MFU, 1386M MRU, 21M Anon, 86M Header, 446M Other
 3079M Compressed, 5123M Uncompressed, 1.66:1 Ratio
Swap: 20G Total, 11M Used, 20G Free

This is healthy. It's running a poudriere build.

My laptop:

   Total MFU MRUAnon Hdr   L2Hdr   
Other
 ZFS ARC   3175M   1791M872M 69M165M  0K
277M

ratehits  misses   total hits total 
misses
 arcstats  : 99%3851  26 89082984  
5101207
 arcstats.demand_data  : 99% 345   2  6197930   
340186
 arcstats.demand_metadata  : 99%3506  24 81391265  
4367755
 arcstats.prefetch_data:  0%   0   011507   
 30945
 arcstats.prefetch_metadata:  0%   0   0  1482282   
362321
 zfetchstats   :  2%  12 576   113185 
38564546
 arcstats.l2   :  0%   0   00   
 0
 vdev_cache_stats  :  0%   0   00   
 0

Similar results after working on a bunch of ports in four VMs last 
night, testing various combinations of options while Heimdal in base is 
private, hence the large ARC remaining this morning.

Currently on the laptop top reports:

CPU:  0.2% user,  0.0% nice,  0.0% system,  0.0% interrupt, 99.8% idle
Mem: 376M Active, 1214M Inact, 5907M Wired, 464M Buf, 259M Free
ARC: 3175M Total, 1863M MFU, 803M MRU, 69M Anon, 160M Header, 280M Other
 2330M Compressed, 7881M Uncompressed, 3.38:1 Ratio
Swap: 22G Total, 22G Free

This is also healthy.

Now for questions:

Do you have any UFS filesystems? Top will report buf. What is that at?

Some background: My /, /usr, and /var are UFS (these are old 
installations which when I install a new machine I dump | rsh 
new-machine restore, change a couple of entries in rc.conf and fstab, 
rsync ports (/usr/local, /var/db...) and boot (I'm terribly impatient). 
Hence the legacy.

I have noticed that when writing a lot to UFS, increasing the size of 
the UFS buffer cache, my ARC will reduce to 1 GB or even less. But this 
is during a -j8 installworld to /, a test partition, an i386 partition 
and a number of VMs on UFS on a zpool and other VMs using ZFS on the 
same zpool. My ARC drops rapidly when the UFS filesystems are actively 
being written to. UFS and ZFS on the same server will impact 
performance unless 

Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

2018-09-08 Thread Jakob Alvermark

On 9/7/18 6:06 PM, Mark Johnston wrote:

On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:

On 9/6/18 2:28 AM, Mark Johnston wrote:

On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:

On Wed, Sep 5, 2018 at 5:58 PM Allan Jude  wrote:

On 2018-09-05 10:04, Subbsd wrote:

Hi,

I'm seeing a huge loss in performance ZFS after upgrading FreeBSD 12
to latest revision (r338466 the moment) and related to ARC.

I can not say which revision was before except that the newver.sh
pointed to ALPHA3.

Problems are observed if you try to limit ARC. In my case:

vfs.zfs.arc_max="128M"

I know that this is very small. However, for two years with this there
were no problems.

When i send SIGINFO to process which is currently working with ZFS, i
see "arc_reclaim_waiters_cv":

e.g when i type:

/bin/csh

I have time (~5 seconds) to press several times 'ctrl+t' before csh is executed:

load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 0.00s 0% 3512k
load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 0.01s 0% 3512k
load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 0.01s 0% 4156k

same story with find or any other commans:

load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 0.00s 0% 2676k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 0.00s 0% 2680k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 0.00s 0% 2684k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 0.00s 0% 2704k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 0.00s 0% 2716k
load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 0.00s 0% 2760k

this problem goes away after increasing vfs.zfs.arc_max
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Previously, ZFS was not actually able to evict enough dnodes to keep
your arc_max under 128MB, it would have been much higher based on the
number of open files you had. A recent improvement from upstream ZFS
(r337653 and r337660) was pulled in that fixed this, so setting an
arc_max of 128MB is much more effective now, and that is causing the
side effect of "actually doing what you asked it to do", in this case,
what you are asking is a bit silly. If you have a working set that is
greater than 128MB, and you ask ZFS to use less than that, it'll have to
constantly try to reclaim memory to keep under that very low bar.


Thanks for comments. Mark was right when he pointed to r338416 (
https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?r1=338416=338415=338416
). Commenting aggsum_value returns normal speed regardless of the rest
of the new code from upstream.
I would like to repeat that the speed with these two lines is not just
slow, but _INCREDIBLY_ slow! Probably, this should be written in the
relevant documentation for FreeBSD 12+

Hi,

I am experiencing the same slowness when there is a bit of load on the
system (buildworld for example) which I haven't seen before.

Is it a regression following a recent kernel update?



Yes.





I have vfs.zfs.arc_max=2G.

Top is reporting

ARC: 607M Total, 140M MFU, 245M MRU, 1060K Anon, 4592K Header, 217M Other
   105M Compressed, 281M Uncompressed, 2.67:1 Ratio

Should I test the patch?

I would be interested in the results, assuming it is indeed a
regression.



This gets more interesting.

Kernel + world was at r338465

I was going to test the patch, but since I had updated the src tree to 
r338499 I built it first without your patch.


Now, at r338499, without the patch, it doesn't seem to hit the 
performance problem.


vfs.zfs.arc_max is still set to 2G

ARC display in top is around 1000M total, haven't seen go above about 
1200M, even if I stress it.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

2018-09-08 Thread Jakob Alvermark

   Total MFU MRU    Anon Hdr L2Hdr   Other
 ZFS ARC    667M    186M    168M 13M   3825K 0K    295M

    rate    hits  misses   total hits total 
misses

 arcstats  : 99%   65636 605 167338494  9317074
 arcstats.demand_data  : 57% 431 321 13414675  2117714
 arcstats.demand_metadata  : 99%   65175 193 152969480  5344919
 arcstats.prefetch_data    :  0%   0  30 3292   401344
 arcstats.prefetch_metadata: 32%  30  61 951047  1453097
 zfetchstats   :  9% 119    1077 612582 55041789
 arcstats.l2   :  0%   0   0 0    0
 vdev_cache_stats  :  0%   0   0 0    0




This is while a 'make -j8 buildworld' (it has 8 cores) is going.

SSH'ing to the machine while the buildworld is going it takes 40-60 
seconds to get to the shell!


Hitting ^T while waiting: load: 1.06  cmd: zsh 45334 
[arc_reclaim_waiters_cv] 56.11r 0.00u 0.10s 0% 5232k


I will test the patch below and report back.


Jakob

On 9/7/18 7:27 PM, Cy Schubert wrote:

I'd be interested in seeing systat -z output.

---
Sent using a tiny phone keyboard.
Apologies for any typos and autocorrect.
Also, this old phone only supports top post. Apologies.

Cy Schubert
 or 
The need of the many outweighs the greed of the few.
---

From: Mark Johnston
Sent: 07/09/2018 09:09
To: Jakob Alvermark
Cc: Subbsd; allanj...@freebsd.org; freebsd-current Current
Subject: Re: ZFS perfomance regression in FreeBSD 12 APLHA3->ALPHA4

On Fri, Sep 07, 2018 at 03:40:52PM +0200, Jakob Alvermark wrote:
> On 9/6/18 2:28 AM, Mark Johnston wrote:
> > On Wed, Sep 05, 2018 at 11:15:03PM +0300, Subbsd wrote:
> >> On Wed, Sep 5, 2018 at 5:58 PM Allan Jude  
wrote:

> >>> On 2018-09-05 10:04, Subbsd wrote:
>  Hi,
> 
>  I'm seeing a huge loss in performance ZFS after upgrading 
FreeBSD 12

>  to latest revision (r338466 the moment) and related to ARC.
> 
>  I can not say which revision was before except that the newver.sh
>  pointed to ALPHA3.
> 
>  Problems are observed if you try to limit ARC. In my case:
> 
>  vfs.zfs.arc_max="128M"
> 
>  I know that this is very small. However, for two years with 
this there

>  were no problems.
> 
>  When i send SIGINFO to process which is currently working with 
ZFS, i

>  see "arc_reclaim_waiters_cv":
> 
>  e.g when i type:
> 
>  /bin/csh
> 
>  I have time (~5 seconds) to press several times 'ctrl+t' before 
csh is executed:

> 
>  load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.41r 0.00u 
0.00s 0% 3512k

>  load: 0.70  cmd: csh 5935 [zio->io_cv] 1.69r 0.00u 0.00s 0% 3512k
>  load: 0.70  cmd: csh 5935 [arc_reclaim_waiters_cv] 1.98r 0.00u 
0.01s 0% 3512k
>  load: 0.73  cmd: csh 5935 [arc_reclaim_waiters_cv] 2.19r 0.00u 
0.01s 0% 4156k

> 
>  same story with find or any other commans:
> 
>  load: 0.34  cmd: find 5993 [zio->io_cv] 0.99r 0.00u 0.00s 0% 2676k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.13r 0.00u 
0.00s 0% 2676k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.25r 0.00u 
0.00s 0% 2680k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.38r 0.00u 
0.00s 0% 2684k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.51r 0.00u 
0.00s 0% 2704k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.64r 0.00u 
0.00s 0% 2716k
>  load: 0.34  cmd: find 5993 [arc_reclaim_waiters_cv] 1.78r 0.00u 
0.00s 0% 2760k

> 
>  this problem goes away after increasing vfs.zfs.arc_max
>  ___
>  freebsd-current@freebsd.org mailing list
>  https://lists.freebsd.org/mailman/listinfo/freebsd-current
>  To unsubscribe, send any mail to 
"freebsd-current-unsubscr...@freebsd.org"

> 
> >>> Previously, ZFS was not actually able to evict enough dnodes to keep
> >>> your arc_max under 128MB, it would have been much higher based 
on the

> >>> number of open files you had. A recent improvement from upstream ZFS
> >>> (r337653 and r337660) was pulled in that fixed this, so setting an
> >>> arc_max of 128MB is much more effective now, and that is causing the
> >>> side effect of "actually doing what you asked it to do", in this 
case,
> >>> what you are asking is a bit silly. If you have a working set 
that is
> >>> greater than 128MB, and you ask ZFS to use less than that, it'll 
have to

> >>> constantly try to reclaim memory to keep under that very low bar.
> >>>
> >> Thanks for comments. Mark was right when he pointed to r338416 (
> >> 
https://svnweb.freebsd.org/base/head/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/arc.c?r1=338416=338415=338416
> >> ). Commenting aggsum_value returns normal speed