Artem Belevich wrote:
On Wed, Feb 8, 2012 at 4:28 PM, Jeremy Chadwick
<free...@jdc.parodius.com>  wrote:
On Thu, Feb 09, 2012 at 01:11:36AM +0100, Miroslav Lachman wrote:
...
ARC Size:
          Current Size:             1769 MB (arcsize)
          Target Size (Adaptive):   512 MB (c)
          Min Size (Hard Limit):    512 MB (zfs_arc_min)
          Max Size (Hard Limit):    3584 MB (zfs_arc_max)

The target size is going down to the min size and after few more
days, the system is so slow, that I must reboot the machine. Then it
is running fine for about 107 days and then it all repeat again.

You can see more on MRTG graphs
http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/
You can see links to other useful informations on top of the page
(arc_summary, top, dmesg, fs usage, loader.conf)

There you can see nightly backups (higher CPU load started at
01:13), otherwise the machine is idle.

It coresponds with ARC target size lowering in last 5 days
http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_arcstats_size.html

And with ARC metadata cache overflowing the limit in last 5 days
http://freebsd.quip.cz/ext/2012/2012-02-08-kiwi-mrtg-12-15/local_zfs_vfs_meta.html

I don't know what's going on and I don't know if it is something
know / fixed in newer releases. We are running a few more ZFS
systems on 8.2 without this issue. But those systems are in
different roles.

This sounds like the... damn, what is it called... some kind of internal
"counter" or "ticks" thing within the ZFS code that was discovered to
only begin happening after a certain period of time (which correlated to
some number of days, possibly 107).  I'm sorry that I can't be more
specific, but it's been discussed heavily on the lists in the past, and
fixes for all of that were committed to RELENG_8.

Thank you for your quick response. I am glad that it is fixed in 8.x. So I will upgrade this last old machine in few weeks. :)

 I wish I could
remember the name of the function or macro or variable name it pertained
to, something like LTHAW or TLOCK or something like that.  I would say
"I don't know why I can't remember", but I do know why I can't remember:
because I gave up trying to track all of these problems.

Does someone else remember this issue?  CC'ing Martin who might remember
for certain.

It's LBOLT. :-)

And there was more than one related integer overflow. One of them
manifested itself as L2ARC feeding thread hogging CPU time after about
a month of uptime. Another one caused issue with ARC reclaim after 107
days. See more details in this thread:

http://lists.freebsd.org/pipermail/freebsd-fs/2011-May/011584.html

Yes, it is exactly this problem. Thank you for the link to this thread. I am subscribed to freebsd-fs@ and I am reading it almost daily, but I missed this one!

Thanks to both of you!

Miroslav Lachman
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to