Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-22 Thread Daniel Kjar
I have this problem with any VM running on either Sol10 Nevada, 
Opensolaris, openindiana.  I have the ARC restricted now but for some 
reason, and 'people at sun that know these things' have mentioned it 
before when sun still existed, when something needs a big devoted chunk 
of ram, zfs fails miserably at giving up its cache.  If I don't limit 
the ARC, a few days of even a single VM running causes the system to 
start stalling.  With the ARC limit, everything is peaches.


This is on a system with 4 dual core processors and 32 gigs of ram.

On 10/22/12 01:25 AM, David Halko wrote:

On Sat, Oct 20, 2012 at 1:57 PM, Cedric Blancher 
cedric.blanc...@googlemail.com wrote:


On 16 October 2012 01:22, Jason Matthews ja...@broken.net wrote:

-Original Message-
From: Cedric Blancher [mailto:cedric.blanc...@googlemail.com]


IMO you blame the wrong people. You can have the same kind of
problems with any Illumos-based distribution if you activate a
zone and let the machine just sit there for a week or so or have
a lot filesystem activity using mmap(). Either way the machines
will choke themselves to memory starvation. The only workaround
we found are regular reboots (every 24h), or limit the
ZFS ARC to an absolute minimum.

I don't think you understand. My proxy tier does almost no reads from the
file system. There is no content on the server.

OK, sorry then. Same symptoms, different cause, albeit it's so bad
that it makes the OS virtually unusable for any serious work.

Ced


This whole discussion sounds bizarre to me. I have a Solaris 10 Update 1
system with over a dozen zones with UFS with 8 Gig RAM and don't experience
these types of issues. We reboot once a year, whether we need to or not.

This is my limited understanding...
- UFS basically consumes all unused memory for paging, without ever telling
the OS, but releases it to OS processes when memory is needed. UFS does not
tell the OS that it is using the unused memory as buffer cache, so you
never know it when you check for your memory usage.
- ZFS basically consumes all unused memory for ARC, but tells the OS when
it is taking RAM. The ZFS ARC is supposed to give back memory when there is
pressure to do so. You always know how much memory is really being used
when you check your memory usage.

I had issues in the past where disk I/O started to go through the roof, on
UFS filesystems, when the free memory was getting below 4 Gig of RAM... a
sure sign the apps were starving the invisible UFS buffer cache.

I don't really understand how applications can be starved by ZFS unless ZFS
can not give back the buffers (lots of constant full-table scans?) Has
Illumos developers really confirmed such a ZFS bug, where it is not
returning memory to the OS?

This just does not sound right, to me.

Thanks - Dave H
http://netmgt.blogspot.com/
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


--
Dr. Daniel Kjar
Assistant Professor of Biology
Division of Mathematics and Natural Sciences
Elmira College
1 Park Place
Elmira, NY 14901
607-735-1826
http://faculty.elmira.edu/dkjar

...humans send their young men to war; ants send their old ladies
-E. O. Wilson



___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-22 Thread Jim Klimov

2012-10-22 18:24, Daniel Kjar wrote:

I have this problem with any VM running on either Sol10 Nevada,
Opensolaris, openindiana.  I have the ARC restricted now but for some
reason, and 'people at sun that know these things' have mentioned it
before when sun still existed, when something needs a big devoted chunk
of ram, zfs fails miserably at giving up its cache.  If I don't limit
the ARC, a few days of even a single VM running causes the system to
start stalling.  With the ARC limit, everything is peaches.



I think limiting the ARC was the recommended practice for VirtualBox
in either ZFS Evil Tuning Guide or VirtualBox manuals, or both.

Also remember that VBox requires disk-based swap to be available at
the time of start of VMs in amounts comparable to VMs' RAM size, so
that if needed, these processes can be guaranteed to be able to swap.

On a different note, there are programs which check system available
memory and fail themselves if there's not enough - without asking the
OS to give them some memory. In this case ARC gets no signals to
shrink because some process needs the RAM. Rule of thumb for admins
was to estimate or measure the stable RAM requirements of running
processes, add 1GB for good measure, and limit ARC to the rest.

As for lags and failures in relation to ARC, I'd guess that getting
and releasing RAM in vast amounts and small chunks would fragment
the memory, possibly to the extent that while the needed total
free RAM is available, no contiguous block can be allocated of the
size needed by a process (or that there's too much random access
and virtual memory mapping involved, requiring much CPU for each
memory access from userland processes). But that's just an educated
guess.

HTH,
//Jim

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-22 Thread Bob Friesenhahn

On Mon, 22 Oct 2012, David Halko wrote:


I don't really understand how applications can be starved by ZFS unless ZFS
can not give back the buffers (lots of constant full-table scans?) Has
Illumos developers really confirmed such a ZFS bug, where it is not
returning memory to the OS?


It does take a little time to give back the memory.  There are 
data-intensive usage scenarios where ARC is competing with 
applications.


Application use of mmap() can cause memory issues since mapped file 
data uses memory in both ARC and page cache (i.e. may double memory 
requirement).


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-22 Thread James Carlson
Daniel Kjar wrote:
 I have this problem with any VM running on either Sol10 Nevada,
 Opensolaris, openindiana.  I have the ARC restricted now but for some
 reason, and 'people at sun that know these things' have mentioned it
 before when sun still existed, when something needs a big devoted chunk
 of ram, zfs fails miserably at giving up its cache.  If I don't limit
 the ARC, a few days of even a single VM running causes the system to
 start stalling.  With the ARC limit, everything is peaches.

My understanding is that the two mechanisms are only superficially similar.

UFS was (and is) in bed with the VM implementation, so it's using memory
pages as the cache in the same way that the VM keeps paged-in data in
memory.  UFS giving up memory that's needed for applications is really
the same thing as saying that the VM reallocates memory pages when it
needs to.

ZFS, though, is different.  It's not as much in bed with the VM.  The
ARC is allocated memory, done in the same way that any other kernel
service can allocate memory.  There is a back-pressure help! I need
more space mechanism on normal kernel allocations, and this is what the
ARC relies on to give up memory when necessary, but it occurs much
further down the line than ordinary VM page reallocation.  Thus, when it
does happen, the damage has already been done -- the VM has already been
exhausted of easy-to-find pages, and we're already in a bad state.

So, yes, limiting the ARC is (unfortunately) a good thing.

-- 
James Carlson 42.703N 71.076W carls...@workingcode.com

___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-22 Thread Richard Elling
On Oct 22, 2012, at 9:13 AM, James Carlson carls...@workingcode.com wrote:

 Daniel Kjar wrote:
 I have this problem with any VM running on either Sol10 Nevada,
 Opensolaris, openindiana.  I have the ARC restricted now but for some
 reason, and 'people at sun that know these things' have mentioned it
 before when sun still existed, when something needs a big devoted chunk
 of ram, zfs fails miserably at giving up its cache.  If I don't limit
 the ARC, a few days of even a single VM running causes the system to
 start stalling.  With the ARC limit, everything is peaches.
 
 My understanding is that the two mechanisms are only superficially similar.
 
 UFS was (and is) in bed with the VM implementation, so it's using memory
 pages as the cache in the same way that the VM keeps paged-in data in
 memory.  UFS giving up memory that's needed for applications is really
 the same thing as saying that the VM reallocates memory pages when it
 needs to.

UFS cache is also mostly clean data.

 
 ZFS, though, is different.  It's not as much in bed with the VM.  The
 ARC is allocated memory, done in the same way that any other kernel
 service can allocate memory.  There is a back-pressure help! I need
 more space mechanism on normal kernel allocations, and this is what the
 ARC relies on to give up memory when necessary, but it occurs much
 further down the line than ordinary VM page reallocation.  Thus, when it
 does happen, the damage has already been done -- the VM has already been
 exhausted of easy-to-find pages, and we're already in a bad state.

In a ZFS system, the ARC could contain a large amount of dirty data that
needs to be written to the pool. Giving back memory can involve actual
disk I/O.

 So, yes, limiting the ARC is (unfortunately) a good thing.

Yes, and can be used to emulate UFS ala SunOS 3.x.

Also, note that the size of an ARC shrink request is tunable, but doesn't scale
well to large or small machines. In other words, if there is a sudden demand
for more memory, the ARC will shrink by 2^arc_shrink_shift. If you need much
more than that, another shrink will occur. This can appear to increase the time
needed to shrink and there is no real mechanism to do otherwise, given the 
existing VM design.
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422



___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-21 Thread David Halko
On Sat, Oct 20, 2012 at 1:57 PM, Cedric Blancher 
cedric.blanc...@googlemail.com wrote:

 On 16 October 2012 01:22, Jason Matthews ja...@broken.net wrote:
  -Original Message-
  From: Cedric Blancher [mailto:cedric.blanc...@googlemail.com]
 
  IMO you blame the wrong people. You can have the same kind of
  problems with any Illumos-based distribution if you activate a
  zone and let the machine just sit there for a week or so or have
  a lot filesystem activity using mmap(). Either way the machines
  will choke themselves to memory starvation. The only workaround
  we found are regular reboots (every 24h), or limit the
  ZFS ARC to an absolute minimum.
 
  I don't think you understand. My proxy tier does almost no reads from the
  file system. There is no content on the server.

 OK, sorry then. Same symptoms, different cause, albeit it's so bad
 that it makes the OS virtually unusable for any serious work.

 Ced


This whole discussion sounds bizarre to me. I have a Solaris 10 Update 1
system with over a dozen zones with UFS with 8 Gig RAM and don't experience
these types of issues. We reboot once a year, whether we need to or not.

This is my limited understanding...
- UFS basically consumes all unused memory for paging, without ever telling
the OS, but releases it to OS processes when memory is needed. UFS does not
tell the OS that it is using the unused memory as buffer cache, so you
never know it when you check for your memory usage.
- ZFS basically consumes all unused memory for ARC, but tells the OS when
it is taking RAM. The ZFS ARC is supposed to give back memory when there is
pressure to do so. You always know how much memory is really being used
when you check your memory usage.

I had issues in the past where disk I/O started to go through the roof, on
UFS filesystems, when the free memory was getting below 4 Gig of RAM... a
sure sign the apps were starving the invisible UFS buffer cache.

I don't really understand how applications can be starved by ZFS unless ZFS
can not give back the buffers (lots of constant full-table scans?) Has
Illumos developers really confirmed such a ZFS bug, where it is not
returning memory to the OS?

This just does not sound right, to me.

Thanks - Dave H
http://netmgt.blogspot.com/
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-15 Thread Brendan Gregg
(removing developer)

On Mon, Oct 15, 2012 at 4:01 PM, Cedric Blancher 
cedric.blanc...@googlemail.com wrote:

 On 16 October 2012 00:52, Jason Matthews ja...@broken.net wrote:
 
 
 
  I should also mention that over all network performance degrades over
 time
  and the only fix I have sound so far is to shut down the zones, destroy
 the
  vnics, recreate them, and then reboot the zones.
 
  I do that about every four hours to keep the response times reasonable.
 It
  really sucks.

 IMO you blame the wrong people. You can have the same kind of problems
 with any Illumos-based distribution if you activate a zone and let the
 machine just sit there for a week or so or have a lot filesystem
 activity using mmap(). Either way the machines will choke themselves
 to memory starvation. The only workaround we found are regular reboots
 (every 24h), or limit the ZFS ARC to an absolute minimum.


Is this bare-metal or under a VM? We have many mmap()-based workloads
running in production, bare-metal, without issue. The ZFS ARC will reduce
its size (reap) when the system is low on memory. If it doesn't, and needs
workarounds, that's a bug. It can be diagnosed by checking kstats
(arcstats, or using arcstat.pl) and using DTrace.

And by choke, you mean the system pages out applications, such that you
have a rate of anonymous page-ins? Ie, vmstat -p 1 and having a rate of
api?

A full echo ::kmastat | mdb -k before reboot may show where the memory is
or isn't.

Brendan

-- 
Brendan Gregg, Joyent  http://dtrace.org/blogs/brendan
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-15 Thread Jason Matthews


-Original Message-
From: Brendan Gregg [mailto:brendan.gr...@joyent.com] 


 A full echo ::kmastat | mdb -k before reboot may show where the 
 memory is or isn't.

I am not a point where I would normally reboot it, but we are pretty well
deep into memory utilization for the size of the apps that run...

On the proxy tier, it seems that the DCE cache is way out of out of line.
What are the tunables here? 


Thanks,
j.


Here is a dump from a box on the proxy tier...

root@www003:~# echo ::kmastat |mdb -k |sort -nrk 5 |head
kmem_va_40964096 2964721 2964736 12143558656B   3023570
0
dce_cache152 73389856 73389862 11561725952B  73389938
0
zio_data_buf_131072   131072  43616  43644 5720506368B893451 0
kmem_va_16384  16384  51737  51744  847773696B 51969 0
zio_buf_16384  16384  51386  51475  843366400B   5324046 0
tcp_conn_cache  1808  17352  88512  181272576B 1597914226 0
kmem_va_81928192  22097  22112  181141504B 37356 0
streams_dblk_1040   1152 42 106253  124346368B 1582704292 0
kmem_bufctl_cache 24 3527424 3527541   86519808B   3750227 0
arc_buf_hdr_t176 345516 345576   64339968B   1199529 0
root@www003:~# echo ::memstat |mdb -k
Page SummaryPagesMB  %Tot
     
Kernel3721822 14538   44%
ZFS File Data 1448069  5656   17%
Anon84512   3301%
Exec and libs5085190%
Page cache  15805610%
Free (cachelist)22861890%
Free (freelist)   3072717 12002   37%

Total 8370871 32698
Physical  8370870 32698

And here is one from the database tier...

root@db001:~# echo ::kmastat |mdb -k |sort -nrk 5 |head
zfs_file_data_81928192 2991810 7365904 60341485568B 3495907153 0
kmem_va_4096  4096 4965999 6615520 27097169920B  73888636 0
zio_data_buf_8192 8192 2991120 2991810 24508907520B 3108123433 0
kmem_va_1638416384 216265 1442656 23636475904B 224812598 0
zfs_file_data_buf 24814600192B 24814600192B 24814600192B 15532060657
arc_buf_hdr_t   176 56809623 61605434 11469811712B 489897048 0
zio_buf_131072131072  83335  83370 10927472640B 624255183 0
zio_buf_16384  16384 213952 214677 3517267968B 866096686 0
kmem_alloc_3232 54596040 95296625 3122679808B 4035614613 0
kmem_va_81928192  61443 213888 1752170496B 483211721 0
root@db001:~# echo ::memstat |mdb -k
Page SummaryPagesMB  %Tot
     
Kernel   10971334 42856   44%
ZFS File Data 6059601 23670   24%
Anon  5686726 22213   23%
Exec and libs9304360%
Page cache  53574   2090%
Free (cachelist)   140715   5491%
Free (freelist)   2225873  86949%

Total25147127 98230
Physical 25147125 98230
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss


Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern

2012-10-15 Thread Brendan Gregg
On Mon, Oct 15, 2012 at 5:52 PM, Jason Matthews ja...@broken.net wrote:



 -Original Message-
 From: Brendan Gregg [mailto:brendan.gr...@joyent.com]


  A full echo ::kmastat | mdb -k before reboot may show where the
  memory is or isn't.

 I am not a point where I would normally reboot it, but we are pretty well
 deep into memory utilization for the size of the apps that run...

 On the proxy tier, it seems that the DCE cache is way out of out of line.
 What are the tunables here?


This sounds like the DCE cache cleanup issue:
http://smartos.org/2012/02/28/using-flamegraph-to-solve-ip-scaling-issue-dce/

A netstat -dn | wc -l should show how many entries are in the cache.

Brendan





 Thanks,
 j.


 Here is a dump from a box on the proxy tier...

 root@www003:~# echo ::kmastat |mdb -k |sort -nrk 5 |head
 kmem_va_40964096 2964721 2964736 12143558656B   3023570
 0
 dce_cache152 73389856 73389862 11561725952B  73389938
 0
 zio_data_buf_131072   131072  43616  43644 5720506368B893451 0
 kmem_va_16384  16384  51737  51744  847773696B 51969 0
 zio_buf_16384  16384  51386  51475  843366400B   5324046 0
 tcp_conn_cache  1808  17352  88512  181272576B 1597914226 0
 kmem_va_81928192  22097  22112  181141504B 37356 0
 streams_dblk_1040   1152 42 106253  124346368B 1582704292 0
 kmem_bufctl_cache 24 3527424 3527541   86519808B   3750227
 0
 arc_buf_hdr_t176 345516 345576   64339968B   1199529 0
 root@www003:~# echo ::memstat |mdb -k
 Page SummaryPagesMB  %Tot
      
 Kernel3721822 14538   44%
 ZFS File Data 1448069  5656   17%
 Anon84512   3301%
 Exec and libs5085190%
 Page cache  15805610%
 Free (cachelist)22861890%
 Free (freelist)   3072717 12002   37%

 Total 8370871 32698
 Physical  8370870 32698



-- 
Brendan Gregg, Joyent  http://dtrace.org/blogs/brendan
___
OpenIndiana-discuss mailing list
OpenIndiana-discuss@openindiana.org
http://openindiana.org/mailman/listinfo/openindiana-discuss