Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
I have this problem with any VM running on either Sol10 Nevada, Opensolaris, openindiana. I have the ARC restricted now but for some reason, and 'people at sun that know these things' have mentioned it before when sun still existed, when something needs a big devoted chunk of ram, zfs fails miserably at giving up its cache. If I don't limit the ARC, a few days of even a single VM running causes the system to start stalling. With the ARC limit, everything is peaches. This is on a system with 4 dual core processors and 32 gigs of ram. On 10/22/12 01:25 AM, David Halko wrote: On Sat, Oct 20, 2012 at 1:57 PM, Cedric Blancher cedric.blanc...@googlemail.com wrote: On 16 October 2012 01:22, Jason Matthews ja...@broken.net wrote: -Original Message- From: Cedric Blancher [mailto:cedric.blanc...@googlemail.com] IMO you blame the wrong people. You can have the same kind of problems with any Illumos-based distribution if you activate a zone and let the machine just sit there for a week or so or have a lot filesystem activity using mmap(). Either way the machines will choke themselves to memory starvation. The only workaround we found are regular reboots (every 24h), or limit the ZFS ARC to an absolute minimum. I don't think you understand. My proxy tier does almost no reads from the file system. There is no content on the server. OK, sorry then. Same symptoms, different cause, albeit it's so bad that it makes the OS virtually unusable for any serious work. Ced This whole discussion sounds bizarre to me. I have a Solaris 10 Update 1 system with over a dozen zones with UFS with 8 Gig RAM and don't experience these types of issues. We reboot once a year, whether we need to or not. This is my limited understanding... - UFS basically consumes all unused memory for paging, without ever telling the OS, but releases it to OS processes when memory is needed. UFS does not tell the OS that it is using the unused memory as buffer cache, so you never know it when you check for your memory usage. - ZFS basically consumes all unused memory for ARC, but tells the OS when it is taking RAM. The ZFS ARC is supposed to give back memory when there is pressure to do so. You always know how much memory is really being used when you check your memory usage. I had issues in the past where disk I/O started to go through the roof, on UFS filesystems, when the free memory was getting below 4 Gig of RAM... a sure sign the apps were starving the invisible UFS buffer cache. I don't really understand how applications can be starved by ZFS unless ZFS can not give back the buffers (lots of constant full-table scans?) Has Illumos developers really confirmed such a ZFS bug, where it is not returning memory to the OS? This just does not sound right, to me. Thanks - Dave H http://netmgt.blogspot.com/ ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss -- Dr. Daniel Kjar Assistant Professor of Biology Division of Mathematics and Natural Sciences Elmira College 1 Park Place Elmira, NY 14901 607-735-1826 http://faculty.elmira.edu/dkjar ...humans send their young men to war; ants send their old ladies -E. O. Wilson ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
2012-10-22 18:24, Daniel Kjar wrote: I have this problem with any VM running on either Sol10 Nevada, Opensolaris, openindiana. I have the ARC restricted now but for some reason, and 'people at sun that know these things' have mentioned it before when sun still existed, when something needs a big devoted chunk of ram, zfs fails miserably at giving up its cache. If I don't limit the ARC, a few days of even a single VM running causes the system to start stalling. With the ARC limit, everything is peaches. I think limiting the ARC was the recommended practice for VirtualBox in either ZFS Evil Tuning Guide or VirtualBox manuals, or both. Also remember that VBox requires disk-based swap to be available at the time of start of VMs in amounts comparable to VMs' RAM size, so that if needed, these processes can be guaranteed to be able to swap. On a different note, there are programs which check system available memory and fail themselves if there's not enough - without asking the OS to give them some memory. In this case ARC gets no signals to shrink because some process needs the RAM. Rule of thumb for admins was to estimate or measure the stable RAM requirements of running processes, add 1GB for good measure, and limit ARC to the rest. As for lags and failures in relation to ARC, I'd guess that getting and releasing RAM in vast amounts and small chunks would fragment the memory, possibly to the extent that while the needed total free RAM is available, no contiguous block can be allocated of the size needed by a process (or that there's too much random access and virtual memory mapping involved, requiring much CPU for each memory access from userland processes). But that's just an educated guess. HTH, //Jim ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
On Mon, 22 Oct 2012, David Halko wrote: I don't really understand how applications can be starved by ZFS unless ZFS can not give back the buffers (lots of constant full-table scans?) Has Illumos developers really confirmed such a ZFS bug, where it is not returning memory to the OS? It does take a little time to give back the memory. There are data-intensive usage scenarios where ARC is competing with applications. Application use of mmap() can cause memory issues since mapped file data uses memory in both ARC and page cache (i.e. may double memory requirement). Bob -- Bob Friesenhahn bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/ GraphicsMagick Maintainer,http://www.GraphicsMagick.org/ ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
Daniel Kjar wrote: I have this problem with any VM running on either Sol10 Nevada, Opensolaris, openindiana. I have the ARC restricted now but for some reason, and 'people at sun that know these things' have mentioned it before when sun still existed, when something needs a big devoted chunk of ram, zfs fails miserably at giving up its cache. If I don't limit the ARC, a few days of even a single VM running causes the system to start stalling. With the ARC limit, everything is peaches. My understanding is that the two mechanisms are only superficially similar. UFS was (and is) in bed with the VM implementation, so it's using memory pages as the cache in the same way that the VM keeps paged-in data in memory. UFS giving up memory that's needed for applications is really the same thing as saying that the VM reallocates memory pages when it needs to. ZFS, though, is different. It's not as much in bed with the VM. The ARC is allocated memory, done in the same way that any other kernel service can allocate memory. There is a back-pressure help! I need more space mechanism on normal kernel allocations, and this is what the ARC relies on to give up memory when necessary, but it occurs much further down the line than ordinary VM page reallocation. Thus, when it does happen, the damage has already been done -- the VM has already been exhausted of easy-to-find pages, and we're already in a bad state. So, yes, limiting the ARC is (unfortunately) a good thing. -- James Carlson 42.703N 71.076W carls...@workingcode.com ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
On Oct 22, 2012, at 9:13 AM, James Carlson carls...@workingcode.com wrote: Daniel Kjar wrote: I have this problem with any VM running on either Sol10 Nevada, Opensolaris, openindiana. I have the ARC restricted now but for some reason, and 'people at sun that know these things' have mentioned it before when sun still existed, when something needs a big devoted chunk of ram, zfs fails miserably at giving up its cache. If I don't limit the ARC, a few days of even a single VM running causes the system to start stalling. With the ARC limit, everything is peaches. My understanding is that the two mechanisms are only superficially similar. UFS was (and is) in bed with the VM implementation, so it's using memory pages as the cache in the same way that the VM keeps paged-in data in memory. UFS giving up memory that's needed for applications is really the same thing as saying that the VM reallocates memory pages when it needs to. UFS cache is also mostly clean data. ZFS, though, is different. It's not as much in bed with the VM. The ARC is allocated memory, done in the same way that any other kernel service can allocate memory. There is a back-pressure help! I need more space mechanism on normal kernel allocations, and this is what the ARC relies on to give up memory when necessary, but it occurs much further down the line than ordinary VM page reallocation. Thus, when it does happen, the damage has already been done -- the VM has already been exhausted of easy-to-find pages, and we're already in a bad state. In a ZFS system, the ARC could contain a large amount of dirty data that needs to be written to the pool. Giving back memory can involve actual disk I/O. So, yes, limiting the ARC is (unfortunately) a good thing. Yes, and can be used to emulate UFS ala SunOS 3.x. Also, note that the size of an ARC shrink request is tunable, but doesn't scale well to large or small machines. In other words, if there is a sudden demand for more memory, the ARC will shrink by 2^arc_shrink_shift. If you need much more than that, another shrink will occur. This can appear to increase the time needed to shrink and there is no real mechanism to do otherwise, given the existing VM design. -- richard -- richard.ell...@richardelling.com +1-760-896-4422 ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
On Sat, Oct 20, 2012 at 1:57 PM, Cedric Blancher cedric.blanc...@googlemail.com wrote: On 16 October 2012 01:22, Jason Matthews ja...@broken.net wrote: -Original Message- From: Cedric Blancher [mailto:cedric.blanc...@googlemail.com] IMO you blame the wrong people. You can have the same kind of problems with any Illumos-based distribution if you activate a zone and let the machine just sit there for a week or so or have a lot filesystem activity using mmap(). Either way the machines will choke themselves to memory starvation. The only workaround we found are regular reboots (every 24h), or limit the ZFS ARC to an absolute minimum. I don't think you understand. My proxy tier does almost no reads from the file system. There is no content on the server. OK, sorry then. Same symptoms, different cause, albeit it's so bad that it makes the OS virtually unusable for any serious work. Ced This whole discussion sounds bizarre to me. I have a Solaris 10 Update 1 system with over a dozen zones with UFS with 8 Gig RAM and don't experience these types of issues. We reboot once a year, whether we need to or not. This is my limited understanding... - UFS basically consumes all unused memory for paging, without ever telling the OS, but releases it to OS processes when memory is needed. UFS does not tell the OS that it is using the unused memory as buffer cache, so you never know it when you check for your memory usage. - ZFS basically consumes all unused memory for ARC, but tells the OS when it is taking RAM. The ZFS ARC is supposed to give back memory when there is pressure to do so. You always know how much memory is really being used when you check your memory usage. I had issues in the past where disk I/O started to go through the roof, on UFS filesystems, when the free memory was getting below 4 Gig of RAM... a sure sign the apps were starving the invisible UFS buffer cache. I don't really understand how applications can be starved by ZFS unless ZFS can not give back the buffers (lots of constant full-table scans?) Has Illumos developers really confirmed such a ZFS bug, where it is not returning memory to the OS? This just does not sound right, to me. Thanks - Dave H http://netmgt.blogspot.com/ ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
(removing developer) On Mon, Oct 15, 2012 at 4:01 PM, Cedric Blancher cedric.blanc...@googlemail.com wrote: On 16 October 2012 00:52, Jason Matthews ja...@broken.net wrote: I should also mention that over all network performance degrades over time and the only fix I have sound so far is to shut down the zones, destroy the vnics, recreate them, and then reboot the zones. I do that about every four hours to keep the response times reasonable. It really sucks. IMO you blame the wrong people. You can have the same kind of problems with any Illumos-based distribution if you activate a zone and let the machine just sit there for a week or so or have a lot filesystem activity using mmap(). Either way the machines will choke themselves to memory starvation. The only workaround we found are regular reboots (every 24h), or limit the ZFS ARC to an absolute minimum. Is this bare-metal or under a VM? We have many mmap()-based workloads running in production, bare-metal, without issue. The ZFS ARC will reduce its size (reap) when the system is low on memory. If it doesn't, and needs workarounds, that's a bug. It can be diagnosed by checking kstats (arcstats, or using arcstat.pl) and using DTrace. And by choke, you mean the system pages out applications, such that you have a rate of anonymous page-ins? Ie, vmstat -p 1 and having a rate of api? A full echo ::kmastat | mdb -k before reboot may show where the memory is or isn't. Brendan -- Brendan Gregg, Joyent http://dtrace.org/blogs/brendan ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
-Original Message- From: Brendan Gregg [mailto:brendan.gr...@joyent.com] A full echo ::kmastat | mdb -k before reboot may show where the memory is or isn't. I am not a point where I would normally reboot it, but we are pretty well deep into memory utilization for the size of the apps that run... On the proxy tier, it seems that the DCE cache is way out of out of line. What are the tunables here? Thanks, j. Here is a dump from a box on the proxy tier... root@www003:~# echo ::kmastat |mdb -k |sort -nrk 5 |head kmem_va_40964096 2964721 2964736 12143558656B 3023570 0 dce_cache152 73389856 73389862 11561725952B 73389938 0 zio_data_buf_131072 131072 43616 43644 5720506368B893451 0 kmem_va_16384 16384 51737 51744 847773696B 51969 0 zio_buf_16384 16384 51386 51475 843366400B 5324046 0 tcp_conn_cache 1808 17352 88512 181272576B 1597914226 0 kmem_va_81928192 22097 22112 181141504B 37356 0 streams_dblk_1040 1152 42 106253 124346368B 1582704292 0 kmem_bufctl_cache 24 3527424 3527541 86519808B 3750227 0 arc_buf_hdr_t176 345516 345576 64339968B 1199529 0 root@www003:~# echo ::memstat |mdb -k Page SummaryPagesMB %Tot Kernel3721822 14538 44% ZFS File Data 1448069 5656 17% Anon84512 3301% Exec and libs5085190% Page cache 15805610% Free (cachelist)22861890% Free (freelist) 3072717 12002 37% Total 8370871 32698 Physical 8370870 32698 And here is one from the database tier... root@db001:~# echo ::kmastat |mdb -k |sort -nrk 5 |head zfs_file_data_81928192 2991810 7365904 60341485568B 3495907153 0 kmem_va_4096 4096 4965999 6615520 27097169920B 73888636 0 zio_data_buf_8192 8192 2991120 2991810 24508907520B 3108123433 0 kmem_va_1638416384 216265 1442656 23636475904B 224812598 0 zfs_file_data_buf 24814600192B 24814600192B 24814600192B 15532060657 arc_buf_hdr_t 176 56809623 61605434 11469811712B 489897048 0 zio_buf_131072131072 83335 83370 10927472640B 624255183 0 zio_buf_16384 16384 213952 214677 3517267968B 866096686 0 kmem_alloc_3232 54596040 95296625 3122679808B 4035614613 0 kmem_va_81928192 61443 213888 1752170496B 483211721 0 root@db001:~# echo ::memstat |mdb -k Page SummaryPagesMB %Tot Kernel 10971334 42856 44% ZFS File Data 6059601 23670 24% Anon 5686726 22213 23% Exec and libs9304360% Page cache 53574 2090% Free (cachelist) 140715 5491% Free (freelist) 2225873 86949% Total25147127 98230 Physical 25147125 98230 ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss
Re: [OpenIndiana-discuss] [developer] Re: Memory usage concern
On Mon, Oct 15, 2012 at 5:52 PM, Jason Matthews ja...@broken.net wrote: -Original Message- From: Brendan Gregg [mailto:brendan.gr...@joyent.com] A full echo ::kmastat | mdb -k before reboot may show where the memory is or isn't. I am not a point where I would normally reboot it, but we are pretty well deep into memory utilization for the size of the apps that run... On the proxy tier, it seems that the DCE cache is way out of out of line. What are the tunables here? This sounds like the DCE cache cleanup issue: http://smartos.org/2012/02/28/using-flamegraph-to-solve-ip-scaling-issue-dce/ A netstat -dn | wc -l should show how many entries are in the cache. Brendan Thanks, j. Here is a dump from a box on the proxy tier... root@www003:~# echo ::kmastat |mdb -k |sort -nrk 5 |head kmem_va_40964096 2964721 2964736 12143558656B 3023570 0 dce_cache152 73389856 73389862 11561725952B 73389938 0 zio_data_buf_131072 131072 43616 43644 5720506368B893451 0 kmem_va_16384 16384 51737 51744 847773696B 51969 0 zio_buf_16384 16384 51386 51475 843366400B 5324046 0 tcp_conn_cache 1808 17352 88512 181272576B 1597914226 0 kmem_va_81928192 22097 22112 181141504B 37356 0 streams_dblk_1040 1152 42 106253 124346368B 1582704292 0 kmem_bufctl_cache 24 3527424 3527541 86519808B 3750227 0 arc_buf_hdr_t176 345516 345576 64339968B 1199529 0 root@www003:~# echo ::memstat |mdb -k Page SummaryPagesMB %Tot Kernel3721822 14538 44% ZFS File Data 1448069 5656 17% Anon84512 3301% Exec and libs5085190% Page cache 15805610% Free (cachelist)22861890% Free (freelist) 3072717 12002 37% Total 8370871 32698 Physical 8370870 32698 -- Brendan Gregg, Joyent http://dtrace.org/blogs/brendan ___ OpenIndiana-discuss mailing list OpenIndiana-discuss@openindiana.org http://openindiana.org/mailman/listinfo/openindiana-discuss