Re: [ceph-users] out of memory bluestore osds
Hi Mark, thanks a lot for your explanation and clarification. Adjusting osd_memory_target to fit in our systems did the trick. Jaime On 07/08/2019 14:09, Mark Nelson wrote: Hi Jaime, we only use the cache size parameters now if you've disabled autotuning. With autotuning we adjust the cache size on the fly to try and keep the mapped process memory under the osd_memory_target. You can set a lower memory target than default, though you will have far less cache for bluestore onodes and rocksdb. You may notice that it's slower, especially if you have a big active data set you are processing. I don't usually recommend setting the osd_memory_target below 2GB. At some point it will have shrunk the caches as far as it can and the process memory may start exceeding the target. (with our default rocksdb and pglog settings this usually happens somewhere between 1.3-1.7GB once the OSD has been sufficiently saturated with IO). Given memory prices right now, I'd still recommend upgrading RAM if you have the ability though. You might be able to get away with setting each OSD to 2-2.5GB in your scenario but you'll be pushing it. I would not recommend lowering the osd_memory_cache_min. You really want rocksdb indexes/filters fitting in cache, and as many bluestore onodes as you can get. In any event, you'll still be bound by the (currently hardcoded) 64MB cache chunk allocation size in the autotuner which osd_memory_cache_min can't reduce (and that's per cache while osd_memory_cache_min is global for the kv,buffer, and rocksdb block caches). IE each cache is going to get 64MB+growth room regardless of how low you set osd_memory_cache_min. That's intentional as we don't want a single SST file in rocksdb to be able to completely blow everything else out of the block cache during compaction, only to quickly become invalid, removed from the cache, and make it look to the priority cache system like rocksdb doesn't actually need any more memory for cache. Mark On 8/7/19 7:44 AM, Jaime Ibar wrote: Hi all, we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks each. Recently we redeployed the osds of one of them using bluestore backend, however, after this, we're facing Out of memory errors(invoked oom-killer) and the OS kills one of the ceph-osd process. The osd is restarted automatically and back online after one minute. We're running Ubuntu 16.04, kernel 4.15.0-55-generic. The server has 32GB of RAM and 4GB of swap partition. All the disks are hdd, no ssd disks. Bluestore settings are the default ones "osd_memory_target": "4294967296" "osd_memory_cache_min": "134217728" "bluestore_cache_size": "0" "bluestore_cache_size_hdd": "1073741824" "bluestore_cache_autotune": "true" As stated in the documentation, bluestore assigns by default 4GB of RAM per osd(1GB of RAM for 1TB). So in this case 48GB of RAM would be needed. Am I right? Are these the minimun requirements for bluestore? In case adding more RAM is not an option, can any of osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd be decrease to fit in our server specs? Would this have any impact on performance? Thanks Jaime ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jaime Ibar High Performance & Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie Tel: +353-1-896-3725 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] out of memory bluestore osds
Hi Jaime, we only use the cache size parameters now if you've disabled autotuning. With autotuning we adjust the cache size on the fly to try and keep the mapped process memory under the osd_memory_target. You can set a lower memory target than default, though you will have far less cache for bluestore onodes and rocksdb. You may notice that it's slower, especially if you have a big active data set you are processing. I don't usually recommend setting the osd_memory_target below 2GB. At some point it will have shrunk the caches as far as it can and the process memory may start exceeding the target. (with our default rocksdb and pglog settings this usually happens somewhere between 1.3-1.7GB once the OSD has been sufficiently saturated with IO). Given memory prices right now, I'd still recommend upgrading RAM if you have the ability though. You might be able to get away with setting each OSD to 2-2.5GB in your scenario but you'll be pushing it. I would not recommend lowering the osd_memory_cache_min. You really want rocksdb indexes/filters fitting in cache, and as many bluestore onodes as you can get. In any event, you'll still be bound by the (currently hardcoded) 64MB cache chunk allocation size in the autotuner which osd_memory_cache_min can't reduce (and that's per cache while osd_memory_cache_min is global for the kv,buffer, and rocksdb block caches). IE each cache is going to get 64MB+growth room regardless of how low you set osd_memory_cache_min. That's intentional as we don't want a single SST file in rocksdb to be able to completely blow everything else out of the block cache during compaction, only to quickly become invalid, removed from the cache, and make it look to the priority cache system like rocksdb doesn't actually need any more memory for cache. Mark On 8/7/19 7:44 AM, Jaime Ibar wrote: Hi all, we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks each. Recently we redeployed the osds of one of them using bluestore backend, however, after this, we're facing Out of memory errors(invoked oom-killer) and the OS kills one of the ceph-osd process. The osd is restarted automatically and back online after one minute. We're running Ubuntu 16.04, kernel 4.15.0-55-generic. The server has 32GB of RAM and 4GB of swap partition. All the disks are hdd, no ssd disks. Bluestore settings are the default ones "osd_memory_target": "4294967296" "osd_memory_cache_min": "134217728" "bluestore_cache_size": "0" "bluestore_cache_size_hdd": "1073741824" "bluestore_cache_autotune": "true" As stated in the documentation, bluestore assigns by default 4GB of RAM per osd(1GB of RAM for 1TB). So in this case 48GB of RAM would be needed. Am I right? Are these the minimun requirements for bluestore? In case adding more RAM is not an option, can any of osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd be decrease to fit in our server specs? Would this have any impact on performance? Thanks Jaime ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] out of memory bluestore osds
Hi all, we run a Ceph Luminous 12.2.12 cluster, 7 osds servers 12x4TB disks each. Recently we redeployed the osds of one of them using bluestore backend, however, after this, we're facing Out of memory errors(invoked oom-killer) and the OS kills one of the ceph-osd process. The osd is restarted automatically and back online after one minute. We're running Ubuntu 16.04, kernel 4.15.0-55-generic. The server has 32GB of RAM and 4GB of swap partition. All the disks are hdd, no ssd disks. Bluestore settings are the default ones "osd_memory_target": "4294967296" "osd_memory_cache_min": "134217728" "bluestore_cache_size": "0" "bluestore_cache_size_hdd": "1073741824" "bluestore_cache_autotune": "true" As stated in the documentation, bluestore assigns by default 4GB of RAM per osd(1GB of RAM for 1TB). So in this case 48GB of RAM would be needed. Am I right? Are these the minimun requirements for bluestore? In case adding more RAM is not an option, can any of osd_memory_target, osd_memory_cache_min, bluestore_cache_size_hdd be decrease to fit in our server specs? Would this have any impact on performance? Thanks Jaime -- Jaime Ibar High Performance & Research Computing, IS Services Lloyd Building, Trinity College Dublin, Dublin 2, Ireland. http://www.tchpc.tcd.ie/ | ja...@tchpc.tcd.ie Tel: +353-1-896-3725 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com