Re: [ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-11-02 Thread Brad Hubbard
On Wed, Nov 1, 2017 at 11:54 PM, Mazzystr  wrote:
> I experienced this as well on tiny Ceph cluster testing...
>
> HW spec - 3x
> Intel i7-4770K quad core
> 32Gb m2/ssd
> 8Gb memory
> Dell PERC H200
> 6 x 3Tb Seagate
> Centos 7.x
> Ceph 12.x
>
> I also run 3 memory hungry procs on the Ceph nodes.  Obviously there is a
> memory problem here.  Here are the steps I took avoid oom-killer killing the
> node ...
>
> /etc/rc.local -
> for i in $(pgrep ceph-mon); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-osd); do echo -17 > /proc/$i/oom_score_adj; done
> for i in $(pgrep ceph-mgr); do echo 50 > /proc/$i/oom_score_adj; done
>
> /etc/sysctl.conf -
> vm.swappiness = 100
> vm.vfs_cache_pressure = 1000

This is generally not a good idea. Just sayin'

$ grep -A17 ^vfs_cache_pressure sysctl/vm.txt
vfs_cache_pressure
--

This percentage value controls the tendency of the kernel to reclaim
the memory which is used for caching of directory and inode objects.

At the default value of vfs_cache_pressure=100 the kernel will attempt to
reclaim dentries and inodes at a "fair" rate with respect to pagecache and
swapcache reclaim.  Decreasing vfs_cache_pressure causes the kernel to prefer
to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will
never reclaim dentries and inodes due to memory pressure and this can easily
lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100
causes the kernel to prefer to reclaim dentries and inodes.

Increasing vfs_cache_pressure significantly beyond 100 may have negative
performance impact. Reclaim code needs to take various locks to find freeable
directory and inode objects. With vfs_cache_pressure=1000, it will look for
ten times more freeable objects than there are.

> vm.min_free_kbytes = 512
>
> /etc/ceph/ceph.conf -
> [osd]
> bluestore_cache_size = 52428800
> bluestore_cache_size_hdd = 52428800
> bluestore_cache_size_ssd = 52428800
> bluestore_cache_kv_max = 52428800
>
> You're going to see memory page-{in,out} skyrocket with this setup but it
> should keep oom-killer at bay until a memory fix can be applied.  Client
> performance to the cluster wasn't spectacular but wasn't terrible.  I was
> seeing +/- 60Mb/sec of bandwidth.
>
> Ultimately I upgraded the nodes to 16Gb
>
> /Chris C
>
> On Tue, Oct 31, 2017 at 10:30 PM, shadow_lin  wrote:
>>
>> Hi Sage,
>> We have tried compiled the latest ceph source code from github.
>> The build is ceph version 12.2.1-249-g42172a4
>> (42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable).
>> The memory problem seems better but the memory usage of osd is still keep
>> increasing as more data are wrote into the rbd image and the memory usage
>> won't drop after the write is stopped.
>>Could you specify from which commit the memeory bug is fixed?
>> Thanks
>> 2017-11-01
>> 
>> lin.yunfan
>> 
>>
>> 发件人:Sage Weil 
>> 发送时间:2017-10-24 20:03
>> 主题:Re: [ceph-users] [luminous]OSD memory usage increase when writing a lot
>> of data to cluster
>> 收件人:"shadow_lin"
>> 抄送:"ceph-users"
>>
>> On Tue, 24 Oct 2017, shadow_lin wrote:
>> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body
>> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,
>> > The cluster has 24 osd with 24 8TB hdd.
>> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD. I know the
>> > memory
>> > is below the remmanded value, but this osd server is an ARM  server so I
>> > can't do anything to add more ram.
>> > I created a replicated(2 rep) pool and an 20TB image and mounted to the
>> > test
>> > server with xfs fs.
>> >
>> > I have set the ceph.conf to this(according to other related post
>> > suggested):
>> > [osd]
>> > bluestore_cache_size = 104857600
>> > bluestore_cache_size_hdd = 104857600
>> > bluestore_cache_size_ssd = 104857600
>> > bluestore_cache_kv_max = 103809024
>> >
>> >  osd map cache size = 20
>> > osd map max advance = 10
>> > osd map share max epochs = 10
>> > osd pg epoch persisted max stale = 10
>> > The bluestore cache setting did improve the situation,but if i try to
>> > write
>> > 1TB data by dd command(dd if=/dev/zero of=test bs=1G count=1000)  to rbd
>> > the
>> > osd will eventually be killed by oom killer.
>> > If I only wirte like 100G  data once then everything is fine.
>> >
>> > Why does the osd memory usage keep increasing whle writing ?
>> > Is there anything I can do to reduce the memory usage?
>>
>> There is a bluestore memory bug that was fixed just after 12.2.1 was
>> released; it will be fixed in 12.2.2.  In the meantime, you can run
>> consider running the latest luminous branch (not fully tested) from
>> https://shaman.ceph.com/builds/ceph/luminous.
>>
>> sage
>>
>>
>> ___
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>>

Re: [ceph-users] 回复: Re: [luminous]OSD memory usage increase when writing a lot of data to cluster

2017-11-01 Thread Mazzystr
I experienced this as well on tiny Ceph cluster testing...

HW spec - 3x
Intel i7-4770K quad core
32Gb m2/ssd
8Gb memory
Dell PERC H200
6 x 3Tb Seagate
Centos 7.x
Ceph 12.x

I also run 3 memory hungry procs on the Ceph nodes.  Obviously there is a
memory problem here.  Here are the steps I took avoid oom-killer killing
the node ...

/etc/rc.local -
for i in $(pgrep ceph-mon); do echo -17 > /proc/$i/oom_score_adj; done
for i in $(pgrep ceph-osd); do echo -17 > /proc/$i/oom_score_adj; done
for i in $(pgrep ceph-mgr); do echo 50 > /proc/$i/oom_score_adj; done

/etc/sysctl.conf -
vm.swappiness = 100
vm.vfs_cache_pressure = 1000
vm.min_free_kbytes = 512

/etc/ceph/ceph.conf -
[osd]
bluestore_cache_size = 52428800
bluestore_cache_size_hdd = 52428800
bluestore_cache_size_ssd = 52428800
bluestore_cache_kv_max = 52428800

You're going to see memory page-{in,out} skyrocket with this setup but it
should keep oom-killer at bay until a memory fix can be applied.  Client
performance to the cluster wasn't spectacular but wasn't terrible.  I was
seeing +/- 60Mb/sec of bandwidth.

Ultimately I upgraded the nodes to 16Gb

/Chris C

On Tue, Oct 31, 2017 at 10:30 PM, shadow_lin  wrote:

> Hi Sage,
> We have tried compiled the latest ceph source code from github.
> The build is ceph version 12.2.1-249-g42172a4 (
> 42172a443183ffe6b36e85770e53fe678db293bf) luminous (stable).
> The memory problem seems better but the memory usage of osd is still keep
> increasing as more data are wrote into the rbd image and the memory usage
> won't drop after the write is stopped.
>Could you specify from which commit the memeory bug is fixed?
> Thanks
> 2017-11-01
> --
> lin.yunfan
> --
>
> *发件人:*Sage Weil 
> *发送时间:*2017-10-24 20:03
> *主题:*Re: [ceph-users] [luminous]OSD memory usage increase when writing a
> lot of data to cluster
> *收件人:*"shadow_lin"
> *抄送:*"ceph-users"
>
> On Tue, 24 Oct 2017, shadow_lin wrote:
> > BLOCKQUOTE{margin-Top: 0px; margin-Bottom: 0px; margin-Left: 2em} body
> > {border-width:0;margin:0} img {border:0;margin:0;padding:0} Hi All,
> > The cluster has 24 osd with 24 8TB hdd.
> > Each osd server has 2GB ram and runs 2OSD with 2 8TBHDD.
> I know the memory
> > is below the remmanded value, but this osd server is
> an ARM  server so I
> > can't do anything to add more ram.
> > I created a replicated(2 rep) pool and an 20TB image
> and mounted to the test
> > server with xfs fs.
> >
> > I have set the ceph.conf to this(according to other
> related post suggested):
> > [osd]
> > bluestore_cache_size = 104857600
> > bluestore_cache_size_hdd = 104857600
> > bluestore_cache_size_ssd = 104857600
> > bluestore_cache_kv_max = 103809024
> >
> >  osd map cache size = 20
> > osd map max advance = 10
> > osd map share max epochs = 10
> > osd pg epoch persisted max stale = 10
> > The bluestore cache setting did improve the situation,but
> if i try to write
> > 1TB data by dd command(dd if=/dev/zero of=test bs=1G
> count=1000)  to rbd the
> > osd will eventually be killed by oom killer.
> > If I only wirte like 100G  data once then everything is fine.
> >
> > Why does the osd memory usage keep increasing whle writing ?
> > Is there anything I can do to reduce the memory usage?
>
> There is a bluestore memory bug that was fixed just after 12.2.1 was
> released; it will be fixed in 12.2.2.  In the meantime, you can run
> consider running the latest luminous branch (not fully tested) from
> https://shaman.ceph.com/builds/ceph/luminous.
>
> sage
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com