Re: [ceph-users] [Ceph-community] Monitors not in quorum (1 of 3 live)

2019-06-08 Thread Joao Eduardo Luis
(adding ceph-users instead)

On 19/06/07 12:53pm, Lluis Arasanz i Nonell - Adam wrote:
> Hi all,
> 
> I know I have a very old ceph version, but I need some help.
> Also, understand that English is not my native language, so please, take it 
> in mind if something is not really well explained.
> 
> About infra:
> 
> -   Ceph version 0.87.2 (Giant)
> 
> -   5 OSD Servers (127 TB) and 3 Monitors
> 
> -   Centos 7 based
> 
> We had a power supply problem that make one monitor down. (Power off). After 
> power restart, this monitor start ceph-mon, but does not contact the other 2 
> monitors. Ceph-mon process logs version and nothing else, but start a memory 
> compsumption until fill all and kernel stops process.
> 
> After some tries, we "thought" to deploy a new monitor (mon04) and This 
> was the start of our problems :(
> 
> 4 monitors, 2 up, 1 down and one new but not in ring...  so this is NOT 
> quorum then I lost monitor access (ceph related command does not work)
> 
> Then, suddenly, mon02 began to start ceph-mon, but "ceph-create-keys" module 
> does not contact with socket. Of course, socket files is created, but ceph 
> daemon mon.mon02  commands, not shows "mon_status", "quorum_status", and some 
> more (only shows 13 from 19 that shows mon01).
> 
> I have:
> 
> Monmap (with 4 monitors, mon01 to mon04)
> Monitors keyring
> 
> So I deploy a MON05 from scratch, make new filesystem and inject monmap and 
> keyring...
> 
> Mon_status on 3 monitors show:
> 
> Mon01:
> [root@mon01 ~]# ceph daemon mon.mon01 mon_status
> { "name": "mon01",
>   "rank": 0,
>   "state": "electing",
>   "election_epoch": 447,
>   "quorum": [],
>   "outside_quorum": [],
>   "extra_probe_peers": [
> "10.10.200.21:6789\/0",
> "10.10.200.24:6789\/0"],
>   "sync_provider": [],
>   "monmap": { "epoch": 20,
>   "fsid": "1aa318df-c6eb-47c5-a80e-2e9e43160e4e",
>   "modified": "2019-06-06 16:45:47.489558",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>   "name": "mon01",
>   "addr": "10.10.200.20:6789\/0"},
> { "rank": 1,
>   "name": "mon02",
>   "addr": "10.10.200.21:6789\/0"},
> { "rank": 2,
>   "name": "mon03",
>   "addr": "10.10.200.22:6789\/0"},
> { "rank": 3,
>   "name": "mon04",
>   "addr": "10.10.200.23:6789\/0"}]}}
> 
> Mon04:
> [root@mon04 ~]# ceph daemon mon.mon04 mon_status
> { "name": "mon04",
>   "rank": 3,
>   "state": "probing",
>   "election_epoch": 0,
>   "quorum": [],
>   "outside_quorum": [
> "mon01",
> "mon04"],
>   "extra_probe_peers": [
> "10.10.200.21:6789\/0",
> "10.10.200.22:6789\/0",
> "10.10.200.23:6789\/0",
> "10.10.200.24:6789\/0"],
>   "sync_provider": [],
>   "monmap": { "epoch": 20,
>   "fsid": "1aa318df-c6eb-47c5-a80e-2e9e43160e4e",
>   "modified": "2019-06-06 16:45:47.489558",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>  "name": "mon01",
>   "addr": "10.10.200.20:6789\/0"},
> { "rank": 1,
>   "name": "mon02",
>   "addr": "10.10.200.21:6789\/0"},
> { "rank": 2,
>   "name": "mon03",
>   "addr": "10.10.200.22:6789\/0"},
> { "rank": 3,
>   "name": "mon04",
>   "addr": "10.10.200.23:6789\/0"}]}}
> 
> Mon05:
> [root@mon05 ~]# ceph daemon mon.mon05 mon_status
> { "name": "mon05",
>   "rank": -1,
>   "state": "probing",
>   "election_epoch": 0,
>   "quorum": [],
>   "outside_quorum": [
> "mon01",
> "mon04"],
>   "extra_probe_peers": [
> "10.10.200.21:6789\/0",
> "10.10.200.22:6789\/0",
> "10.10.200.23:6789\/0"],
>   "sync_provider": [],
>   "monmap": { "epoch": 20,
>   "fsid": "1aa318df-c6eb-47c5-a80e-2e9e43160e4e",
>   "modified": "2019-06-06 16:45:47.489558",
>   "created": "0.00",
>   "mons": [
> { "rank": 0,
>   "name": "mon01",
>   "addr": "10.10.200.20:6789\/0"},
> { "rank": 1,
>   "name": "mon02",
>   "addr": "10.10.200.21:6789\/0"},
> { "rank": 2,
>   "name": "mon03",
>   "addr": "10.10.200.22:6789\/0"},
> { "rank": 3,
>   "name": "mon04",
>   "addr": "10.10.200.23:6789\/0"}]}}
> 
> 
> Mon02 and mon03, does not start ceph-mon.

Why are mon02 and mon03 not starting? That would be something important to
know. You should be able to find that in the logs.


> How I tell mon01 and mon04 that mon05 is a new partner in cluster?

If, instead of injecting a monmap, you just specify the fsid and mon_host, I
think mon05 will find its way into the quorum. I think that will address the
issue.

If not, then you can always extract the monmap from the existing monitors, add
mon05 to it, and inje

Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread Igor Podlesny
On Sat, 8 Jun 2019 at 04:35, Sergei Genchev  wrote:
>
>  Hi,
>  My OSD processes are constantly getting killed by OOM killer. My
> cluster has 5 servers, each with 18 spinning disks, running 18 OSD
> daemons in 48GB of memory.
>  I was trying to limit OSD cache, according to
> http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
>
> [osd]
> bluestore_cache_size_ssd = 1G
> bluestore_cache_size_hdd = 768M
> Yet, my OSDs are using way more memory than that. I have seen as high as 3.2G

Well, it's widely known for a long time that 640KB isn't enough for everyone. ;)

CEPH's OSD RAM consumption is largely dependent on its backing store capacity.
Check out official recommendations:
http://docs.ceph.com/docs/luminous/start/hardware-recommendations/
-- "ceph-osd: RAM~1GB for 1TB of storage per daemon".

You didn't specify capacity of the disks, BTW. 2--3 TB?

[...]
>  Is there any way for me to limit how much memory does OSD use?

Try adding to same [osd] section

osd_memory_target = ...Amount_in_Bytes...

Don't set it to 640 KB though. ;-) minimum recommendations make sense
still, reduce with caution.

-- 
End of message. Next message?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] radosgw dying

2019-06-08 Thread huang jun
From the error message, i'm decline to that 'mon_max_pg_per_osd' was exceed,
you can check the value of it, and its default value is 250, so you
can at most have 1500pgs(250*6osds),
and for replicated pools with size=3, you can have 500pgs for all pools,
you already have 448pgs, so the next pool can create at most 500-448=52pgs.

 于2019年6月8日周六 下午2:41写道:
>
> All;
>
> I have a test and demonstration cluster running (3 hosts, MON, MGR, 2x OSD 
> per host), and I'm trying to add a 4th host for gateway purposes.
>
> The radosgw process keeps dying with:
> 2019-06-07 15:59:50.700 7fc4ef273780  0 ceph version 14.2.1 
> (d555a9489eb35f84f2e1ef49b77e19da9d113972) nautilus (stable), process 
> radosgw, pid 17588
> 2019-06-07 15:59:51.358 7fc4ef273780  0 rgw_init_ioctx ERROR: 
> librados::Rados::pool_create returned (34) Numerical result out of range 
> (this can be due to a pool or placement group misconfiguration, e.g. pg_num < 
> pgp_num or mon_max_pg_per_osd exceeded)
> 2019-06-07 15:59:51.396 7fc4ef273780 -1 Couldn't init storage provider (RADOS)
>
> The .rgw.root pool already exists.
>
> ceph status returns:
>   cluster:
> id: 1a8a1693-fa54-4cb3-89d2-7951d4cee6a3
> health: HEALTH_OK
>
>   services:
> mon: 3 daemons, quorum S700028,S700029,S700030 (age 30m)
> mgr: S700028(active, since 47h), standbys: S700030, S700029
> osd: 6 osds: 6 up (since 2d), 6 in (since 3d)
>
>   data:
> pools:   5 pools, 448 pgs
> objects: 12 objects, 1.2 KiB
> usage:   722 GiB used, 65 TiB / 66 TiB avail
> pgs: 448 active+clean
>
> and ceph osd tree returns:
> ID CLASS WEIGHT   TYPE NAMESTATUS REWEIGHT PRI-AFF
> -1   66.17697 root default
> -5   22.05899 host S700029
>  2   hdd 11.02950 osd.2up  1.0 1.0
>  3   hdd 11.02950 osd.3up  1.0 1.0
> -7   22.05899 host S700030
>  4   hdd 11.02950 osd.4up  1.0 1.0
>  5   hdd 11.02950 osd.5up  1.0 1.0
> -3   22.05899 host s700028
>  0   hdd 11.02950 osd.0up  1.0 1.0
>  1   hdd 11.02950 osd.1up  1.0 1.0
>
> Any thoughts on what I'm missing?
>
> Thank you,
>
> Dominic L. Hilsbos, MBA
> Director - Information Technology
> Perform Air International Inc.
> dhils...@performair.com
> www.PerformAir.com
>
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread huang jun
Did you osd oom killed when cluster doing recover/backfill, or just
the client io?
The configure items you mentioned is for bluestore and the osd memory
include many other
things, like pglog, you it's important to known do you cluster is dong recover?

Sergei Genchev  于2019年6月8日周六 上午5:35写道:
>
>  Hi,
>  My OSD processes are constantly getting killed by OOM killer. My
> cluster has 5 servers, each with 18 spinning disks, running 18 OSD
> daemons in 48GB of memory.
>  I was trying to limit OSD cache, according to
> http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
>
> [osd]
> bluestore_cache_size_ssd = 1G
> bluestore_cache_size_hdd = 768M
>
> Yet, my OSDs are using way more memory than that. I have seen as high as 3.2G
>
> KiB Mem : 47877604 total,   310172 free, 45532752 used,  2034680 buff/cache
> KiB Swap:  2097148 total,0 free,  2097148 used.   950224 avail Mem
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>  352516 ceph  20   0 3962504   2.8g   4164 S   2.3  6.1   4:22.98
> ceph-osd
>  350771 ceph  20   0 3668248   2.7g   4724 S   3.0  6.0   3:56.76
> ceph-osd
>  352777 ceph  20   0 3659204   2.7g   4672 S   1.7  5.9   4:10.52
> ceph-osd
>  353578 ceph  20   0 3589484   2.6g   4808 S   4.6  5.8   3:37.54
> ceph-osd
>  352280 ceph  20   0 3577104   2.6g   4704 S   5.9  5.7   3:44.58
> ceph-osd
>  350933 ceph  20   0 3421168   2.5g   4140 S   2.6  5.4   3:38.13
> ceph-osd
>  353678 ceph  20   0 3368664   2.4g   4804 S   4.0  5.3  12:47.12
> ceph-osd
>  350665 ceph  20   0 3364780   2.4g   4716 S   2.6  5.3   4:23.44
> ceph-osd
>  353101 ceph  20   0 3304288   2.4g   4676 S   4.3  5.2   3:16.53
> ceph-osd
>  ...
>
>
>  Is there any way for me to limit how much memory does OSD use?
> Thank you!
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Reweight OSD to 0, why doesn't report degraded if UP set under Pool Size

2019-06-08 Thread huang jun
i think the write data will also write to the osd.4 in this case.
bc your osd.4 is not down, so the ceph don't think the pg have some osd
down,
and it will replicated the data to all osds in actingbackfill set.

Tarek Zegar  于2019年6月7日周五 下午10:37写道:

> Paul / All
>
> I'm not sure what warning your are referring to, I'm on Nautilus. The
> point I'm getting at is if you weight out all OSD on a host with a cluster
> of 3 OSD hosts with 3 OSD each, crush rule = host, then write to the
> cluster, it *should* imo not just say remapped but undersized / degraded.
>
> See below, 1 out of the 3 OSD hosts has ALL it's OSD marked out and weight
> = 0. When you write (say using FIO), the PGs *only* have 2 OSD in them (UP
> set), which is pool min size. I don't understand why it's not saying
> undersized/degraded, this seems like a bug. Who cares that the Acting Set
> has the 3 original OSD in it, the actual data is only on 2 OSD, which is a
> degraded state
>
> *root@hostadmin:~# ceph -s*
> cluster:
> id: 33d41932-9df2-40ba-8e16-8dedaa4b3ef6
> health: HEALTH_WARN
> application not enabled on 1 pool(s)
>
> services:
> mon: 1 daemons, quorum hostmonitor1 (age 29m)
> mgr: hostmonitor1(active, since 31m)
> osd: 9 osds: 9 up, 6 in; 100 remapped pgs
>
> data:
> pools: 1 pools, 100 pgs
> objects: 520 objects, 2.0 GiB
> usage: 15 GiB used, 75 GiB / 90 GiB avail
> pgs: 520/1560 objects misplaced (33.333%)
> *100 active+clean+remapped*
>
> *root@hostadmin:~# ceph osd tree*
> ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
> -1 0.08817 root default
> -3 0.02939 host hostosd1
> 0 hdd 0.00980 osd.0 up 1.0 1.0
> 3 hdd 0.00980 osd.3 up 1.0 1.0
> 6 hdd 0.00980 osd.6 up 1.0 1.0
> *-5 0.02939 host hostosd2*
> * 1 hdd 0.00980 osd.1 up 0 1.0*
> * 4 hdd 0.00980 osd.4 up 0 1.0*
> * 7 hdd 0.00980 osd.7 up 0 1.0*
> -7 0.02939 host hostosd3
> 2 hdd 0.00980 osd.2 up 1.0 1.0
> 5 hdd 0.00980 osd.5 up 1.0 1.0
> 8 hdd 0.00980 osd.8 up 1.0 1.0
>
>
> *root@hostadmin:~# ceph osd df*
> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL %USE VAR PGS
> STATUS
> 0 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48
> 1.03 34 up
> 3 hdd 0.00980 1.0 10 GiB 1.7 GiB 765 MiB 12 KiB 1024 MiB 8.2 GiB 17.48
> 1.03 36 up
> 6 hdd 0.00980 1.0 10 GiB 1.6 GiB 593 MiB 4 KiB 1024 MiB 8.4 GiB 15.80
> 0.93 30 up
> * 1 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up*
> * 4 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 0 up*
> * 7 hdd 0.00980 0 0 B 0 B 0 B 0 B 0 B 0 B 0 0 100 up*
> 2 hdd 0.00980 1.0 10 GiB 1.5 GiB 525 MiB 8 KiB 1024 MiB 8.5 GiB 15.13
> 0.89 20 up
> 5 hdd 0.00980 1.0 10 GiB 1.9 GiB 941 MiB 4 KiB 1024 MiB 8.1 GiB 19.20
> 1.13 43 up
> 8 hdd 0.00980 1.0 10 GiB 1.6 GiB 657 MiB 8 KiB 1024 MiB 8.4 GiB 16.42
> 0.97 37 up
> TOTAL 90 GiB 15 GiB 6.2 GiB 61 KiB 9.0 GiB 75 GiB 16.92
> MIN/MAX VAR: 0.89/1.13 STDDEV: 1.32
> Tarek Zegar
> Senior SDS Engineer
> Email *tze...@us.ibm.com* 
> Mobile *630.974.7172*
>
>
>
>
> [image: Inactive hide details for Paul Emmerich ---06/07/2019 05:25:23
> AM---remapped no longer triggers a health warning in nautilus. Y]Paul
> Emmerich ---06/07/2019 05:25:23 AM---remapped no longer triggers a health
> warning in nautilus. Your data is still there, it's just on the
>
> From: Paul Emmerich 
> To: Tarek Zegar 
> Cc: Ceph Users 
> Date: 06/07/2019 05:25 AM
> Subject: [EXTERNAL] Re: [ceph-users] Reweight OSD to 0, why doesn't
> report degraded if UP set under Pool Size
> --
>
>
>
> remapped no longer triggers a health warning in nautilus.
>
> Your data is still there, it's just on the wrong OSD if that OSD is still
> up and running.
>
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at *https://croit.io*
> 
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> *www.croit.io* 
> Tel: +49 89 1896585 90
>
>
> On Thu, Jun 6, 2019 at 10:48 PM Tarek Zegar <*tze...@us.ibm.com*
> > wrote:
>
>For testing purposes I set a bunch of OSD to 0 weight, this correctly
>forces Ceph to not use said OSD. I took enough out such that the UP set
>only had Pool min size # of OSD (i.e 2 OSD).
>
>Two Questions:
>1. Why doesn't the acting set eventually match the UP set and simply
>point to [6,5] only
>2. Why are none of the PGs marked as undersized and degraded? The data
>is only hosted on 2 OSD rather then Pool size (3), I would expect a
>undersized warning and degraded for PG with data?
>
>Example PG:
>PG 1.4d active+clean+remapped UP= [6,5] Acting = [6,5,4]
>
>OSD Tree:
>ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
>-1 0.08817 root default
>-3 0.02939 host hostosd1
>0 hdd 0.00980 osd.0 up 1.0 1.0
>3 hdd 0.00980 osd.3 up 1.0 1.0
>6 hdd 0.00980 osd.6 up 1.0 1.0
>-5 0.02939 host hostosd2
>1 hdd 0.00980 osd.1 up 0 1.0
>4 hdd 0.00980 osd.4 up 0 1.0

Re: [ceph-users] balancer module makes OSD distribution worse

2019-06-08 Thread huang jun
what's your 'ceph osd df tree' outputs?does the osd have the expected PGs?

Josh Haft  于2019年6月7日周五 下午9:23写道:
>
> 95% of usage is CephFS. Remaining is split between RGW and RBD.
>
> On Wed, Jun 5, 2019 at 3:05 PM Gregory Farnum  wrote:
> >
> > I think the mimic balancer doesn't include omap data when trying to
> > balance the cluster. (Because it doesn't get usable omap stats from
> > the cluster anyway; in Nautilus I think it does.) Are you using RGW or
> > CephFS?
> > -Greg
> >
> > On Wed, Jun 5, 2019 at 1:01 PM Josh Haft  wrote:
> > >
> > > Hi everyone,
> > >
> > > On my 13.2.5 cluster, I recently enabled the ceph balancer module in
> > > crush-compat mode. A couple manual 'eval' and 'execute' runs showed
> > > the score improving, so I set the following and enabled the auto
> > > balancer.
> > >
> > > mgr/balancer/crush_compat_metrics:bytes # from
> > > https://github.com/ceph/ceph/pull/20665
> > > mgr/balancer/max_misplaced:0.01
> > > mgr/balancer/mode:crush-compat
> > >
> > > Log messages from the mgr showed lower scores with each iteration, so
> > > I thought things were moving in the right direction.
> > >
> > > Initially my highest-utilized OSD was at 79% and MAXVAR was 1.17. I
> > > let the balancer do its thing for 5 days, at which point my highest
> > > utilized OSD was just over 90% and MAXVAR was about 1.28.
> > >
> > > I do have pretty low PG-per-OSD counts (average of about 60 - that's
> > > next on my list), but I explicitly asked the balancer to use the bytes
> > > metric. Was I just being impatient? Is it expected that usage would go
> > > up overall for a time before starting to trend downward? Is my low PG
> > > count affecting this somehow? I would have expected things to move in
> > > the opposite direction pretty quickly as they do with 'ceph osd
> > > reweight-by-utilization'.
> > >
> > > Thoughts?
> > >
> > > Regards,
> > > Josh
> > > ___
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



-- 
Thank you!
HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] balancer module makes OSD distribution worse

2019-06-08 Thread Igor Podlesny
On Thu, 6 Jun 2019 at 03:01, Josh Haft  wrote:
>
> Hi everyone,
>
> On my 13.2.5 cluster, I recently enabled the ceph balancer module in
> crush-compat mode.

Why did you choose compat mode? Don't you want to try another one instead?

-- 
End of message. Next message?
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread Paul Emmerich
On Fri, Jun 7, 2019 at 11:35 PM Sergei Genchev  wrote:

>  Hi,
>  My OSD processes are constantly getting killed by OOM killer. My
> cluster has 5 servers, each with 18 spinning disks, running 18 OSD
> daemons in 48GB of memory.
>  I was trying to limit OSD cache, according to
> http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
>
> [osd]
> bluestore_cache_size_ssd = 1G
> bluestore_cache_size_hdd = 768M
>

these options are no longer being used, set osd_memory_target (default: 4
GB) instead.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


>
> Yet, my OSDs are using way more memory than that. I have seen as high as
> 3.2G
>
> KiB Mem : 47877604 total,   310172 free, 45532752 used,  2034680 buff/cache
> KiB Swap:  2097148 total,0 free,  2097148 used.   950224 avail Mem
>
> PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> COMMAND
>  352516 ceph  20   0 3962504   2.8g   4164 S   2.3  6.1   4:22.98
> ceph-osd
>  350771 ceph  20   0 3668248   2.7g   4724 S   3.0  6.0   3:56.76
> ceph-osd
>  352777 ceph  20   0 3659204   2.7g   4672 S   1.7  5.9   4:10.52
> ceph-osd
>  353578 ceph  20   0 3589484   2.6g   4808 S   4.6  5.8   3:37.54
> ceph-osd
>  352280 ceph  20   0 3577104   2.6g   4704 S   5.9  5.7   3:44.58
> ceph-osd
>  350933 ceph  20   0 3421168   2.5g   4140 S   2.6  5.4   3:38.13
> ceph-osd
>  353678 ceph  20   0 3368664   2.4g   4804 S   4.0  5.3  12:47.12
> ceph-osd
>  350665 ceph  20   0 3364780   2.4g   4716 S   2.6  5.3   4:23.44
> ceph-osd
>  353101 ceph  20   0 3304288   2.4g   4676 S   4.3  5.2   3:16.53
> ceph-osd
>  ...
>
>
>  Is there any way for me to limit how much memory does OSD use?
> Thank you!
>
> ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic
> (stable)
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Can I limit OSD memory usage?

2019-06-08 Thread Sergei Genchev
@Paul Emmerich  Thank you very much!. That did the trick!


On Sat, Jun 8, 2019 at 4:59 AM huang jun  wrote:
>
> Did you osd oom killed when cluster doing recover/backfill, or just
> the client io?
> The configure items you mentioned is for bluestore and the osd memory
> include many other
> things, like pglog, you it's important to known do you cluster is dong 
> recover?
>
> Sergei Genchev  于2019年6月8日周六 上午5:35写道:
> >
> >  Hi,
> >  My OSD processes are constantly getting killed by OOM killer. My
> > cluster has 5 servers, each with 18 spinning disks, running 18 OSD
> > daemons in 48GB of memory.
> >  I was trying to limit OSD cache, according to
> > http://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
> >
> > [osd]
> > bluestore_cache_size_ssd = 1G
> > bluestore_cache_size_hdd = 768M
> >
> > Yet, my OSDs are using way more memory than that. I have seen as high as 
> > 3.2G
> >
> > KiB Mem : 47877604 total,   310172 free, 45532752 used,  2034680 buff/cache
> > KiB Swap:  2097148 total,0 free,  2097148 used.   950224 avail Mem
> >
> > PID USER  PR  NIVIRTRESSHR S  %CPU %MEM TIME+
> > COMMAND
> >  352516 ceph  20   0 3962504   2.8g   4164 S   2.3  6.1   4:22.98
> > ceph-osd
> >  350771 ceph  20   0 3668248   2.7g   4724 S   3.0  6.0   3:56.76
> > ceph-osd
> >  352777 ceph  20   0 3659204   2.7g   4672 S   1.7  5.9   4:10.52
> > ceph-osd
> >  353578 ceph  20   0 3589484   2.6g   4808 S   4.6  5.8   3:37.54
> > ceph-osd
> >  352280 ceph  20   0 3577104   2.6g   4704 S   5.9  5.7   3:44.58
> > ceph-osd
> >  350933 ceph  20   0 3421168   2.5g   4140 S   2.6  5.4   3:38.13
> > ceph-osd
> >  353678 ceph  20   0 3368664   2.4g   4804 S   4.0  5.3  12:47.12
> > ceph-osd
> >  350665 ceph  20   0 3364780   2.4g   4716 S   2.6  5.3   4:23.44
> > ceph-osd
> >  353101 ceph  20   0 3304288   2.4g   4676 S   4.3  5.2   3:16.53
> > ceph-osd
> >  ...
> >
> >
> >  Is there any way for me to limit how much memory does OSD use?
> > Thank you!
> >
> > ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic 
> > (stable)
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> --
> Thank you!
> HuangJun
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD caching on EC-pools (heavy cross OSD communication on cached reads)

2019-06-08 Thread jesper
Hi.

I just changed some of my data on CephFS to go to the EC pool instead
of the 3x replicated pool. The data is "write rare / read heavy" data
being served to an HPC cluster.

To my surprise it looks like the OSD memory caching is done at the
"split object level" not at the "assembled object level", as a
consequence - even though the dataset is fully memory cached it
actually deliveres a very "heavy" cross OSD network traffic to
assemble the objects back.

Since (as far as I understand) no changes can go the the underlying
object without going though the primary pg - then caching could be
more effectively done at that level.

The caching on the 3x replica does not retrieve all 3 copies to compare
and verify on a read request (or I at least cannot see any network
traffic supporting that it should be the case).

Is above configurable? Or would that be a feature/performance request?

Jesper

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com