[ceph-users] Re: Hardware for new OSD nodes.
Eneko and all, Regarding my current BlueFS Spillover issues, I've just noticed in https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/ that it says: If there is only a small amount of fast storage available (e.g., less than a gigabyte), we recommend using it as a WAL device. If there is more, provisioning a DB device makes more sense. The BlueStore journal will always be placed on the fastest device available, so using a DB device will provide the same benefit that the WAL device would while /also/ allowing additional metadata to be stored there (if it will fit). This makes me wonder if I shouldn't just move my DBs onto the HDDs and switch to WAL on the NVMe partitions. Does anybody have any thoughts on this? BTW, I don't think I have WAL set up, but I'd really like to check on both WAL and Journal settings to see if I can make any improvements. I also have 150GB left on my mirrored boot drive. I could un-mirror part of this and get 300GB of SATA SSD. Thoughts? -Dave Dave Hall Binghamton University kdh...@binghamton.edu On 10/23/2020 6:00 AM, Eneko Lacunza wrote: Hi Dave, El 22/10/20 a las 19:43, Dave Hall escribió: El 22/10/20 a las 16:48, Dave Hall escribió: (BTW, Nautilus 14.2.7 on Debian non-container.) We're about to purchase more OSD nodes for our cluster, but I have a couple questions about hardware choices. Our original nodes were 8 x 12TB SAS drives and a 1.6TB Samsung NVMe card for WAL, DB, etc. We chose the NVMe card for performance since it has an 8 lane PCIe interface. However, we're currently BlueFS spillovers. The Tyan chassis we are considering has the option of 4 x U.2 NVMe bays - each with 4 PCIe lanes, (and 8 SAS bays). It has occurred to me that I might stripe 4 1TB NVMe drives together to get much more space for WAL/DB and a net performance of 16 PCIe lanes. Any thoughts on this approach? Don't stripe them, if one NVMe fails you'll lose all OSDs. Just use 1 NVMe drive for 2 SAS drives and provision 300GB for WAL/DB for each OSD (see related threads on this mailing list about why that exact size). This way if a NVMe fails, you'll only lose 2 OSD. I was under the impression that everything that BlueStore puts on the SSD/NVMe could be reconstructed from information on the OSD. Am I mistaken about this? If so, my single 1.6TB NVMe card is equally vulnerable. I don't think so, that info only exists on that partition as was the case with filestore journal. Your single 1.6TB NVMe is vulnerable, yes. Also, what size of WAL/DB partitions do you have now, and what spillover size? I recently posted another question to the list on this topic, since I now have spillover on 7 of 24 OSDs. Since the data layout on the NVMe for BlueStore is not traditional I've never quite figured out how to get this information. The current partition size is 1.6TB /12 since we had the possibility to add for more drives to each node. How that was divided between WAL, DB, etc. is something I'd like to be able to understand. However, we're not going to add the extra 4 drives, so expanding the LVM partitions is now a possibility. Can you paste the warning message? If shows the spillover size. What size are the partitions on NVMe disk (lsblk) Cheers ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph cluster recovering status
Hi, my cluster was crashed by going down one of my DC and 'ceph -s' status dont show me the current working status and nothing change in large time, how can i see what is ceph doing really: cluster: health: HEALTH_ERR mons fond-beagle,guided-tuna are using a lot of disk space 1/3 mons down, quorum fond-beagle,guided-tuna 18/404368 objects unfound (0.004%) Reduced data availability: 235 pgs inactive, 72 pgs down, 9 pgs incomplete Possible data damage: 3 pgs recovery_unfound Degraded data redundancy: 306574/2607020 objects degraded (11.760%), 10 pgs degraded, 10 pgs undersized 2 pgs not deep-scrubbed in time 32408 slow ops, oldest one blocked for 62348 sec, daemons [osd.0,osd.10,osd.11,osd.13,osd.14,osd.15,osd.16,osd.17,osd.18,osd.19]... have slow ops. services: mon: 3 daemons, quorum fond-beagle,guided-tuna (age 31m), out of quorum: alive-lynx mgr: fond-beagle(active, since 31m) osd: 52 osds: 28 up (since 30m), 28 in (since 11h); 3 remapped pgs data: pools: 7 pools, 2305 pgs objects: 404.37k objects, 1.7 TiB usage: 2.7 TiB used, 22 TiB / 24 TiB avail pgs: 6.681% pgs unknown 3.514% pgs not active 306574/2607020 objects degraded (11.760%) 18/404368 objects unfound (0.004%) 2060 active+clean 154 unknown 72 down 9incomplete 7active+undersized+degraded 3active+recovery_unfound+undersized+degraded+remapped ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph not showing full capacity
Yes, There is a unbalance in PG's assigned to OSD's. `ceph osd df` output snip ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS 0hdd 5.45799 1.0 5.5 TiB 3.6 TiB 3.6 TiB 9.7 MiB 4.6 GiB 1.9 TiB 65.94 1.31 13 up 1hdd 5.45799 1.0 5.5 TiB 1.0 TiB 1.0 TiB 4.4 MiB 1.3 GiB 4.4 TiB 18.87 0.389 up 2hdd 5.45799 1.0 5.5 TiB 1.5 TiB 1.5 TiB 4.0 MiB 1.9 GiB 3.9 TiB 28.30 0.56 10 up 3hdd 5.45799 1.0 5.5 TiB 2.1 TiB 2.1 TiB 7.7 MiB 2.7 GiB 3.4 TiB 37.70 0.75 12 up 4hdd 5.45799 1.0 5.5 TiB 4.1 TiB 4.1 TiB 5.8 MiB 5.2 GiB 1.3 TiB 75.27 1.50 20 up 5hdd 5.45799 1.0 5.5 TiB 5.1 TiB 5.1 TiB 5.9 MiB 6.7 GiB 317 GiB 94.32 1.88 18 up 6hdd 5.45799 1.0 5.5 TiB 1.5 TiB 1.5 TiB 5.2 MiB 2.0 GiB 3.9 TiB 28.32 0.569 up MIN/MAX VAR: 0.19/1.88 STDDEV: 22.13 On Sun, Oct 25, 2020 at 12:08 AM Stefan Kooman wrote: > On 2020-10-24 14:53, Amudhan P wrote: > > Hi, > > > > I have created a test Ceph cluster with Ceph Octopus using cephadm. > > > > Cluster total RAW disk capacity is 262 TB but it's allowing to use of > only > > 132TB. > > I have not set quota for any of the pool. what could be the issue? > > Unbalance? What does ceph osd df show? How large is the standard deviation? > > Gr. Stefan > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph not showing full capacity
On 2020-10-24 14:53, Amudhan P wrote: > Hi, > > I have created a test Ceph cluster with Ceph Octopus using cephadm. > > Cluster total RAW disk capacity is 262 TB but it's allowing to use of only > 132TB. > I have not set quota for any of the pool. what could be the issue? Unbalance? What does ceph osd df show? How large is the standard deviation? Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ceph not showing full capacity
Hi Nathan, Attached crushmap output. let me know if you find any thing odd. On Sat, Oct 24, 2020 at 6:47 PM Nathan Fish wrote: > Can you post your crush map? Perhaps some OSDs are in the wrong place. > > On Sat, Oct 24, 2020 at 8:51 AM Amudhan P wrote: > > > > Hi, > > > > I have created a test Ceph cluster with Ceph Octopus using cephadm. > > > > Cluster total RAW disk capacity is 262 TB but it's allowing to use of > only > > 132TB. > > I have not set quota for any of the pool. what could be the issue? > > > > Output from :- > > ceph -s > > cluster: > > id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a > > health: HEALTH_WARN > > clock skew detected on mon.strg-node3, mon.strg-node2 > > 2 backfillfull osd(s) > > 4 pool(s) backfillfull > > 1 pools have too few placement groups > > > > services: > > mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m) > > mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv > > mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1 > > up:standby > > osd: 48 osds: 48 up (since 7m), 48 in (since 5d) > > > > task status: > > scrub status: > > mds.cephfs-strg.strg-node1.lhmeea: idle > > > > data: > > pools: 4 pools, 289 pgs > > objects: 17.29M objects, 66 TiB > > usage: 132 TiB used, 130 TiB / 262 TiB avail > > pgs: 288 active+clean > > 1 active+clean+scrubbing+deep > > > > mounted volume shows > > node1:/ 67T 66T 910G 99% /mnt/cephfs > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > { "devices": [ { "id": 0, "name": "osd.0", "class": "hdd" }, { "id": 1, "name": "osd.1", "class": "hdd" }, { "id": 2, "name": "osd.2", "class": "hdd" }, { "id": 3, "name": "osd.3", "class": "hdd" }, { "id": 4, "name": "osd.4", "class": "hdd" }, { "id": 5, "name": "osd.5", "class": "hdd" }, { "id": 6, "name": "osd.6", "class": "hdd" }, { "id": 7, "name": "osd.7", "class": "hdd" }, { "id": 8, "name": "osd.8", "class": "hdd" }, { "id": 9, "name": "osd.9", "class": "hdd" }, { "id": 10, "name": "osd.10", "class": "hdd" }, { "id": 11, "name": "osd.11", "class": "hdd" }, { "id": 12, "name": "osd.12", "class": "hdd" }, { "id": 13, "name": "osd.13", "class": "hdd" }, { "id": 14, "name": "osd.14", "class": "hdd" }, { "id": 15, "name": "osd.15", "class": "hdd" }, { "id": 16, "name": "osd.16", "class": "hdd" }, { "id": 17, "name": "osd.17", "class": "hdd" }, { "id": 18, "name": "osd.18", "class": "hdd" }, { "id": 19, "name": "osd.19", "class": "hdd" }, { "id": 20, "name": "osd.20", "class": "hdd" }, { "id": 21, "name": "osd.21", "class": "hdd" }, { "id": 22, "name": "osd.22", "class": "hdd" }, { "id": 23, "name": "osd.23", "class": "hdd" }, { "id": 24, "name": "osd.24", "class": "hdd" }, { "id": 25, "name": "osd.25", "class": "hdd" }, { "id": 26, "name": "osd.26", "class": "hdd" }, { "id": 27, "name": "osd.27", "class": "hdd" }, { "id": 28, "name": "osd.28", "class": "hdd" }, { "id": 29, "name": "osd.29", "class": "hdd" }, { "id": 30, "name": "osd.30", "class": "hdd" }, {
[ceph-users] Re: Ceph not showing full capacity
Can you post your crush map? Perhaps some OSDs are in the wrong place. On Sat, Oct 24, 2020 at 8:51 AM Amudhan P wrote: > > Hi, > > I have created a test Ceph cluster with Ceph Octopus using cephadm. > > Cluster total RAW disk capacity is 262 TB but it's allowing to use of only > 132TB. > I have not set quota for any of the pool. what could be the issue? > > Output from :- > ceph -s > cluster: > id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a > health: HEALTH_WARN > clock skew detected on mon.strg-node3, mon.strg-node2 > 2 backfillfull osd(s) > 4 pool(s) backfillfull > 1 pools have too few placement groups > > services: > mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m) > mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv > mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1 > up:standby > osd: 48 osds: 48 up (since 7m), 48 in (since 5d) > > task status: > scrub status: > mds.cephfs-strg.strg-node1.lhmeea: idle > > data: > pools: 4 pools, 289 pgs > objects: 17.29M objects, 66 TiB > usage: 132 TiB used, 130 TiB / 262 TiB avail > pgs: 288 active+clean > 1 active+clean+scrubbing+deep > > mounted volume shows > node1:/ 67T 66T 910G 99% /mnt/cephfs > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph not showing full capacity
Hi, I have created a test Ceph cluster with Ceph Octopus using cephadm. Cluster total RAW disk capacity is 262 TB but it's allowing to use of only 132TB. I have not set quota for any of the pool. what could be the issue? Output from :- ceph -s cluster: id: f8bc7682-0d11-11eb-a332-0cc47a5ec98a health: HEALTH_WARN clock skew detected on mon.strg-node3, mon.strg-node2 2 backfillfull osd(s) 4 pool(s) backfillfull 1 pools have too few placement groups services: mon: 3 daemons, quorum strg-node1,strg-node3,strg-node2 (age 7m) mgr: strg-node3.jtacbn(active, since 7m), standbys: strg-node1.gtlvyv mds: cephfs-strg:1 {0=cephfs-strg.strg-node1.lhmeea=up:active} 1 up:standby osd: 48 osds: 48 up (since 7m), 48 in (since 5d) task status: scrub status: mds.cephfs-strg.strg-node1.lhmeea: idle data: pools: 4 pools, 289 pgs objects: 17.29M objects, 66 TiB usage: 132 TiB used, 130 TiB / 262 TiB avail pgs: 288 active+clean 1 active+clean+scrubbing+deep mounted volume shows node1:/ 67T 66T 910G 99% /mnt/cephfs ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io