[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk
On Mon, Jul 27, 2020 at 08:02:23PM +0200, Mariusz Gronczewski wrote: > Hi, > > I've got a problem on Octopus (15.2.3, debian packages) install, bucket > S3 index shows a file: > > s3cmd ls s3://upvid/255/38355 --recursive > 2020-07-27 17:48 50584342 > > s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 > > radosgw-admin bi list also shows it > > { > "type": "plain", > "idx": > "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", > "entry": { "name": > "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", > "instance": "", "ver": { > "pool": 11, > "epoch": 853842 > }, > "locator": "", > "exists": "true", > "meta": { > "category": 1, > "size": 50584342, > "mtime": "2020-07-27T17:48:27.203008Z", > "etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7", > "storage_class": "", > "owner": "filmweb-app", > "owner_display_name": "filmweb app user", > "content_type": "", > "accounted_size": 50584342, > "user_data": "", > "appendable": "false" > }, > "tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP", > "flags": 0, > "pending_map": [], > "versioned_epoch": 0 > } > }, > > but trying to download it via curl (I've set permissions to public0 only gets > me Does the RADOS object for this still exist? try: radosgw-admin object stat --bucket ... --object '255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' If that doesn't return, then the backing object is gone, and you have a stale index entry that can be cleaned up in most cases with check bucket. For cases where that doesn't fix it, my recommended way to fix it is write a new 0-byte object to the same name, then delete it. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster became unresponsive: e5 handle_auth_request failed to assign global_id
Well, port 6800 is not a monitor port as I just looked up, so I wouldn't look there. Can you use ceph command from another mon ? Also maybe the user you use can't access the admin keyring - as far as I remember that lead to infinetely hanging commands on my test cluster (but was Nautilus, don't know if that changed) - or maybe you used to fire the commands from the folder you used to deploy and didn't admin the machine. Just some thoughts. On 27.07.20 16:28, Илья Борисович Волошин wrote: Here are all the active ports on mon1 (with the exception of sshd and ntpd): # netstat -npl Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp0 0 :3300 0.0.0.0:* LISTEN 1582/ceph-mon tcp0 0 :6789 0.0.0.0:* LISTEN 1582/ceph-mon tcp6 0 0 :::9093 :::*LISTEN 908/alertmanager tcp6 0 0 :::9094 :::*LISTEN 908/alertmanager tcp6 0 0 :::9095 :::*LISTEN 896/prometheus tcp6 0 0 :::9100 :::*LISTEN 906/node_exporter tcp6 0 0 :::3000 :::*LISTEN 882/grafana-server udp6 0 0 :::9094 :::* 908/alertmanager I've tried telnet from mon1 host, can connect to 3300 and 6789: # telnet 3300 Trying ... Connected to . Escape character is '^]'. ceph v2 # telnet 6789 Trying ... Connected to . Escape character is '^]'. ceph v027QQ 6800 and 6801 refuse connection: # telnet 6800 Trying ... telnet: Unable to connect to remote host: Connection refused I don't see any errors in the log related to failures to bind... and all CEPH systemd services are running as far as I can tell: # systemctl list-units -a | grep ceph ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@alertmanager.mon1.service loadedactive running Ceph alertmanager.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@crash.mon1.service loadedactive running Ceph crash.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@grafana.mon1.service loadedactive running Ceph grafana.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mgr.mon1.peevkl.service loadedactive running Ceph mgr.mon1.peevkl for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mon.mon1.service loadedactive running Ceph mon.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@node-exporter.mon1.service loadedactive running Ceph node-exporter.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@prometheus.mon1.service loadedactive running Ceph prometheus.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice loadedactive active system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5.target loadedactive activeCeph cluster e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph.target loadedactive activeAll Ceph clusters and services Here are currently active docker images: # docker ps CONTAINER IDIMAGECOMMAND CREATED STATUS PORTS NAMES dfd8dbeccf1eceph/ceph:v15"/usr/bin/ceph-mgr -…" 41 minutes ago Up 41 minutes ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mgr.mon1.peevkl 9452d1db7ffbceph/ceph:v15"/usr/bin/ceph-mon -…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mon.mon1 703ec4a43824prom/prometheus:v2.18.1 "/bin/prometheus --c…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-prometheus.mon1 d816ec5e645fceph/ceph:v15"/usr/bin/ceph-crash…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-crash.mon1 38d283ba6424ceph/ceph-grafana:latest "/bin/sh -c 'grafana…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-grafana.mon1 cc119ec8f09aprom/node-exporter:v0.18.1 "/bin/node_exporter …" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-node-exporter.mon1 aa1d339c4100prom/alertmanager:v0.20.0"/bin/alertmanager -…" 3 hours ago Up 3 hours
[ceph-users] Re: rbd-nbd stuck request
On Mon, Jul 27, 2020 at 3:08 PM Herbert Alexander Faleiros wrote: > > Hi, > > On Fri, Jul 24, 2020 at 12:37:38PM -0400, Jason Dillaman wrote: > > On Fri, Jul 24, 2020 at 10:45 AM Herbert Alexander Faleiros > > wrote: > > > > > > On Fri, Jul 24, 2020 at 07:28:07PM +0500, Alexander E. Patrakov wrote: > > > > On Fri, Jul 24, 2020 at 6:01 PM Herbert Alexander Faleiros > > > > wrote: > > > > > > > > > > Hi, > > > > > > > > > > is there any way to fix it instead a reboot? > > > > > > > > > > [128632.995249] block nbd0: Possible stuck request b14a04af: > > > > > control (read@2097152,4096B). Runtime 9540 seconds > > > > > [128663.718993] block nbd0: Possible stuck request b14a04af: > > > > > control (read@2097152,4096B). Runtime 9570 seconds > > > > > [128694.434774] block nbd0: Possible stuck request b14a04af: > > > > > control (read@2097152,4096B). Runtime 9600 seconds > > > > > [128725.154515] block nbd0: Possible stuck request b14a04af: > > > > > control (read@2097152,4096B). Runtime 9630 seconds > > > > > > > > > > # ceph -v > > > > > ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) > > > > > luminous (stable) > > > > > > > > > > # rbd-nbd list-mapped > > > > > # > > > > > > > > > > # uname -r > > > > > 5.4.52-050452-generic > > > > > > > > Not enough data to troubleshoot this. Is the rbd-nbd process running? > > > > > > > > I.e.: > > > > > > > > # cat /proc/partitions > > > > # ps axww | grep nbd > > > > > > no nbd on /proc/partitions, ps shows only: > > > > > > root 192324 0.0 0.0 0 0 ?I< 07:12 0:00 > > > [knbd0-recv] > > > > You can restart the "rbd-nbd" daemon by running "rbd-nbd map --device > > /dev/nbd0 " > > works, but when I try to unmap, the command block the terminal and > never end (I cannot even kill it). Same with nbd-client. > > Detail: only happens with journaling enabled. Luminous is EOL -- any chance you can reproduce using an Octopus "rbd-nbd" client? > -- > Herbert > -- Jason ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: rbd-nbd stuck request
Hi, On Fri, Jul 24, 2020 at 12:37:38PM -0400, Jason Dillaman wrote: > On Fri, Jul 24, 2020 at 10:45 AM Herbert Alexander Faleiros > wrote: > > > > On Fri, Jul 24, 2020 at 07:28:07PM +0500, Alexander E. Patrakov wrote: > > > On Fri, Jul 24, 2020 at 6:01 PM Herbert Alexander Faleiros > > > wrote: > > > > > > > > Hi, > > > > > > > > is there any way to fix it instead a reboot? > > > > > > > > [128632.995249] block nbd0: Possible stuck request b14a04af: > > > > control (read@2097152,4096B). Runtime 9540 seconds > > > > [128663.718993] block nbd0: Possible stuck request b14a04af: > > > > control (read@2097152,4096B). Runtime 9570 seconds > > > > [128694.434774] block nbd0: Possible stuck request b14a04af: > > > > control (read@2097152,4096B). Runtime 9600 seconds > > > > [128725.154515] block nbd0: Possible stuck request b14a04af: > > > > control (read@2097152,4096B). Runtime 9630 seconds > > > > > > > > # ceph -v > > > > ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e) > > > > luminous (stable) > > > > > > > > # rbd-nbd list-mapped > > > > # > > > > > > > > # uname -r > > > > 5.4.52-050452-generic > > > > > > Not enough data to troubleshoot this. Is the rbd-nbd process running? > > > > > > I.e.: > > > > > > # cat /proc/partitions > > > # ps axww | grep nbd > > > > no nbd on /proc/partitions, ps shows only: > > > > root 192324 0.0 0.0 0 0 ?I< 07:12 0:00 > > [knbd0-recv] > > You can restart the "rbd-nbd" daemon by running "rbd-nbd map --device > /dev/nbd0 " works, but when I try to unmap, the command block the terminal and never end (I cannot even kill it). Same with nbd-client. Detail: only happens with journaling enabled. -- Herbert ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] NoSuchKey on key that is visible in s3 list/radosgw bk
Hi, I've got a problem on Octopus (15.2.3, debian packages) install, bucket S3 index shows a file: s3cmd ls s3://upvid/255/38355 --recursive 2020-07-27 17:48 50584342 s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 radosgw-admin bi list also shows it { "type": "plain", "idx": "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", "entry": { "name": "255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4", "instance": "", "ver": { "pool": 11, "epoch": 853842 }, "locator": "", "exists": "true", "meta": { "category": 1, "size": 50584342, "mtime": "2020-07-27T17:48:27.203008Z", "etag": "2b31cc8ce8b1fb92a5f65034f2d12581-7", "storage_class": "", "owner": "filmweb-app", "owner_display_name": "filmweb app user", "content_type": "", "accounted_size": 50584342, "user_data": "", "appendable": "false" }, "tag": "_3ubjaztglHXfZr05wZCFCPzebQf-ZFP", "flags": 0, "pending_map": [], "versioned_epoch": 0 } }, but trying to download it via curl (I've set permissions to public0 only gets me NoSuchKeyupvidtxe716d-005f1f14cb-e478a-pl-war1e478a-pl-war1-pl (the actually nonexisting files shows access denied in same context) same with other tools: $ s3cmd get s3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4 /tmp download: 's3://upvid/255/38355/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' -> '/tmp/juz_nie_zyjesz_sezon_2___oficjalny_zwiastun___netflix_mp4' [1 of 1] ERROR: S3 error: 404 (NoSuchKey) cluster health is OK Any ideas what is happening here ? -- Mariusz Gronczewski, Administrator Efigence S. A. ul. Wołoska 9a, 02-583 Warszawa T: [+48] 22 380 13 13 NOC: [+48] 22 380 10 20 E: ad...@efigence.com ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mimic: much more raw used than reported
Hi Igor, thanks for your answer. I was thinking about that, but as far as I understood, to hit this bug actually requires a partial rewrite to happen. However, these are disk images in storage servers with basically static files, many of which very large (15GB). Therefore, I believe, the vast majority of objects is written to only once and should not be affected by the amplification bug. Is there any way to confirm/rule out that/check how much amplification is happening? I'm wondering if I might be observing something else. Since "ceph osd df tree" does report the actual utilization and I have only one pool on these OSDs, there is no problem with accounting allocated storage to a pool. I know its all used by this one pool. I'm more wondering if its not the known amplification but something else (at least partly) that plays a role here. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 27 July 2020 12:54:02 To: Frank Schilder; ceph-users Subject: Re: [ceph-users] mimic: much more raw used than reported Hi Frank, you might be being hit by https://tracker.ceph.com/issues/44213 In short the root causes are significant space overhead due to high bluestore allocation unit (64K) and EC overwrite design. This is fixed for upcoming Pacific release by using 4K alloc unit but it is unlikely to be backported to earlier releases due to its complexity. To say nothing about the need for OSD redeployment. Hence please expect no fix for mimic. And your raw usage reports might still be not that good since mimic lacks per-pool stats collection https://github.com/ceph/ceph/pull/19454. I.e. your actual raw space usage is higher than reported. To estimate proper raw usage one can use bluestore perf counters (namely bluestore_stored and bluestore_allocated). Summing bluestore_allocated over all involved OSDs will give actual RAW usage. Summing bluestore_stored will provide actual data volume after EC processing, i.e. presumably it should be around 158TiB. Thanks, Igor On 7/26/2020 8:43 PM, Frank Schilder wrote: > Dear fellow cephers, > > I observe a wired problem on our mimic-13.2.8 cluster. We have an EC RBD pool > backed by HDDs. These disks are not in any other pool. I noticed that the > total capacity (=USED+MAX AVAIL) reported by "ceph df detail" has shrunk > recently from 300TiB to 200TiB. Part but by no means all of this can be > explained by imbalance of the data distribution. > > When I compare the output of "ceph df detail" and "ceph osd df tree", I find > 69TiB raw capacity used but not accounted for; see calculations below. These > 69TiB raw are equivalent to 20% usable capacity and I really need it back. > Together with the imbalance, we loose about 30% capacity. > > What is using these extra 69TiB and how can I get it back? > > > Some findings: > > These are the 5 largest images in the pool, accounting for a total of 97TiB > out of 119TiB usage: > > # rbd du : > NAMEPROVISIONED USED > one-133 25 TiB 14 TiB > NAMEPROVISIONEDUSED > one-153@222 40 TiB 14 TiB > one-153@228 40 TiB 357 GiB > one-153@235 40 TiB 797 GiB > one-153@241 40 TiB 509 GiB > one-153@242 40 TiB 43 GiB > one-153@243 40 TiB 16 MiB > one-153@244 40 TiB 16 MiB > one-153@245 40 TiB 324 MiB > one-153@246 40 TiB 276 MiB > one-153@247 40 TiB 96 MiB > one-153@248 40 TiB 138 GiB > one-153@249 40 TiB 1.8 GiB > one-153@250 40 TiB 0 B > one-153 40 TiB 204 MiB > 40 TiB 16 TiB > NAME PROVISIONEDUSED > one-391@3 40 TiB 432 MiB > one-391@9 40 TiB 26 GiB > one-391@15 40 TiB 90 GiB > one-391@16 40 TiB 0 B > one-391@17 40 TiB 0 B > one-391@18 40 TiB 0 B > one-391@19 40 TiB 0 B > one-391@20 40 TiB 3.5 TiB > one-391@21 40 TiB 5.4 TiB > one-391@22 40 TiB 5.8 TiB > one-391@23 40 TiB 8.4 TiB > one-391@24 40 TiB 1.4 TiB > one-391 40 TiB 2.2 TiB > 40 TiB 27 TiB > NAME PROVISIONEDUSED > one-394@3 70 TiB 1.4 TiB > one-394@9 70 TiB 2.5 TiB > one-394@15 70 TiB 20 GiB > one-394@16 70 TiB 0 B > one-394@17 70 TiB 0 B > one-394@18 70 TiB 0 B > one-394@19 70 TiB 383 GiB > one-394@20 70 TiB 3.3 TiB > one-394@21 70 TiB 5.0 TiB > one-394@22 70 TiB 5.0 TiB > one-394@23 70 TiB 9.0 TiB > one-394@24 70 TiB 1.6 TiB > one-394 70 TiB 2.5 TiB > 70 TiB 31 TiB > NAMEPROVISIONEDUSED > one-434 25 TiB 9.1 TiB > > The large 70TiB images one-391 and one-394 are currently copied to with ca. > 5TiB per day. > > Output of "ceph df detail" with some columns removed: > > NAME ID USED%USED MAX AVAIL OBJECTS >RAW USED > sr-rbd-data-one-hdd 11 119 TiB 58.4584
[ceph-users] Re: Cluster became unresponsive: e5 handle_auth_request failed to assign global_id
Here are all the active ports on mon1 (with the exception of sshd and ntpd): # netstat -npl Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name tcp0 0 :3300 0.0.0.0:* LISTEN 1582/ceph-mon tcp0 0 :6789 0.0.0.0:* LISTEN 1582/ceph-mon tcp6 0 0 :::9093 :::*LISTEN 908/alertmanager tcp6 0 0 :::9094 :::*LISTEN 908/alertmanager tcp6 0 0 :::9095 :::*LISTEN 896/prometheus tcp6 0 0 :::9100 :::*LISTEN 906/node_exporter tcp6 0 0 :::3000 :::*LISTEN 882/grafana-server udp6 0 0 :::9094 :::* 908/alertmanager I've tried telnet from mon1 host, can connect to 3300 and 6789: # telnet 3300 Trying ... Connected to . Escape character is '^]'. ceph v2 # telnet 6789 Trying ... Connected to . Escape character is '^]'. ceph v027QQ 6800 and 6801 refuse connection: # telnet 6800 Trying ... telnet: Unable to connect to remote host: Connection refused I don't see any errors in the log related to failures to bind... and all CEPH systemd services are running as far as I can tell: # systemctl list-units -a | grep ceph ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@alertmanager.mon1.service loadedactive running Ceph alertmanager.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@crash.mon1.service loadedactive running Ceph crash.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@grafana.mon1.service loadedactive running Ceph grafana.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mgr.mon1.peevkl.service loadedactive running Ceph mgr.mon1.peevkl for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@mon.mon1.service loadedactive running Ceph mon.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@node-exporter.mon1.service loadedactive running Ceph node-exporter.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5@prometheus.mon1.service loadedactive running Ceph prometheus.mon1 for e30397f0-cc32-11ea-8c8e-000c29469cd5 system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice loadedactive active system-ceph\x2de30397f0\x2dcc32\x2d11ea\x2d8c8e\x2d000c29469cd5.slice ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5.target loadedactive activeCeph cluster e30397f0-cc32-11ea-8c8e-000c29469cd5 ceph.target loadedactive activeAll Ceph clusters and services Here are currently active docker images: # docker ps CONTAINER IDIMAGECOMMAND CREATED STATUS PORTS NAMES dfd8dbeccf1eceph/ceph:v15"/usr/bin/ceph-mgr -…" 41 minutes ago Up 41 minutes ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mgr.mon1.peevkl 9452d1db7ffbceph/ceph:v15"/usr/bin/ceph-mon -…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-mon.mon1 703ec4a43824prom/prometheus:v2.18.1 "/bin/prometheus --c…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-prometheus.mon1 d816ec5e645fceph/ceph:v15"/usr/bin/ceph-crash…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-crash.mon1 38d283ba6424ceph/ceph-grafana:latest "/bin/sh -c 'grafana…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-grafana.mon1 cc119ec8f09aprom/node-exporter:v0.18.1 "/bin/node_exporter …" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-node-exporter.mon1 aa1d339c4100prom/alertmanager:v0.20.0"/bin/alertmanager -…" 3 hours ago Up 3 hours ceph-e30397f0-cc32-11ea-8c8e-000c29469cd5-alertmanager.mon1 iptables are active, I tried setting all chain policies to ACCEPT (didn't help), the rules are as such: 0 0 CEPH tcp -- * * 0.0.0.0/0 0.0.0.0/0tcp dpt:6789 5060 303K CEPH tcp -- * * 0.0.0.0/0 0.0.0.0/0multiport dports 6800:7300 Chain CEPH includes addresses for monitors and OSDs. пн, 27 июл. 2020 г. в 17:07, Dino Godor : > Hi, > > have you tried to locally connect to the ports with netcat (or telnet)? > > Is the
[ceph-users] Re: Fwd: BlueFS assertion ceph_assert(h->file->fnode.ino != 1)
Hi Alexei, just left a comment in the ticket... Thanks, Igor On 7/25/2020 3:31 PM, Aleksei Zakharov wrote: Hi all, I wonder if someone else faced the issue described on the tracker: https://tracker.ceph.com/issues/45519 We thought that this problem is caused by high OSD fragmentation, until today. For now even OSDs with fragmentation rating < .3 are affected. We don't use separate DB/WAL partition in this setup and strings like this before failing: 2020-07-25 11:08:22.961 7f6f489d5700 1 bluefs _allocate failed to allocate 0x33dd4c5 on bdev 1, free 0x2bc; fallback to bdev 2 2020-07-25 11:08:22.961 7f6f489d5700 1 bluefs _allocate unable to allocate 0x33dd4c5 on bdev 2, free 0x; fallback to slow device expander look suspicious for us. We use 4KiB bluefs and bluestore block sizes as well as store the objects ~1KiB size and it looks like this makes the issue to be reproduced much more frequently. But, as I can see on the tracker / telegram channels, different people face with it from time to time, for example: https://paste.ubuntu.com/p/GDCXDrnrtX/ (telegram link https://t.me/ceph_users/376) Did anyone able to identify the root cause and/or find a workaround for it? BTW, ceph would be a nice small-objects storage showing 300-500usec latency if not this issue and this: https://tracker.ceph.com/issues/45765 one. -- Regards, Aleksei Zakharov ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cluster became unresponsive: e5 handle_auth_request failed to assign global_id
Hi, have you tried to locally connect to the ports with netcat (or telnet)? Is the process listening ? (something like netstat -4ln or the current equivalent thereof) Is the old (new) Firewall maybe still running ? On 27.07.20 16:00, Илья Борисович Волошин wrote: Hello, I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 hosts in total, all ESXi VMs). It lived through a couple of reboots without problem, then I've reconfigured the main host a bit: set iptables-legacy as current option in update-alternatives (this is a Debian10 system), applied a basic ruleset of iptables and restarted docker. After that the cluster became unresponsive (any ceph command hangs indefinitely). I can use admin socket to manipulate config though. Setting debug_ms to 5 I see this in the logs (timestamps cut for readability): 7f4096f41700 5 --2- [v2::3300/0,v1::6789/0] >> [v2::3300/0,v1::6789/0] conn(0x55c21b975800 0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx= 0).send_message enqueueing message m=0x55c21bd84a00 type=67 mon_probe(probe e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7 7f4098744700 1 -- >> [v2::6800/561959008,v1::6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:81.200.2 .152:6800/561959008 7f4098744700 2 -- >> [v2::6800/561959008,v1::6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process connection refused! and this: 7f4098744700 2 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0)._fault on lossy channel, failing 7f4098744700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).stop 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_recv_state 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_security 7f409373a700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept 7f4098744700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8 peer_addr_for_me=v2::3300/0 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am :3300 when talking to v2::49012/0 7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to assign global_id Config (the result of ceph --admin-daemon /run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok config show): https://pastebin.com/kifMXs9H I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return 'process connection refused' Setting all iptables policies to ACCEPT didn't change anything. Where should I start digging to fix this problem? I'd like to at least understand why this happened before putting the cluster into production. Any help is appreciated. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Cluster became unresponsive: e5 handle_auth_request failed to assign global_id
Hello, I've created an Octopus 15.2.4 cluster with 3 monitors and 3 OSDs (6 hosts in total, all ESXi VMs). It lived through a couple of reboots without problem, then I've reconfigured the main host a bit: set iptables-legacy as current option in update-alternatives (this is a Debian10 system), applied a basic ruleset of iptables and restarted docker. After that the cluster became unresponsive (any ceph command hangs indefinitely). I can use admin socket to manipulate config though. Setting debug_ms to 5 I see this in the logs (timestamps cut for readability): 7f4096f41700 5 --2- [v2::3300/0,v1::6789/0] >> [v2::3300/0,v1::6789/0] conn(0x55c21b975800 0x55c21ab45180 unknown :-1 s=START_CONNECT pgs=0 cs=0 l=0 rx=0 tx= 0).send_message enqueueing message m=0x55c21bd84a00 type=67 mon_probe(probe e30397f0-cc32-11ea-8c8e-000c29469cd5 name mon1 mon_release octopus) v7 7f4098744700 1 -- >> [v2::6800/561959008,v1::6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process reconnect failed to v2:81.200.2 .152:6800/561959008 7f4098744700 2 -- >> [v2::6800/561959008,v1::6801/561959008] conn(0x55c21b974400 msgr2=0x55c21ab45600 unknown :-1 s=STATE_CONNECTING_RE l=0).process connection refused! and this: 7f4098744700 2 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0)._fault on lossy channel, failing 7f4098744700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).stop 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_recv_state 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21ba38c00 0x55c21bcc5a80 secure :-1 s=AUTH_ACCEPTING pgs=0 cs=0 l=1 rx=0 tx=0).reset_security 7f409373a700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=NONE pgs=0 cs=0 l=0 rx=0 tx=0).accept 7f4098744700 1 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=BANNER_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0)._handle_peer_banner_payload supported=0 required=0 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello received hello: peer_type=8 peer_addr_for_me=v2::3300/0 7f4098744700 5 --2- [v2::3300/0,v1::6789/0] >> conn(0x55c21c0d2800 0x55c21bcc3f80 unknown :-1 s=HELLO_ACCEPTING pgs=0 cs=0 l=0 rx=0 tx=0).handle_hello getsockname says I am :3300 when talking to v2::49012/0 7f4098744700 1 mon.mon1@0(probing) e5 handle_auth_request failed to assign global_id Config (the result of ceph --admin-daemon /run/ceph/e30397f0-cc32-11ea-8c8e-000c29469cd5/ceph-mon.mon1.asok config show): https://pastebin.com/kifMXs9H I can connect to ports 3300 and 6789 with telnet; 6800 and 6801 return 'process connection refused' Setting all iptables policies to ACCEPT didn't change anything. Where should I start digging to fix this problem? I'd like to at least understand why this happened before putting the cluster into production. Any help is appreciated. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] snaptrim blocks IO on ceph nautilus
Hi, since some days I try to debug a problem with snaptrimming under nautilus. I have a cluster with Nautilus (v14.2.10) , 44 Nodes á 24 OSDs á 14 TB I create every day a snapshot for 7 days. Every time the old snapshot is deleting I have bad IO performcance and blocked requests for several seconds until the snaptrim is done. Settings like snaptrim_sleep and osd_pg_max_concurrent_snap_trims don't affect this behavior. In the debug_osd 10/10 log I see the following: 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 prio 196 cost 0 latency 0.019545 osd_repop_reply(client.22731418.0:615257 3.636 e22457/22372) v2 pg pg[3.636( v 22457'100855 (21737'97756,22457'100855] local-lis/les=22372/22374 n=27762 ec=2842/2839 lis/c 22372/22372 les/c/f 22374/22374/0 22372/22372/22343) [411,36,956,763] r=0 lpr=22372 luod=22457'100854 crt=22457'100855 lcod 22457'100853 mlcod 22457'100853 active+clean+snaptrim_wait trimq=[1d~1]] 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edda20 finish 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 prio 127 cost 0 latency 0.043165 MOSDScrubReserve(2.2645 RELEASE e22457) v1 pg pg[2.2645( empty local-lis/les=22359/22364 n=0 ec=2403/2403 lis/c 22359/22359 les/c/f 22364/22367/0 22359/22359/22359) [379,411,884,975] r=1 lpr=22359 crt=0'0 active mbc={}] 2020-07-27 11:45:49.976 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557886edc2c0 finish 2020-07-27 11:45:50.039 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99491 (21594'96426,22457'99491] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 crt=22457'99491 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer posting 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 pg_epoch: 22457 pg[3.278e( v 22457'99493 (21594'96426,22457'99493] local-lis/les=22359/22362 n=27669 ec=2859/2839 lis/c 22359/22359 les/c/f 22362/22365/0 22359/22359/22343) [411,379,848,924] r=0 lpr=22359 luod=22457'99491 crt=22457'99493 lcod 22457'99489 mlcod 22457'99489 active+clean+snaptrim trimq=[1d~1]] snap_trimmer complete 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 prio 127 cost 663 latency 7.761823 osd_repop(osd.217.0:3025 3.1ca5 e22457/22378) v2 pg pg[3.1ca5( v 22457'100370 (21716'97357,22457'100370] local-lis/les=22378/22379 n=27532 ec=2855/2839 lis/c 22378/22378 les/c/f 22379/22379/0 22378/22378/22378) [217,411,551,1055] r=1 lpr=22378 luod=0'0 lua=22294'16 crt=22457'100370 lcod 22457'100369 active mbc={}] 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x557880ac3760 finish 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 prio 127 cost 0 latency 7.494296 MOSDScrubReserve(2.37e2 REQUEST e22457) v1 pg pg[2.37e2( empty local-lis/les=22355/22356 n=0 ec=2412/2412 lis/c 22355/22355 les/c/f 22356/22356/0 22355/22355/22355) [245,411,834,768] r=1 lpr=22355 crt=0'0 active mbc={}] 2020-07-27 11:45:57.801 7fd8b8404700 10 osd.411 22457 dequeue_op 0x5578813e1e40 finish the dequeueing of ops works without pauses until the „snap_trimmer posting“ and „snap_trimmer complete“ loglines. This task takes in this example about 7 Seconds. The following operations which are dequeued have now a latency of about this time. I tried to drill down this in the code. (Developers are asked here) It seems, that the PG will be locked for every operation. The snap_trimmer posting and complete message comes from „osd/PrimaryLogPG.cc“ on line 4700. This indicates me, that the process of deleting a snapshot object will sometimes take some time. After further poking around. I see in „osd/SnapMapper.cc“ the method „SnapMapper::get_next_objects_to_trim“ which takes several seconds to get finished. I followed this further to the „common/map_cacher.hpp“ to the line 94: „int r = driver->get_next(key, );“ From there I lost the path. The slowness is not on all OSDs at the same time. Somteime, this few OSDs are affected, sometimes some others. Restart of an OSD does not help. With luminous and filestore, snapshot deletion was not an issue at all. With nautilus and bluestore this is not acceptable for my usecase. I don‘t know so far, if this is a bluestore specific problem or some general issue. I wonder a bit why there are no other who have this problem. Regards Manuel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Push config to all hosts
Hi Cem, Since https://github.com/ceph/ceph/pull/35576 you will be able to tell cephadm to keep your `/etc/ceph/ceph.conf` updated in all hosts by runnig: # ceph config set mgr mgr/cephadm/manage_etc_ceph_ceph_conf true But this feature was not released yet, so you will have to wait for v15.2.5. Ricardo Marques From: Cem Zafer Sent: Monday, June 29, 2020 6:37 AM To: ceph-users@ceph.io Subject: [ceph-users] Push config to all hosts Hi, What is the best method(s) to push ceph.conf to all hosts in octopus (15.x)? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mimic: much more raw used than reported
Frank, suggest to start with perf counter analysis as per the second part of my previous email... Thanks, Igor On 7/27/2020 2:30 PM, Frank Schilder wrote: Hi Igor, thanks for your answer. I was thinking about that, but as far as I understood, to hit this bug actually requires a partial rewrite to happen. However, these are disk images in storage servers with basically static files, many of which very large (15GB). Therefore, I believe, the vast majority of objects is written to only once and should not be affected by the amplification bug. Is there any way to confirm/rule out that/check how much amplification is happening? I'm wondering if I might be observing something else. Since "ceph osd df tree" does report the actual utilization and I have only one pool on these OSDs, there is no problem with accounting allocated storage to a pool. I know its all used by this one pool. I'm more wondering if its not the known amplification but something else (at least partly) that plays a role here. Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Igor Fedotov Sent: 27 July 2020 12:54:02 To: Frank Schilder; ceph-users Subject: Re: [ceph-users] mimic: much more raw used than reported Hi Frank, you might be being hit by https://tracker.ceph.com/issues/44213 In short the root causes are significant space overhead due to high bluestore allocation unit (64K) and EC overwrite design. This is fixed for upcoming Pacific release by using 4K alloc unit but it is unlikely to be backported to earlier releases due to its complexity. To say nothing about the need for OSD redeployment. Hence please expect no fix for mimic. And your raw usage reports might still be not that good since mimic lacks per-pool stats collection https://github.com/ceph/ceph/pull/19454. I.e. your actual raw space usage is higher than reported. To estimate proper raw usage one can use bluestore perf counters (namely bluestore_stored and bluestore_allocated). Summing bluestore_allocated over all involved OSDs will give actual RAW usage. Summing bluestore_stored will provide actual data volume after EC processing, i.e. presumably it should be around 158TiB. Thanks, Igor On 7/26/2020 8:43 PM, Frank Schilder wrote: Dear fellow cephers, I observe a wired problem on our mimic-13.2.8 cluster. We have an EC RBD pool backed by HDDs. These disks are not in any other pool. I noticed that the total capacity (=USED+MAX AVAIL) reported by "ceph df detail" has shrunk recently from 300TiB to 200TiB. Part but by no means all of this can be explained by imbalance of the data distribution. When I compare the output of "ceph df detail" and "ceph osd df tree", I find 69TiB raw capacity used but not accounted for; see calculations below. These 69TiB raw are equivalent to 20% usable capacity and I really need it back. Together with the imbalance, we loose about 30% capacity. What is using these extra 69TiB and how can I get it back? Some findings: These are the 5 largest images in the pool, accounting for a total of 97TiB out of 119TiB usage: # rbd du : NAMEPROVISIONED USED one-133 25 TiB 14 TiB NAMEPROVISIONEDUSED one-153@222 40 TiB 14 TiB one-153@228 40 TiB 357 GiB one-153@235 40 TiB 797 GiB one-153@241 40 TiB 509 GiB one-153@242 40 TiB 43 GiB one-153@243 40 TiB 16 MiB one-153@244 40 TiB 16 MiB one-153@245 40 TiB 324 MiB one-153@246 40 TiB 276 MiB one-153@247 40 TiB 96 MiB one-153@248 40 TiB 138 GiB one-153@249 40 TiB 1.8 GiB one-153@250 40 TiB 0 B one-153 40 TiB 204 MiB 40 TiB 16 TiB NAME PROVISIONEDUSED one-391@3 40 TiB 432 MiB one-391@9 40 TiB 26 GiB one-391@15 40 TiB 90 GiB one-391@16 40 TiB 0 B one-391@17 40 TiB 0 B one-391@18 40 TiB 0 B one-391@19 40 TiB 0 B one-391@20 40 TiB 3.5 TiB one-391@21 40 TiB 5.4 TiB one-391@22 40 TiB 5.8 TiB one-391@23 40 TiB 8.4 TiB one-391@24 40 TiB 1.4 TiB one-391 40 TiB 2.2 TiB 40 TiB 27 TiB NAME PROVISIONEDUSED one-394@3 70 TiB 1.4 TiB one-394@9 70 TiB 2.5 TiB one-394@15 70 TiB 20 GiB one-394@16 70 TiB 0 B one-394@17 70 TiB 0 B one-394@18 70 TiB 0 B one-394@19 70 TiB 383 GiB one-394@20 70 TiB 3.3 TiB one-394@21 70 TiB 5.0 TiB one-394@22 70 TiB 5.0 TiB one-394@23 70 TiB 9.0 TiB one-394@24 70 TiB 1.6 TiB one-394 70 TiB 2.5 TiB 70 TiB 31 TiB NAMEPROVISIONEDUSED one-434 25 TiB 9.1 TiB The large 70TiB images one-391 and one-394 are currently copied to with ca. 5TiB per day. Output of "ceph df detail" with some columns removed: NAME ID USED%USED MAX AVAIL OBJECTS RAW USED sr-rbd-data-one-hdd 11 119 TiB 58.45
[ceph-users] Re: please help me fix iSCSI Targets not available
Hi David, which ceph version are you using? From: David Thuong Sent: Wednesday, July 22, 2020 10:45 AM To: ceph-users@ceph.io Subject: [ceph-users] please help me fix iSCSI Targets not available iSCSI Targets not available Please consult the documentation on how to configure and enable the iSCSI Targets management functionality. Available information: There are no gateways defined any idea to enable it. tks so much ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mimic: much more raw used than reported
Hi Frank, you might be being hit by https://tracker.ceph.com/issues/44213 In short the root causes are significant space overhead due to high bluestore allocation unit (64K) and EC overwrite design. This is fixed for upcoming Pacific release by using 4K alloc unit but it is unlikely to be backported to earlier releases due to its complexity. To say nothing about the need for OSD redeployment. Hence please expect no fix for mimic. And your raw usage reports might still be not that good since mimic lacks per-pool stats collection https://github.com/ceph/ceph/pull/19454. I.e. your actual raw space usage is higher than reported. To estimate proper raw usage one can use bluestore perf counters (namely bluestore_stored and bluestore_allocated). Summing bluestore_allocated over all involved OSDs will give actual RAW usage. Summing bluestore_stored will provide actual data volume after EC processing, i.e. presumably it should be around 158TiB. Thanks, Igor On 7/26/2020 8:43 PM, Frank Schilder wrote: Dear fellow cephers, I observe a wired problem on our mimic-13.2.8 cluster. We have an EC RBD pool backed by HDDs. These disks are not in any other pool. I noticed that the total capacity (=USED+MAX AVAIL) reported by "ceph df detail" has shrunk recently from 300TiB to 200TiB. Part but by no means all of this can be explained by imbalance of the data distribution. When I compare the output of "ceph df detail" and "ceph osd df tree", I find 69TiB raw capacity used but not accounted for; see calculations below. These 69TiB raw are equivalent to 20% usable capacity and I really need it back. Together with the imbalance, we loose about 30% capacity. What is using these extra 69TiB and how can I get it back? Some findings: These are the 5 largest images in the pool, accounting for a total of 97TiB out of 119TiB usage: # rbd du : NAMEPROVISIONED USED one-133 25 TiB 14 TiB NAMEPROVISIONEDUSED one-153@222 40 TiB 14 TiB one-153@228 40 TiB 357 GiB one-153@235 40 TiB 797 GiB one-153@241 40 TiB 509 GiB one-153@242 40 TiB 43 GiB one-153@243 40 TiB 16 MiB one-153@244 40 TiB 16 MiB one-153@245 40 TiB 324 MiB one-153@246 40 TiB 276 MiB one-153@247 40 TiB 96 MiB one-153@248 40 TiB 138 GiB one-153@249 40 TiB 1.8 GiB one-153@250 40 TiB 0 B one-153 40 TiB 204 MiB 40 TiB 16 TiB NAME PROVISIONEDUSED one-391@3 40 TiB 432 MiB one-391@9 40 TiB 26 GiB one-391@15 40 TiB 90 GiB one-391@16 40 TiB 0 B one-391@17 40 TiB 0 B one-391@18 40 TiB 0 B one-391@19 40 TiB 0 B one-391@20 40 TiB 3.5 TiB one-391@21 40 TiB 5.4 TiB one-391@22 40 TiB 5.8 TiB one-391@23 40 TiB 8.4 TiB one-391@24 40 TiB 1.4 TiB one-391 40 TiB 2.2 TiB 40 TiB 27 TiB NAME PROVISIONEDUSED one-394@3 70 TiB 1.4 TiB one-394@9 70 TiB 2.5 TiB one-394@15 70 TiB 20 GiB one-394@16 70 TiB 0 B one-394@17 70 TiB 0 B one-394@18 70 TiB 0 B one-394@19 70 TiB 383 GiB one-394@20 70 TiB 3.3 TiB one-394@21 70 TiB 5.0 TiB one-394@22 70 TiB 5.0 TiB one-394@23 70 TiB 9.0 TiB one-394@24 70 TiB 1.6 TiB one-394 70 TiB 2.5 TiB 70 TiB 31 TiB NAMEPROVISIONEDUSED one-434 25 TiB 9.1 TiB The large 70TiB images one-391 and one-394 are currently copied to with ca. 5TiB per day. Output of "ceph df detail" with some columns removed: NAME ID USED%USED MAX AVAIL OBJECTS RAW USED sr-rbd-data-one-hdd 11 119 TiB 58.4584 TiB 31286554 158 TiB Pool is EC 6+2. USED is correct: 31286554*4MiB=119TiB. RAW USED is correct: 119*8/6=158TiB. Most of this data is freshly copied onto large RBD images. Compression is enabled on this pool (aggressive,snappy). However, when looking at "deph osd df tree", I get The combined raw capacity of OSDs backing this pool is 406.8TiB (sum over SIZE). Summing up column USE over all OSDs gives 227.5TiB. This gives a difference of 69TiB (=227-158) that is not accounted for. Here the output of "ceph osd df tree limited" to the drives backing the pool: ID CLASSWEIGHT REWEIGHT SIZEUSE DATAOMAPMETA AVAIL %USE VAR PGS TYPE NAME 84 hdd8.90999 1.0 8.9 TiB 5.0 TiB 5.0 TiB 180 MiB 16 GiB 3.9 TiB 56.43 1.72 103 osd.84 145 hdd8.90999 1.0 8.9 TiB 4.6 TiB 4.6 TiB 144 MiB 14 GiB 4.3 TiB 51.37 1.57 87 osd.145 156 hdd8.90999 1.0 8.9 TiB 5.2 TiB 5.1 TiB 173 MiB 16 GiB 3.8 TiB 57.91 1.77 100 osd.156 168 hdd8.90999 1.0 8.9 TiB 5.0 TiB 5.0 TiB 164 MiB 16 GiB 3.9 TiB 56.31 1.72 98 osd.168 181 hdd8.90999 1.0 8.9 TiB 5.5 TiB 5.4 TiB 121 MiB 17 GiB 3.5 TiB
[ceph-users] cache tier dirty status
Hello all, is there a way to interrogate a cache tier pool about the number of dirty objects/bytes that it contains? Thank you, Laszlo ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Reinitialize rgw garbage collector
Hi all, I have a question about the garbage collector within RGWs. We run Nautilus 14.2.8 and we have 32 garbage objects in the gc pool with totally 39 GB of garbage that needs to be processed. When we run, radosgw-admin gc process --include-all objects are processed but most of them won't be deleted. This can be checked by using --debug-rgw=5 in the command and stat the objects which are mentioned that they have been processed. Also the monitoring doesn't show that a huge amount of objects are deleted by the gc. So, I assume that it doesn't actually delete the objects. It might be due to a renewed time stamp? (not sure about this) Is there anybody who had similar issues with removing a large amount of garbage and is there a way to let the gc delete the objects? Most of the objects within the gc list are __multipart__ objects. Are they processed differently than single part objects? E.g. collect all the multiparts before the deletion actually happens or how is this implemented? The garbage is still increasing and the gc cannot process things what scares us a bit. Also, we cannot bypass the gc because the bucket is still in use. I also thought about reinitializing the GC in order to get an up to date list of garbage. (some entries show with `radosgw-admin gc list --include-all` are over a month old) Is there a way to make this happen and how save is it? I thought about exporting the omapobjects from the gc pool (as a backup) and delete the objects within the pool (or rename the pool). I appreciate any input and thank you in advance. Regards, Michael ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io