[ceph-users] Libvirt and Ceph: libvirtd tries to open random RBD images
Hello Users, We're using libvirt with KVM and the orchestrator is Cloudstack. I raised the issue already at Cloudstack at https://github.com/apache/cloudstack/issues/8211 but appears to be at libvirtd. Did the same in libvirt ML at https://lists.libvirt.org/archives/list/us...@lists.libvirt.org/thread/SA2I4QZGVVEIKPJU7E2KAFYYFZLJZDMV/ but I'm now here looking for answers. Below is our environment & issue description: Ceph: v17.2.0 Pool: replicated Number of block images in this pool: more than 1250 # virsh pool-info c15508c7-5c2c-317f-aa2e-29f307771415 Name: c15508c7-5c2c-317f-aa2e-29f307771415 UUID: c15508c7-5c2c-317f-aa2e-29f307771415 State: running Persistent: no Autostart: no Capacity: 1.25 PiB Allocation: 489.52 TiB Available: 787.36 TiB # kvm --version QEMU emulator version 4.2.1 (Debian 1:4.2-3ubuntu6.27) Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers # libvirtd --version libvirtd (libvirt) 6.0.0 It appears that one of our Cloudstack KVM clusters having 8 hosts is having the issue. We have HCI on these 8 hosts and there are around 700+ VMs running. But strange enough, there are these logs like below on hosts. Oct 25 13:38:11 hv-01 libvirtd[9464]: failed to open the RBD image '087bb114-448a-41d2-9f5d-6865b62eed15': No such file or directory Oct 25 20:35:22 hv-01 libvirtd[9464]: failed to open the RBD image 'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory Oct 26 09:48:33 hv-01 libvirtd[9464]: failed to open the RBD image 'a3fe82f8-afc9-4604-b55e-91b676514a18': No such file or directory We've got DNS servers on which there is an`A` record resolving to all the IPv4 Addresses of 5 monitors and there have not been any issues with the DNS resolution. But the issue of "failed to open the RBD image 'ccc1168a-5ffa-4b6d-a953-8e0ac788ebc5': No such file or directory" gets more weird because the VM that is making use of that RBD image lets say "087bb114-448a-41d2-9f5d-6865b62eed15" is running on an altogether different host like "hv-06". On further inspection of that specific Virtual Machine, it has been running on that host "hv-06" for more than 4 months or so. Fortunately, the Virtual Machine has no issues and has been running since then. There are absolutely no issues with any of the Virtual Machines because of these warnings. >From libvirtd mailing lists, one of the community members helped me understand that libvirt only tries to get the info of the images and doesn't open for reading or writing. All hosts where there is libvirtd tries doing the same. We manually did "virsh pool-refresh" which CloudStack itself takes care of at regular intervals and the warning messages still appear. Please help me find the cause and let me know if further information is needed. Thanks, Jayanth Reddy ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph 17.2.7 to 18.2.0 issues
Hi All, Recently I upgraded my cluster from Quincy to Reef. Everything appeared to go smoothly and without any issues arising. I was forced to poweroff the cluster, performing the ususal procedures beforehand and everything appears to have come back fine. Every service reports green across the board except If i try to copy any files from a cephfs mountpoint whether kernel or fuse the actual copy will hang. ls/stat etc all work which indicates metadata appears fine but copying always hangs. I can copy objects direct using the rados toolset which indicates the underlying data exists. The system itself reports no errors and thinks its healthy. The entire cluster and cephfs clients are all Rocky9. Any advice would be much appreciatd. I'd find this easier to deal with if the cluster actually gave me an error ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Compilation failure when building Ceph on Ubuntu
Hi, I'm trying to build a DEBUG version of Ceph Reef on a virtual Ubuntu-LTS 22.04 running on Lima by following the README on Ceph's github repo. The build failed and the last CMake error was ""g++-11: error: unrecognized command-line option '-Wimplicit-const-int-float-conversion'". Does anyone know what I can do to fix the compilation error? I could try different gcc versions, but I'd assume Ceph's build scripts would install and verify all the dependencies. Thanks, The system configuration is as follows: > uname -a Linux lima-ceph-dev 5.15.0-86-generic #96-Ubuntu SMP Wed Sep 20 08:23:49 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux > lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.3 LTS Release: 22.04 Codename: jammy I followed the instructions on the README in Ceph's github repo: https://github.com/ceph/ceph, and the command ./do_cmake.sh failed at step [137/2150] that builds frontend dashboard with the error message "ninja: build stopped: subcommand failed." The last error logged in the file CMakeError.log has to do with "g++-11: error: unrecognized command-line option '-Wimplicit-const-int-float-conversion'". Below the last error message on the CMakeError.log: Performing C++ SOURCE FILE Test COMPILER_SUPPORTS_WARN_IMPLICIT_CONST_INT_FLOAT_CONVERSION failed with the following output: Change Dir: /home/dyuan.linux/ceph/build/CMakeFiles/CMakeTmp Run Build Command(s):/usr/bin/ninja cmTC_bab6d && [1/2] Building CXX object CMakeFiles/cmTC_bab6d.dir/src.cxx.o FAILED: CMakeFiles/cmTC_bab6d.dir/src.cxx.o /usr/bin/g++-11 -DCOMPILER_SUPPORTS_WARN_IMPLICIT_CONST_INT_FLOAT_CONVERSION -fPIE -Wimplicit-const-int-float-conversion -std=c++20 -o CMakeFiles/cmTC_bab6d.dir/src.cxx.o -c /home/dyuan.linux/ceph/build/CMakeFiles/CMakeTmp/src.cxx g++-11: error: unrecognized command-line option '-Wimplicit-const-int-float-conversion' ninja: build stopped: subcommand failed. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Stray host/daemon
Found my previous post regarding this issue. Fixed by restarting mgr daemons. -jeremy > On Friday, Dec 01, 2023 at 3:04 AM, Me (mailto:jer...@skidrow.la)> wrote: > I think I ran in to this before but I forget the fix: > > HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm > [WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by > cephadm > stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03'] > > > Pacific 16.2.11 > > How do I clear this? > > Thanks > -jeremy > > > signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph osd dump_historic_ops
This small (Bash) wrapper around the "ceph daemon" command, especially the auto-completeion with the TAB key, ist quite helpful, IMHO: https://github.com/test-erik/ceph-daemon-wrapper Am Fr., 1. Dez. 2023 um 15:03 Uhr schrieb Phong Tran Thanh < tranphong...@gmail.com>: > It works!!! > > Thanks Kai Stian Olstad > > Vào Th 6, 1 thg 12, 2023 vào lúc 17:06 Kai Stian Olstad < > ceph+l...@olstad.com> đã viết: > > > On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote: > > >I have a problem with my osd, i want to show dump_historic_ops of osd > > >I follow the guide: > > > > > > https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops > > >But when i run command > > > > > >ceph daemon osd.8 dump_historic_ops show the error, the command run on > > node > > >with osd.8 > > >Can't get admin socket path: unable to get conf option admin_socket for > > >osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid > > >types are: auth, mon, osd, mds, mgr, client\n" > > > > > >I am running ceph cluster reef version by cephadmin install > > > > > >What should I do? > > > > The easiest is use tell, then you can run it on any node that have access > > to ceph. > > > > ceph tell osd.8 dump_historic_ops > > > > > > ceph tell osd.8 help > > will give you all you can do with tell. > > > > -- > > Kai Stian Olstad > > > > > -- > Trân trọng, > > > > *Tran Thanh Phong* > > Email: tranphong...@gmail.com > Skype: tranphong079 > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: reef 18.2.1 QE Validation status
Hi Yuri, Looks like that's not THAT critical and complicated as it's been thought originally. User has to change bluefs_shared_alloc_size to be exposed to the issue. So hopefully I'll submit a patch on Monday to close this gap and we'll be able to proceed. Thanks, Igor On 01/12/2023 18:16, Yuri Weinstein wrote: Venky, pls review the test results for smoke and fs after the PRs were merged. Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618? Thx On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein wrote: The fs PRs: https://github.com/ceph/ceph/pull/54407 https://github.com/ceph/ceph/pull/54677 were approved/tested and ready for merge. What is the status/plan for https://tracker.ceph.com/issues/63618? On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov wrote: https://tracker.ceph.com/issues/63618 to be considered as a blocker for the next Reef release. On 07/11/2023 00:30, Yuri Weinstein wrote: Details of this release are summarized here: https://tracker.ceph.com/issues/63443#note-1 Seeking approvals/reviews for: smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures) rados - Neha, Radek, Travis, Ernesto, Adam King rgw - Casey fs - Venky orch - Adam King rbd - Ilya krbd - Ilya upgrade/quincy-x (reef) - Laura PTL powercycle - Brad perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures) Please reply to this email with approval and/or trackers of known issues/PRs to address them. TIA YuriW ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: About ceph osd slow ops
Given that this is s3, are the slow ops on index or data OSDs? (You mentioned HDD but I don't want to assume that meant that the osd you mentioned is data) Josh On Fri, Dec 1, 2023 at 7:05 AM VÔ VI wrote: > > Hi Stefan, > > I am running replicate x3 with a failure domain as host and setting > min_size pool is 1. Because my cluster s3 traffic real time and can't stop > or block IO, the data may be lost but IO alway available. I hope my cluster > can run with two nodes unavailable. > After that two nodes is down at the same time, and then nodes up, client IO > and recover running in the same time, and some disk warning is slowops, > what is the problem, may be my disk is overload, but the disk utilization > only 60 -80% > > Thanks Stefan > > Vào Th 6, 1 thg 12, 2023 vào lúc 16:40 Stefan Kooman đã > viết: > > > On 01-12-2023 08:45, VÔ VI wrote: > > > Hi community, > > > > > > My cluster running with 10 nodes and 2 nodes goes down, sometimes the log > > > shows the slow ops, what is the root cause? > > > My osd is HDD and block.db and wal is 500GB SSD per osd. > > > > > > Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10 > > > has slow ops (SLOW_OPS) > > > > Most likely you have a crush rule that spreads objects over hosts as a > > failure domain. For size=3, min_size=2 (default for replicated pools) > > you might end up in a situation where two of the nodes that are offline > > have PGs where min_size=2 requirement is not fulfilled, and will hence > > by inactive and slow ops will occur. > > > > When host is your failure domain, you should not reboot more than one at > > the same time. If the hosts are somehow organized (different racks, > > datacenters) you could make a higher level bucket and put your hosts > > there. And create a crush rule using that bucket type as failure domain, > > and have your pools use that. > > > > Gr. Stefan > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to identify the index pool real usage?
>> >> Today we had a big issue with slow ops on the nvme drives which holding >> the index pool. >> >> Why the nvme shows full if on ceph is barely utilized? Which one I should >> belive? >> >> When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme >> drive has 4x osds on it): Why split each device into 4 very small OSDs? You're losing a lot of capacity to overhead. >> >> ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META >> AVAIL%USE VAR PGS STATUS >> 195 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 656 MiB >> 400 GiB 10.47 0.21 64 up >> 252 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 845 MiB >> 401 GiB 10.35 0.21 64 up >> 253 nvme 0.43660 1.0 447 GiB 46 GiB 229 MiB 45 GiB 662 MiB >> 401 GiB 10.26 0.21 66 up >> 254 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.3 GiB >> 401 GiB 10.26 0.21 65 up >> 255 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 1.2 GiB >> 400 GiB 10.58 0.21 64 up >> 288 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.2 GiB >> 401 GiB 10.25 0.21 64 up >> 289 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 641 MiB >> 401 GiB 10.33 0.21 64 up >> 290 nvme 0.43660 1.0 447 GiB 45 GiB 229 MiB 44 GiB 668 MiB >> 402 GiB 10.14 0.21 65 up >> >> However in nvme list it says full: >> Node SN ModelNamespace Usage >>Format FW Rev >> >> --- - >> -- >> /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1 1.92 TB >> / 1.92 TB512 B + 0 B GPK6 >> /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1 1.92 TB >> / 1.92 TB512 B + 0 B GPK6 That command isn't telling you what you think it is. It has no awareness of actual data, it's looking at NVMe namespaces. >> >> With some other node the test was like: >> >> * if none of the disk full, no slow ops. >> * If 1x disk full and the other not, has slow ops but not too much >> * if none of the disk full, no slow ops. >> >> The full disks are very highly utilized during recovery and they are >> holding back the operations from the other nvmes. >> >> What's the reason that even if the pgs are the same in the cluster +/-1 >> regarding space they are not equally utilized. >> >> Thank you >> >> >> >> >> This message is confidential and is for the sole use of the intended >> recipient(s). It may also be privileged or otherwise protected by copyright >> or other legal rules. If you have received it by mistake please let us know >> by reply email and delete it from your system. It is prohibited to copy >> this message or disclose its content to anyone. Any confidentiality or >> privilege is not waived or lost by any mistaken delivery or unauthorized >> disclosure of the message. All messages sent to and from Agoda may be >> monitored to ensure compliance with company policies, to protect the >> company's interests and to remove potential malware. Electronic messages >> may be intercepted, amended, lost or deleted, or contain viruses. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to identify the index pool real usage?
Hi, It looks like a trim/discard problem. I would try my luck by activating the discard on a disk, to validate. I have no feedback on the reliability of the bdev_*_discard parameters. Maybe dig a little deeper into the subject or if anyone has any feedback... Cordialement, *David CASIER* Le ven. 1 déc. 2023 à 16:15, Szabo, Istvan (Agoda) a écrit : > Hi, > > Today we had a big issue with slow ops on the nvme drives which holding > the index pool. > > Why the nvme shows full if on ceph is barely utilized? Which one I should > belive? > > When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme > drive has 4x osds on it): > > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL%USE VAR PGS STATUS > 195 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 656 > MiB 400 GiB 10.47 0.21 64 up > 252 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 845 > MiB 401 GiB 10.35 0.21 64 up > 253 nvme 0.43660 1.0 447 GiB 46 GiB 229 MiB 45 GiB 662 > MiB 401 GiB 10.26 0.21 66 up > 254 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.3 > GiB 401 GiB 10.26 0.21 65 up > 255 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 1.2 > GiB 400 GiB 10.58 0.21 64 up > 288 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.2 > GiB 401 GiB 10.25 0.21 64 up > 289 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 641 > MiB 401 GiB 10.33 0.21 64 up > 290 nvme 0.43660 1.0 447 GiB 45 GiB 229 MiB 44 GiB 668 > MiB 402 GiB 10.14 0.21 65 up > > However in nvme list it says full: > Node SN Model > Namespace Usage Format FW Rev > > - > -- > /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 > 1 1.92 TB / 1.92 TB512 B + 0 B GPK6 > /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 > 1 1.92 TB / 1.92 TB512 B + 0 B GPK6 > > With some other node the test was like: > > * if none of the disk full, no slow ops. > * If 1x disk full and the other not, has slow ops but not too much > * if none of the disk full, no slow ops. > > The full disks are very highly utilized during recovery and they are > holding back the operations from the other nvmes. > > What's the reason that even if the pgs are the same in the cluster +/-1 > regarding space they are not equally utilized. > > Thank you > > > > > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy > this message or disclose its content to anyone. Any confidentiality or > privilege is not waived or lost by any mistaken delivery or unauthorized > disclosure of the message. All messages sent to and from Agoda may be > monitored to ensure compliance with company policies, to protect the > company's interests and to remove potential malware. Electronic messages > may be intercepted, amended, lost or deleted, or contain viruses. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: reef 18.2.1 QE Validation status
Venky, pls review the test results for smoke and fs after the PRs were merged. Radek, Igor, Adam - any updates on https://tracker.ceph.com/issues/63618? Thx On Thu, Nov 30, 2023 at 8:08 AM Yuri Weinstein wrote: > > The fs PRs: > https://github.com/ceph/ceph/pull/54407 > https://github.com/ceph/ceph/pull/54677 > were approved/tested and ready for merge. > > What is the status/plan for https://tracker.ceph.com/issues/63618? > > On Wed, Nov 29, 2023 at 10:51 AM Igor Fedotov wrote: > > > > https://tracker.ceph.com/issues/63618 to be considered as a blocker for > > the next Reef release. > > > > On 07/11/2023 00:30, Yuri Weinstein wrote: > > > Details of this release are summarized here: > > > > > > https://tracker.ceph.com/issues/63443#note-1 > > > > > > Seeking approvals/reviews for: > > > > > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures) > > > rados - Neha, Radek, Travis, Ernesto, Adam King > > > rgw - Casey > > > fs - Venky > > > orch - Adam King > > > rbd - Ilya > > > krbd - Ilya > > > upgrade/quincy-x (reef) - Laura PTL > > > powercycle - Brad > > > perf-basic - Laura, Prashant (POOL_APP_NOT_ENABLE failures) > > > > > > Please reply to this email with approval and/or trackers of known > > > issues/PRs to address them. > > > > > > TIA > > > YuriW > > > ___ > > > ceph-users mailing list -- ceph-users@ceph.io > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to identify the index pool real usage?
Hi, Today we had a big issue with slow ops on the nvme drives which holding the index pool. Why the nvme shows full if on ceph is barely utilized? Which one I should belive? When I check the ceph osd df it shows 10% usage of the osds (1x 2TB nvme drive has 4x osds on it): ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META AVAIL%USE VAR PGS STATUS 195 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 656 MiB 400 GiB 10.47 0.21 64 up 252 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 845 MiB 401 GiB 10.35 0.21 64 up 253 nvme 0.43660 1.0 447 GiB 46 GiB 229 MiB 45 GiB 662 MiB 401 GiB 10.26 0.21 66 up 254 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.3 GiB 401 GiB 10.26 0.21 65 up 255 nvme 0.43660 1.0 447 GiB 47 GiB 161 MiB 46 GiB 1.2 GiB 400 GiB 10.58 0.21 64 up 288 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 44 GiB 1.2 GiB 401 GiB 10.25 0.21 64 up 289 nvme 0.43660 1.0 447 GiB 46 GiB 161 MiB 45 GiB 641 MiB 401 GiB 10.33 0.21 64 up 290 nvme 0.43660 1.0 447 GiB 45 GiB 229 MiB 44 GiB 668 MiB 402 GiB 10.14 0.21 65 up However in nvme list it says full: Node SN Model Namespace Usage Format FW Rev - -- /dev/nvme0n1 90D0A00XTXTR KCD6XLUL1T92 1 1.92 TB / 1.92 TB512 B + 0 B GPK6 /dev/nvme1n1 60P0A003TXTR KCD6XLUL1T92 1 1.92 TB / 1.92 TB512 B + 0 B GPK6 With some other node the test was like: * if none of the disk full, no slow ops. * If 1x disk full and the other not, has slow ops but not too much * if none of the disk full, no slow ops. The full disks are very highly utilized during recovery and they are holding back the operations from the other nvmes. What's the reason that even if the pgs are the same in the cluster +/-1 regarding space they are not equally utilized. Thank you This message is confidential and is for the sole use of the intended recipient(s). It may also be privileged or otherwise protected by copyright or other legal rules. If you have received it by mistake please let us know by reply email and delete it from your system. It is prohibited to copy this message or disclose its content to anyone. Any confidentiality or privilege is not waived or lost by any mistaken delivery or unauthorized disclosure of the message. All messages sent to and from Agoda may be monitored to ensure compliance with company policies, to protect the company's interests and to remove potential malware. Electronic messages may be intercepted, amended, lost or deleted, or contain viruses. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Duplicated device IDs
Dear Ceph users, I am replacing some small disks on one of my hosts with bigger ones. I delete the OSD from the web UI, preserving the ID for replacement, then after the rebalancing is finished I change the disk and the cluster automatically re-creates the OSD with the same ID. Then I adjust the CRUSH weight. Everything works fine except for the handling of the device ID of some of the the new disks. As you can see below there are 5 IDs associated to 2 devices and 2 OSDs, while these are actually different disks since OSDs see different and corrects sizes. [ceph: root@bofur /]# ceph device ls-by-host romolo DEVICE DEV DAEMONS EXPECTED FAILURE AMCC_9650SE-16M_DISK_82723576349B5E000984 sdc osd.42 AMCC_9650SE-16M_DISK_83214021349B63000A50 sdd osd.56 AMCC_9650SE-16M_DISK_83450671349B680004B3 sdf osd.68 AMCC_9650SE-16M_DISK_83471183349B680021DA sde osd.65 AMCC_9650SE-16M_DISK_9QG58JCX349B59EE sdb osd.13 AMCC_9650SE-16M_DISK_AF248795608D6A16 sdq osd.62 AMCC_9650SE-16M_DISK_J0210858 sdi sdn osd.105 osd.20 AMCC_9650SE-16M_DISK_J0210926 sdg sdl osd.36 osd.5 AMCC_9650SE-16M_DISK_N0ECFHAL sdj sdo osd.25 osd.60 AMCC_9650SE-16M_DISK_N0R5P9WT sdk sdp osd.51 osd.70 AMCC_9650SE-16M_DISK_PBGDG6EE sdh sdm osd.45 osd.9 SanDisk_SSD_PLUS_21089P443002 sda mon.romolo I really don't understand what happened, if I did something wrong, or how to fix this. Any help is greatly appreciated. Nicola smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: About ceph osd slow ops
Hi Stefan, I am running replicate x3 with a failure domain as host and setting min_size pool is 1. Because my cluster s3 traffic real time and can't stop or block IO, the data may be lost but IO alway available. I hope my cluster can run with two nodes unavailable. After that two nodes is down at the same time, and then nodes up, client IO and recover running in the same time, and some disk warning is slowops, what is the problem, may be my disk is overload, but the disk utilization only 60 -80% Thanks Stefan Vào Th 6, 1 thg 12, 2023 vào lúc 16:40 Stefan Kooman đã viết: > On 01-12-2023 08:45, VÔ VI wrote: > > Hi community, > > > > My cluster running with 10 nodes and 2 nodes goes down, sometimes the log > > shows the slow ops, what is the root cause? > > My osd is HDD and block.db and wal is 500GB SSD per osd. > > > > Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10 > > has slow ops (SLOW_OPS) > > Most likely you have a crush rule that spreads objects over hosts as a > failure domain. For size=3, min_size=2 (default for replicated pools) > you might end up in a situation where two of the nodes that are offline > have PGs where min_size=2 requirement is not fulfilled, and will hence > by inactive and slow ops will occur. > > When host is your failure domain, you should not reboot more than one at > the same time. If the hosts are somehow organized (different racks, > datacenters) you could make a higher level bucket and put your hosts > there. And create a crush rule using that bucket type as failure domain, > and have your pools use that. > > Gr. Stefan > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph osd dump_historic_ops
It works!!! Thanks Kai Stian Olstad Vào Th 6, 1 thg 12, 2023 vào lúc 17:06 Kai Stian Olstad < ceph+l...@olstad.com> đã viết: > On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote: > >I have a problem with my osd, i want to show dump_historic_ops of osd > >I follow the guide: > > > https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops > >But when i run command > > > >ceph daemon osd.8 dump_historic_ops show the error, the command run on > node > >with osd.8 > >Can't get admin socket path: unable to get conf option admin_socket for > >osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid > >types are: auth, mon, osd, mds, mgr, client\n" > > > >I am running ceph cluster reef version by cephadmin install > > > >What should I do? > > The easiest is use tell, then you can run it on any node that have access > to ceph. > > ceph tell osd.8 dump_historic_ops > > > ceph tell osd.8 help > will give you all you can do with tell. > > -- > Kai Stian Olstad > -- Trân trọng, *Tran Thanh Phong* Email: tranphong...@gmail.com Skype: tranphong079 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph fs (meta) data inconsistent
Hi Xiubo, I uploaded a test script with session output showing the issue. When I look at your scripts, I can't see the stat-check on the second host anywhere. Hence, I don't really know what you are trying to compare. If you want me to run your test scripts on our system for comparison, please include the part executed on the second host explicitly in an ssh-command. Running your scripts alone in their current form will not reproduce the issue. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Xiubo Li Sent: Monday, November 27, 2023 3:59 AM To: Frank Schilder; Gregory Farnum Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent On 11/24/23 21:37, Frank Schilder wrote: > Hi Xiubo, > > thanks for the update. I will test your scripts in our system next week. > Something important: running both scripts on a single client will not produce > a difference. You need 2 clients. The inconsistency is between clients, not > on the same client. For example: Frank, Yeah, I did this with 2 different kclients. Thanks > Setup: host1 and host2 with a kclient mount to a cephfs under /mnt/kcephfs > > Test 1 > - on host1: execute shutil.copy2 > - execute ls -l /mnt/kcephfs/ on host1 and host2: same result > > Test 2 > - on host1: shutil.copy > - execute ls -l /mnt/kcephfs/ on host1 and host2: file size=0 on host 2 while > correct on host 1 > > Your scripts only show output of one host, but the inconsistency requires two > hosts for observation. The stat information is updated on host1, but not > synchronized to host2 in the second test. In case you can't reproduce that, I > will append results from our system to the case. > > Also it would be important to know the python and libc versions. We observe > this only for newer versions of both. > > Best regards, > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: Xiubo Li > Sent: Thursday, November 23, 2023 3:47 AM > To: Frank Schilder; Gregory Farnum > Cc: ceph-users@ceph.io > Subject: Re: [ceph-users] Re: ceph fs (meta) data inconsistent > > I just raised one tracker to follow this: > https://tracker.ceph.com/issues/63510 > > Thanks > > - Xiubo > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Stray host/daemon
I think I ran in to this before but I forget the fix: HEALTH_WARN 1 stray host(s) with 1 daemon(s) not managed by cephadm [WRN] CEPHADM_STRAY_HOST: 1 stray host(s) with 1 daemon(s) not managed by cephadm stray host cn06.ceph.fu.intra has 1 stray daemons: ['mon.cn03'] Pacific 16.2.11 How do I clear this? Thanks -jeremy signature.asc Description: PGP signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph osd dump_historic_ops
On Fri, Dec 01, 2023 at 04:33:20PM +0700, Phong Tran Thanh wrote: I have a problem with my osd, i want to show dump_historic_ops of osd I follow the guide: https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops But when i run command ceph daemon osd.8 dump_historic_ops show the error, the command run on node with osd.8 Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n" I am running ceph cluster reef version by cephadmin install What should I do? The easiest is use tell, then you can run it on any node that have access to ceph. ceph tell osd.8 dump_historic_ops ceph tell osd.8 help will give you all you can do with tell. -- Kai Stian Olstad ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph osd dump_historic_ops
On 12/1/23 10:33, Phong Tran Thanh wrote: ceph daemon osd.8 dump_historic_ops show the error, the command run on node with osd.8 Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n" I am running ceph cluster reef version by cephadmin install When the daemons run in containers managed by the cephadm orchestrator the socket file has a different location and the command line tool ceph (run outisde the container) does not find it automatically. You can run # ceph daemon /var/run/ceph/$FSID/ceph-osd.$OSDID.asok dump_historic_ops to use the socket outside the container. Or you enter the container with # cephadm enter --name osd.$OSDID and then execute # ceph daemon osd.$OSDID dump_historic_ops inside the container. $FSID is the UUID of the Ceph cluster, $OSDID is the OSD id. Regards -- Robert Sander Heinlein Consulting GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 Amtsgericht Berlin-Charlottenburg - HRB 220009 B Geschäftsführer: Peer Heinlein - Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: About ceph osd slow ops
On 01-12-2023 08:45, VÔ VI wrote: Hi community, My cluster running with 10 nodes and 2 nodes goes down, sometimes the log shows the slow ops, what is the root cause? My osd is HDD and block.db and wal is 500GB SSD per osd. Health check update: 13 slow ops, oldest one blocked for 167 sec, osd.10 has slow ops (SLOW_OPS) Most likely you have a crush rule that spreads objects over hosts as a failure domain. For size=3, min_size=2 (default for replicated pools) you might end up in a situation where two of the nodes that are offline have PGs where min_size=2 requirement is not fulfilled, and will hence by inactive and slow ops will occur. When host is your failure domain, you should not reboot more than one at the same time. If the hosts are somehow organized (different racks, datacenters) you could make a higher level bucket and put your hosts there. And create a crush rule using that bucket type as failure domain, and have your pools use that. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph osd dump_historic_ops
Hi community, I have a problem with my osd, i want to show dump_historic_ops of osd I follow the guide: https://www.ibm.com/docs/en/storage-fusion/2.6?topic=alerts-cephosdslowops But when i run command ceph daemon osd.8 dump_historic_ops show the error, the command run on node with osd.8 Can't get admin socket path: unable to get conf option admin_socket for osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid types are: auth, mon, osd, mds, mgr, client\n" I am running ceph cluster reef version by cephadmin install What should I do? Thank you. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io