[ceph-users] Re: Ceph image delete error - NetHandler create_socket couldnt create socket
Hi Konstantin, Thank you for the reply. I tried setting ulimit to 32768 when I saw 25726 number in lsof output and then after 2 disks deletion again it got an error and checked lsof and which is above 35000. I'm not sure how to handle it. I rebooted the monitor node, but the open files kept growing. root@ceph-mon01 ~# lsof | wc -l 49296 root@ceph-mon01 ~# Thanks, Pardh On Thu, Apr 18, 2024 at 11:36 PM Konstantin Shalygin wrote: > Hi, > > Your shell seems reached the default file discriptors limit (1024 mostly) > and your cluster maybe more than 1000 OSD > > Try to set command `ulimit -n 10240` before rbd rm task > > > k > Sent from my iPhone > > > On 18 Apr 2024, at 23:50, Pardhiv Karri wrote: > > > > Hi, > > > > Trying to delete images in a Ceph pool is causing errors in one of > > the clusters. I rebooted all the monitor nodes sequentially to see if the > > error went away, but it still persists. What is the best way to fix this? > > The Ceph cluster is in an OK state, with no rebalancing or scrubbing > > happening (I did set the noscrub and deep-noscrub flags) and also no load > > on the cluster, very few IO. > > > > root@ceph-mon01 ~# rbd rm > 000dca3d-4f2b-4033-b8f5-95458e0c3444_disk_delete > > -p compute > > Removing image: 31% complete...2024-04-18 20:42:52.525135 7f6de0c79700 -1 > > NetHandler create_socket couldn't create socket (24) Too many open files > > Removing image: 32% complete...2024-04-18 20:42:52.539882 7f6de9c7b700 -1 > > NetHandler create_socket couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.541508 7f6de947a700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.546613 7f6de0c79700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.558133 7f6de9c7b700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.573819 7f6de947a700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.589733 7f6de0c79700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > Removing image: 33% complete...2024-04-18 20:42:52.643489 7f6de9c7b700 -1 > > NetHandler create_socket couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.727262 7f6de0c79700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.737135 7f6de9c7b700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.743292 7f6de947a700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.746167 7f6de0c79700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.757404 7f6de9c7b700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > Removing image: 34% complete...2024-04-18 20:42:52.773182 7f6de947a700 -1 > > NetHandler create_socket couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.773222 7f6de947a700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.789847 7f6de0c79700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > 2024-04-18 20:42:52.844201 7f6de9c7b700 -1 NetHandler create_socket > > couldn't create socket (24) Too many open files > > > > ^C > > root@ceph-mon01 ~# > > > > > > Thanks, > > Pardh > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph image delete error - NetHandler create_socket couldnt create socket
Hi, Trying to delete images in a Ceph pool is causing errors in one of the clusters. I rebooted all the monitor nodes sequentially to see if the error went away, but it still persists. What is the best way to fix this? The Ceph cluster is in an OK state, with no rebalancing or scrubbing happening (I did set the noscrub and deep-noscrub flags) and also no load on the cluster, very few IO. root@ceph-mon01 ~# rbd rm 000dca3d-4f2b-4033-b8f5-95458e0c3444_disk_delete -p compute Removing image: 31% complete...2024-04-18 20:42:52.525135 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files Removing image: 32% complete...2024-04-18 20:42:52.539882 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.541508 7f6de947a700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.546613 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.558133 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.573819 7f6de947a700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.589733 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files Removing image: 33% complete...2024-04-18 20:42:52.643489 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.727262 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.737135 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.743292 7f6de947a700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.746167 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.757404 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files Removing image: 34% complete...2024-04-18 20:42:52.773182 7f6de947a700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.773222 7f6de947a700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.789847 7f6de0c79700 -1 NetHandler create_socket couldn't create socket (24) Too many open files 2024-04-18 20:42:52.844201 7f6de9c7b700 -1 NetHandler create_socket couldn't create socket (24) Too many open files ^C root@ceph-mon01 ~# Thanks, Pardh ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph - Error ERANGE: (34) Numerical result out of range
Hi, Trying to move a node/host under a new SSD root and getting below error. Has anyone seen it and know the fix? the pg_num and pgp_num are same for all pools so that is not the issue. [root@hbmon1 ~]# ceph osd crush move hbssdhost1 root=ssd Error ERANGE: (34) Numerical result out of range [root@hbmon1 ~]# Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] init unable to update_crush_location: (34) Numerical result out of range
Hi, Getting an error while adding a new node/OSD with bluestore OSDs to the cluster. The OSD is added without any host and is down, tried to bring it up didn't work. The same method to add in other clusters doesn't have any issue. Any idea what the problem is? Ceph Version: ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable) Ceph Health: OK 2023-10-25 20:40:40.867878 7f1f478cde40 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1698266440867866, "job": 1, "event": "recovery_started", "log_files": [270]} 2023-10-25 20:40:40.867883 7f1f478cde40 4 rocksdb: [/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/db_impl_open.cc:482] Recovering log #270 mode 0 2023-10-25 20:40:40.867904 7f1f478cde40 4 rocksdb: [/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/version_set.cc:2395] Creating manifest 272 2023-10-25 20:40:40.869553 7f1f478cde40 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1698266440869548, "job": 1, "event": "recovery_finished"} 2023-10-25 20:40:40.870924 7f1f478cde40 4 rocksdb: [/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/db_impl_open.cc:1063] DB pointer 0x55c9061ba000 2023-10-25 20:40:40.870964 7f1f478cde40 1 bluestore(/var/lib/ceph/osd/ceph-721) _open_db opened rocksdb path db options compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152 2023-10-25 20:40:40.871234 7f1f478cde40 1 freelist init 2023-10-25 20:40:40.871293 7f1f478cde40 1 bluestore(/var/lib/ceph/osd/ceph-721) _open_alloc opening allocation metadata 2023-10-25 20:40:40.871314 7f1f478cde40 1 bluestore(/var/lib/ceph/osd/ceph-721) _open_alloc loaded 3.49TiB in 1 extents 2023-10-25 20:40:40.874700 7f1f478cde40 0 /build/ceph-U0cfoi/ceph-12.2.11/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs 2023-10-25 20:40:40.874721 7f1f478cde40 0 _get_class not permitted to load sdk 2023-10-25 20:40:40.874955 7f1f478cde40 0 _get_class not permitted to load kvs 2023-10-25 20:40:40.875638 7f1f478cde40 0 _get_class not permitted to load lua 2023-10-25 20:40:40.875724 7f1f478cde40 0 /build/ceph-U0cfoi/ceph-12.2.11/src/cls/hello/cls_hello.cc:296: loading cls_hello 2023-10-25 20:40:40.875776 7f1f478cde40 0 osd.721 0 crush map has features 288232575208783872, adjusting msgr requires for clients 2023-10-25 20:40:40.875780 7f1f478cde40 0 osd.721 0 crush map has features 288232575208783872 was 8705, adjusting msgr requires for mons 2023-10-25 20:40:40.875784 7f1f478cde40 0 osd.721 0 crush map has features 288232575208783872, adjusting msgr requires for osds 2023-10-25 20:40:40.875837 7f1f478cde40 0 osd.721 0 load_pgs 2023-10-25 20:40:40.875840 7f1f478cde40 0 osd.721 0 load_pgs opened 0 pgs 2023-10-25 20:40:40.875844 7f1f478cde40 0 osd.721 0 using weightedpriority op queue with priority op cut off at 64. 2023-10-25 20:40:40.877401 7f1f478cde40 -1 osd.721 0 log_to_monitors {default=true} 2023-10-25 20:40:40.888408 7f1f478cde40 -1 osd.721 0 mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34) Numerical result out of range 2023-10-25 20:40:40.891367 7f1f478cde40 -1 osd.721 0 mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34) Numerical result out of range 2023-10-25 20:40:40.891409 7f1f478cde40 -1 osd.721 0 init unable to update_crush_location: (34) Numerical result out of range Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Copying and renaming pools
Hi, Our Ceph is used as backend storage for Openstack. We use the "images" pool for glance and the "compute" pool for instances. We need to migrate our images pool which is on HDD drives to SSD drives. I copied all the data from the "images" pool that is on HDD disks to an "ssdimages" pool that is on SSD disks, made sure the crush rules are all good. I used "rbd deep copy" to migrate all the objects. Then I renamed the pools, "images" to "hddimages" and "ssdimages" to "images". Our Openstack instances are on the "compute" pool. All the instances that are created using the image show the parent as an image from the images pool. I thought renaming would point to the new pool that is on SSD disks with renamed as "images" but now interestingly all the instances rbd info are now pointing to the parent "hddimages". How to make sure the parent pointers stay as "images" only instead of modifying to "hddimages"? Before renaming pools: lab [root@ctl01 /]# rbd info compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk': size 100GiB in 12800 objects order 23 (8MiB objects) block_name_prefix: rbd_data.8f51c347398c89 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Mar 15 21:36:55 2022 parent: images/909e6734-6f84-466a-b2fa-487b73a1f50a@snap overlap: 10GiB lab [root@ctl01 /]# After renaming pools, the parent value autoamitclaly gets modified: lab [root@ctl01 /]# rbd info compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk': size 100GiB in 12800 objects order 23 (8MiB objects) block_name_prefix: rbd_data.8f51c347398c89 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten flags: create_timestamp: Tue Mar 15 21:36:55 2022 parent: hddimages/909e6734-6f84-466a-b2fa-487b73a1f50a@snap overlap: 10GiB lab [root@ctl01 /]# Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Luminous to Pacific Upgrade with Filestore OSDs
Ok, thanks! --Pardhiv On Fri, Jun 10, 2022 at 2:46 AM Eneko Lacunza wrote: > Hi Pardhiv, > > I don't recall anything unusual, just follow upgrade procedures outlined > in each release. > > Cheers > > El 9/6/22 a las 20:08, Pardhiv Karri escribió: > > Awesome, thank you, Eneko! > > Would you mind sharing the upgrade run book, if you have one? Want to > avoid reinventing the wheel as there will b some caveats while uprading and > they don't usually be present in official Ceph upgrade docs. > > Thanks, > Pardhiv > > On Thu, Jun 9, 2022 at 12:40 AM Eneko Lacunza wrote: > >> Hi Pardhiv, >> >> We have a running production Pacific cluster with some filestore OSDs >> (and other Bluestore OSD too). This cluster was installed "some" years ago >> with Firefly... :) >> >> No issues related to filestore so far. >> >> Cheers >> >> El 8/6/22 a las 21:32, Pardhiv Karri escribió: >> >> Hi, >> >> We are planning to upgrade our current Ceph from Luminous (12.2.11) to >> Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it >> okay to upgrade with filestore OSDs? We plan to migrate from filestore to >> Bluestore at a later date as the clusters are pretty large in PBs size and >> understand that any new or failed OSDs will have to be added as Bluestore >> OSDs only post-upgrade. Will that work? >> >> Thanks, >> Pardhiv >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> >> >> Eneko Lacunza >> Zuzendari teknikoa | Director técnico >> Binovo IT Human Project >> >> Tel. +34 943 569 206 | https://www.binovo.esAstigarragako Bidea, 2 - 2º >> <https://www.google.com/maps/search/Astigarragako+Bidea,+2+-+2%C2%BA?entry=gmail=g> >> izda. Oficina 10-11, 20180 Oiartzun >> https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/ >> >> Eneko Lacunza > > Director Técnico | Zuzendari teknikoa > > Binovo IT Human Project > 943 569 206 > elacu...@binovo.es > binovo.es > Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun > [image: youtube] <https://www.youtube.com/user/CANALBINOVO/> > [image: linkedin] <https://www.linkedin.com/company/37269706/> > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph pool set min_write_recency_for_promote not working
Hi, I created a new pool called "ssdimages," which is similar to another pool called "images" (a very old one). But when I try to set min_write_recency_for_promote to 1, it fails with permission denied. Do you know how I can fix it? ceph-lab # ceph osd dump | grep -E 'images|ssdimages' pool 3 'images' replicated size 3 min_size 1 crush_rule 0 object_hash rjenkins pg_num 512 pgp_num 512 last_change 74894 flags hashpspool min_write_recency_for_promote 1 stripe_width 0 application rbd pool 25 'ssdimages' replicated size 3 min_size 1 crush_rule 1 object_hash rjenkins pg_num 512 pgp_num 512 last_change 78217 flags hashpspool stripe_width 0 application rbd ceph-lab # ceph-lab # ceph osd pool set ssdimages min_write_recency_for_promote 1 Error EACCES: (13) Permission denied ceph-lab # Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Luminous to Pacific Upgrade with Filestore OSDs
Awesome, thank you, Eneko! Would you mind sharing the upgrade run book, if you have one? Want to avoid reinventing the wheel as there will b some caveats while uprading and they don't usually be present in official Ceph upgrade docs. Thanks, Pardhiv On Thu, Jun 9, 2022 at 12:40 AM Eneko Lacunza wrote: > Hi Pardhiv, > > We have a running production Pacific cluster with some filestore OSDs (and > other Bluestore OSD too). This cluster was installed "some" years ago with > Firefly... :) > > No issues related to filestore so far. > > Cheers > > El 8/6/22 a las 21:32, Pardhiv Karri escribió: > > Hi, > > We are planning to upgrade our current Ceph from Luminous (12.2.11) to > Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it > okay to upgrade with filestore OSDs? We plan to migrate from filestore to > Bluestore at a later date as the clusters are pretty large in PBs size and > understand that any new or failed OSDs will have to be added as Bluestore > OSDs only post-upgrade. Will that work? > > Thanks, > Pardhiv > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > > > Eneko Lacunza > Zuzendari teknikoa | Director técnico > Binovo IT Human Project > > Tel. +34 943 569 206 | https://www.binovo.esAstigarragako Bidea, 2 - 2º > <https://www.google.com/maps/search/Astigarragako+Bidea,+2+-+2%C2%BA?entry=gmail=g> > izda. Oficina 10-11, 20180 Oiartzun > https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/ > > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Luminous to Pacific Upgrade with Filestore OSDs
Hi, We are planning to upgrade our current Ceph from Luminous (12.2.11) to Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it okay to upgrade with filestore OSDs? We plan to migrate from filestore to Bluestore at a later date as the clusters are pretty large in PBs size and understand that any new or failed OSDs will have to be added as Bluestore OSDs only post-upgrade. Will that work? Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph RBD pool copy?
Hi, We have a ceph cluster with integration to Openstack. We are thinking about migrating the glance (images) pool to a new pool with better SSD disks. I see there is a "rados cppool" command. Will that work with snapshots in this rbd pool? -- *Pardhiv* ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Unable to login to Ceph Pacific Dashboard
Hi, I installed Ceph Pacific one Monitor node using cephadm tool. The output of installation gave me the credentials. When I go to a browser (different from ceph server) I see the login screen and when I enter the credentials the browser loads to the same page, in that fraction of a second I see it asking me to enter a new password, so I went into cli and changed the password and now trying to login with the new password but still gets stuck at the login screen. I opened ports 8443 and 8080. Tried creating another user with credentials and still no luck. What am I missing? Thanks, Pardhiv ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unable to track different ceph client version connections
This command is awesome, thank you! --Pardhiv On Fri, Jan 24, 2020 at 1:55 AM Konstantin Shalygin wrote: > We upgraded our Ceph cluster from Hammer to Luminous and it is running > fine. Post upgrade we live migrated all our Openstack instances (not 100% > sure). Currently we see 1658 clients still on Hammer version. To track the > clients we increased the debugging of debug_mon=10/10, debug_ms=1/5, > debug_monc=5/20 on all three monitors and looking at all three monitor logs > at /var/log/ceph/mon..log and grepping for hammer and 0x81dff8eeacfffb but > not seeing anything in logs even after hours of waiting. > > Earlier in our other clusters it used to show in logs from which Openstack > compute node it is originating from. Am I missing something or do I need > to add more logging or need to check a different log on all three ceph > monitor nodes? > > Look your clients at mon sessions: > > `ceph daemon /var/run/ceph/ceph-mon.ceph-mon0.asok sessions | grep hammer > | awk '{print $2}'` > > > > k > -- *Pardhiv Karri* "Rise and Rise again until LAMBS become LIONS" ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io