[ceph-users] Re: Ceph image delete error - NetHandler create_socket couldnt create socket

2024-04-19 Thread Pardhiv Karri
Hi Konstantin,
Thank you for the reply. I tried setting ulimit to 32768 when I saw 25726
number in lsof output and then after 2 disks deletion again it got an error
and checked lsof and which is above 35000.  I'm not sure how to handle it.
I rebooted the monitor node, but the open files kept growing.

root@ceph-mon01 ~# lsof | wc -l
49296
root@ceph-mon01 ~#

Thanks,
Pardh

On Thu, Apr 18, 2024 at 11:36 PM Konstantin Shalygin  wrote:

> Hi,
>
> Your shell seems reached the default file discriptors limit (1024 mostly)
> and your cluster maybe more than 1000 OSD
>
> Try to set command `ulimit -n 10240` before rbd rm task
>
>
> k
> Sent from my iPhone
>
> > On 18 Apr 2024, at 23:50, Pardhiv Karri  wrote:
> >
> > Hi,
> >
> > Trying to delete images in a Ceph pool is causing errors in one of
> > the clusters. I rebooted all the monitor nodes sequentially to see if the
> > error went away, but it still persists. What is the best way to fix this?
> > The Ceph cluster is in an OK state, with no rebalancing or scrubbing
> > happening (I did set the noscrub and deep-noscrub flags) and also no load
> > on the cluster, very few IO.
> >
> > root@ceph-mon01 ~# rbd rm
> 000dca3d-4f2b-4033-b8f5-95458e0c3444_disk_delete
> > -p compute
> > Removing image: 31% complete...2024-04-18 20:42:52.525135 7f6de0c79700 -1
> > NetHandler create_socket couldn't create socket (24) Too many open files
> > Removing image: 32% complete...2024-04-18 20:42:52.539882 7f6de9c7b700 -1
> > NetHandler create_socket couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.541508 7f6de947a700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.546613 7f6de0c79700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.558133 7f6de9c7b700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.573819 7f6de947a700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.589733 7f6de0c79700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > Removing image: 33% complete...2024-04-18 20:42:52.643489 7f6de9c7b700 -1
> > NetHandler create_socket couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.727262 7f6de0c79700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.737135 7f6de9c7b700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.743292 7f6de947a700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.746167 7f6de0c79700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.757404 7f6de9c7b700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > Removing image: 34% complete...2024-04-18 20:42:52.773182 7f6de947a700 -1
> > NetHandler create_socket couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.773222 7f6de947a700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.789847 7f6de0c79700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> > 2024-04-18 20:42:52.844201 7f6de9c7b700 -1 NetHandler create_socket
> > couldn't create socket (24) Too many open files
> >
> > ^C
> > root@ceph-mon01 ~#
> >
> >
> > Thanks,
> > Pardh
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph image delete error - NetHandler create_socket couldnt create socket

2024-04-18 Thread Pardhiv Karri
Hi,

Trying to delete images in a Ceph pool is causing errors in one of
the clusters. I rebooted all the monitor nodes sequentially to see if the
error went away, but it still persists. What is the best way to fix this?
The Ceph cluster is in an OK state, with no rebalancing or scrubbing
happening (I did set the noscrub and deep-noscrub flags) and also no load
on the cluster, very few IO.

root@ceph-mon01 ~# rbd rm 000dca3d-4f2b-4033-b8f5-95458e0c3444_disk_delete
-p compute
Removing image: 31% complete...2024-04-18 20:42:52.525135 7f6de0c79700 -1
NetHandler create_socket couldn't create socket (24) Too many open files
Removing image: 32% complete...2024-04-18 20:42:52.539882 7f6de9c7b700 -1
NetHandler create_socket couldn't create socket (24) Too many open files
2024-04-18 20:42:52.541508 7f6de947a700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.546613 7f6de0c79700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.558133 7f6de9c7b700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.573819 7f6de947a700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.589733 7f6de0c79700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
Removing image: 33% complete...2024-04-18 20:42:52.643489 7f6de9c7b700 -1
NetHandler create_socket couldn't create socket (24) Too many open files
2024-04-18 20:42:52.727262 7f6de0c79700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.737135 7f6de9c7b700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.743292 7f6de947a700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.746167 7f6de0c79700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.757404 7f6de9c7b700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
Removing image: 34% complete...2024-04-18 20:42:52.773182 7f6de947a700 -1
NetHandler create_socket couldn't create socket (24) Too many open files
2024-04-18 20:42:52.773222 7f6de947a700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.789847 7f6de0c79700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files
2024-04-18 20:42:52.844201 7f6de9c7b700 -1 NetHandler create_socket
couldn't create socket (24) Too many open files

^C
root@ceph-mon01 ~#


Thanks,
Pardh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph - Error ERANGE: (34) Numerical result out of range

2023-10-26 Thread Pardhiv Karri
Hi,
Trying to move a node/host under a new SSD root and getting below error.
Has anyone seen it and know the fix? the pg_num and pgp_num are same for
all pools so that is not the issue.

 [root@hbmon1 ~]# ceph osd crush move hbssdhost1 root=ssd
Error ERANGE: (34) Numerical result out of range
 [root@hbmon1 ~]#

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] init unable to update_crush_location: (34) Numerical result out of range

2023-10-25 Thread Pardhiv Karri
Hi,

Getting an error while adding a new node/OSD with bluestore OSDs to the
cluster. The OSD is added without any host and is down, tried to bring it
up didn't work. The same method to add in other clusters doesn't have any
issue. Any idea what the problem is?

Ceph Version: ceph version 12.2.11
(26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable)
Ceph Health: OK

2023-10-25 20:40:40.867878 7f1f478cde40  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1698266440867866, "job": 1, "event": "recovery_started",
"log_files": [270]}
2023-10-25 20:40:40.867883 7f1f478cde40  4 rocksdb:
[/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/db_impl_open.cc:482]
Recovering log #270 mode 0
2023-10-25 20:40:40.867904 7f1f478cde40  4 rocksdb:
[/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/version_set.cc:2395]
Creating manifest 272

2023-10-25 20:40:40.869553 7f1f478cde40  4 rocksdb: EVENT_LOG_v1
{"time_micros": 1698266440869548, "job": 1, "event": "recovery_finished"}
2023-10-25 20:40:40.870924 7f1f478cde40  4 rocksdb:
[/build/ceph-U0cfoi/ceph-12.2.11/src/rocksdb/db/db_impl_open.cc:1063] DB
pointer 0x55c9061ba000
2023-10-25 20:40:40.870964 7f1f478cde40  1
bluestore(/var/lib/ceph/osd/ceph-721) _open_db opened rocksdb path db
options
compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152
2023-10-25 20:40:40.871234 7f1f478cde40  1 freelist init
2023-10-25 20:40:40.871293 7f1f478cde40  1
bluestore(/var/lib/ceph/osd/ceph-721) _open_alloc opening allocation
metadata
2023-10-25 20:40:40.871314 7f1f478cde40  1
bluestore(/var/lib/ceph/osd/ceph-721) _open_alloc loaded 3.49TiB in 1
extents
2023-10-25 20:40:40.874700 7f1f478cde40  0 
/build/ceph-U0cfoi/ceph-12.2.11/src/cls/cephfs/cls_cephfs.cc:197: loading
cephfs
2023-10-25 20:40:40.874721 7f1f478cde40  0 _get_class not permitted to load
sdk
2023-10-25 20:40:40.874955 7f1f478cde40  0 _get_class not permitted to load
kvs
2023-10-25 20:40:40.875638 7f1f478cde40  0 _get_class not permitted to load
lua
2023-10-25 20:40:40.875724 7f1f478cde40  0 
/build/ceph-U0cfoi/ceph-12.2.11/src/cls/hello/cls_hello.cc:296: loading
cls_hello
2023-10-25 20:40:40.875776 7f1f478cde40  0 osd.721 0 crush map has features
288232575208783872, adjusting msgr requires for clients
2023-10-25 20:40:40.875780 7f1f478cde40  0 osd.721 0 crush map has features
288232575208783872 was 8705, adjusting msgr requires for mons
2023-10-25 20:40:40.875784 7f1f478cde40  0 osd.721 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2023-10-25 20:40:40.875837 7f1f478cde40  0 osd.721 0 load_pgs
2023-10-25 20:40:40.875840 7f1f478cde40  0 osd.721 0 load_pgs opened 0 pgs
2023-10-25 20:40:40.875844 7f1f478cde40  0 osd.721 0 using weightedpriority
op queue with priority op cut off at 64.
2023-10-25 20:40:40.877401 7f1f478cde40 -1 osd.721 0 log_to_monitors
{default=true}
2023-10-25 20:40:40.888408 7f1f478cde40 -1 osd.721 0
mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34)
Numerical result out of range
2023-10-25 20:40:40.891367 7f1f478cde40 -1 osd.721 0
mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34)
Numerical result out of range
2023-10-25 20:40:40.891409 7f1f478cde40 -1 osd.721 0 init unable to
update_crush_location: (34) Numerical result out of range

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Copying and renaming pools

2022-06-13 Thread Pardhiv Karri
Hi,

Our Ceph is used as backend storage for Openstack. We use the "images" pool
for glance and the "compute" pool for instances. We need to migrate our
images pool which is on HDD drives to SSD drives.

I copied all the data from the "images" pool that is on HDD disks to an
"ssdimages" pool that is on SSD disks, made sure the crush rules are all
good. I used "rbd deep copy" to migrate all the objects. Then I renamed the
pools, "images" to "hddimages" and "ssdimages" to "images".

Our Openstack instances are on the "compute" pool. All the instances that
are created using the image show the parent as an image from the images
pool. I thought renaming would point to the new pool that is on SSD disks
with renamed as "images" but now interestingly all the instances rbd
info are now pointing to the parent "hddimages". How to make sure the
parent pointers stay as "images" only instead of modifying to "hddimages"?

Before renaming pools:

lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: images/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#



After renaming pools, the parent value autoamitclaly gets modified:
lab [root@ctl01 /]# rbd info
compute/e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk
rbd image 'e669fe16-dd2a-4a17-a2c3-c7f5428d781f_disk':
size 100GiB in 12800 objects
order 23 (8MiB objects)
block_name_prefix: rbd_data.8f51c347398c89
format: 2
features: layering, exclusive-lock, object-map, fast-diff, deep-flatten
flags:
create_timestamp: Tue Mar 15 21:36:55 2022
parent: hddimages/909e6734-6f84-466a-b2fa-487b73a1f50a@snap
overlap: 10GiB
lab [root@ctl01 /]#


Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous to Pacific Upgrade with Filestore OSDs

2022-06-10 Thread Pardhiv Karri
Ok, thanks!

--Pardhiv


On Fri, Jun 10, 2022 at 2:46 AM Eneko Lacunza  wrote:

> Hi Pardhiv,
>
> I don't recall anything unusual, just follow upgrade procedures outlined
> in each release.
>
> Cheers
>
> El 9/6/22 a las 20:08, Pardhiv Karri escribió:
>
> Awesome, thank you, Eneko!
>
> Would you mind sharing the upgrade run book, if you have one? Want to
> avoid reinventing the wheel as there will b some caveats while uprading and
> they don't usually be present in official Ceph upgrade docs.
>
> Thanks,
> Pardhiv
>
> On Thu, Jun 9, 2022 at 12:40 AM Eneko Lacunza  wrote:
>
>> Hi Pardhiv,
>>
>> We have a running production Pacific cluster with some filestore OSDs
>> (and other Bluestore OSD too). This cluster was installed "some" years ago
>> with Firefly... :)
>>
>> No issues related to filestore so far.
>>
>> Cheers
>>
>> El 8/6/22 a las 21:32, Pardhiv Karri escribió:
>>
>> Hi,
>>
>> We are planning to upgrade our current Ceph from Luminous (12.2.11) to
>> Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it
>> okay to upgrade with filestore OSDs? We plan to migrate from filestore to
>> Bluestore at a later date as the clusters are pretty large in PBs size and
>> understand that any new or failed OSDs will have to be added as Bluestore
>> OSDs only post-upgrade. Will that work?
>>
>> Thanks,
>> Pardhiv
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>> Eneko Lacunza
>> Zuzendari teknikoa | Director técnico
>> Binovo IT Human Project
>>
>> Tel. +34 943 569 206 | https://www.binovo.esAstigarragako Bidea, 2 - 2º 
>> <https://www.google.com/maps/search/Astigarragako+Bidea,+2+-+2%C2%BA?entry=gmail=g>
>>  izda. Oficina 10-11, 20180 Oiartzun
>> https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/
>>
>> Eneko Lacunza
>
> Director Técnico | Zuzendari teknikoa
>
> Binovo IT Human Project
> 943 569 206
> elacu...@binovo.es
> binovo.es 
> Astigarragako Bidea, 2 - 2 izda. Oficina 10-11, 20180 Oiartzun
> [image: youtube] <https://www.youtube.com/user/CANALBINOVO/>
> [image: linkedin] <https://www.linkedin.com/company/37269706/>
>
> --
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph pool set min_write_recency_for_promote not working

2022-06-10 Thread Pardhiv Karri
Hi,

I created a new pool called "ssdimages," which is similar to another pool
called "images" (a very old one). But when I try to
set min_write_recency_for_promote to 1, it fails with permission denied. Do
you know how I can fix it?

ceph-lab # ceph osd dump | grep -E 'images|ssdimages'
pool 3 'images' replicated size 3 min_size 1 crush_rule 0 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 74894 flags hashpspool
min_write_recency_for_promote 1 stripe_width 0 application rbd
pool 25 'ssdimages' replicated size 3 min_size 1 crush_rule 1 object_hash
rjenkins pg_num 512 pgp_num 512 last_change 78217 flags hashpspool
stripe_width 0 application rbd
ceph-lab #


ceph-lab # ceph osd pool set ssdimages min_write_recency_for_promote 1
Error EACCES: (13) Permission denied
ceph-lab #

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Luminous to Pacific Upgrade with Filestore OSDs

2022-06-09 Thread Pardhiv Karri
Awesome, thank you, Eneko!

Would you mind sharing the upgrade run book, if you have one? Want to avoid
reinventing the wheel as there will b some caveats while uprading and they
don't usually be present in official Ceph upgrade docs.

Thanks,
Pardhiv

On Thu, Jun 9, 2022 at 12:40 AM Eneko Lacunza  wrote:

> Hi Pardhiv,
>
> We have a running production Pacific cluster with some filestore OSDs (and
> other Bluestore OSD too). This cluster was installed "some" years ago with
> Firefly... :)
>
> No issues related to filestore so far.
>
> Cheers
>
> El 8/6/22 a las 21:32, Pardhiv Karri escribió:
>
> Hi,
>
> We are planning to upgrade our current Ceph from Luminous (12.2.11) to
> Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it
> okay to upgrade with filestore OSDs? We plan to migrate from filestore to
> Bluestore at a later date as the clusters are pretty large in PBs size and
> understand that any new or failed OSDs will have to be added as Bluestore
> OSDs only post-upgrade. Will that work?
>
> Thanks,
> Pardhiv
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> Eneko Lacunza
> Zuzendari teknikoa | Director técnico
> Binovo IT Human Project
>
> Tel. +34 943 569 206 | https://www.binovo.esAstigarragako Bidea, 2 - 2º 
> <https://www.google.com/maps/search/Astigarragako+Bidea,+2+-+2%C2%BA?entry=gmail=g>
>  izda. Oficina 10-11, 20180 Oiartzun
> https://www.youtube.com/user/CANALBINOVOhttps://www.linkedin.com/company/37269706/
>
> --
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Luminous to Pacific Upgrade with Filestore OSDs

2022-06-08 Thread Pardhiv Karri
Hi,

We are planning to upgrade our current Ceph from Luminous (12.2.11) to
Nautilus and then to Pacific. We are using Filestore for OSDs now. Is it
okay to upgrade with filestore OSDs? We plan to migrate from filestore to
Bluestore at a later date as the clusters are pretty large in PBs size and
understand that any new or failed OSDs will have to be added as Bluestore
OSDs only post-upgrade. Will that work?

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph RBD pool copy?

2022-05-19 Thread Pardhiv Karri
Hi,

We have a ceph cluster with integration to Openstack. We are thinking about
migrating the glance (images) pool to a new pool with better SSD disks. I
see there is a "rados cppool" command. Will that work with snapshots in
this rbd pool?

-- 
*Pardhiv*
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Unable to login to Ceph Pacific Dashboard

2022-01-19 Thread Pardhiv Karri
Hi,

I installed Ceph Pacific one Monitor node using cephadm tool. The output of
installation gave me the credentials. When I go to a browser (different
from ceph server) I see the login screen and when I enter the credentials
the browser loads to the same page, in that fraction of a second I see it
asking me to enter a new password, so I went into cli and changed the
password and now trying to login with the new password but still gets stuck
at the login screen. I opened ports 8443 and 8080. Tried creating another
user with credentials and still no luck. What am I missing?

Thanks,
Pardhiv
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to track different ceph client version connections

2020-01-24 Thread Pardhiv Karri
This command is awesome, thank you!

--Pardhiv


On Fri, Jan 24, 2020 at 1:55 AM Konstantin Shalygin  wrote:

> We upgraded our Ceph cluster from Hammer to Luminous and it is running
> fine. Post upgrade we live migrated all our Openstack instances (not 100%
> sure). Currently we see 1658 clients still on Hammer version. To track the
> clients we increased the debugging of debug_mon=10/10, debug_ms=1/5,
> debug_monc=5/20 on all three monitors and looking at all three monitor logs
> at /var/log/ceph/mon..log and grepping for hammer and 0x81dff8eeacfffb but
> not seeing anything in logs even after hours of waiting.
>
> Earlier in our other clusters it used to show in logs from which Openstack
> compute node it is originating from. Am I missing something  or do I need
> to add more logging or need to check a different log on all three ceph
> monitor nodes?
>
> Look your clients at mon sessions:
>
> `ceph daemon /var/run/ceph/ceph-mon.ceph-mon0.asok sessions | grep hammer
> | awk '{print $2}'`
>
>
>
> k
>


-- 
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io