[ceph-users] RGW is slowly after the ops increase
check the ops for rgw: [root@node06 ceph]# ceph daemon /var/run/ceph/ceph-client.rgw.os.dsglczutvqsgowpz.a.13.93908447458760.asok objecter_requests| jq ".ops" | jq 'length' 8 list subdir with s5cmd: [root@node01 deeproute]# time ./s5cmd --endpoint-url=http://10.x.x.x:80 ls s3://mlp-data-warehouse/ads_prediction/ DIR prediction_scenes/ DIR test_pai/ real0m1.125s user0m0.007s sys 0m0.016s after the ops increase: [root@node06 ceph]# ceph daemon /var/run/ceph/ceph-client.rgw.os.dsglczutvqsgowpz.a.13.93908447458760.asok objecter_requests| jq ".ops" | jq 'length' 264 list subdir with s5cmd: [root@node01 deeproute]# time ./s5cmd --endpoint-url=http://10.x.x.x:80 ls s3://mlp-data-warehouse/ads_prediction/ DIR prediction_scenes/ DIR test_pai/ real0m8.822s user0m0.004s sys 0m0.019s and if the ops increase to more 2000, it needs more than 100s to list the subdir, why? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Osd crash, looks like something related to PG recovery.
{ "archived": "2023-04-13 02:23:50.948191", "backtrace": [ "/lib64/libpthread.so.0(+0x12ce0) [0x7f2ee8198ce0]", "pthread_kill()", "(ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*, std::chrono::time_point > >)+0x48c) [0x563506e9934c]", "(ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, std::chrono::duration >, std::chrono::duration >)+0x23e) [0x563506e9973e]", "(PrimaryLogPG::scan_range(int, int, BackfillInterval*, ThreadPool::TPHandle&)+0x15a) [0x56350699a8da]", "(PrimaryLogPG::do_scan(boost::intrusive_ptr, ThreadPool::TPHandle&)+0x914) [0x56350699bd34]", "(PrimaryLogPG::do_request(boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x776) [0x56350699c826]", "(OSD::dequeue_op(boost::intrusive_ptr, boost::intrusive_ptr, ThreadPool::TPHandle&)+0x309) [0x563506823fc9]", "(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x68) [0x563506a82e78]", "(OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0xc28) [0x5635068414c8]", "(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) [0x563506ebe2a4]", "(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563506ec1184]", "/lib64/libpthread.so.0(+0x81ca) [0x7f2ee818e1ca]", "clone()" ], "ceph_version": "16.2.10", "crash_id": "2023-04-12T16:53:45.988696Z_7e73aedd-3518-41f4-ae48-e5dbfe5750ec", "entity_name": "osd.74", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "ceph-osd", "stack_sig": "1a2700ce6c68288739eb14ca1b2b5f49449c59a5baafbd1e71df3a4316e3bffe", "timestamp": "2023-04-12T16:53:45.988696Z", "utsname_hostname": "node03", "utsname_machine": "x86_64", "utsname_release": "3.10.0-1160.45.1.el7.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Wed Oct 13 17:20:51 UTC 2021" } releated issuse under rook: https://github.com/rook/rook/issues/11565 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] radosgw crash
{ "archived": "2023-04-09 01:22:40.755345", "backtrace": [ "/lib64/libpthread.so.0(+0x12ce0) [0x7f06dc1edce0]", "(boost::asio::detail::reactive_socket_service_base::start_op(boost::asio::detail::reactive_socket_service_base::base_implementation_type&, int, boost::asio::detail::reactor_op*, bool, bool, bool)+0x126) [0x7f06e6cf84d6]", "(void boost::asio::detail::reactive_socket_service_base::async_receive, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >, boost::asio::detail::io_object_executor >(boost::asio::detail::reactive_socket_service_base::base_implementation_type&, boost::asio::mutable_buffer const&, int, boost::beast::detail::dynamic_read_ops::read_op, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >&, boost::asio::detail::io_object_executor const&)+0x1fa) [0x7f06e6d0bd1a]", "(boost::beast::detail::dynamic_read_ops::read_op, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >::operator()(boost::system::error_code, unsigned long, bool)+0x17f) [0x7f06e6d39f8f]", "(boost::asio::detail::executor_op, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >, boost::system::error_code, unsigned long>, std::allocator, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x189) [0x7f06e6d3a7d9]", "(boost::asio::detail::strand_executor_service::invoker::operator()()+0x8d) [0x7f06e6d0793d]", "(void boost::asio::io_context::executor_type::dispatch, std::allocator >(boost::asio::detail::strand_executor_service::invoker&&, std::allocator const&) const+0x9c) [0x7f06e6d07b3c]", "(void boost::asio::detail::strand_executor_service::dispatch, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >, boost::system::error_code, unsigned long>, std::allocator >(std::shared_ptr const&, boost::asio::io_context::executor_type const&, boost::asio::detail::binder2, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_conditi on, spawn::detail::coro_handler >, unsigned long> >, boost::system::error_code, unsigned long>&&, std::allocator const&)+0x2b6) [0x7f06e6d3a306]", "(boost::asio::detail::reactive_socket_recv_op, boost::beast::flat_static_buffer<65536ul>, boost::beast::http::detail::read_header_condition, spawn::detail::coro_handler >, unsigned long> >, boost::asio::detail::io_object_executor >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long)+0x1a1) [0x7f06e6d3a571]", "(boost::asio::detail::scheduler::run(boost::system::error_code&)+0x4f2) [0x7f06e6cfbad2]", "/lib64/libradosgw.so.2(+0x430376) [0x7f06e6cde376]", "/lib64/libstdc++.so.6(+0xc2ba3) [0x7f06db212ba3]", "/lib64/libpthread.so.0(+0x81ca) [0x7f06dc1e31ca]", "clone()" ], "ceph_version": "16.2.10", "crash_id": "2023-04-08T22:37:20.389262Z_88939939-522f-4b2c-a5fb-d4e49e9922a7", "entity_name": "client.rgw.os.dsglczutvqsgowpz.a", "os_id": "centos", "os_name": "CentOS Stream", "os_version": "8", "os_version_id": "8", "process_name": "radosgw", "stack_sig": "2535cd0a26a2ffcc7ca223d416ebf3d4ea172eeec60026bb8b36b2c97ea787da", "timestamp": "2023-04-08T22:37:20.389262Z", "utsname_hostname": "node01", "utsname_machine": "x86_64", "utsname_release": "3.10.0-1160.45.1.el7.x86_64", "utsname_sysname": "Linux", "utsname_version": "#1 SMP Wed Oct 13 17:20:51 UTC 2021" } ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] rookcmd: failed to configure devices: failed to generate osd keyring: failed to get or create auth key for client.bootstrap-osd:
I am deploying Ceph via ROOK on K8s Cluster with following version metric ceph-version=17.2.5-0 Ubuntu 20.04 Kernel 5.4.0-135-generic But getting following error, does ceph-version=17.2.5-0 tested with Ubuntu 20.04 running Kernel 5.4.0-135-generic. Where I can find compatibility metric? 2023-04-12 21:34:41.958118 I | cephclient: getting or creating ceph auth key "client.bootstrap-osd" 2023-04-12 21:34:41.958184 D | exec: Running command: ceph auth get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd --connect-timeout=15 --cluster=rook-ceph --conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin --keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json 2023-04-12 21:34:57.291895 C | rookcmd: failed to configure devices: failed to generate osd keyring: failed to get or create auth key for client.bootstrap-osd: failed get-or-create-key client.bootstrap-osd: exit status 1' ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSDs remain not in after update to v17
Dear Ceph Users, I have a small ceph cluster for VMs on my local machine. It used to be installed with the system packages and I migrated it to docker following the documentation. It worked OK until I migrated from v16 to v17 a few months ago. Now the OSDs remain "not in" as shown in the status: # ceph -s cluster: id: abef2e91-cd07-4359-b457-f0f8dc753dfa health: HEALTH_WARN 6 stray daemon(s) not managed by cephadm 1 stray host(s) with 6 daemon(s) not managed by cephadm 2 devices (4 osds) down 4 osds down 1 host (4 osds) down 1 root (4 osds) down Reduced data availability: 129 pgs inactive services: mon: 1 daemons, quorum bjorn (age 8m) mgr: bjorn(active, since 8m) osd: 4 osds: 0 up (since 4w), 4 in (since 4w) data: pools: 2 pools, 129 pgs objects: 0 objects, 0 B usage: 1.8 TiB used, 1.8 TiB / 3.6 TiB avail pgs: 100.000% pgs unknown 129 unknown I can see some network communication between the OSDs and the monitor and the OSDs are running: # docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f8fbe8177a63 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-2 6768ec871404 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-1 ff82f84504d5 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-0 4c89e50ce974 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-3 fe0b6089edda quay.io/ceph/ceph:v17 "/usr/bin/ceph-mon -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mon-bjorn f76ac9dcdd6d quay.io/ceph/ceph:v17 "/usr/bin/ceph-mgr -…" 9 minutes ago Up 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mgr-bjorn However when I try to use any `ceph orch` commands, they hang. I can also see some blacklist on the OSDs: # ceph osd blocklist ls 10.99.0.13:6833/3770763474 2023-04-13T08:17:38.885128+ 10.99.0.13:6832/3770763474 2023-04-13T08:17:38.885128+ 10.99.0.13:0/2634718754 2023-04-13T08:17:38.885128+ 10.99.0.13:0/1103315748 2023-04-13T08:17:38.885128+ listed 4 entries The first two entries correspond to the manager process. `ceph osd blocked-by` does not show anything. I think I might have forgotten to set the `ceph osd require-osd-release ...` because 14 is written in `/var/lib/ceph//osd.?/require_osd_release`. If I try to do it now, the monitor hits an abort: debug 0> 2023-04-12T08:43:27.788+ 7f0fcf2aa700 -1 *** Caught signal (Aborted) ** in thread 7f0fcf2aa700 thread_name:ms_dispatch ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable) 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f0fd94bbcf0] 2: gsignal() 3: abort() 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x18f) [0x7f0fdb5124e3] 5: /usr/lib64/ceph/libceph-common.so.2(+0x26a64f) [0x7f0fdb51264f] 6: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr, std::map, std::allocator >, boost::variant)+0x38d) [0x562719cb127d] 8: (OSDMonitor::prepare_update(boost::intrusive_ptr)+0x17b) [0x562719cb18cb] 9: (PaxosService::dispatch(boost::intrusive_ptr)+0x2ce) [0x562719c20ade] 10: (Monitor::handle_command(boost::intrusive_ptr)+0x1ebb) [0x562719ab9f6b] 11: (Monitor::dispatch_op(boost::intrusive_ptr)+0x9f2) [0x562719abe152] 12: (Monitor::_ms_dispatch(Message*)+0x406) [0x562719abf066] 13: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5d) [0x562719aef13d] 14: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr const&)+0x478) [0x7f0fdb78e0e8] 15: (DispatchQueue::entry()+0x50f) [0x7f0fdb78b52f] 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f0fdb8543b1] 17: /lib64/libpthread.so.0(+0x81ca) [0x7f0fd94b11ca] 18: clone() Any ideas on what is going on? Many thanks, Alexandre ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] v16.2.12 Pacific (hot-fix) released
We're happy to announce the 12th hot-fix release in the Pacific series. https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/ Notable Changes --- This is a hotfix release that resolves several performance flaws in ceph-volume, particularly during osd activation (https://tracker.ceph.com/issues/57627) Getting Ceph * Git at git://github.com/ceph/ceph.git * Tarball at https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz * Containers at https://quay.io/repository/ceph/ceph * For packages, see https://docs.ceph.com/en/latest/install/get-packages/ * Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nothing provides libthrift-0.14.0.so()(64bit)
Oops, forgot to mention that I'm installing Ceph 17.2.6, preempting an upgrade of our cluster from 15.2.17 to 17.2.6. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Nothing provides libthrift-0.14.0.so()(64bit)
Hello! I'm trying to Install the ceph-common package on a Rocky Linux 9 box so that I can connect to our ceph cluster and mount user directories. I've added the ceph repo to yum.repos.d, but when I run `dnf install ceph-common`, I get the following error ``` [root@jet yum.repos.d]# dnf install ceph-common Last metadata expiration check: 0:33:06 ago on Fri 14 Apr 2023 11:29:26 AM EDT. Error: Problem: conflicting requests - nothing provides libthrift-0.14.0.so()(64bit) needed by ceph-common-2:17.2.6-0.el9.x86_64 (try to add '--skip-broken' to skip uninstallable packages or '--nobest' to use not only best candidate packages) ``` I've looked far and wide for this libthrift package, but I can't figure out what repo to add to get it. Has anyone had success installing ceph on Rocky 9 and would be willing to share some guidance? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: 17.2.6 Dashboard/RGW Signature Mismatch
I've finally solved this. There has been a change in behaviour in 17.2.6. For cluster 2 (the one that failed): * When they were built the hosts were configured with a hostname without a domain (so hostname returned a short name) * The hosts as reported by ceph all had short hostnames * In ceph.conf each of the RGWs has a section like: [client.rgw.host1] host = host1 rgw frontends = "beast port=80" rgw dns name = host1.my.domain rgw_crypt_require_ssl = false * The dashboard connections to the RGW servers all had a Host header of the FQDN as specified in ceph.conf (observed using tcpdump) * The RGW processes allowed the connections based on knowledge of their own FQDN But after the upgrade: * The dashboard connections to the RGW all have a Host header of the short host name (observed using tcpdump) * The RGW processes are disallowing it has it doesn't match their FQDN * By adding the short names to the zonegroup "hostnames" it now works Cluster 1 (which didn't fail) had been built with FQDN hostnames, so were still supplying an FQDN in the Host headers. So my hypothesis is that in 17.2.6 the dashboard no longer honours the "rgw dns name" field in ceph.conf. There may be some other subtleties but that's my best guess. If you were running TLS to the RGWs, that may well be sufficient to cause certificate name mismatches too unless the certificate SANs contained the short names. I guess you would hit that first, masking the other problem. Although cluster 2 should probably have been configured with FQDN hostnames I do still think this is a regression. The "rgw dns name" field should be honoured. Thanks, Chris On 13/04/2023 17:20, Chris Palmer wrote: Hi I have 3 Ceph clusters, all configured similarly, which have been happy for some months on 17.2.5: 1. A test cluster 2. A small production cluster 3. A larger production cluster All are debian 11 built from packages - no cephadm. I upgraded (1) to 17.2.6 without any problems at all. In particular the Object Gateway sections of the dashboard work as usual. I then upgraded (2). Nothing seemed amiss, and everything seems to work except... when I try to access the Object Gateway sections of the dashboard I always get: *The Object Gateway Service is not configured* Error connecting to Object Gateway: RGW REST API failed request with status code 403 (b'{"Code":"SignatureDoesNotMatch","RequestId":"tx022ba920e82ac4a9c-0064381' b'934-10e73385-default","HostId":"10e73385-default-default"}') (Just the RequestId changes each time). Before the upgrade it worked just fine. Other info: * RGW requests using awscli and rclone all work with normal RGW accounts. It just seems to be the dashboard that's died. * Just the one zonegroup, no multisite/replication * "radosgw-admin user info --uid=rgwadmin" gives the correct output with the right access_key & secret_key. The other fields are as in (1). * "ceph dashboard get-rgw-api-access-key/get-rgw-api-secret-key" both give the right values. The rgw logs from (2) which fails show: 2023-04-13T16:36:28.720+0100 7fcc7966a700 1 == starting new request req=0x7fcd88c10720 = 2023-04-13T16:36:28.720+0100 7fcc80e79700 1 req 8090309398268968541 0.0s op->ERRORHANDLER: err_no=-2027 new_err_no=-2027 2023-04-13T16:36:28.724+0100 7fcc80e79700 1 == req done req=0x7fcd88c10720 op status=0 http_status=403 latency=0.00380s == 2023-04-13T16:36:28.724+0100 7fcc80e79700 1 beast: 0x7fcd88c10720: 192.168.xx.xx - - [13/Apr/2023:16:36:28.720 +0100] "GET /admin/metadata/user?myself HTTP/1.1" 403 134 - "python-requests/2.25.1" - latency=0.00380s (Note this does not have rgwadmin as the user, and is always the same URL) Whereas the rgw logs from (1) which works show things like: 2023-04-13T15:44:19.396+ 7f8478da1700 1 == starting new request req=0x7f86284f5720 = 2023-04-13T15:44:19.412+ 7f8478da1700 1 == req done req=0x7f86284f5720 op status=0 http_status=200 latency=0.01660s == 2023-04-13T15:44:19.412+ 7f8478da1700 1 beast: 0x7f86284f5720: 10.xx.xx.xx - rgwadmin [13/Apr/2023:15:44:19.396 +] "GET /admin/realm?list HTTP/1.1" 200 31 - "python-requests/2.25.1" - latency=0.01660s (Note this has rgwadmin as the user, and various URLs) The only thing I can see in the release notes that looks even vaguely related is https://github.com/ceph/ceph/pull/47547, but it doesn't seem likely. I am really stumped on this, with no idea what has gone wrong on (2), and what the difference is between (1) and (2). I'm not going to touch (3) until I have resolved this. Grateful for any help... And thanks for all the good work. Regards, Chris ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users
[ceph-users] Re: Restrict user to an RBD image in a pool
Hi, this is a common question, you should be able to find plenty of examples, here's one [1]. Regards, Eugen [1] https://www.spinics.net/lists/ceph-users/msg76020.html Zitat von Work Ceph : Hello guys! Is it possible to restrict user access to a single image in an RBD pool? I know that I can use namespaces, so users can only see images with a given namespaces. However, these users will still be able to create new RBD images. Is it possible to somehow block users from creating RBD images and only work with the already existing ones? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Cephadm only scheduling, not orchestrating daemons
Hi, I would probably stop the upgrade to continue, this might be blocking cephadm. Then try again to redeploy a daemon, if it still fails check the cephadm.log(s) on the respective servers as well as the active mgr log. Regards, Eugen Zitat von Thomas Widhalm : Hi, As you might know, I have a problem with MDS not starting. During the investigation with your help I found another issue that might be related. I can plan to restart, redeploy, reconfigure services via cephadm or dashboard just as I want but services won't react. I only see the action to be scheduled but that's all. 2023-04-13T17:11:15.690698+ mgr.ceph04.qaexpv (mgr.74906907) 37184 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt 2023-04-13T17:11:20.746743+ mgr.ceph04.qaexpv (mgr.74906907) 37190 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks 2023-04-13T17:11:24.971226+ mgr.ceph04.qaexpv (mgr.74906907) 37195 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph07.omdisd It's the same for other daemons/services. I changed placement rules, scheduled changes, failed mgr, even rebooted hosts. I even was desperate enough to delete files for services from hosts before rebooting hoping I could trigger a manual redeploy. All I see are the same MDS stuck in "error" state. I removed them via "ceph orch rm" but they are still there. When I reissue the command it fails saying that the service isn't there. "ceph orch ps" still lists them. mds.mds01.ceph03.xqwdjy ceph03 error2d ago 2M-- mds.mds01.ceph04.hcmvae ceph04 error2d ago 2d-- mds.mds01.ceph05.pqxmvt ceph05 error2d ago 10M-- mds.mds01.ceph06.rrxmks ceph06 error2d ago 10w-- mds.mds01.ceph07.omdisd ceph07 error2d ago 3M-- Any idea how I can get rid of them? Or redeploy them? Additionally I'm just in the middle of an upgrade. { "target_image": "quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635", "in_progress": true, "which": "Upgrading all daemon types on all hosts", "services_complete": [ "crash", "mgr", "mon", "osd" ], "progress": "18/40 daemons upgraded", "message": "Upgrade paused", "is_paused": true } I paused it on purpose to allow for manipulation of daemons. Cheers, Thomas ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services
Hi Team, their is one additional observation. Mount as the client is working fine from one of the Ceph nodes. Command *: sudo mount -t ceph :/ /mnt/imgs -o name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwdfULnx6qX/VDA== * *we are not passing the Monitor address, instead, DNS SRV is configured as per:* https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/ mount works fine in this case. But if we try to mount from the other Location i.e from another VM/client(non-Ceph Node) we are getting the error : mount -t ceph :/ /mnt/imgs -o name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA== -v *mount: /mnt/image: mount point does not exist.* the document says that if we do not pass the monitor address, it tries discovering the monitor address from DNS Servers, but in actual it is not happening. On Tue, Apr 11, 2023 at 6:48 PM Lokendra Rathour wrote: > Ceph version Quincy. > > But now I am able to resolve the issue. > > During mount i will not pass any monitor details, it will be > auto-discovered via SRV. > > On Tue, Apr 11, 2023 at 6:09 PM Eugen Block wrote: > >> What ceph version is this? Could it be this bug [1]? Although the >> error message is different, not sure if it could be the same issue, >> and I don't have anything to test ipv6 with. >> >> [1] https://tracker.ceph.com/issues/47300 >> >> Zitat von Lokendra Rathour : >> >> > Hi All, >> > Requesting any inputs around the issue raised. >> > >> > Best Regards, >> > Lokendra >> > >> > On Tue, 24 Jan, 2023, 7:32 pm Lokendra Rathour, < >> lokendrarath...@gmail.com> >> > wrote: >> > >> >> Hi Team, >> >> >> >> >> >> >> >> We have a ceph cluster with 3 storage nodes: >> >> >> >> 1. storagenode1 - abcd:abcd:abcd::21 >> >> >> >> 2. storagenode2 - abcd:abcd:abcd::22 >> >> >> >> 3. storagenode3 - abcd:abcd:abcd::23 >> >> >> >> >> >> >> >> The requirement is to mount ceph using the domain name of MON node: >> >> >> >> Note: we resolved the domain name via DNS server. >> >> >> >> >> >> For this we are using the command: >> >> >> >> ``` >> >> >> >> mount -t ceph [storagenode.storage.com]:6789:/ /backup -o >> >> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg== >> >> >> >> ``` >> >> >> >> >> >> >> >> We are getting the following logs in /var/log/messages: >> >> >> >> ``` >> >> >> >> Jan 24 17:23:17 localhost kernel: libceph: resolve ' >> >> storagenode.storage.com' (ret=-3): failed >> >> >> >> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip ' >> >> storagenode.storage.com:6789' >> >> >> >> ``` >> >> >> >> >> >> >> >> We also tried mounting ceph storage using IP of MON which is working >> fine. >> >> >> >> >> >> >> >> Query: >> >> >> >> >> >> Could you please help us out with how we can mount ceph using FQDN. >> >> >> >> >> >> >> >> My /etc/ceph/ceph.conf is as follows: >> >> >> >> [global] >> >> >> >> ms bind ipv6 = true >> >> >> >> ms bind ipv4 = false >> >> >> >> mon initial members = storagenode1,storagenode2,storagenode3 >> >> >> >> osd pool default crush rule = -1 >> >> >> >> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe >> >> >> >> mon host = >> >> >> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789] >> >> >> >> public network = abcd:abcd:abcd::/64 >> >> >> >> cluster network = eff0:eff0:eff0::/64 >> >> >> >> >> >> >> >> [osd] >> >> >> >> osd memory target = 4294967296 >> >> >> >> >> >> >> >> [client.rgw.storagenode1.rgw0] >> >> >> >> host = storagenode1 >> >> >> >> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring >> >> >> >> log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log >> >> >> >> rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080 >> >> >> >> rgw thread pool size = 512 >> >> >> >> -- >> >> ~ Lokendra >> >> skype: lokendrarathour >> >> >> >> >> >> >> > ___ >> > ceph-users mailing list -- ceph-users@ceph.io >> > To unsubscribe send an email to ceph-users-le...@ceph.io >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io >> > > > -- > ~ Lokendra > skype: lokendrarathour > > > -- ~ Lokendra skype: lokendrarathour ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Restrict user to an RBD image in a pool
Hello guys! Is it possible to restrict user access to a single image in an RBD pool? I know that I can use namespaces, so users can only see images with a given namespaces. However, these users will still be able to create new RBD images. Is it possible to somehow block users from creating RBD images and only work with the already existing ones? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph pg stuck - missing on 1 osd how to proceed
Hi, your cluster is in backfilling state, maybe just wait for the backfill to finish? What is 'ceph -s' reporting? The PG could be backfilling to a different OSD as well. You could query the PG to see more details ('ceph pg 8.2a6 query'). By the way, the PGs you show are huge (around 174 GB with more than 200k objects), depending on the disks you use a split could help gain more performance (if that is an issue for you). Regards, Eugen Zitat von xadhoo...@gmail.com: Hi to all Using ceph 17.2.5 i have 3 pgs in stuck state ceph pg map 8.2a6 osdmap e32862 pg 8.2a6 (8.2a6) -> up [88,100,59] acting [59,100] looking at it ho 88 ,100 and 59 i got that ceph pg ls-by-osd osd.100 | grep 8.2a6 8.2a6 211004209089 00 174797925205 0 0 7075 active+undersized+degraded+remapped+backfilling21m 32862'1540291 32862:3387785 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+ 2023-03-12T08:08:00.903727+ 6839 queued for deep scrub ceph pg ls-by-osd osd.59 | grep 8.2a6 8.2a6 211005209084 00 174798941087 0 0 7076 active+undersized+degraded+remapped+backfilling 22m 32862'1540292 32862:3387798 [88,100,59]p88 [59,100]p59 2023-03-12T08:08:00.903727+ 2023-03-12T08:08:00.903727+ 6839 queued for deep scrub BUT ceph pg ls-by-osd osd.88 | grep 8.2a6 ---> NONE it is missing how to proceed ? Best regards ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io