[ceph-users] RGW is slowly after the ops increase

2023-04-14 Thread Louis Koo
check the ops for rgw:
[root@node06 ceph]# ceph daemon 
/var/run/ceph/ceph-client.rgw.os.dsglczutvqsgowpz.a.13.93908447458760.asok 
objecter_requests| jq ".ops"  | jq 'length'
8

list subdir with s5cmd:
[root@node01 deeproute]# time ./s5cmd --endpoint-url=http://10.x.x.x:80 ls 
s3://mlp-data-warehouse/ads_prediction/
  DIR prediction_scenes/
  DIR test_pai/

real0m1.125s
user0m0.007s
sys 0m0.016s

after the ops increase:
[root@node06 ceph]# ceph daemon 
/var/run/ceph/ceph-client.rgw.os.dsglczutvqsgowpz.a.13.93908447458760.asok 
objecter_requests| jq ".ops"  | jq 'length'
264

list subdir with s5cmd:
[root@node01 deeproute]# time ./s5cmd --endpoint-url=http://10.x.x.x:80 ls 
s3://mlp-data-warehouse/ads_prediction/
  DIR prediction_scenes/
  DIR test_pai/
real0m8.822s
user0m0.004s
sys 0m0.019s


and if the ops increase to more 2000, it needs more than 100s to list the 
subdir, why?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Osd crash, looks like something related to PG recovery.

2023-04-14 Thread Louis Koo
{
"archived": "2023-04-13 02:23:50.948191",
"backtrace": [
"/lib64/libpthread.so.0(+0x12ce0) [0x7f2ee8198ce0]",
"pthread_kill()",
"(ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char 
const*, std::chrono::time_point > >)+0x48c) 
[0x563506e9934c]",
"(ceph::HeartbeatMap::reset_timeout(ceph::heartbeat_handle_d*, 
std::chrono::duration >, 
std::chrono::duration >)+0x23e) 
[0x563506e9973e]",
"(PrimaryLogPG::scan_range(int, int, BackfillInterval*, 
ThreadPool::TPHandle&)+0x15a) [0x56350699a8da]",
"(PrimaryLogPG::do_scan(boost::intrusive_ptr, 
ThreadPool::TPHandle&)+0x914) [0x56350699bd34]",
"(PrimaryLogPG::do_request(boost::intrusive_ptr&, 
ThreadPool::TPHandle&)+0x776) [0x56350699c826]",
"(OSD::dequeue_op(boost::intrusive_ptr, 
boost::intrusive_ptr, ThreadPool::TPHandle&)+0x309) 
[0x563506823fc9]",
"(ceph::osd::scheduler::PGOpItem::run(OSD*, OSDShard*, 
boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x68) [0x563506a82e78]",
"(OSD::ShardedOpWQ::_process(unsigned int, 
ceph::heartbeat_handle_d*)+0xc28) [0x5635068414c8]",
"(ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4) 
[0x563506ebe2a4]",
"(ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563506ec1184]",
"/lib64/libpthread.so.0(+0x81ca) [0x7f2ee818e1ca]",
"clone()"
],
"ceph_version": "16.2.10",
"crash_id": 
"2023-04-12T16:53:45.988696Z_7e73aedd-3518-41f4-ae48-e5dbfe5750ec",
"entity_name": "osd.74",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "ceph-osd",
"stack_sig": 
"1a2700ce6c68288739eb14ca1b2b5f49449c59a5baafbd1e71df3a4316e3bffe",
"timestamp": "2023-04-12T16:53:45.988696Z",
"utsname_hostname": "node03",
"utsname_machine": "x86_64",
"utsname_release": "3.10.0-1160.45.1.el7.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Oct 13 17:20:51 UTC 2021"
}

releated issuse under rook: https://github.com/rook/rook/issues/11565
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw crash

2023-04-14 Thread Louis Koo
{
"archived": "2023-04-09 01:22:40.755345",
"backtrace": [
"/lib64/libpthread.so.0(+0x12ce0) [0x7f06dc1edce0]",

"(boost::asio::detail::reactive_socket_service_base::start_op(boost::asio::detail::reactive_socket_service_base::base_implementation_type&,
 int, boost::asio::detail::reactor_op*, bool, bool, bool)+0x126) 
[0x7f06e6cf84d6]",
"(void 
boost::asio::detail::reactive_socket_service_base::async_receive, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler >, unsigned long> 
>, 
boost::asio::detail::io_object_executor 
>(boost::asio::detail::reactive_socket_service_base::base_implementation_type&, 
boost::asio::mutable_buffer const&, int, 
boost::beast::detail::dynamic_read_ops::read_op, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler 
>, unsigned long> >&, 
boost::asio::detail::io_object_executor 
const&)+0x1fa) [0x7f06e6d0bd1a]",

"(boost::beast::detail::dynamic_read_ops::read_op, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler >, unsigned long> 
>::operator()(boost::system::error_code, unsigned long, bool)+0x17f) 
[0x7f06e6d39f8f]",

"(boost::asio::detail::executor_op, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler >, unsigned long> 
>, boost::system::error_code, unsigned long>, std::allocator, 
boost::asio::detail::scheduler_operation>::do_complete(void*, 
boost::asio::detail::scheduler_operation*, boost::system::error_code const&, 
unsigned long)+0x189) [0x7f06e6d3a7d9]",

"(boost::asio::detail::strand_executor_service::invoker::operator()()+0x8d) [0x7f06e6d0793d]",
"(void 
boost::asio::io_context::executor_type::dispatch, std::allocator 
>(boost::asio::detail::strand_executor_service::invoker&&, std::allocator const&) const+0x9c) [0x7f06e6d07b3c]",
"(void 
boost::asio::detail::strand_executor_service::dispatch, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler >, unsigned long> 
>, boost::system::error_code, unsigned long>, std::allocator 
>(std::shared_ptr 
const&, boost::asio::io_context::executor_type const&, 
boost::asio::detail::binder2, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_conditi
 on, spawn::detail::coro_handler >, unsigned long> 
>, boost::system::error_code, unsigned long>&&, std::allocator 
const&)+0x2b6) [0x7f06e6d3a306]",

"(boost::asio::detail::reactive_socket_recv_op, 
boost::beast::flat_static_buffer<65536ul>, 
boost::beast::http::detail::read_header_condition, 
spawn::detail::coro_handler >, unsigned long> 
>, 
boost::asio::detail::io_object_executor 
>::do_complete(void*, boost::asio::detail::scheduler_operation*, 
boost::system::error_code const&, unsigned long)+0x1a1) [0x7f06e6d3a571]",

"(boost::asio::detail::scheduler::run(boost::system::error_code&)+0x4f2) 
[0x7f06e6cfbad2]",
"/lib64/libradosgw.so.2(+0x430376) [0x7f06e6cde376]",
"/lib64/libstdc++.so.6(+0xc2ba3) [0x7f06db212ba3]",
"/lib64/libpthread.so.0(+0x81ca) [0x7f06dc1e31ca]",
"clone()"
],
"ceph_version": "16.2.10",
"crash_id": 
"2023-04-08T22:37:20.389262Z_88939939-522f-4b2c-a5fb-d4e49e9922a7",
"entity_name": "client.rgw.os.dsglczutvqsgowpz.a",
"os_id": "centos",
"os_name": "CentOS Stream",
"os_version": "8",
"os_version_id": "8",
"process_name": "radosgw",
"stack_sig": 
"2535cd0a26a2ffcc7ca223d416ebf3d4ea172eeec60026bb8b36b2c97ea787da",
"timestamp": "2023-04-08T22:37:20.389262Z",
"utsname_hostname": "node01",
"utsname_machine": "x86_64",
"utsname_release": "3.10.0-1160.45.1.el7.x86_64",
"utsname_sysname": "Linux",
"utsname_version": "#1 SMP Wed Oct 13 17:20:51 UTC 2021"
}
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rookcmd: failed to configure devices: failed to generate osd keyring: failed to get or create auth key for client.bootstrap-osd:

2023-04-14 Thread knawaz
I am deploying Ceph via ROOK on K8s Cluster with following version metric 
ceph-version=17.2.5-0
 Ubuntu 20.04
 Kernel 5.4.0-135-generic

But getting following error,  does ceph-version=17.2.5-0 tested with  Ubuntu 
20.04 running  Kernel 5.4.0-135-generic. Where I can find compatibility metric? 
2023-04-12 21:34:41.958118 I | cephclient: getting or creating ceph auth key 
"client.bootstrap-osd"
2023-04-12 21:34:41.958184 D | exec: Running command: ceph auth 
get-or-create-key client.bootstrap-osd mon allow profile bootstrap-osd 
--connect-timeout=15 --cluster=rook-ceph 
--conf=/var/lib/rook/rook-ceph/rook-ceph.config --name=client.admin 
--keyring=/var/lib/rook/rook-ceph/client.admin.keyring --format json
2023-04-12 21:34:57.291895 C | rookcmd: failed to configure devices: failed to 
generate osd keyring: failed to get or create auth key for 
client.bootstrap-osd: failed get-or-create-key client.bootstrap-osd: exit 
status 1'
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSDs remain not in after update to v17

2023-04-14 Thread Alexandre Becholey
Dear Ceph Users,

I have a small ceph cluster for VMs on my local machine. It used to be 
installed with the system packages and I migrated it to docker following the 
documentation. It worked OK until I migrated from v16 to v17 a few months ago. 
Now the OSDs remain "not in" as shown in the status:

# ceph -s
  cluster:
id: abef2e91-cd07-4359-b457-f0f8dc753dfa
health: HEALTH_WARN
6 stray daemon(s) not managed by cephadm
1 stray host(s) with 6 daemon(s) not managed by cephadm
2 devices (4 osds) down
4 osds down
1 host (4 osds) down
1 root (4 osds) down
Reduced data availability: 129 pgs inactive
 
  services:
mon: 1 daemons, quorum bjorn (age 8m)
mgr: bjorn(active, since 8m)
osd: 4 osds: 0 up (since 4w), 4 in (since 4w)
 
  data:
pools:   2 pools, 129 pgs
objects: 0 objects, 0 B
usage:   1.8 TiB used, 1.8 TiB / 3.6 TiB avail
pgs: 100.000% pgs unknown
 129 unknown

I can see some network communication between the OSDs and the monitor and the 
OSDs are running:

# docker ps -a
CONTAINER ID   IMAGE   COMMAND  CREATED 
 STATUS  PORTS NAMES
f8fbe8177a63   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-2
6768ec871404   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-1
ff82f84504d5   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-0
4c89e50ce974   quay.io/ceph/ceph:v17   "/usr/bin/ceph-osd -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-3
fe0b6089edda   quay.io/ceph/ceph:v17   "/usr/bin/ceph-mon -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mon-bjorn
f76ac9dcdd6d   quay.io/ceph/ceph:v17   "/usr/bin/ceph-mgr -…"   9 minutes ago   
 Up 9 minutes  ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mgr-bjorn

However when I try to use any `ceph orch` commands, they hang. I can also see 
some blacklist on the OSDs:

# ceph osd blocklist ls
10.99.0.13:6833/3770763474 2023-04-13T08:17:38.885128+
10.99.0.13:6832/3770763474 2023-04-13T08:17:38.885128+
10.99.0.13:0/2634718754 2023-04-13T08:17:38.885128+
10.99.0.13:0/1103315748 2023-04-13T08:17:38.885128+
listed 4 entries

The first two entries correspond to the manager process. `ceph osd blocked-by` 
does not show anything.

I think I might have forgotten to set the `ceph osd require-osd-release ...` 
because 14 is written in `/var/lib/ceph//osd.?/require_osd_release`. If I 
try to do it now, the monitor hits an abort:

debug  0> 2023-04-12T08:43:27.788+ 7f0fcf2aa700 -1 *** Caught signal 
(Aborted) **
 in thread 7f0fcf2aa700 thread_name:ms_dispatch
 ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f0fd94bbcf0]
 2: gsignal()
 3: abort()
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x18f) [0x7f0fdb5124e3]
 5: /usr/lib64/ceph/libceph-common.so.2(+0x26a64f) [0x7f0fdb51264f]
 6: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr, 
std::map, 
std::allocator >, boost::variant)+0x38d) 
[0x562719cb127d]
 8: (OSDMonitor::prepare_update(boost::intrusive_ptr)+0x17b) 
[0x562719cb18cb]
 9: (PaxosService::dispatch(boost::intrusive_ptr)+0x2ce) 
[0x562719c20ade]
 10: (Monitor::handle_command(boost::intrusive_ptr)+0x1ebb) 
[0x562719ab9f6b]
 11: (Monitor::dispatch_op(boost::intrusive_ptr)+0x9f2) 
[0x562719abe152]
 12: (Monitor::_ms_dispatch(Message*)+0x406) [0x562719abf066]
 13: (Dispatcher::ms_dispatch2(boost::intrusive_ptr const&)+0x5d) 
[0x562719aef13d]
 14: (Messenger::ms_deliver_dispatch(boost::intrusive_ptr 
const&)+0x478) [0x7f0fdb78e0e8]
 15: (DispatchQueue::entry()+0x50f) [0x7f0fdb78b52f]
 16: (DispatchQueue::DispatchThread::entry()+0x11) [0x7f0fdb8543b1]
 17: /lib64/libpthread.so.0(+0x81ca) [0x7f0fd94b11ca]
 18: clone()

Any ideas on what is going on?

Many thanks,
Alexandre
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] v16.2.12 Pacific (hot-fix) released

2023-04-14 Thread Yuri Weinstein
We're happy to announce the 12th hot-fix release in the Pacific series.

https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/

Notable Changes
---
This is a hotfix release that resolves several performance flaws in ceph-volume,
particularly during osd activation (https://tracker.ceph.com/issues/57627)
Getting Ceph


* Git at git://github.com/ceph/ceph.git
* Tarball at https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz
* Containers at https://quay.io/repository/ceph/ceph
* For packages, see https://docs.ceph.com/en/latest/install/get-packages/
* Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nothing provides libthrift-0.14.0.so()(64bit)

2023-04-14 Thread Will Nilges
Oops, forgot to mention that I'm installing Ceph 17.2.6, preempting an
upgrade of our cluster from 15.2.17 to 17.2.6.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Nothing provides libthrift-0.14.0.so()(64bit)

2023-04-14 Thread Will Nilges
Hello!
I'm trying to Install the ceph-common package on a Rocky Linux 9 box so
that I can connect to our ceph cluster and mount user directories. I've
added the ceph repo to yum.repos.d, but when I run `dnf install
ceph-common`, I get the following error
```
[root@jet yum.repos.d]# dnf install ceph-common
Last metadata expiration check: 0:33:06 ago on Fri 14 Apr 2023 11:29:26 AM
EDT.
Error:
 Problem: conflicting requests
  - nothing provides libthrift-0.14.0.so()(64bit) needed by
ceph-common-2:17.2.6-0.el9.x86_64
(try to add '--skip-broken' to skip uninstallable packages or '--nobest' to
use not only best candidate packages)
```

I've looked far and wide for this libthrift package, but I can't figure out
what repo to add to get it. Has anyone had success installing ceph on Rocky
9 and would be willing to share some guidance?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.6 Dashboard/RGW Signature Mismatch

2023-04-14 Thread Chris Palmer

I've finally solved this. There has been a change in behaviour in 17.2.6.

For cluster 2 (the one that failed):

 * When they were built the hosts were configured with a hostname
   without a domain (so hostname returned a short name)
 * The hosts as reported by ceph all had short hostnames
 * In ceph.conf each of the RGWs has a section like:

[client.rgw.host1]
    host = host1
    rgw frontends = "beast port=80"
    rgw dns name = host1.my.domain
    rgw_crypt_require_ssl = false

 * The dashboard connections to the RGW servers all had a Host header
   of the FQDN as specified in ceph.conf (observed using tcpdump)
 * The RGW processes allowed the connections based on knowledge of
   their own FQDN

But after the upgrade:

 * The dashboard connections to the RGW all have a Host header of the
   short host name (observed using tcpdump)
 * The RGW processes are disallowing it has it doesn't match their FQDN
 * By adding the short names to the zonegroup "hostnames" it now works

Cluster 1 (which didn't fail) had been built with FQDN hostnames, so 
were still supplying an FQDN in the Host headers.


So my hypothesis is that in 17.2.6 the dashboard no longer honours the 
"rgw dns name" field in ceph.conf. There may be some other subtleties 
but that's my best guess.


If you were running TLS to the RGWs, that may well be sufficient to 
cause certificate name mismatches too unless the certificate SANs 
contained the short names. I guess you would hit that first, masking the 
other problem.


Although cluster 2 should probably have been configured with FQDN 
hostnames I do still think this is a regression. The "rgw dns name" 
field should be honoured.


Thanks, Chris


On 13/04/2023 17:20, Chris Palmer wrote:

Hi

I have 3 Ceph clusters, all configured similarly, which have been 
happy for some months on 17.2.5:


1. A test cluster
2. A small production cluster
3. A larger production cluster

All are debian 11 built from packages - no cephadm.

I upgraded (1) to 17.2.6 without any problems at all. In particular 
the Object Gateway sections of the dashboard work as usual.


I then upgraded (2). Nothing seemed amiss, and everything seems to 
work except... when I try to access the Object Gateway sections of the 
dashboard I always get:



 *The Object Gateway Service is not configured*


   Error connecting to Object Gateway: RGW REST API failed request
   with status code 403
(b'{"Code":"SignatureDoesNotMatch","RequestId":"tx022ba920e82ac4a9c-0064381'
b'934-10e73385-default","HostId":"10e73385-default-default"}')

(Just the RequestId changes each time). Before the upgrade it worked 
just fine.


Other info:

 * RGW requests using awscli and rclone all work with normal RGW
   accounts. It just seems to be the dashboard that's died.
 * Just the one zonegroup, no multisite/replication
 * "radosgw-admin user info --uid=rgwadmin" gives the correct output
   with the right access_key & secret_key. The other fields are as in 
(1).

 * "ceph dashboard get-rgw-api-access-key/get-rgw-api-secret-key" both
   give the right values.

The rgw logs from (2) which fails show:

2023-04-13T16:36:28.720+0100 7fcc7966a700  1 == starting new 
request req=0x7fcd88c10720 =
2023-04-13T16:36:28.720+0100 7fcc80e79700  1 req 8090309398268968541 
0.0s op->ERRORHANDLER: err_no=-2027 new_err_no=-2027
2023-04-13T16:36:28.724+0100 7fcc80e79700  1 == req done 
req=0x7fcd88c10720 op status=0 http_status=403 latency=0.00380s 
==
2023-04-13T16:36:28.724+0100 7fcc80e79700  1 beast: 0x7fcd88c10720: 
192.168.xx.xx - - [13/Apr/2023:16:36:28.720 +0100] "GET 
/admin/metadata/user?myself HTTP/1.1" 403 134 - 
"python-requests/2.25.1" - latency=0.00380s


(Note this does not have rgwadmin as the user, and is always the same 
URL)



Whereas the rgw logs from (1) which works show things like:

2023-04-13T15:44:19.396+ 7f8478da1700  1 == starting new 
request req=0x7f86284f5720 =
2023-04-13T15:44:19.412+ 7f8478da1700  1 == req done 
req=0x7f86284f5720 op status=0 http_status=200 latency=0.01660s 
==
2023-04-13T15:44:19.412+ 7f8478da1700  1 beast: 0x7f86284f5720: 
10.xx.xx.xx - rgwadmin [13/Apr/2023:15:44:19.396 +] "GET 
/admin/realm?list HTTP/1.1" 200 31 - "python-requests/2.25.1" - 
latency=0.01660s


(Note this has rgwadmin as the user, and various URLs)

The only thing I can see in the release notes that looks even vaguely 
related is https://github.com/ceph/ceph/pull/47547, but it doesn't 
seem likely.


I am really stumped on this, with no idea what has gone wrong on (2), 
and what the difference is between (1) and (2). I'm not going to touch 
(3) until I have resolved this.


Grateful for any help...

And thanks for all the good work.

Regards, Chris



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users 

[ceph-users] Re: Restrict user to an RBD image in a pool

2023-04-14 Thread Eugen Block

Hi,

this is a common question, you should be able to find plenty of  
examples, here's one [1].


Regards,
Eugen

[1] https://www.spinics.net/lists/ceph-users/msg76020.html


Zitat von Work Ceph :


Hello guys!

Is it possible to restrict user access to a single image in an RBD pool? I
know that I can use namespaces, so users can only see images with a given
namespaces. However, these users will still be able to create new RBD
images.

Is it possible to somehow block users from creating RBD images and only
work with the already existing ones?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm only scheduling, not orchestrating daemons

2023-04-14 Thread Eugen Block

Hi,
I would probably stop the upgrade to continue, this might be blocking  
cephadm. Then try again to redeploy a daemon, if it still fails check  
the cephadm.log(s) on the respective servers as well as the active mgr  
log.


Regards,
Eugen

Zitat von Thomas Widhalm :


Hi,

As you might know, I have a problem with MDS not starting. During  
the investigation with your help I found another issue that might be  
related.


I can plan to restart, redeploy, reconfigure services via cephadm or  
dashboard just as I want but services won't react. I only see the  
action to be scheduled but that's all.


2023-04-13T17:11:15.690698+ mgr.ceph04.qaexpv (mgr.74906907)  
37184 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph05.pqxmvt
2023-04-13T17:11:20.746743+ mgr.ceph04.qaexpv (mgr.74906907)  
37190 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph06.rrxmks
2023-04-13T17:11:24.971226+ mgr.ceph04.qaexpv (mgr.74906907)  
37195 : cephadm [INF] Schedule redeploy daemon mds.mds01.ceph07.omdisd



It's the same for other daemons/services. I changed placement rules,  
scheduled changes, failed mgr, even rebooted hosts. I even was  
desperate enough to delete files for services from hosts before  
rebooting hoping I could trigger a manual redeploy.


All I see are the same MDS stuck in "error" state. I removed them  
via "ceph orch rm" but they are still there. When I reissue the  
command it fails saying that the service isn't there.


"ceph orch ps" still lists them.

mds.mds01.ceph03.xqwdjy  ceph03   error2d  
ago 2M-- 
mds.mds01.ceph04.hcmvae  ceph04   error2d  
ago 2d-- 
mds.mds01.ceph05.pqxmvt  ceph05   error2d  
ago 10M-- 
mds.mds01.ceph06.rrxmks  ceph06   error2d  
ago 10w-- 
mds.mds01.ceph07.omdisd  ceph07   error2d  
ago 3M-- 


Any idea how I can get rid of them? Or redeploy them?

Additionally I'm just in the middle of an upgrade.

{
"target_image":  
"quay.io/ceph/ceph@sha256:1161e35e4e02cf377c93b913ce78773f8413f5a8d7c5eaee4b4773a4f9dd6635",

"in_progress": true,
"which": "Upgrading all daemon types on all hosts",
"services_complete": [
"crash",
"mgr",
"mon",
"osd"
],
"progress": "18/40 daemons upgraded",
"message": "Upgrade paused",
"is_paused": true
}


I paused it on purpose to allow for manipulation of daemons.

Cheers,
Thomas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-04-14 Thread Lokendra Rathour
Hi Team,
their is one additional observation.
Mount as the client is working fine from one of the Ceph nodes.
Command *: sudo mount -t ceph :/ /mnt/imgs  -o
name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwdfULnx6qX/VDA== *

*we are not passing the Monitor address, instead, DNS SRV is configured as
per:*
https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/

mount works fine in this case.



But if we try to mount from the other Location i.e from another
VM/client(non-Ceph Node)
we are getting the error :
  mount -t  ceph :/ /mnt/imgs  -o
name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA== -v
*mount: /mnt/image: mount point does not exist.*

the document says that if we do not pass the monitor address, it tries
discovering the monitor address from DNS Servers, but in actual it is not
happening.



On Tue, Apr 11, 2023 at 6:48 PM Lokendra Rathour 
wrote:

> Ceph version Quincy.
>
> But now I am able to resolve the issue.
>
> During mount i will not pass any monitor details, it will be
> auto-discovered via SRV.
>
> On Tue, Apr 11, 2023 at 6:09 PM Eugen Block  wrote:
>
>> What ceph version is this? Could it be this bug [1]? Although the
>> error message is different, not sure if it could be the same issue,
>> and I don't have anything to test ipv6 with.
>>
>> [1] https://tracker.ceph.com/issues/47300
>>
>> Zitat von Lokendra Rathour :
>>
>> > Hi All,
>> > Requesting any inputs around the issue raised.
>> >
>> > Best Regards,
>> > Lokendra
>> >
>> > On Tue, 24 Jan, 2023, 7:32 pm Lokendra Rathour, <
>> lokendrarath...@gmail.com>
>> > wrote:
>> >
>> >> Hi Team,
>> >>
>> >>
>> >>
>> >> We have a ceph cluster with 3 storage nodes:
>> >>
>> >> 1. storagenode1 - abcd:abcd:abcd::21
>> >>
>> >> 2. storagenode2 - abcd:abcd:abcd::22
>> >>
>> >> 3. storagenode3 - abcd:abcd:abcd::23
>> >>
>> >>
>> >>
>> >> The requirement is to mount ceph using the domain name of MON node:
>> >>
>> >> Note: we resolved the domain name via DNS server.
>> >>
>> >>
>> >> For this we are using the command:
>> >>
>> >> ```
>> >>
>> >> mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
>> >> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
>> >>
>> >> ```
>> >>
>> >>
>> >>
>> >> We are getting the following logs in /var/log/messages:
>> >>
>> >> ```
>> >>
>> >> Jan 24 17:23:17 localhost kernel: libceph: resolve '
>> >> storagenode.storage.com' (ret=-3): failed
>> >>
>> >> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
>> >> storagenode.storage.com:6789'
>> >>
>> >> ```
>> >>
>> >>
>> >>
>> >> We also tried mounting ceph storage using IP of MON which is working
>> fine.
>> >>
>> >>
>> >>
>> >> Query:
>> >>
>> >>
>> >> Could you please help us out with how we can mount ceph using FQDN.
>> >>
>> >>
>> >>
>> >> My /etc/ceph/ceph.conf is as follows:
>> >>
>> >> [global]
>> >>
>> >> ms bind ipv6 = true
>> >>
>> >> ms bind ipv4 = false
>> >>
>> >> mon initial members = storagenode1,storagenode2,storagenode3
>> >>
>> >> osd pool default crush rule = -1
>> >>
>> >> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
>> >>
>> >> mon host =
>> >>
>> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
>> >>
>> >> public network = abcd:abcd:abcd::/64
>> >>
>> >> cluster network = eff0:eff0:eff0::/64
>> >>
>> >>
>> >>
>> >> [osd]
>> >>
>> >> osd memory target = 4294967296
>> >>
>> >>
>> >>
>> >> [client.rgw.storagenode1.rgw0]
>> >>
>> >> host = storagenode1
>> >>
>> >> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
>> >>
>> >> log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
>> >>
>> >> rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
>> >>
>> >> rgw thread pool size = 512
>> >>
>> >> --
>> >> ~ Lokendra
>> >> skype: lokendrarathour
>> >>
>> >>
>> >>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> ~ Lokendra
> skype: lokendrarathour
>
>
>

-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Restrict user to an RBD image in a pool

2023-04-14 Thread Work Ceph
Hello guys!

Is it possible to restrict user access to a single image in an RBD pool? I
know that I can use namespaces, so users can only see images with a given
namespaces. However, these users will still be able to create new RBD
images.

Is it possible to somehow block users from creating RBD images and only
work with the already existing ones?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph pg stuck - missing on 1 osd how to proceed

2023-04-14 Thread Eugen Block

Hi,
your cluster is in backfilling state, maybe just wait for the backfill  
to finish? What is 'ceph -s' reporting? The PG could be backfilling to  
a different OSD as well. You could query the PG to see more details  
('ceph pg 8.2a6 query').
By the way, the PGs you show are huge (around 174 GB with more than  
200k objects), depending on the disks you use a split could help gain  
more performance (if that is an issue for you).


Regards,
Eugen

Zitat von xadhoo...@gmail.com:


Hi to all
Using ceph 17.2.5 i have 3 pgs in stuck state

ceph pg map 8.2a6
osdmap e32862 pg 8.2a6 (8.2a6) -> up [88,100,59] acting [59,100]
looking at it ho 88 ,100 and 59 i got that


ceph pg ls-by-osd osd.100 | grep 8.2a6
8.2a6   211004209089  00  174797925205
 0   0   7075   
active+undersized+degraded+remapped+backfilling21m
32862'1540291   32862:3387785   [88,100,59]p88  [59,100]p59   
2023-03-12T08:08:00.903727+  2023-03-12T08:08:00.903727+  
6839  queued for deep scrub


ceph pg ls-by-osd osd.59 | grep 8.2a6
8.2a6   211005209084  00  174798941087
 0   0   7076   
active+undersized+degraded+remapped+backfilling  22m
32862'1540292   32862:3387798   [88,100,59]p88  [59,100]p59   
2023-03-12T08:08:00.903727+  2023-03-12T08:08:00.903727+  
6839  queued for deep scrub


BUT
ceph pg ls-by-osd osd.88 | grep 8.2a6 ---> NONE

it is missing  how to proceed ?
Best regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io