[ceph-users] Can I delete rgw log entries?

2023-04-16 Thread Richard Bade
Hi Everyone,
I've been having trouble finding an answer to this question. Basically
I'm wanting to know if stuff in the .log pool is actively used for
anything or if it's just logs that can be deleted.
In particular I was wondering about sync logs.
In my particular situation I have had some tests of zone sync setup,
but now I've removed the secondary zone and pools. My primary zone is
filled with thousands of logs like this:
data_log.71
data.full-sync.index.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.47
meta.full-sync.index.7
datalog.sync-status.shard.e2cf2c3e-7870-4fc4-8ab9-d78a17263b4f.13
bucket.sync-status.f3113d30-ecd3-4873-8537-aa006e54b884:{bucketname}:default.623958784.455

I assume that because I'm not doing any sync anymore I can delete all
the sync related logs? Is anyone able to confirm this?
What about if the sync is running? Are these being written and read
from and therefore must be left alone?
It seems like these are more of a status than just a log and that
deleting them might confuse the sync process. If so, does that mean
that the log pool is not just output that can be removed as needed?
Are there perhaps other things in there that need to stay?

Regards,
Richard
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-04-16 Thread Lokendra Rathour
Hi Team,
The mount at the client side should be independent of Ceph, but here in
this case of DNS SRV-based mount, we see that the Ceph common utility is
needed.
What can be the reason for the same, any inputs in this direction would be
helpful.

Best Regards,
Lokendra


On Sun, Apr 16, 2023 at 10:11 AM Lokendra Rathour 
wrote:

> Hi .
> Any input will be of great help.
> Thanks once again.
> Lokendra
>
> On Fri, 14 Apr, 2023, 3:47 pm Lokendra Rathour, 
> wrote:
>
>> Hi Team,
>> their is one additional observation.
>> Mount as the client is working fine from one of the Ceph nodes.
>> Command *: sudo mount -t ceph :/ /mnt/imgs  -o
>> name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwdfULnx6qX/VDA== *
>>
>> *we are not passing the Monitor address, instead, DNS SRV is configured
>> as per:*
>> https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/
>>
>> mount works fine in this case.
>>
>> 
>>
>> But if we try to mount from the other Location i.e from another
>> VM/client(non-Ceph Node)
>> we are getting the error :
>>   mount -t  ceph :/ /mnt/imgs  -o
>> name=foo,secret=AQABDzRkTaJCEhAAC7rC6E68ofwULnx6qX/VDA== -v
>> *mount: /mnt/image: mount point does not exist.*
>>
>> the document says that if we do not pass the monitor address, it tries
>> discovering the monitor address from DNS Servers, but in actual it is not
>> happening.
>>
>>
>>
>> On Tue, Apr 11, 2023 at 6:48 PM Lokendra Rathour <
>> lokendrarath...@gmail.com> wrote:
>>
>>> Ceph version Quincy.
>>>
>>> But now I am able to resolve the issue.
>>>
>>> During mount i will not pass any monitor details, it will be
>>> auto-discovered via SRV.
>>>
>>> On Tue, Apr 11, 2023 at 6:09 PM Eugen Block  wrote:
>>>
 What ceph version is this? Could it be this bug [1]? Although the
 error message is different, not sure if it could be the same issue,
 and I don't have anything to test ipv6 with.

 [1] https://tracker.ceph.com/issues/47300

 Zitat von Lokendra Rathour :

 > Hi All,
 > Requesting any inputs around the issue raised.
 >
 > Best Regards,
 > Lokendra
 >
 > On Tue, 24 Jan, 2023, 7:32 pm Lokendra Rathour, <
 lokendrarath...@gmail.com>
 > wrote:
 >
 >> Hi Team,
 >>
 >>
 >>
 >> We have a ceph cluster with 3 storage nodes:
 >>
 >> 1. storagenode1 - abcd:abcd:abcd::21
 >>
 >> 2. storagenode2 - abcd:abcd:abcd::22
 >>
 >> 3. storagenode3 - abcd:abcd:abcd::23
 >>
 >>
 >>
 >> The requirement is to mount ceph using the domain name of MON node:
 >>
 >> Note: we resolved the domain name via DNS server.
 >>
 >>
 >> For this we are using the command:
 >>
 >> ```
 >>
 >> mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
 >> name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
 >>
 >> ```
 >>
 >>
 >>
 >> We are getting the following logs in /var/log/messages:
 >>
 >> ```
 >>
 >> Jan 24 17:23:17 localhost kernel: libceph: resolve '
 >> storagenode.storage.com' (ret=-3): failed
 >>
 >> Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
 >> storagenode.storage.com:6789'
 >>
 >> ```
 >>
 >>
 >>
 >> We also tried mounting ceph storage using IP of MON which is working
 fine.
 >>
 >>
 >>
 >> Query:
 >>
 >>
 >> Could you please help us out with how we can mount ceph using FQDN.
 >>
 >>
 >>
 >> My /etc/ceph/ceph.conf is as follows:
 >>
 >> [global]
 >>
 >> ms bind ipv6 = true
 >>
 >> ms bind ipv4 = false
 >>
 >> mon initial members = storagenode1,storagenode2,storagenode3
 >>
 >> osd pool default crush rule = -1
 >>
 >> fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
 >>
 >> mon host =
 >>
 [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:abcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1:[abcd:abcd:abcd::23]:6789]
 >>
 >> public network = abcd:abcd:abcd::/64
 >>
 >> cluster network = eff0:eff0:eff0::/64
 >>
 >>
 >>
 >> [osd]
 >>
 >> osd memory target = 4294967296
 >>
 >>
 >>
 >> [client.rgw.storagenode1.rgw0]
 >>
 >> host = storagenode1
 >>
 >> keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
 >>
 >> log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
 >>
 >> rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
 >>
 >> rgw thread pool size = 512
 >>
 >> --
 >> ~ Lokendra
 >> skype: lokendrarathour
 >>
 >>
 >>
 > ___
 > ceph-users mailing list -- ceph-users@ceph.io
 > To unsubscribe send an email to ceph-users-le...@ceph.io

 ___
 ceph-users mailing list -- ceph-users@ceph.io
 

[ceph-users] Ceph mon status stuck at "probing"

2023-04-16 Thread York Huang
Hi Ceph community,

I had a small lab cluster of version Nautilus (ceph-ansible / containerized) 
which  functions quite well. As part of the upgrade experiment, I replaced one 
of the mon with an Octopus one (containerized as well, the OS is purged before 
mon deployment), the daemon seems working but its status is always "probing", 
and "ceph -s" showed that mon is down and out of quorum. I checked and compared 
the output of "ceph daemon XXX mon_status" of all monitors, ensuring there is 
no network issue, the new mon seemed to have the correct monmap but always 
stucked in status "probing".

I doubt if Ceph mon of different versions do co-exist. Any further steps 
suggest? Thanks in advanced.


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] unable to deploy ceph -- failed to read label for XXX No such file or directory

2023-04-16 Thread Radoslav Bodó

hello,

during basic experimentation I'm running into wierd situaltion when 
adding osd to test cluster. The test cluster is created as 3x XEN DomU 
Debian Bookworm (test1-3), 4x CPU, 8GB RAM, xvda root, xvbd swap, 4x 
xvdj,k,l,m 20GB (LVM volumes in Dom0, propagated via xen phy device) and 
cleaned with `wipefs -a`


```
apt-get install cephadm ceph-common
cephadm bootstrap --mon-ip 10.0.0.101
ceph orch host add test2
ceph orch host add test3
```

when adding OSDs the first host gets created OSDs as expected, but 
during creating OSDs on second host the output gets wierd, even when 
adding each device separately the output shows that `ceph orch` tries to 
create multiple osds at once


```
root@test1:~# for xxx in j k l m; do ceph orch daemon add osd 
test2:/dev/xvd$xxx; done

Created osd(s) 0,1,2,3 on host 'test2'
Created osd(s) 0,1 on host 'test2'
Created osd(s) 2,3 on host 'test2'
Created osd(s) 1 on host 'test2'
```

the syslog on test2 node shows an errors


```
2023-04-16T20:57:02.528456+00:00 test2 bash[10426]: cephadm 
2023-04-16T20:57:01.389951+ mgr.test1.ucudzp (mgr.14206) 1691 : 
cephadm [INF] Found duplicate OSDs: osd.0 in status running on test1, 
osd.0 in status error on test2


2023-04-16T20:57:02.528748+00:00 test2 bash[10426]: cephadm
2023-04-16T20:57:01.391346+ mgr.test1.ucudzp (mgr.14206) 1692 : 
cephadm [INF] Removing daemon osd.0 from test2 -- ports []

2023-04-16T20:57:02.528943+00:00 test2 bash[10426]: cluster
2023-04-16T20:57:02.350564+ mon.test1 (mon.0) 743 : cluster [WRN] 
Health check failed: 2 failed cephadm daemon(s) (CEPHADM_FAILED_DAEMON)


2023-04-16T20:57:17.972962+00:00 test2 bash[20098]:  stderr: failed to 
read label for 
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: 
(2) No such file or directory
2023-04-16T20:57:17.973064+00:00 test2 bash[20098]:  stderr: 
2023-04-16T20:57:17.962+ 7fad2451c540 -1 
bluestore(/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d) 
_read_bdev_label failed to open 
/dev/ceph-48f3646c-7070-4a37-b9a4-ed0a4a983965/osd-block-11a0dc2b-f8e1-4694-813f-2309ab6a5c1d: 
(2) No such file or directory
2023-04-16T20:57:17.973181+00:00 test2 bash[20098]: --> Failed to 
activate via lvm: command returned non-zero exit status: 1
2023-04-16T20:57:17.973278+00:00 test2 bash[20098]: --> Failed to 
activate via simple: 'Namespace' object has no attribute 'json_config'
2023-04-16T20:57:17.973368+00:00 test2 bash[20098]: --> Failed to 
activate any OSD(s)

```

the ceph and cephadm binaries are installed from debian bookworm

```
ii  ceph-common16.2.11+ds-2 amd64common utilities to mount 
and interact with a ceph storage cluster
ii  cephadm16.2.11+ds-2 amd64utility to bootstrap ceph 
daemons with systemd and containers

```

management session script can be found at https://pastebin.com/raw/FiX7DMHS


none of the googled symptoms helped me to understand why is this 
situation happening nor how to troubleshoot or debug the issues. I'd 
understand that the nodes are very log on RAM to get this experiment 
running, but the behavior does not really look like OOM issue.


any idea would be appreciated

thanks
bodik
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Dead node (watcher) won't timeout on RBD

2023-04-16 Thread Ilya Dryomov
On Sat, Apr 15, 2023 at 4:58 PM Max Boone  wrote:
>
>
> After a critical node failure on my lab cluster, which won't come
> back up and is still down, the RBD objects are still being watched
> / mounted according to ceph. I can't shell to the node to rbd unbind
> them as the node is down. I am absolutely certain that nothing is
> using these images and they don't have snapshots either (and this IP
> is not even remotely close to the those of the monitors in the
> cluster). I blocked the IP usingceph osd blocklist add but after 30
> minutes, they are still being watched. Them being watched (they are
> RWO ceph-csi volumes) prevents me from re-using them in the cluster.
> As far as I'm aware, ceph should remove the watchers after 30 minutes
> and they've been blocklisted for hours now.

Hi Max,

A couple of general points:

- watch timeout is 30 seconds, not 30 minutes
- watcher IP doesn't have to match that of any of the monitors

> root@node0:~# rbd status 
> kubernetes/csi-vol-e6a07ccd-93f6-4c47-a948-201501440fff
> Watchers:
> watcher=10.0.0.103:0/992994811 client.1634081 cookie=139772597209280
> root@node0:~# rbd snap list 
> kubernetes/csi-vol-e6a07ccd-93f6-4c47-a948-201501440fff
> root@node0:~# rbd info kubernetes/csi-vol-e6a07ccd-93f6-4c47-a948-201501440fff
> rbd image 'csi-vol-e6a07ccd-93f6-4c47-a948-201501440fff':
> size 10 GiB in 2560 objects
> order 22 (4 MiB objects)
> snapshot_count: 0
> id: 4ff5353b865e1
> block_name_prefix: rbd_data.4ff5353b865e1
> format: 2
> features: layering
> op_features:
> flags:
> create_timestamp: Fri Mar 31 14:46:51 2023
> access_timestamp: Fri Mar 31 14:46:51 2023
> modify_timestamp: Fri Mar 31 14:46:51 2023
> root@node0:~# rados -p kubernetes listwatchers rbd_header.4ff5353b865e1
> watcher=10.0.0.103:0/992994811 client.1634081 cookie=139772597209280
> root@node0:~# ceph osd blocklist ls
> 10.0.0.103:0/0 2023-04-16T13:58:34.854232+0200
> listed 1 entries
> root@node0:~# ceph daemon osd.0 config get osd_client_watch_timeout
> {
> "osd_client_watch_timeout": "30"
> }
>
> Is it possible to kick a watcher out manually, or is there not much
> I can do here besides shutting down the entire cluster (or OSDs) and
> getting them back up? If it is a bug, I'm happy to help figuring out
> it's root cause and see if I can help writing a fix. Cheers, Max.

You may have hit https://tracker.ceph.com/issues/58120.

Try restarting the OSD that is holding the header object.  To determine
the OSD, run "ceph osd map kubernetes rbd_header.4ff5353b865e1".  The
output should end with something like "acting ([X, Y, Z], pX)", where X,
Y and Z are numbers.  X is the OSD you want to restart.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs remain not in after update to v17

2023-04-16 Thread Konstantin Shalygin
Hi,

This PR for the main branch and was never backpoted to another branches, 
currently


k
Sent from my iPhone

> On 15 Apr 2023, at 21:00, Alexandre Becholey  wrote:
> 
> Hi,
> 
> Thank you for your answer, yes this seems to be exactly my issue. The pull 
> request related to the issue is this one: 
> https://github.com/ceph/ceph/pull/49199 and it is not (yet?) merged into the 
> Quincy release. Hopefully this will happen before the next major release, 
> because I cannot run any `ceph orch` command as they hang.
> 
> Kind regards,
> Alexandre
> 
> 
> --- Original Message ---
>> On Saturday, April 15th, 2023 at 6:26 PM, Ramin Najjarbashi 
>>  wrote:
>> 
>> 
>> Hi
>> I think the issue you are experiencing may be related to a bug that has been 
>> reported in the Ceph project. Specifically, the issue is documented in 
>> https://tracker.ceph.com/issues/58156, and a pull request has been submitted 
>> and merged in https://github.com/ceph/ceph/pull/44090.
>> 
>>> On Fri, Apr 14, 2023 at 8:17 PM Alexandre Becholey  wrote:
>>> 
>>> Dear Ceph Users,
>>> 
>>> I have a small ceph cluster for VMs on my local machine. It used to be 
>>> installed with the system packages and I migrated it to docker following 
>>> the documentation. It worked OK until I migrated from v16 to v17 a few 
>>> months ago. Now the OSDs remain "not in" as shown in the status:
>>> 
>>> # ceph -s
>>> cluster:
>>> id: abef2e91-cd07-4359-b457-f0f8dc753dfa
>>> health: HEALTH_WARN
>>> 6 stray daemon(s) not managed by cephadm
>>> 1 stray host(s) with 6 daemon(s) not managed by cephadm
>>> 2 devices (4 osds) down
>>> 4 osds down
>>> 1 host (4 osds) down
>>> 1 root (4 osds) down
>>> Reduced data availability: 129 pgs inactive
>>> 
>>> services:
>>> mon: 1 daemons, quorum bjorn (age 8m)
>>> mgr: bjorn(active, since 8m)
>>> osd: 4 osds: 0 up (since 4w), 4 in (since 4w)
>>> 
>>> data:
>>> pools: 2 pools, 129 pgs
>>> objects: 0 objects, 0 B
>>> usage: 1.8 TiB used, 1.8 TiB / 3.6 TiB avail
>>> pgs: 100.000% pgs unknown
>>> 129 unknown
>>> 
>>> I can see some network communication between the OSDs and the monitor and 
>>> the OSDs are running:
>>> 
>>> # docker ps -a
>>> CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
>>> f8fbe8177a63 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-2
>>> 6768ec871404 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-1
>>> ff82f84504d5 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-0
>>> 4c89e50ce974 quay.io/ceph/ceph:v17 "/usr/bin/ceph-osd -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-osd-3
>>> fe0b6089edda quay.io/ceph/ceph:v17 "/usr/bin/ceph-mon -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mon-bjorn
>>> f76ac9dcdd6d quay.io/ceph/ceph:v17 "/usr/bin/ceph-mgr -…" 9 minutes ago Up 
>>> 9 minutes ceph-abef2e91-cd07-4359-b457-f0f8dc753dfa-mgr-bjorn
>>> 
>>> However when I try to use any `ceph orch` commands, they hang. I can also 
>>> see some blacklist on the OSDs:
>>> 
>>> # ceph osd blocklist ls
>>> 10.99.0.13:6833/3770763474 2023-04-13T08:17:38.885128+
>>> 10.99.0.13:6832/3770763474 2023-04-13T08:17:38.885128+
>>> 10.99.0.13:0/2634718754 2023-04-13T08:17:38.885128+
>>> 10.99.0.13:0/1103315748 2023-04-13T08:17:38.885128+
>>> listed 4 entries
>>> 
>>> The first two entries correspond to the manager process. `ceph osd 
>>> blocked-by` does not show anything.
>>> 
>>> I think I might have forgotten to set the `ceph osd require-osd-release 
>>> ...` because 14 is written in 
>>> `/var/lib/ceph//osd.?/require_osd_release`. If I try to do it now, the 
>>> monitor hits an abort:
>>> 
>>> debug 0> 2023-04-12T08:43:27.788+ 7f0fcf2aa700 -1 *** Caught signal 
>>> (Aborted) **
>>> in thread 7f0fcf2aa700 thread_name:ms_dispatch
>>> ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
>>> (stable)
>>> 1: /lib64/libpthread.so.0(+0x12cf0) [0x7f0fd94bbcf0]
>>> 2: gsignal()
>>> 3: abort()
>>> 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char 
>>> const*)+0x18f) [0x7f0fdb5124e3]
>>> 5: /usr/lib64/ceph/libceph-common.so.2(+0x26a64f) [0x7f0fdb51264f]
>>> 6: (OSDMonitor::prepare_command_impl(boost::intrusive_ptr, 
>>> std::map, 
>>> std::allocator >, boost::variant>> 7: (OSDMonitor::prepare_command(boost::intrusive_ptr)+0x38d) 
>>> [0x562719cb127d]
>>> 8: (OSDMonitor::prepare_update(boost::intrusive_ptr)+0x17b) 
>>> [0x562719cb18cb]
>>> 9: (PaxosService::dispatch(boost::intrusive_ptr)+0x2ce) 
>>> [0x562719c20ade]
>>> 10: (Monitor::handle_command(boost::intrusive_ptr)+0x1ebb) 
>>> [0x562719ab9f6b]
>>> 11: (Monitor::dispatch_op(boost::intrusive_ptr)+0x9f2) 
>>> [0x562719abe152]
>>> 12: (Monitor::_ms_dispatch(Message*)+0x406) [0x562719abf066]
>>> 13: