[ceph-users] Re: deploying Ceph using FQDN for MON / MDS Services

2023-02-02 Thread Lokendra Rathour
Hi Robert and Team,



Thank you for the help. We had previously referred to the link:
https://docs.ceph.com/en/quincy/rados/configuration/mon-lookup-dns/
But we were not able to configure mon_dns_srv_name correctly.



We find the following link:
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/configuration_guide/ceph-monitor-configuration



Which gives just a little more information about the DNS lookup.



After following the link, we updated the ceph.conf as follows:
```
[root@storagenode3 ~]# cat /etc/ceph/ceph.conf
[global]
ms bind ipv6 = true
ms bind ipv4 = false
mon initial members = storagenode1,storagenode2,storagenode3
osd pool default crush rule = -1
mon dns srv name = ceph-mon
fsid = ce479912-a277-45b6-87b1-203d3e43d776
public network = abcd:abcd:abcd::/64
cluster network = eff0:eff0:eff0::/64



[osd]
osd memory target = 4294967296



[client.rgw.storagenode3.rgw0]
host = storagenode3
keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode3.rgw0/keyring
log file = /var/log/ceph/ceph-rgw-storagenode3.rgw0.log
rgw frontends = beast endpoint=[abcd:abcd:abcd::23]:8080
rgw thread pool size = 512



[root@storagenode3 ~]#
```

 We also updated the dns server as follows:

```
storagenode1.storage.com  IN    abcd:abcd:abcd::21
storagenode2.storage.com  IN    abcd:abcd:abcd::22
storagenode3.storage.com  IN    abcd:abcd:abcd::23



_ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode1.storage.com
_ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode2.storage.com
_ceph-mon._tcp.storage.com 60 IN SRV 10 60 6789 storagenode3.storage.com
_ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode1.storage.com
_ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode2.storage.com
_ceph-mon._tcp.storage.com 60 IN SRV 10 60 3300 storagenode3.storage.com


```

But when we run the command ceph -s, we get the following error:

```
[root@storagenode3 ~]# ceph -s
unable to get monitor info from DNS SRV with service name: ceph-mon
2023-02-02T15:18:14.098+0530 7f1313a58700 -1 failed for service
_ceph-mon._tcp
2023-02-02T15:18:14.098+0530 7f1313a58700 -1 monclient:
get_monmap_and_config cannot identify monitors to contact
[errno 2] RADOS object not found (error connecting to the cluster)
[root@storagenode3 ~]#
```

 Could you please help us to configure the server using mon_dns_srv_name
correctly?



On Wed, Jan 25, 2023 at 9:06 PM John Mulligan 
wrote:

> On Tuesday, January 24, 2023 9:02:41 AM EST Lokendra Rathour wrote:
> > Hi Team,
> >
> >
> >
> > We have a ceph cluster with 3 storage nodes:
> >
> > 1. storagenode1 - abcd:abcd:abcd::21
> >
> > 2. storagenode2 - abcd:abcd:abcd::22
> >
> > 3. storagenode3 - abcd:abcd:abcd::23
> >
> >
> >
> > The requirement is to mount ceph using the domain name of MON node:
> >
> > Note: we resolved the domain name via DNS server.
> >
> >
> > For this we are using the command:
> >
> > ```
> >
> > mount -t ceph [storagenode.storage.com]:6789:/  /backup -o
> > name=admin,secret=AQCM+8hjqzuZEhAAcuQc+onNKReq7MV+ykFirg==
> >
> > ```
> >
> >
> >
> > We are getting the following logs in /var/log/messages:
> >
> > ```
> >
> > Jan 24 17:23:17 localhost kernel: libceph: resolve '
> storagenode.storage.com'
> > (ret=-3): failed
> >
> > Jan 24 17:23:17 localhost kernel: libceph: parse_ips bad ip '
> > storagenode.storage.com:6789'
> >
> > ```
> >
>
>
> I saw a similar log message recently when I had forgotten to install the
> ceph
> mount helper.
> Check to see if you have a binary 'mount.ceph' on the system. If you don't
> try
> to install it from packages. On fedora I needed to install 'ceph-common'.
>
>
> >
> >
> > We also tried mounting ceph storage using IP of MON which is working
> fine.
> >
> >
> >
> > Query:
> >
> >
> > Could you please help us out with how we can mount ceph using FQDN.
> >
> >
> >
> > My /etc/ceph/ceph.conf is as follows:
> >
> > [global]
> >
> > ms bind ipv6 = true
> >
> > ms bind ipv4 = false
> >
> > mon initial members = storagenode1,storagenode2,storagenode3
> >
> > osd pool default crush rule = -1
> >
> > fsid = 7969b8a3-1df7-4eae-8ccf-2e5794de87fe
> >
> > mon host =
> >
> [v2:[abcd:abcd:abcd::21]:3300,v1:[abcd:abcd:abcd::21]:6789],[v2:[abcd:abcd:a
> >
> bcd::22]:3300,v1:[abcd:abcd:abcd::22]:6789],[v2:[abcd:abcd:abcd::23]:3300,v1
> > :[abcd:abcd:abcd::23]:6789]
> >
> > public network = abcd:abcd:abcd::/64
> >
> > cluster network = eff0:eff0:eff0::/64
> >
> >
> >
> > [osd]
> >
> > osd memory target = 4294967296
> >
> >
> >
> > [client.rgw.storagenode1.rgw0]
> >
> > host = storagenode1
> >
> > keyring = /var/lib/ceph/radosgw/ceph-rgw.storagenode1.rgw0/keyring
> >
> > log file = /var/log/ceph/ceph-rgw-storagenode1.rgw0.log
> >
> > rgw frontends = beast endpoint=[abcd:abcd:abcd::21]:8080
> >
> > rgw thread pool size = 512
>
>
>
>
>

-- 
~ Lokendra
skype: lokendrarathour
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph

[ceph-users] Re: Telemetry service is temporarily down

2023-02-02 Thread Yaarit Hatuka
Hi everyone,

Our telemetry endpoints are temporarily unavailable due to network issues.
We apologize for the inconvenience. We will update once the service is
restored.

Yaarit


On Fri, Jan 13, 2023 at 12:05 PM Yaarit Hatuka  wrote:

> Hi everyone,
>
> Our telemetry service is up and running again.
> Thanks Adam Kraitman and Dan Mick for restoring the service.
>
> We thank you for your patience and appreciate your contribution to the
> project!
>
> Thanks,
> Yaarit
>
> On Tue, Jan 3, 2023 at 3:14 PM Yaarit Hatuka  wrote:
>
>> Hi everyone,
>>
>> We are having some infrastructure issues with our telemetry backend, and
>> we are working on fixing it.
>> Thanks Jan Horacek for opening this issue
>>  [1]. We will update once the
>> service is back up.
>> We are sorry for any inconvenience you may be experiencing, and
>> appreciate your patience.
>>
>> Thanks,
>> Yaarit
>>
>> [1] https://tracker.ceph.com/issues/58371
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Re: Ceph Upgrade path

2023-02-02 Thread Beaman, Joshua (Contractor)
We had a similar challenge getting from Nautilus (ceph-deploy) / Xenial (ubuntu 
16.04) to Pacific (cephadm) / Focal (ubuntu 20.04), as the pacific packages 
were not available for Xenial and Nautilus was not available for Focal.  Our 
method was to upgrade and cephadm adopt mons/mgrs/rgws all before making any 
changes to OSD nodes as follows:


  1.  Rebuild (one at a time) all 5 mon/mgr/rgw hosts as Bionic (ubuntu 18.04) 
and re-add to cluster as Nautilus with ceph-deploy
  2.  Upgrade mon/mgr/rgw services from Nautilus to Pacific with apt (standard 
package manager) method
  3.  Follow cephadm adopt procedures for mon/mgr/rgw hosts - 
https://docs.ceph.com/en/pacific/cephadm/adoption/
  4.  Rebuild (one at a time) all 5 mon/mgr/rgw hosts as Focal and re-add to 
cluster as Pacific with orchestrator / cephadm
  5.  Drain and then rebuild all OSD hosts 1-4 at a time as Focal and re-add to 
cluster as Pacific
 *   
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/#migration-process
 except using orchestrator / cephadm to build the “$NEWHOST”

Thank you,
Josh Beaman

From: Fox, Kevin M 
Date: Wednesday, February 1, 2023 at 11:11 AM
To: Iztok Gregori , ceph-users@ceph.io 

Subject: [EXTERNAL] [ceph-users] Re: Ceph Upgrade path
We successfully did ceph-deploy+octopus+centos7 -> (ceph-deploy 
unsupported)+octopus+centos8stream (using leap) -> (ceph-deploy 
unsupported)+pacific+centos8stream  -> cephadm+pacific+centos8stream

Everything in place. Leap was tested repeatedly till the procedure/sideeffects 
were very well known.

We also did s/centos8stream/rocky8/ successfully.

Thanks,
Kevin


From: Iztok Gregori 
Sent: Wednesday, February 1, 2023 3:51 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Ceph Upgrade path

Check twice before you click! This email originated from outside PNNL.


Hi to all!

We are running a Ceph cluster (Octopus) on (99%) CentOS 7 (deployed at
the time with ceph-deploy) and we would like to upgrade it. As far as I
know for Pacific (and later releases) there aren't packages for CentOS 7
distribution (at least not on download.ceph.com), so we need to upgrade
(change) not only Ceph but also the distribution.

What is the raccomended path to do so?

We could upgrade (reinstall) all the nodes to Rocky 8 and then upgrade
Ceph to Quincy, but we will "stuck" with "not the latest" distribution
and probably we will have to upgrade (reinstall) again in the near future.

Our second idea is to leverage cephadm (which we would like to
implement) and switch from rpms to containers, but I don't have a clear
vision of how to do it. I was thinking to:

1. install a new monitor/manager with Rocky 9.
2. prepare the node for cephadm.
3. start the manager/monitor containers on that node.
4. repeat for the other monitors.
5. repeat for the OSD servers.

I'm not sure how to execute the point 2 and 3. The documentation says
how to bootstrap a NEW cluster and how to ADOPT an existing one, but our
situation is a hybrid (or in my mind it is...).

I cannot also adopt my current cluster to cephadm because we have 30% of
our OSD still on filestore. My intention was to drain them, reinstall
them and then adopt them. But I would like to avoid (if not necessary)
multiple reinstallations. In my mind all the OSD servers will be drained
before been reinstalled, just to be sure to have a "fresh" start).

Have you any ideas and/or advice to give us?


Thanks a lot!
Iztok

P.S. I saw that the script cephadm doesn't support Rocky, I can modify
it to do so and it should work, but is there a plan to officially
support it?



--
Iztok Gregori
ICT Systems and Services
Elettra - Sincrotrone Trieste S.C.p.A.
Telephone: +39 040 3758948
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.elettra.eu%2F&data=05%7C01%7Ckevin.fox%40pnnl.gov%7Cd68621f5c9db4ff0375808db044b7927%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638108494454332936%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=GF36e%2FEPgroA%2FE9x1LYCjk42%2BLOH15yAAxc%2BRoqf%2B7g%3D&reserved=0
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 'ceph orch upgrade...' causes an rbd outage on a proxmox cluster

2023-02-02 Thread Pierre BELLEMAIN
Hi everyone, 
(sorry for the spam, apparently I was not subscribed to the ml) 

I have a ceph test cluster and a proxmox test cluster (for try upgrade in test 
before the prod). 
My ceph cluster is made up of three servers running debian 11, with two 
separate networks (cluster_network and public_network, in VLANs). 
In ceph version 16.2.10 (cephadm with docker). 
Each server has one MGR, one MON and 8 OSDs. 
cluster: 
id: xxx 
health: HEALTH_OK 

services: 
mon: 3 daemons, quorum ceph01,ceph03,ceph02 (age 2h) 
mgr: ceph03(active, since 77m), standbys: ceph01, ceph02 
osd: 24 osds: 24 up (since 7w), 24 in (since 6M) 

data: 
pools: 3 pools, 65 pgs 
objects: 29.13k objects, 113 GiB 
usage: 344 GiB used, 52 TiB / 52 TiB avail 
pgs: 65 active+clean 

io: 
client: 1.3 KiB/s wr, 0 op/s rd, 0 op/s wr 

The proxmox cluster is also made up of 3 servers running proxmox 7.2-7 (with 
proxmox ceph pacific which is on 16.2.9 version). The ceph storage used is RBD 
(on the ceph public_network). I added the RBD datastores simply via the GUI. 

So far so good. I have several VMs, on each of the proxmox. 

When I update ceph to 16.2.11, that's where things go wrong. 
I don't like when the update does everything for me without control, so I did a 
"staggered upgrade", following the official procedure 
(https://docs.ceph.com/en/pacific/cephadm/upgrade/#staggered-upgrade). As the 
version I'm starting from doesn't support staggered upgrade, I follow the 
procedure 
(https://docs.ceph.com/en/pacific/cephadm/upgrade/#upgrading-to-a-version-that-supports-staggered-upgrade-from-one-that-doesn-t).
 
When I do the "ceph orch redeploy" of the two standby MGRs, everything is fine. 
I do the "sudo ceph mgr fail", everything is fine (it switches well to an mgr 
which was standby, so I get an MGR 16.2.11). 
However, when I do the "sudo ceph orch upgrade start --image 
quay.io/ceph/ceph:v16.2.11 --daemon-types mgr", it updates me the last MGR 
which was not updated (so far everything is still fine), but it does a last 
restart of all the MGRs to finish, and there, the proxmox visibly loses the RBD 
and turns off all my VMs. 
Here is the message in the proxmox syslog: 
Feb 2 16:20:52 pmox01 QEMU[436706]: terminate called after throwing an instance 
of 'std::system_error' 
Feb 2 16:20:52 pmox01 QEMU[436706]: what(): Resource deadlock avoided 
Feb 2 16:20:52 pmox01 kernel: [17038607.686686] vmbr0: port 2(tap102i0) entered 
disabled state 
Feb 2 16:20:52 pmox01 kernel: [17038607.779049] vmbr0: port 2(tap102i0) entered 
disabled state 
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Succeeded. 
Feb 2 16:20:52 pmox01 systemd[1]: 102.scope: Consumed 43.136s CPU time. 
Feb 2 16:20:53 pmox01 qmeventd[446872]: Starting cleanup for 102 
Feb 2 16:20:53 pmox01 qmeventd[446872]: Finished cleanup for 102 

For ceph, everything is fine, it does the update, and tells me everything is OK 
in the end. 
Ceph is now on 16.2.11 and the health is OK. 

When I redo a downgrade of the MGRs, I have the problem again and when I start 
the procedure again, I still have the problem. It's very reproducible. 
According to my tests, the "sudo ceph orch upgrade" command always gives me 
trouble, even when trying a real staggered upgrade from and to version 16.2.11 
with the command: 
sudo ceph orch upgrade start --image quay.io/ceph/ceph:v16.2.11 --daemon-types 
mgr --hosts ceph01 --limit 1 

Does anyone have an idea? 

Thank you everyone ! 
Pierre. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inconsistency in rados ls

2023-02-02 Thread Eugen Block

Hi,


I have a cluster with approximately one billion objects and when I run a PG
query, it shows that I have 27,000 objects per PG.


which query is that, can you provide more details about that cluster and pool?

However, when I run the same command per pg, the results are much  
less, with only 20 million

objects being reported. For example, "rados -p  --pgid 1.xx ls | wc
-l" shows only three objects in the specified PG.


It seems like your PG and object distribution might not be balanced  
very well. Did you check each PG of that pool? The individual PG's  
numbers should add up to the total number of objects. Here's a quick  
example from an almost empty pool with 8 PGs:


storage01:~ # ceph pg ls-by-pool volumes | awk '{print $1" - "$2}'
PG - OBJECTS
3.0 - 2
3.1 - 0
3.2 - 2
3.3 - 1
3.4 - 3
3.5 - 0
3.6 - 2
3.7 - 0

and it sums up to 10 objects which matches the total stats:

storage01:~ # ceph df | grep -E "OBJECTS|volumes"
POOL   ID  PGS   STORED  OBJECTS USED  %USED   
MAX AVAIL

volumes38379 B   10  960 KiB  07.5 GiB

Regards,
Eugen

Zitat von Ramin Najjarbashi :


Hi
I hope this email finds you well. I am reaching out to you because I have
encountered an issue with my CEPH Bluestore cluster and I am seeking your
assistance.
I have a cluster with approximately one billion objects and when I run a PG
query, it shows that I have 27,000 objects per PG.
I have run the following command: "rados -p  ls | wc -l" which
returns the correct number of one billion objects. However, when I run the
same command per pg, the results are much less, with only 20 million
objects being reported. For example, "rados -p  --pgid 1.xx ls | wc
-l" shows only three objects in the specified PG.
This is a significant discrepancy and I am concerned about the integrity of
my data.
Do you have any idea about this discrepancy?

p.s:
I have a total of 30 million objects in a single bucket and versioning has
not been enabled for this particular bucket.

Thank you for your time and I look forward to your response.

Best regards,
Ramin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: January Ceph Science Virtual User Group

2023-02-02 Thread Konstantin Shalygin
Hi Mark,

Why you need to upgrade every year? 
Yes, scalable - you can add new racks without new version or new distro - that 
how big clusters are live


k

Sent from my iPhone

> On 2 Feb 2023, at 19:09, Marc  wrote:
> 
> 
>> 
>> 
>> https://www.youtube.com/playlist?list=PLrBUGiINAakM3d4bw6Rb7EZUcLd98iaWG
>> 
> 
> Interesting to hear about the container environment not able to scale to 
> well. 
> 
> And listening to this I started to wonder also about the current release 
> cycle. I can remember the discussion about this, but to me it looks like 
> almost all big clusters can't keep up with upgrading every year. I am not 
> even looking forward to upgrading my tiny cluster.
> 
> Yet even on the ceph.com home page, one of the key features advertised is 
> scalable .
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: January Ceph Science Virtual User Group

2023-02-02 Thread Marc
> 
> https://www.youtube.com/playlist?list=PLrBUGiINAakM3d4bw6Rb7EZUcLd98iaWG
> 

Interesting to hear about the container environment not able to scale to well. 

And listening to this I started to wonder also about the current release cycle. 
I can remember the discussion about this, but to me it looks like almost all 
big clusters can't keep up with upgrading every year. I am not even looking 
forward to upgrading my tiny cluster.

Yet even on the ceph.com home page, one of the key features advertised is 
scalable .


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io