[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread nbarbier
First, thanks Xiubo for your feedback !

To go further on the points raised by Sake:
- How does this happen ? -> There were no preliminary signs before the incident

-  Is this avoidable? -> Good question, I'd also like to know how!

-  How to fix the issue ? -> So far, no fix nor workaround from what I read. I 
am very interested in finding a way to have the storage running again, so far 
our cluster is out of order as it's not possible to write on it anymore (good 
to know that data is still readable btw!). I'm not a Ceph guru so I don't want 
to play with the settings / parameters as the result could be even worse, but 
help would be greatly appreciated to get a system back available !

- Should you use cephfs with reef ? -> Well, from my experience, not for 
production

Thanks to everyone who helped me or will help me find a solution!
Nicolas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Adding new OSDs - also adding PGs?

2024-06-04 Thread Wesley Dillingham
It depends on the cluster. In general I would say if your PG count is
already good in terms of PG-per-OSD (say between 100 and 200 each) add
capacity and then re-evaluate your PG count after.

If you have a lot of time before the gear will be racked and could undergo
some PG splits before the new gear is integrated you may want to get that
work done now.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Tue, Jun 4, 2024 at 4:27 PM Erich Weiler  wrote:

> Hi All,
>
> I'm going to be adding a bunch of OSDs to our cephfs cluster shortly
> (increasing the total size by 50%).  We're on reef, and will be
> deploying using the cephadm method, and the OSDs are exactly the same
> size and disk type as the current ones.
>
> So, after adding the new OSDs, my understanding is that ceph will begin
> rebalancing the data.  I will also probably want to increase my PGs to
> accommodate the new OSDs being added.  My question is basically: should
> I wait for the rebalance to finish before increasing my PG count?  Which
> would kick off another relabance action for the new PGs?  Or, should I
> increase the PG count as soon as the rebalance action starts after
> adding the new OSDs, and it would then create new PGs and rebalance on
> the new OSDs at the same time?
>
> Thanks for any guidance!
>
> -erich
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: reef 18.2.3 QE validation status

2024-06-04 Thread Laura Flores
Rados results were approved, and we successfully upgraded the gibba cluster.

Now waiting on @Dan Mick  to upgrade the LRC.

On Thu, May 30, 2024 at 8:32 PM Yuri Weinstein  wrote:

> I reran rados on the fix https://github.com/ceph/ceph/pull/57794/commits
> and seeking approvals from Radek and Laure
>
> https://tracker.ceph.com/issues/65393#note-1
>
> On Tue, May 28, 2024 at 2:12 PM Yuri Weinstein 
> wrote:
> >
> > We have discovered some issues (#1 and #2) during the final stages of
> > testing that require considering a delay in this point release until
> > all options and risks are assessed and resolved.
> >
> > We will keep you all updated on the progress.
> >
> > Thank you for your patience!
> >
> > #1 https://tracker.ceph.com/issues/66260
> > #2 https://tracker.ceph.com/issues/61948#note-21
> >
> > On Wed, May 1, 2024 at 3:41 PM Yuri Weinstein 
> wrote:
> > >
> > > We've run into a problem during the last verification steps before
> > > publishing this release after upgrading the LRC to it  =>
> > > https://tracker.ceph.com/issues/65733
> > >
> > > After this issue is resolved, we will continue testing and publishing
> > > this point release.
> > >
> > > Thanks for your patience!
> > >
> > > On Thu, Apr 18, 2024 at 11:29 PM Christian Rohmann
> > >  wrote:
> > > >
> > > > On 18.04.24 8:13 PM, Laura Flores wrote:
> > > > > Thanks for bringing this to our attention. The leads have decided
> that
> > > > > since this PR hasn't been merged to main yet and isn't approved, it
> > > > > will not go in v18.2.3, but it will be prioritized for v18.2.4.
> > > > > I've already added the PR to the v18.2.4 milestone so it's sure to
> be
> > > > > picked up.
> > > >
> > > > Thanks a bunch. If you miss the train, you miss the train - fair
> enough.
> > > > Nice to know there is another one going soon and that bug is going
> to be
> > > > on it !
> > > >
> > > >
> > > > Regards
> > > >
> > > > Christian
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Adding new OSDs - also adding PGs?

2024-06-04 Thread Erich Weiler

Hi All,

I'm going to be adding a bunch of OSDs to our cephfs cluster shortly 
(increasing the total size by 50%).  We're on reef, and will be 
deploying using the cephadm method, and the OSDs are exactly the same 
size and disk type as the current ones.


So, after adding the new OSDs, my understanding is that ceph will begin 
rebalancing the data.  I will also probably want to increase my PGs to 
accommodate the new OSDs being added.  My question is basically: should 
I wait for the rebalance to finish before increasing my PG count?  Which 
would kick off another relabance action for the new PGs?  Or, should I 
increase the PG count as soon as the rebalance action starts after 
adding the new OSDs, and it would then create new PGs and rebalance on 
the new OSDs at the same time?


Thanks for any guidance!

-erich
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Lukasz Borek
>
> You could check if your devices support NVMe namespaces and create more
> than one namespace on the device.

Wow, tricky. Will give it a try.

Thanks!


Łukasz Borek
luk...@borek.org.pl


On Tue, 4 Jun 2024 at 16:26, Robert Sander 
wrote:

> Hi,
>
> On 6/4/24 16:15, Anthony D'Atri wrote:
>
> > I've wondered for years what the practical differences are between using
> a namespace and a conventional partition.
>
> Namespaces show up as separate block devices in the kernel.
>
> The orchestrator will not touch any devices that contain a partition
> table or logical volume signatures.
>
> Regards
> --
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
>
> https://www.heinlein-support.de
>
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
>
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Excessively Chatty Daemons RHCS v5

2024-06-04 Thread Joshua Arulsamy
Hi,

I recently upgraded my RHCS cluster from v4 to v5 and moved to containerized
daemons (podman) along the way. I noticed that there are a huge number of logs
going to journald on each of my hosts. I am unsure why there are so many.

I tried changing the logging level at runtime with commands like these (from the
ceph docs):

ceph tell osd.\* config set debug_osd 0/5

I tried adjusting several different subsystems (also with 0/0) but I noticed
that logs seem to come at the same rate/content. I'm not sure what to try next?
Is there a way to trace where logs are coming from?

Some of the sample log entries are events like this on the OSD nodes:

Jun 04 10:34:02 pf-osd1 ceph-osd-0[182875]: 2024-06-04T10:34:02.470-0600
7fc049c03700 -1 osd.0 pg_epoch: 703151 pg[35.39s0( v 703141'789389
(701266'780746,703141'789389] local-lis/les=702935/702936 n=48162 ec=63726/27988
lis/c=702935/702935 les/c/f=702936/702936/0 sis=702935)
[0,194,132,3,177,159,83,18,149,14,145]p0(0) r=0 lpr=702935 crt=703141'789389
lcod 703141'789388 mlcod 703141'789388 active+clean planned DEEP_SCRUB_ON_ERROR]
scrubber : handle_scrub_reserve_grant: received unsolicited
reservation grant from osd 177(4) (0x55fdea6c4000)

These are very verbose messages and occur roughly every 0.5 second per daemon.
On a cluster with 200 daemons this is getting unmanageable and is flooding my
syslog servers.

Any advice on how to tame all the logs would be greatly appreciated!

Best,

Josh

Joshua Arulsamy
HPC Systems Architect
Advanced Research Computing Center
University of Wyoming
jarul...@uwyo.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Setting hostnames for zonegroups via cephadm / rgw mgr module?

2024-06-04 Thread Matthew Vernon

Hi,

I'm using reef (18.2.2); the docs talk about setting up a multi-site 
setup with a spec file e.g.


rgw_realm: apus
rgw_zonegroup: apus_zg
rgw_zone: eqiad
placement:
  label: "rgw"

but I don't think it's possible to configure the "hostnames" parameter 
of the zonegroup (and thus control what hostname(s) the rgws are 
expecting to serve)? Have I missed something, or do I need to set up the 
realm/zonegroup/zone, extract the zonegroup json and edit hostnames by hand?


Thanks,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Robert Sander

Hi,

On 6/4/24 16:15, Anthony D'Atri wrote:


I've wondered for years what the practical differences are between using a 
namespace and a conventional partition.


Namespaces show up as separate block devices in the kernel.

The orchestrator will not touch any devices that contain a partition 
table or logical volume signatures.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Anthony D'Atri
Or partition, or use LVM.

I've wondered for years what the practical differences are between using a 
namespace and a conventional partition.


> On Jun 4, 2024, at 07:59, Robert Sander  wrote:
> 
> On 6/4/24 12:47, Lukasz Borek wrote:
> 
>> Using cephadm, is it possible to cut part of the NVME drive for OSD and
>> leave rest space for RocksDB/WALL?
> 
> Not out of the box.
> 
> You could check if your devices support NVMe namespaces and create more than 
> one namespace on the device. The kernel then sees multiple block devices and 
> for the orchestrator they are completely separate.
> 
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> https://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] problem with mgr prometheus module

2024-06-04 Thread Dario Graña
Hi all!

I'm running ceph quincy 17.2.7 in a cluster. On monday I updated the OS to
AlmaLinux 9.3 to 9.4, since then grafana shows "No Data" message in all
ceph related fields but, for example, the nodes information is still fine
(Host Detail Dashboard).
I have redeployed the mgr service with cephadm, disabled and re-enabled mgr
prometheus module , but nothing changed. Digging into the problem, I
accessed the prometheus interface. When I access prometheus, and found this
error[image: Screen Shot 2024-06-04 at 15.22.37.png]
When I access the node shown as down, it reports
503 Service Unavailable

No cached data available yet

Traceback (most recent call last):
  File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line
638, in respond
self._do_respond(path_info)
  File "/lib/python3.6/site-packages/cherrypy/_cprequest.py", line
697, in _do_respond
response.body = self.handler()
  File "/lib/python3.6/site-packages/cherrypy/lib/encoding.py", line
219, in __call__
self.body = self.oldhandler(*args, **kwargs)
  File "/lib/python3.6/site-packages/cherrypy/_cpdispatch.py", line
54, in __call__
return self.callable(*self.args, **self.kwargs)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1751, in metrics
return self._metrics(_global_instance)
  File "/usr/share/ceph/mgr/prometheus/module.py", line 1762, in _metrics
raise cherrypy.HTTPError(503, 'No cached data available yet')
cherrypy._cperror.HTTPError: (503, 'No cached data available yet')

I checked the mgr prometheus address and port
[ceph: root@ceph-admin01 /]# ceph config get mgr mgr/prometheus/server_addr
::
[ceph: root@ceph-admin01 /]# ceph config get mgr mgr/prometheus/server_port
9283

It seems to be ok.

When I check the master manager node for the port, I found
[root@ceph-hn01 ~]# netstat -natup | grep 9283
tcp6   0  0 :::9283 :::*LISTEN
 2453/ceph-mgr
tcp6   0  0 192.168.97.51:9283  192.168.97.60:36130
ESTABLISHED 2453/ceph-mgr

I don't understand why it is showing as IPv6, the node doesn't have a dual
stack.

I also tried to use a newer version of the prometheus container image, the
1.6.0, but it keeps reporting the same, so I rolled it back to the original
one.

Has anyone experienced an issue like this?
Where can I look for more information about it?

Thanks in advance.

Regards.
-- 
Dario Graña
PIC (Port d'Informació Científica)
Campus UAB, Edificio D
E-08193 Bellaterra, Barcelona
http://www.pic.es
Avis - Aviso - Legal Notice: http://legal.ifae.es
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Update OS with clean install

2024-06-04 Thread Sake Ceph
Hi Robert, 

I tried, but that doesn't work :( 

Using exit maintenance mode results in the error: "missing 2 required 
positional arguments: 'hostname' and 'addr'" 
But running the command a second time, it looks like it works, but then I get 
errors with starting the containers. The start up fails because it can't pull 
the container image because authentication is required (our instance is offline 
and we're using a local image registry with authentication). 

Kind regards, 
Sake 
> Op 04-06-2024 14:40 CEST schreef Robert Sander :
> 
>  
> Hi,
> 
> On 6/4/24 14:35, Sake Ceph wrote:
> 
> > * Store host labels (we use labels to deploy the services)
> > * Fail-over MDS and MGR services if running on the host
> > * Remove host from cluster
> > * Add host to cluster again with correct labels
> 
> AFAIK the steps above are not necessary. It should be sufficient to do these:
> 
> * Set host in maintenance mode
> * Reinstall host with newer OS
> * Configure host with correct settings (for example cephadm user SSH key etc.)
> * Unset maintenance mode for the host
> * For OSD hosts run ceph cephadm osd activate
> 
> Regards
> -- 
> Robert Sander
> Heinlein Consulting GmbH
> Schwedter Str. 8/9b, 10119 Berlin
> 
> https://www.heinlein-support.de
> 
> Tel: 030 / 405051-43
> Fax: 030 / 405051-19
> 
> Amtsgericht Berlin-Charlottenburg - HRB 220009 B
> Geschäftsführer: Peer Heinlein - Sitz: Berlin
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: RBD Mirror - Failed to unlink peer

2024-06-04 Thread Eugen Block

Hi,

I don't have much to contribute, but according to the source code [1]  
this seems to be a non-fatal message:


void CreatePrimaryRequest::handle_unlink_peer(int r) {
  CephContext *cct = m_image_ctx->cct;
  ldout(cct, 15) << "r=" << r << dendl;

  if (r < 0) {
lderr(cct) << "failed to unlink peer: " << cpp_strerror(r) << dendl;
finish(0); // not fatal
return;
  }

I guess if you increased debug level to 15, you might see where  
exactly that message comes from. But I don't know how to get rid of  
them, so maybe one of the devs can comment on that.


Regards,
Eugen

[1]  
https://github.com/ceph/ceph/blob/v17.2.7/src/librbd/mirror/snapshot/CreatePrimaryRequest.cc#L260


Zitat von Scott Cairns :


Hi,

Following the introduction of an additional node to our Ceph  
cluster, we've started to see unlink errors when taking a rbd mirror  
snapshot.


We've had RBD mirroring configured for over a year now and it's been  
working flawlessly, however after we created OSD's on a new node  
we've receiving the following error:


librbd::mirror::snapshot::CreatePrimaryRequest: 0x7f60c80056f0  
handle_unlink_peer: failed to unlink peer: (2) No such file or  
directory


This seemed to appear on around 3 of 150 snapshots on the first  
night and over the weeks has progressed to almost every snapshot.


What's odd, is that the snapshot appears to be taken without any  
issues and does mirror to the DR site - we can see the snapshot ID  
taken on the source side is mirrored to the destination side when  
checking the rbd snap ls, and we've tested promoting an image on the  
DR site to ensure the snapshot does include up to date data, which  
it does.


I can't see any other errors generated when the snapshot is taken to  
identify what file/directory isn't found - everything appears to be  
working okay it's just generating an error during the snapshot.



I've also tried disabling mirroring on the disk and re-enabling  
however it doesn't appear to make any difference - there's no error  
on the initial mirror image, or the first snapshot taken after that,  
but every subsequent snapshot shows the error again.


Any ideas?

Thanks,
Scott



The content of this e-mail and any attachment is confidential and  
intended solely for the use of the individual to whom it is addressed.
Any views or opinions presented are solely those of the author and  
do not necessarily represent those of Tecnica Limited.

If you have received this e-mail in error please notify the sender.
Any use, dissemination, forwarding, printing, or copying of this  
e-mail or any attachments thereto, in whole or part, without  
permission is strictly prohibited.


Tecnica Limited Registered office: 5 Castle Court, Carnegie Campus,  
Dunfermline, Fife, KY11 8PB.

Registered in Scotland No. SC250307.
VAT No. 827 5110 42.

This footnote also confirms that this email message has been swept  
for the presence of computer viruses.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2024-06-04 Thread Stolte, Felix
Hi Patrick,

it has been a year now and we did not have a single crash since upgrading to 
16.2.13. We still have the 19 corrupted files which are reported by 'damage 
ls‘. Is it now possible to delete the corrupted files without taking the 
filesystem offline?

Am 22.05.2023 um 20:23 schrieb Patrick Donnelly :

Hi Felix,

On Sat, May 13, 2023 at 9:18 AM Stolte, Felix  wrote:

Hi Patrick,

we have been running one daily snapshot since december and our cephfs crashed 3 
times because of this https://tracker.ceph.com/issues/38452

We currentliy have 19 files with corrupt metadata found by your first-damage.py 
script. We isolated the these files from access by users and are waiting for a 
fix before we remove them with your script (or maybe a new way?)

No other fix is anticipated at this time. Probably one will be
developed after the cause is understood.

Today we upgraded our cluster from 16.2.11 and 16.2.13. After Upgrading the mds 
 servers, cluster health went to ERROR MDS_DAMAGE. 'ceph tells mds 0 damage ls‘ 
is showing me the same files as your script (initially only a part, after a 
cephfs scrub all of them).

This is expected. Once the dentries are marked damaged, the MDS won't
allow operations on those files (like those triggering tracker
#38452).

I noticed "mds: catch damage to CDentry’s first member before persisting 
(issue#58482, pr#50781, Patrick Donnelly)“ in the change logs for 16.2.13  and 
like to ask you the following questions:

a) can we repair the damaged files online now instead of bringing down the 
whole fs and using the python script?

Not yet.

b) should we set one of the new mds options in our specific case to avoid our 
fileserver crashing because of the wrong snap ids?

Have your MDS crashed or just marked the dentries damaged? If you can
reproduce a crash with detailed logs (debug_mds=20), that would be
incredibly helpful.

c) will your patch prevent wrong snap ids in the future?

It will prevent persisting the damage.


--
Patrick Donnelly, Ph.D.
He / Him / His
Red Hat Partner Engineer
IBM, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D


mit freundlichem Gruß
Felix Stolte

IT-Services
mailto: f.sto...@fz-juelich.de
Tel: 02461-619243

-
-
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
Karsten Beneke (stellv. Vorsitzender), Dr. Ir. Pieter Jansens
-
-

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Update OS with clean install

2024-06-04 Thread Robert Sander

Hi,

On 6/4/24 14:35, Sake Ceph wrote:


* Store host labels (we use labels to deploy the services)
* Fail-over MDS and MGR services if running on the host
* Remove host from cluster
* Add host to cluster again with correct labels


AFAIK the steps above are not necessary. It should be sufficient to do these:

* Set host in maintenance mode
* Reinstall host with newer OS
* Configure host with correct settings (for example cephadm user SSH key etc.)
* Unset maintenance mode for the host
* For OSD hosts run ceph cephadm osd activate

Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Update OS with clean install

2024-06-04 Thread Sake Ceph
Hi all
I'm working on a way to automate the OS upgrade of our hosts. This happens with 
a complete reinstall of the OS.

What is the correct way to do this? At the moment I'm using the following:
* Store host labels (we use labels to deploy the services)
* Fail-over MDS and MGR services if running on the host
* Set host in maintenance mode
* Reinstall host with newer OS
* Remove host from cluster
* Configure host with correct settings (for example cephadm user SSH key etc.)
* Add host to cluster again with correct labels
* For OSD hosts run ceph cephadm osd activate

If somebody has some advice I would gladly hear about it! 

KIND regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Robert Sander

On 6/4/24 12:47, Lukasz Borek wrote:


Using cephadm, is it possible to cut part of the NVME drive for OSD and
leave rest space for RocksDB/WALL?


Not out of the box.

You could check if your devices support NVMe namespaces and create more 
than one namespace on the device. The kernel then sees multiple block 
devices and for the orchestrator they are completely separate.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

https://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Amtsgericht Berlin-Charlottenburg - HRB 220009 B
Geschäftsführer: Peer Heinlein - Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi Xiubo

Thank you for the explanation! This won't be a issue for us, but made me think 
twice :) 

Kind regards, 
Sake 

> Op 04-06-2024 12:30 CEST schreef Xiubo Li :
> 
>  
> On 6/4/24 15:20, Sake Ceph wrote:
> > Hi,
> >
> > A little break into this thread, but I have some questions:
> > * How does this happen, that the filesystem gets into readonly modus
> 
> The detail explanation you can refer to the ceph PR: 
> https://github.com/ceph/ceph/pull/55421.
> 
> > * Is this avoidable?
> > * How-to fix the issue, because I didn't see a workaround in the mentioned 
> > tracker (or I missed it)
> Possibly avoid changing data pools or disable multiple data pools?
> > * With this bug around, should you use cephfs with reef?
> 
> This will happen in all the releases, so that doesn't matter.
> 
> - Xiubo
> 
> >
> > Kind regards,
> > Sake
> >
> >> Op 04-06-2024 04:04 CEST schreef Xiubo Li :
> >>
> >>   
> >> Hi Nicolas,
> >>
> >> This is a known issue and Venky is working on it, please see
> >> https://tracker.ceph.com/issues/63259.
> >>
> >> Thanks
> >> - Xiubo
> >>
> >> On 6/3/24 20:04, nbarb...@deltaonline.net wrote:
> >>> Hello,
> >>>
> >>> First of all, thanks for reading my message. I set up a Ceph version 
> >>> 18.2.2 cluster with 4 nodes, everything went fine for a while, but after 
> >>> copying some files, the storage showed a warning status and the following 
> >>> message : "HEALTH_WARN: 1 MDSs are read only mds.PVE-CZ235007SH(mds.0): 
> >>> MDS in read-only mode".
> >>>
> >>> The logs are showing :
> >>>
> >>> Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -> 
> >>> 2024-06-03T07:57:17.589+0200 77250fc006c0 -1 log_channel(cluster) log 
> >>> [ERR] : failed to store backtrace on ino 0x100039c object, pool 5, 
> >>> errno -2
> >>> Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -9998> 
> >>> 2024-06-03T07:57:17.589+0200 77250fc006c0 -1 mds.0.189541 unhandled write 
> >>> error (2) No such file or directory, force readonly...
> >>>
> >>> After googling for a while, I did not find a hint to understand more 
> >>> precisely the root cause. Any help would we greatly appreciated, or even 
> >>> a link to post this request elsewhere if this is not the place to.
> >>>
> >>> Please find below additional details if needed. Thanks a lot !
> >>>
> >>> Nicolas
> >>>
> >>> ---
> >>>
> >>> # ceph osd dump
> >>> [...]
> >>> pool 5 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
> >>> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 
> >>> 292 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
> >>> recovery_priority 5 application cephfs read_balance_score 4.51
> >>> [...]
> >>>
> >>> # ceph osd lspools
> >>> 1 .mgr
> >>> 4 cephfs_data
> >>> 5 cephfs_metadata
> >>> 18 ec-pool-001-data
> >>> 19 ec-pool-001-metadata
> >>>
> >>>
> >>> # ceph df
> >>> --- RAW STORAGE ---
> >>> CLASS SIZEAVAILUSED  RAW USED  %RAW USED
> >>> hdd633 TiB  633 TiB  61 GiB61 GiB  0
> >>> TOTAL  633 TiB  633 TiB  61 GiB61 GiB  0
> >>>
> >>> --- POOLS ---
> >>> POOL  ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> >>> .mgr   11  119 MiB   31  357 MiB  0200 TiB
> >>> cephfs_data4   32   71 KiB8.38k  240 KiB  0200 TiB
> >>> cephfs_metadata5   32  329 MiB6.56k  987 MiB  0200 TiB
> >>> ec-pool-001-data  18   32   42 GiB   15.99k   56 GiB  0451 TiB
> >>> ec-pool-001-metadata  19   32  0 B0  0 B  0200 TiB
> >>>
> >>>
> >>>
> >>> # ceph status
> >>> cluster:
> >>>   id: f16f53e1-7028-440f-bf48-f99912619c33
> >>>   health: HEALTH_WARN
> >>>   1 MDSs are read only
> >>>
> >>> services:
> >>>   mon: 4 daemons, quorum 
> >>> PVE-CZ235007SG,PVE-CZ2341016V,PVE-CZ235007SH,PVE-CZ2341016T (age 35h)
> >>>   mgr: PVE-CZ235007SG(active, since 2d), standbys: PVE-CZ235007SH, 
> >>> PVE-CZ2341016T, PVE-CZ2341016V
> >>>   mds: 1/1 daemons up, 3 standby
> >>>   osd: 48 osds: 48 up (since 2d), 48 in (since 3d)
> >>>
> >>> data:
> >>>   volumes: 1/1 healthy
> >>>   pools:   5 pools, 129 pgs
> >>>   objects: 30.97k objects, 42 GiB
> >>>   usage:   61 GiB used, 633 TiB / 633 TiB avail
> >>>   pgs: 129 active+clean
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: tuning for backup target cluster

2024-06-04 Thread Lukasz Borek
>
> I have certainly seen cases where the OMAPS have not stayed within the
> RocksDB/WAL NVME space and have been going down to disk.

How to monitor OMAPS size and if it does not get out of NVME?

The OP's number suggest IIRC like 120GB-ish for WAL+DB, though depending on
> workload spillover could of course still be a thing.

Correct. But for production deployment the plan is to use 3.2TB for 10
HDDs. In case of performance problems we will move non-ec pool to SSD (by
replacing few HDD by SSDs)

Using cephadm, is it possible to cut part of the NVME drive for OSD and
leave rest space for RocksDB/WALL? Now my deployment is as simple as :

# ceph orch  ls osd osd.dashboard-admin-1710711254620 --export
service_type: osd
service_id: dashboard-admin-1710711254620
service_name: osd.dashboard-admin-1710711254620
placement:
  host_pattern: cephbackup-osd3
spec:
  data_devices:
rotational: true
  db_devices:
rotational: false
  filter_logic: AND
  objectstore: bluestore

Thanks

On Mon, 3 Jun 2024 at 17:28, Anthony D'Atri  wrote:

>
> The OP's number suggest IIRC like 120GB-ish for WAL+DB, though depending
> on workload spillover could of course still be a thing.
>
> >
> > I have certainly seen cases where the OMAPS have not stayed within the
> RocksDB/WAL NVME space and have been going down to disk.
> >
> > This was on a large cluster with a lot of objects but the disks that
> where being used for the non-ec pool where seeing a lot more actual disk
> activity than the other disks in the system.
> >
> > Moving the non-ec pool onto NVME helped with a lot of operations that
> needed to be done to cleanup a lot of orphaned objects.
> >
> > Yes this was a large cluster with a lot of ingress data admitedly.
> >
> > Darren Soothill
> >
> > Want a meeting with me: https://calendar.app.google/MUdgrLEa7jSba3du9
> >
> > Looking for help with your Ceph cluster? Contact us at https://croit.io/
> >
> > croit GmbH, Freseniusstr. 31h, 81247 Munich
> > CEO: Martin Verges - VAT-ID: DE310638492
> > Com. register: Amtsgericht Munich HRB 231263
> > Web: https://croit.io/ | YouTube: https://goo.gl/PGE1Bx
> >
> >
> >
> >
> >> On 29 May 2024, at 21:24, Anthony D'Atri  wrote:
> >>
> >>
> >>
> >>> You also have the metadata pools used by RGW that ideally need to be
> on NVME.
> >>
> >> The OP seems to intend shared NVMe for WAL+DB, so that the omaps are on
> NVMe that way.
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Łukasz Borek
luk...@borek.org.pl
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error EINVAL: check-host failed - Failed to add host

2024-06-04 Thread Eugen Block

Hi,

I think there's something else wrong with your setup, I could  
bootstrap a cluster without an issue with ed keys:


ceph:~ # ssh-keygen -t ed25519
Generating public/private ed25519 key pair.

ceph:~ # cephadm --image quay.io/ceph/ceph:v18.2.2 bootstrap --mon-ip  
[IP] [some more options] --ssh-private-key .ssh/id_ed25519  
--ssh-public-key .ssh/id_ed25519.pub

...
Using provided ssh keys...
Adding key to root@localhost authorized_keys...
Adding host ceph...
...
Bootstrap complete.

ceph:~ # ceph cephadm get-pub-key
ssh-ed25519 C... root@ceph

Regards,
Eugen

Zitat von isnraj...@yahoo.com:


Hello,

Please help:  CEPH cluster using docker.

using below command for the bootstrap with provided key and pub

cephadm -v bootstrap --mon-ip  --allow-overwrite  
--ssh-private-key id_ed25519 --ssh-public-key id_ed25519.pub


able to ssh directly with id_ed25519 key.


RuntimeError: Failed command: /usr/bin/docker run --rm --ipc=host  
--stop-signal=SIGTERM --ulimit nofile=1048576 --net=host  
--entrypoint /usr/bin/ceph --init -e  
CONTAINER_IMAGE=quay.io/ceph/ceph:v18 -e NODE_NAME= -e  
CEPH_USE_RANDOM_NONCE=1 -v  
/var/log/ceph/85c044a8-1d82-11ef-9ea1-a73759ab75e5:/var/log/ceph:z  
-v /tmp/ceph-tmp1sw6a5s0:/etc/ceph/ceph.client.admin.keyring:z -v  
/tmp/ceph-tmpethxlwxr:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v18  
orch host add  : Error EINVAL: check-host failed:
Unable to write  
:/var/lib/ceph/85c044a8-1d82-11ef-9ea1-a73759ab75e5/cephadm.2b9d7d139a9cb40289f2358faf49a109fc297c0a258bde893227c262c30bca8d:

 Session request failed

also validated below and it was successful:


ceph cephadm get-ssh-config > ssh_config
ceph config-key get mgr/cephadm/ssh_identity_key > key
ssh -F ssh_config -i key root@


---
root@:~/.ssh# ceph orch host add  
Error EINVAL: check-host failed:
Unable to write  
:/var/lib/ceph/fe8ecd30-1da2-11ef-9ea1-a73759ab75e5/cephadm.2b9d7d139a9cb40289f2358faf49a109fc297c0a258bde893227c262c30bca8d:

 Session request failed
root@:~/.ssh#

Thanks,
Surya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Xiubo Li



On 6/4/24 15:20, Sake Ceph wrote:

Hi,

A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus


The detail explanation you can refer to the ceph PR: 
https://github.com/ceph/ceph/pull/55421.



* Is this avoidable?
* How-to fix the issue, because I didn't see a workaround in the mentioned 
tracker (or I missed it)

Possibly avoid changing data pools or disable multiple data pools?

* With this bug around, should you use cephfs with reef?


This will happen in all the releases, so that doesn't matter.

- Xiubo



Kind regards,
Sake


Op 04-06-2024 04:04 CEST schreef Xiubo Li :

  
Hi Nicolas,


This is a known issue and Venky is working on it, please see
https://tracker.ceph.com/issues/63259.

Thanks
- Xiubo

On 6/3/24 20:04, nbarb...@deltaonline.net wrote:

Hello,

First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with 
4 nodes, everything went fine for a while, but after copying some files, the storage 
showed a warning status and the following message : "HEALTH_WARN: 1 MDSs are read 
only mds.PVE-CZ235007SH(mds.0): MDS in read-only mode".

The logs are showing :

Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -> 
2024-06-03T07:57:17.589+0200 77250fc006c0 -1 log_channel(cluster) log [ERR] : 
failed to store backtrace on ino 0x100039c object, pool 5, errno -2
Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -9998> 
2024-06-03T07:57:17.589+0200 77250fc006c0 -1 mds.0.189541 unhandled write error 
(2) No such file or directory, force readonly...

After googling for a while, I did not find a hint to understand more precisely 
the root cause. Any help would we greatly appreciated, or even a link to post 
this request elsewhere if this is not the place to.

Please find below additional details if needed. Thanks a lot !

Nicolas

---

# ceph osd dump
[...]
pool 5 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 292 flags 
hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 recovery_priority 5 
application cephfs read_balance_score 4.51
[...]

# ceph osd lspools
1 .mgr
4 cephfs_data
5 cephfs_metadata
18 ec-pool-001-data
19 ec-pool-001-metadata


# ceph df
--- RAW STORAGE ---
CLASS SIZEAVAILUSED  RAW USED  %RAW USED
hdd633 TiB  633 TiB  61 GiB61 GiB  0
TOTAL  633 TiB  633 TiB  61 GiB61 GiB  0

--- POOLS ---
POOL  ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
.mgr   11  119 MiB   31  357 MiB  0200 TiB
cephfs_data4   32   71 KiB8.38k  240 KiB  0200 TiB
cephfs_metadata5   32  329 MiB6.56k  987 MiB  0200 TiB
ec-pool-001-data  18   32   42 GiB   15.99k   56 GiB  0451 TiB
ec-pool-001-metadata  19   32  0 B0  0 B  0200 TiB



# ceph status
cluster:
  id: f16f53e1-7028-440f-bf48-f99912619c33
  health: HEALTH_WARN
  1 MDSs are read only

services:
  mon: 4 daemons, quorum 
PVE-CZ235007SG,PVE-CZ2341016V,PVE-CZ235007SH,PVE-CZ2341016T (age 35h)
  mgr: PVE-CZ235007SG(active, since 2d), standbys: PVE-CZ235007SH, 
PVE-CZ2341016T, PVE-CZ2341016V
  mds: 1/1 daemons up, 3 standby
  osd: 48 osds: 48 up (since 2d), 48 in (since 3d)

data:
  volumes: 1/1 healthy
  pools:   5 pools, 129 pgs
  objects: 30.97k objects, 42 GiB
  usage:   61 GiB used, 633 TiB / 633 TiB avail
  pgs: 129 active+clean
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Testing CEPH scrubbing / self-healing capabilities

2024-06-04 Thread Petr Bena

Hello,

I wanted to try out (lab ceph setup) what exactly is going to happen 
when parts of data on OSD disk gets corrupted. I created a simple test 
where I was going through the block device data until I found something 
that resembled user data (using dd and hexdump) (/dev/sdd is a block 
device that is used by OSD)


INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | 
hexdump -C
  6e 20 69 64 3d 30 20 65  78 65 3d 22 2f 75 73 72  |n id=0 
exe="/usr|
0010  2f 73 62 69 6e 2f 73 73  68 64 22 20 68 6f 73 74 |/sbin/sshd" 
host|


Then I deliberately overwrote 32 bytes using random data:

INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/urandom of=/dev/sdd bs=32 
count=1 seek=33920


INFRA [root@ceph-vm-lab5 ~]# dd if=/dev/sdd bs=32 count=1 skip=33920 | 
hexdump -C
  25 75 af 3e 87 b0 3b 04  78 ba 79 e3 64 fc 76 d2 
|%u.>..;.x.y.d.v.|
0010  9e 94 00 c2 45 a5 e1 d2  a8 86 f1 25 fc 18 07 5a 
|E..%...Z|


At this point I would expect some sort of data corruption. I restarted 
the OSD daemon on this host to make sure it flushes any potentially 
buffered data. It restarted OK without noticing anything, which was 
expected.


Then I ran

ceph osd scrub 5

ceph osd deep-scrub 5

And waiting for all scheduled scrub operations for all PGs to finish.

No inconsistency was found. No errors reported, scrubs just finished OK, 
data are still visibly corrupt via hexdump.


Did I just hit some block of data that WAS used by OSD, but was marked 
deleted and therefore no longer used or am I missing something? I would 
expect CEPH to detect disk corruption and automatically replace the 
invalid data with a valid copy?


I use only replica pools in this lab setup, for RBD and CephFS.

Thanks

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: stretched cluster new pool and second pool with nvme

2024-06-04 Thread Eugen Block
How exactly does your crush rule look right now? I assume it's  
supposed to distribute data across two sites, and since one site is  
missing, the PGs stay in degraded state until the site comes back up.  
You would need to either change the crush rule or assign a different  
one to that pool which would allow to recover one the remaining site.


Zitat von "ronny.lippold" :


hi stefan ... i did the next step and need your help.

my idea was to stretch the cluster without stretch mode. so we  
decided to reserve a size of 4 on each side.


the setup is the same as stretched mode, also crush rule, location,  
election_strategy and tie breaker.

only "ceph mon enable_stretch_mode e stretch_rule datacenter" wasn't made.

now in my test, i made a split brain and expect, that on the  
remaining side, the cluster will rebuild the 4 replica.

but that did not happen.
actually, the cluster, is doing the same stuff, as stretch mode  
enabled. writeable with 2 replica.


can you explain me why? i'm spinning around.

this is the status during split brain:

##
pve-test02-01:~# ceph -s
  cluster:
id: 376fcdef-bba0-4e58-b63e-c9754dc948fa
health: HEALTH_WARN
6/13 mons down, quorum  
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker

1 datacenter (8 osds) down
8 osds down
6 hosts (8 osds) down
Degraded data redundancy: 2116/4232 objects degraded  
(50.000%), 95 pgs degraded, 113 pgs undersized


  services:
mon: 13 daemons, quorum  
pve-test01-01,pve-test01-03,pve-test01-05,pve-test02-01,pve-test02-03,pve-test02-05,tie-breaker (age 54m), out of quorum: pve-test01-02, pve-test01-04, pve-test01-06, pve-test02-02, pve-test02-04,  
pve-test02-06
mgr: pve-test02-05(active, since 53m), standbys: pve-test01-05,  
pve-test01-01, pve-test01-03, pve-test02-01, pve-test02-03

mds: 1/1 daemons up, 1 standby
osd: 16 osds: 8 up (since 54m), 16 in (since 77m)

  data:
volumes: 1/1 healthy
pools:   5 pools, 113 pgs
objects: 1.06k objects, 3.9 GiB
usage:   9.7 GiB used, 580 GiB / 590 GiB avail
pgs: 2116/4232 objects degraded (50.000%)
 95 active+undersized+degraded
 18 active+undersized

  io:
client:   17 KiB/s wr, 0 op/s rd, 10 op/s wr
##

thanks a lot,
ronny

Am 2024-04-30 11:42, schrieb Stefan Kooman:

On 30-04-2024 11:22, ronny.lippold wrote:

hi stefan ... you are the hero of the month ;)


:p.



i don't know, why i did not found your bug report.

i have the exact same problem and resolved the HEALTH only with  
"ceph osd force_healthy_stretch_mode --yes-i-really-mean-it"

will comment the report soon.

actually, we think about 4/2 size without stretch mode enable.

what was your solution?


This specific setup (on which I did the testing) is going to be  
full flash (SSD). So the HDDs are going to be phased out. And only  
the default non-device-class crush rule will be used. While that  
will work for this (small) cluster, it is not a solution. This  
issue should be fixed, as I figure there are quite a few cluster  
that want to use device-classes and use stretch mode at the same  
time.


Gr. Stefan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Missing ceph data

2024-06-04 Thread Eugen Block

Hi,

if you can verify which data has been removed, and that client is  
still connected, you might find out who was responsible for that.
Do you know which files in which directories are missing? Does that  
maybe already reveal one or several users/clients?
You can query the mds daemons and inspect the session output, it shows  
which directories are mounted (if you use kernel client):


quincy-1:~ # ceph tell mds.quincy-1.yrgpqm session ls


I doubt that you'll find much in the logs if you don't have debug  
enabled, but it might be worth checking anyway.


Zitat von dhivagar selvam :


Hi,

We are not using cephfs snapshots. Is there any other way to find this out?

On Thu, May 30, 2024 at 5:20 PM Eugen Block  wrote:


Hi,

I've never heard of automatic data deletion. Maybe just some snapshots
were removed? Or someone deleted data on purpose because of the
nearfull state of some OSDs? And there's no trash function for cephfs
(for rbd there is). Do you use cephfs snapshots?


Zitat von Prabu GJ :

> Hi Team,
>
>
> We are using Ceph Octopus version with a total disk size of 136 TB,
> configured with two replicas. Currently, our usage is 57 TB, and the
> available size is 5.3 TB. An incident occurred yesterday where
> around 3 TB of data was deleted automatically. Upon analysis, we
> couldn't find the reason for the deletion. All OSDs are functioning
> properly and actively running.
>
> We have 3 MDS , we try to restarted all MDS services. Is there any
> solution to recover those data. Can anyone please help us find the
> issue?
>
>
>
>
>
> cluster:
>
> id: 0d605d58-5caf-4f76-b6bd-e12402a22296
>
> health: HEALTH_WARN
>
> insufficient standby MDS daemons available
>
> 5 nearfull osd(s)
>
> 3 pool(s) nearfull
>
> 1 pool(s) have non-power-of-two pg_num
>
>
>
>   services:
>
> mon: 4 daemons, quorum
> download-mon3,download-mon4,download-mon1,download-mon2 (age 14h)
>
> mgr: download-mon2(active, since 14h), standbys: download-mon1,
> download-mon3
>
> mds: integdownload:2
> {0=download-mds3=up:active,1=download-mds1=up:active}
>
> osd: 39 osds: 39 up (since 16h), 39 in (since 4d)
>
>
>
>   data:
>
> pools:   3 pools, 1087 pgs
>
> objects: 71.76M objects, 51 TiB
>
> usage:   105 TiB used, 31 TiB / 136 TiB avail
>
> pgs: 1087 active+clean
>
>
>
>   io:
>
> client:   414 MiB/s rd, 219 MiB/s wr, 513 op/s rd, 1.22k op/s wr
>
> 
>
> ID  HOST USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA
> STATE
>
> 0  download-osd1   2995G   581G 14 4785k  6 6626k
> exists,up
>
> 1  download-osd2   2578G   998G 84 3644k 18 10.1M
> exists,up
>
> 2  download-osd3   3093G   483G 17 5114k  5 4152k
> exists,nearfull,up
>
> 3  download-osd4   2757G   819G 12  996k  2 4107k
> exists,up
>
> 4  download-osd5   2889G   687G 28 3355k 20 8660k
> exists,up
>
> 5  download-osd6   2448G  1128G183 3312k 10 9435k
> exists,up
>
> 6  download-osd7   2814G   762G  7 1667k  4 6354k
> exists,up
>
> 7  download-osd8   2872G   703G 14 1672k 15 10.5M
> exists,up
>
> 8  download-osd9   2577G   999G 10 6615k  3 6960k
> exists,up
>
> 9  download-osd10  2651G   924G 16 4736k  3 7378k
> exists,up
>
> 10  download-osd11  2889G   687G 15 4810k  6 8980k
> exists,up
>
> 11  download-osd12  2912G   664G 11 2516k  2 4106k
> exists,up
>
> 12  download-osd13  2785G   791G 74 4643k 11 3717k
> exists,up
>
> 13  download-osd14  3150G   426G214 6133k  4 7389k
> exists,nearfull,up
>
> 14  download-osd15  2728G   848G 11 4959k  4 6603k
> exists,up
>
> 15  download-osd16  2682G   894G 13 3170k  3 2503k
> exists,up
>
> 16  download-osd17  2555G  1021G 53 2183k  7 5058k
> exists,up
>
> 17  download-osd18  3013G   563G 18 3497k  3 4427k
> exists,up
>
> 18  download-osd19  2924G   651G 24 3534k 12 10.4M
> exists,up
>
> 19  download-osd20  3003G   573G 19 5149k  3 2531k
> exists,up
>
> 20  download-osd21  2757G   819G 16 3707k  9 9816k
> exists,up
>
> 21  download-osd22  2576G   999G 15 2526k  8 7739k
> exists,up
>
> 22  download-osd23  2758G   818G 13 4412k 16 7125k
> exists,up
>
> 23  download-osd24  2862G   714G 18 4424k  6 5787k
> exists,up
>
> 24  download-osd25  2792G   783G 16 1972k  9 9749k
> exists,up
>
> 25  download-osd26  2397G  1179G 14 4296k  9 12.0M
> exists,up
>
> 26  download-osd27  2308G  1267G  8 3149k 22 6280k
> exists,up
>
> 27  download-osd29  2732G   844G 12 3357k  3 7372k
> exists,up
>
> 28  download-osd28  2814G   761G 11  476k  

[ceph-users] Re: Help needed please ! Filesystem became read-only !

2024-06-04 Thread Sake Ceph
Hi, 

A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
* Is this avoidable? 
* How-to fix the issue, because I didn't see a workaround in the mentioned 
tracker (or I missed it) 
* With this bug around, should you use cephfs with reef? 

Kind regards, 
Sake 

> Op 04-06-2024 04:04 CEST schreef Xiubo Li :
> 
>  
> Hi Nicolas,
> 
> This is a known issue and Venky is working on it, please see 
> https://tracker.ceph.com/issues/63259.
> 
> Thanks
> - Xiubo
> 
> On 6/3/24 20:04, nbarb...@deltaonline.net wrote:
> > Hello,
> >
> > First of all, thanks for reading my message. I set up a Ceph version 18.2.2 
> > cluster with 4 nodes, everything went fine for a while, but after copying 
> > some files, the storage showed a warning status and the following message : 
> > "HEALTH_WARN: 1 MDSs are read only mds.PVE-CZ235007SH(mds.0): MDS in 
> > read-only mode".
> >
> > The logs are showing :
> >
> > Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -> 
> > 2024-06-03T07:57:17.589+0200 77250fc006c0 -1 log_channel(cluster) log [ERR] 
> > : failed to store backtrace on ino 0x100039c object, pool 5, errno -2
> > Jun 03 08:20:41 PVE-CZ235007SH ceph-mds[1329868]:  -9998> 
> > 2024-06-03T07:57:17.589+0200 77250fc006c0 -1 mds.0.189541 unhandled write 
> > error (2) No such file or directory, force readonly...
> >
> > After googling for a while, I did not find a hint to understand more 
> > precisely the root cause. Any help would we greatly appreciated, or even a 
> > link to post this request elsewhere if this is not the place to.
> >
> > Please find below additional details if needed. Thanks a lot !
> >
> > Nicolas
> >
> > ---
> >
> > # ceph osd dump
> > [...]
> > pool 5 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 0 
> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 292 
> > flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16 
> > recovery_priority 5 application cephfs read_balance_score 4.51
> > [...]
> >
> > # ceph osd lspools
> > 1 .mgr
> > 4 cephfs_data
> > 5 cephfs_metadata
> > 18 ec-pool-001-data
> > 19 ec-pool-001-metadata
> >
> >
> > # ceph df
> > --- RAW STORAGE ---
> > CLASS SIZEAVAILUSED  RAW USED  %RAW USED
> > hdd633 TiB  633 TiB  61 GiB61 GiB  0
> > TOTAL  633 TiB  633 TiB  61 GiB61 GiB  0
> >
> > --- POOLS ---
> > POOL  ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> > .mgr   11  119 MiB   31  357 MiB  0200 TiB
> > cephfs_data4   32   71 KiB8.38k  240 KiB  0200 TiB
> > cephfs_metadata5   32  329 MiB6.56k  987 MiB  0200 TiB
> > ec-pool-001-data  18   32   42 GiB   15.99k   56 GiB  0451 TiB
> > ec-pool-001-metadata  19   32  0 B0  0 B  0200 TiB
> >
> >
> >
> > # ceph status
> >cluster:
> >  id: f16f53e1-7028-440f-bf48-f99912619c33
> >  health: HEALTH_WARN
> >  1 MDSs are read only
> >
> >services:
> >  mon: 4 daemons, quorum 
> > PVE-CZ235007SG,PVE-CZ2341016V,PVE-CZ235007SH,PVE-CZ2341016T (age 35h)
> >  mgr: PVE-CZ235007SG(active, since 2d), standbys: PVE-CZ235007SH, 
> > PVE-CZ2341016T, PVE-CZ2341016V
> >  mds: 1/1 daemons up, 3 standby
> >  osd: 48 osds: 48 up (since 2d), 48 in (since 3d)
> >
> >data:
> >  volumes: 1/1 healthy
> >  pools:   5 pools, 129 pgs
> >  objects: 30.97k objects, 42 GiB
> >  usage:   61 GiB used, 633 TiB / 633 TiB avail
> >  pgs: 129 active+clean
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io