[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Marc
But Rocky Linux 9 is the continuation of what CentOS would have been on el9. 
Afaik is ceph being developed on elX distributions and not the 'trial' stream 
versions, not?


> 
> In most cases the 'Alternative' distro like Alma or Rocky have outdated
> versions of packages, if we compared it with CentOS Stream 8 or CentOS
> Stream 9. For example is a golang package, on c8s is a 1.20 version on
> Alma still 1.19
> 
> You can try to use c8s/c9s or try to contribute to your distro to
> resolve dependency issues
> 
> 
> >
> > I've been digging and I can't see that this has come up anywhere.
> >
> > I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and
> I'm getting the error
> >
> > Error:
> > Problem: cannot install the best update candidate for package ceph-
> base-2:17.2.3-2.el9s.x86_64
> >  - nothing provides liburing.so.2()(64bit) needed by ceph-base-
> 2:17.2.6-4.el9s.x86_64
> >  - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by ceph-
> base-2:17.2.6-4.el9s.x86_64
> > (try to add '--skip-broken' to skip uninstallable packages or '--
> nobest' to use not only best candidate packages)
> >
> > Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides
> 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I set
> up that repo on Foreman; I don't remember if I'm keeping it up-to-date).
> >
> > Can I/should I just do --nobest when updating? I could probably build
> it from a source RPM from another RH-based distro, but I'd rather keep
> it clean with the same distro.
> > ___
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Jens Galsgaard
Your are right.

Centos stream is alpha
Fedora is beta
RHEL is stable

Alma/Rocky/Oracle are based on RHEL

Venlig hilsen - Mit freundlichen Grüßen - Kind Regards,
Jens Galsgaard

Gitservice.dk 
Mob: +45 28864340


-Oprindelig meddelelse-
Fra: Marc  
Sendt: Friday, 4 August 2023 09.04
Til: Konstantin Shalygin ; dobr...@gmu.edu
Cc: ceph-users@ceph.io
Emne: [ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

But Rocky Linux 9 is the continuation of what CentOS would have been on el9. 
Afaik is ceph being developed on elX distributions and not the 'trial' stream 
versions, not?


> 
> In most cases the 'Alternative' distro like Alma or Rocky have 
> outdated versions of packages, if we compared it with CentOS Stream 8 
> or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 
> version on Alma still 1.19
> 
> You can try to use c8s/c9s or try to contribute to your distro to 
> resolve dependency issues
> 
> 
> >
> > I've been digging and I can't see that this has come up anywhere.
> >
> > I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and
> I'm getting the error
> >
> > Error:
> > Problem: cannot install the best update candidate for package ceph-
> base-2:17.2.3-2.el9s.x86_64
> >  - nothing provides liburing.so.2()(64bit) needed by ceph-base-
> 2:17.2.6-4.el9s.x86_64
> >  - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by 
> > ceph-
> base-2:17.2.6-4.el9s.x86_64
> > (try to add '--skip-broken' to skip uninstallable packages or '--
> nobest' to use not only best candidate packages)
> >
> > Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only 
> > provides
> 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I 
> set up that repo on Foreman; I don't remember if I'm keeping it up-to-date).
> >
> > Can I/should I just do --nobest when updating? I could probably 
> > build
> it from a source RPM from another RH-based distro, but I'd rather keep 
> it clean with the same distro.
> > ___
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERN] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Dietmar Rieder
 I thought so too, but now I'm a bit confused. We are planning to 
setup a new ceph cluster and initially opted for a el9 system, which is 
supposed to be stable, should we rather use a stream trail version?


Dietmar

On 8/4/23 09:04, Marc wrote:

But Rocky Linux 9 is the continuation of what CentOS would have been on el9. 
Afaik is ceph being developed on elX distributions and not the 'trial' stream 
versions, not?




In most cases the 'Alternative' distro like Alma or Rocky have outdated
versions of packages, if we compared it with CentOS Stream 8 or CentOS
Stream 9. For example is a golang package, on c8s is a 1.20 version on
Alma still 1.19

You can try to use c8s/c9s or try to contribute to your distro to
resolve dependency issues




I've been digging and I can't see that this has come up anywhere.

I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and

I'm getting the error


Error:
Problem: cannot install the best update candidate for package ceph-

base-2:17.2.3-2.el9s.x86_64

  - nothing provides liburing.so.2()(64bit) needed by ceph-base-

2:17.2.6-4.el9s.x86_64

  - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by ceph-

base-2:17.2.6-4.el9s.x86_64

(try to add '--skip-broken' to skip uninstallable packages or '--

nobest' to use not only best candidate packages)


Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only provides

0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I set
up that repo on Foreman; I don't remember if I'm keeping it up-to-date).


Can I/should I just do --nobest when updating? I could probably build

it from a source RPM from another RH-based distro, but I'd rather keep
it clean with the same distro.

___

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




OpenPGP_signature
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread dobrie2
Konstantin Shalygin wrote:
> Hi,
> 
> In most cases the 'Alternative' distro like Alma or Rocky have outdated 
> versions
> of packages, if we compared it with CentOS Stream 8 or CentOS Stream 9. For 
> example is a
> golang package, on c8s is a 1.20 version on Alma still 1.19
> 
> You can try to use c8s/c9s or try to contribute to your distro to resolve 
> dependency
> issues
> 
> 
> k

By definition, the stable version of anything is going to have "outdated 
versions of packages," so that's not really what's going on here.

You did, unintentionally, give me the clue I needed, though. I accessed the 
Ceph repos from Rocky's Extras repo, which includes centos-release-ceph-quincy

centos-release-ceph-pacific.noarch1.0-2.el9 
 CEC_Rocky_Linux_9_Rocky_92_extras
centos-release-ceph-quincy.noarch 1.0-2.el9 
 CEC_Rocky_Linux_9_Rocky_92_extras
centos-release-cloud.noarch   1-1.el9   
 CEC_Rocky_Linux_9_Rocky_92_extras

Which is pointing to 9-stream. (I do remember seeing "9s" in the repo names, 
but I didn't connect it with Stream, since I don't do Stream in production and, 
honestly, I don't have enough time at work to do Stream in test, so...)

>From /etc/yum.repos.d/CentOS-Ceph-Quincy.repo:
metalink=https://mirrors.centos.org/metalink?repo=centos-storage-sig-ceph-quincy-9-stream&arch=$basearch

Which is why I'm getting different dependencies. THAT I can take to the Rocky 
folks to get sorted. I can see where that would cause confusion, as it did in 
my case. When I originally installed Ceph, I was using RHEL, not Rocky and I 
didn't use (or have?) the Extras repo. I copied the repo over and edited it to 
point to Ceph Reef EL9, which installed fine -- and confused me further, but 
makes sense now since it wasn't for Stream.

I'll roll my own repo files and not use the centos-release-ceph-* from Extras. 
Hopefully, this saves someone else a bit of grief later!

Thanks!
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Eugen Block
I'm no programmer but if I understand [1] correctly it's an unsigned  
long long:



 int ImageCtx::snap_set(uint64_t in_snap_id) {


which means the max snap_id should be this:

2^64 = 18446744073709551616

Not sure if you can get your cluster to reach that limit, but I also  
don't know what would happen if you actually would reach it. I also  
might be misunderstanding so maybe someone with more knowledge can  
confirm oder correct me.


[1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328

Zitat von Tony Liu :


Hi,

There is a snap ID for each snapshot. How is this ID allocated, sequentially?
Did some tests, it seems this ID is per pool, starting from 4 and  
always going up.

Is that correct?
What's the max of this ID?
What's going to happen when ID reaches the max, going back to start  
from 4 again?



Thanks!
Tony
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Nathan Fish
2^64 byte in peta byte

= 18446.744073709551616 (peta⋅byte)

Assuming that a snapshot requires storing any data at all, which it
must, nobody has a Ceph cluster that could store that much snapshot
metadata even for empty snapshots.

On Fri, Aug 4, 2023 at 7:05 AM Eugen Block  wrote:
>
> I'm no programmer but if I understand [1] correctly it's an unsigned
> long long:
>
> >  int ImageCtx::snap_set(uint64_t in_snap_id) {
>
> which means the max snap_id should be this:
>
> 2^64 = 18446744073709551616
>
> Not sure if you can get your cluster to reach that limit, but I also
> don't know what would happen if you actually would reach it. I also
> might be misunderstanding so maybe someone with more knowledge can
> confirm oder correct me.
>
> [1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328
>
> Zitat von Tony Liu :
>
> > Hi,
> >
> > There is a snap ID for each snapshot. How is this ID allocated, 
> > sequentially?
> > Did some tests, it seems this ID is per pool, starting from 4 and
> > always going up.
> > Is that correct?
> > What's the max of this ID?
> > What's going to happen when ID reaches the max, going back to start
> > from 4 again?
> >
> >
> > Thanks!
> > Tony
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: snapshot timestamp

2023-08-04 Thread Ilya Dryomov
On Fri, Aug 4, 2023 at 7:49 AM Tony Liu  wrote:
>
> Hi,
>
> We know snapshot is on a point of time. Is this point of time tracked 
> internally by
> some sort of sequence number, or the timestamp showed by "snap ls", or 
> something else?

Hi Tony,

The timestamp in "rbd snap ls" output is the snapshot creation
timestamp.

>
> I noticed that when "deep cp", the timestamps of all snapshot are changed to 
> copy-time.

Correct -- exactly the same as the image creation timestamp (visible in
"rbd info" output).

> Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of 
> snapshot in
> the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will 
> actually bring me
> back to the state of 1PM. Is that correct?

Correct.

>
> If the above is true, I won't be able to rely on timestamp to track snapshots.
>
> Say I create a snapshot every hour and make a backup by copy at the end of 
> the day.
> Then the original image is damaged and backup is used to restore the work. On 
> this
> backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.?
> Any advices to track snapshots properly in such case?

I would suggest embedding that info along with any additional metadata
needed in the snapshot name.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9

2023-08-04 Thread Tobias Urdin
That’s a major misinterpretation of how it actually is in reality.

Sorry just had to state that, obviously not the proper mailing list to discuss 
it on.

Best regards
Tobias

> On 4 Aug 2023, at 09:25, Jens Galsgaard  wrote:
> 
> Your are right.
> 
> Centos stream is alpha
> Fedora is beta
> RHEL is stable
> 
> Alma/Rocky/Oracle are based on RHEL
> 
> Venlig hilsen - Mit freundlichen Grüßen - Kind Regards,
> Jens Galsgaard
> 
> Gitservice.dk 
> Mob: +45 28864340
> 
> 
> -Oprindelig meddelelse-
> Fra: Marc  
> Sendt: Friday, 4 August 2023 09.04
> Til: Konstantin Shalygin ; dobr...@gmu.edu
> Cc: ceph-users@ceph.io
> Emne: [ceph-users] Re: Ceph Quincy and liburing.so.2 on Rocky Linux 9
> 
> But Rocky Linux 9 is the continuation of what CentOS would have been on el9. 
> Afaik is ceph being developed on elX distributions and not the 'trial' stream 
> versions, not?
> 
> 
>> 
>> In most cases the 'Alternative' distro like Alma or Rocky have 
>> outdated versions of packages, if we compared it with CentOS Stream 8 
>> or CentOS Stream 9. For example is a golang package, on c8s is a 1.20 
>> version on Alma still 1.19
>> 
>> You can try to use c8s/c9s or try to contribute to your distro to 
>> resolve dependency issues
>> 
>> 
>>> 
>>> I've been digging and I can't see that this has come up anywhere.
>>> 
>>> I'm trying to update a client from Pacific 17.2.3-2 to 17.2.6-4 and
>> I'm getting the error
>>> 
>>> Error:
>>> Problem: cannot install the best update candidate for package ceph-
>> base-2:17.2.3-2.el9s.x86_64
>>> - nothing provides liburing.so.2()(64bit) needed by ceph-base-
>> 2:17.2.6-4.el9s.x86_64
>>> - nothing provides liburing.so.2(LIBURING_2.0)(64bit) needed by 
>>> ceph-
>> base-2:17.2.6-4.el9s.x86_64
>>> (try to add '--skip-broken' to skip uninstallable packages or '--
>> nobest' to use not only best candidate packages)
>>> 
>>> Did Ceph Pacific switch to requiring liburing 2? Rocky 9 only 
>>> provides
>> 0.7-7. CentOS stream seems to have 1.0.7-3 (at least back to when I 
>> set up that repo on Foreman; I don't remember if I'm keeping it up-to-date).
>>> 
>>> Can I/should I just do --nobest when updating? I could probably 
>>> build
>> it from a source RPM from another RH-based distro, but I'd rather keep 
>> it clean with the same distro.
>>> ___
> ___
> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
> ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: question about OSD onode hits ratio

2023-08-04 Thread Mark Nelson
Check to see what your osd_memory_target is set to.  The default 4GB is 
generally a decent starting point, but if you have a large active data 
set you might benefit from increasing the amount of memory available to 
the OSDs.  They'll generally prefer giving it to the onode cache first 
if it's hot.


*Note:  In some container based deployments the osd_memory_target might 
be getting set automatically based on the container limit (and possibly 
based on the memory available in the node).



Mark


On 8/2/23 11:25 PM, Ben wrote:

Hi,
We have a cluster running for a while. From grafana ceph dashboard, I saw
OSD onode hits ratio 92% when cluster was just up and running. After couple
month, it says now 70%. This is not a good trend I think. Just wondering
what should be done to stop this trend.

Many thank,
Ben
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


--
Best Regards,
Mark Nelson
Head of R&D (USA)

Clyso GmbH
p: +49 89 21552391 12
a: Loristraße 8 | 80335 München | Germany
w: https://clyso.com | e: mark.nel...@clyso.com

We are hiring: https://www.clyso.com/jobs/
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Natuilus: Taking out OSDs that are 'Failure Pending'

2023-08-04 Thread Dave Hall
Hello.  It's been a while.  I have a Nautilus cluster with 72 x 12GB HDD
OSDs (BlueStore) and mostly of EC 8+2 pools/PGs.  It's been working great -
some nodes went nearly 900 days without a reboot.

As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending
Failure'.  New drives are ordered and will be here next week.  There is a
procedure in the documentation for replacing an OSD, but I can't do that
directly until I receive the drives.

My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
but I want to confirm my understanding of Ceph's response to this.  Mainly,
given my EC pools (or replicated pools for that matter), if I mark all 3
OSD out all at once will I risk data loss?

If I have it right, marking an OSD out will simply cause Ceph to move all
of the PG shards from that OSD to other OSDs, so no major risk of data
loss.  However, if it would be better to do them one per day or something,
I'd rather be safe.

I also assume that I should wait for the rebalance to complete before I
initiate the replacement procedure.

Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Upgrading nautilus / centos7 to octopus / ubuntu 20.04. - Suggestions and hints? - Thanks

2023-08-04 Thread Götz Reinicke
Hi, thanks to all suggestions.

Right now, it is step by step that works: going to bionic/nautilus …and from 
that like Josh noted.

We encountered a problem which I'll post separately .

Best . Götz

> Am 03.08.2023 um 15:44 schrieb Beaman, Joshua :
> 
> We went through this exercise, though our starting point was ubuntu 16.04 / 
> nautilus.  We reduced our double builds as follows:
> 
> Rebuild each monitor host on 18.04/bionic and rejoin still on nautilus
> Upgrade all mons, mgrs., (and rgws optionally) to pacific
> Convert each mon, mgr, rgw to cephadm and enable orchestrator
> Rebuild each mon, mgr, rgw on 20.04/focal and rejoin pacfic cluster
> Drain and rebuild each osd host on focal and pacific
>  
> This has the advantage of only having to drain and rebuild the OSD hosts 
> once.  Double building the control cluster hosts isn’t so bad, and 
> orchestrator makes all of the ceph parts easy once it’s enabled.
>  
> The biggest challenge we ran into was: https://tracker.ceph.com/issues/51652 
> because we still had a lot of filestore osds.  It’s frustrating, but we 
> managed to get through it without much client interruption on a dozen prod 
> clusters, most of which were 38 osd hosts and 912 total osds each.  One thing 
> which helped, was, before beginning the osd host builds, set all of the old 
> osds primary-affinity to something <1.  This way when the new pacific (or 
> octopus) osds join the cluster they will automatically be favored for primary 
> on their pgs.  If a heartbeat timeout storm starts to get out of control, 
> start by setting nodown and noout.  The flapping osds are the worst.  Then 
> figure out which osds are the culprit and restart them.
>  
> Hopefully your nautilus osds are all bluestore and you won’t have this 
> problem.  We put up with it, because the filestore to bluestore conversion 
> was one of the most important parts of this upgrade for us.
>  
> Best of luck, whatever route you take.
>  
> Regards, 
> Josh Beaman



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] Natuilus: Taking out OSDs that are 'Failure Pending'

2023-08-04 Thread Beaman, Joshua
Marking them OUT first is the way to go.  As long as the osds stay UP, they can 
and will participate in the recovery.  How many you can mark out at one time 
will depend on how sensitive your client i/o is to background recovery, and all 
of the related tunings.  If you have the hours/days to spare, it is definitely 
easier on the cluster to do them one at a time.

Thank you,
Josh Beaman

From: Dave Hall 
Date: Friday, August 4, 2023 at 8:45 AM
To: ceph-users 
Cc: anthony.datri 
Subject: [EXTERNAL] [ceph-users] Natuilus: Taking out OSDs that are 'Failure 
Pending'
Hello.  It's been a while.  I have a Nautilus cluster with 72 x 12GB HDD
OSDs (BlueStore) and mostly of EC 8+2 pools/PGs.  It's been working great -
some nodes went nearly 900 days without a reboot.

As of yesterday I found that I have 3 OSDs with a Smart status of 'Pending
Failure'.  New drives are ordered and will be here next week.  There is a
procedure in the documentation for replacing an OSD, but I can't do that
directly until I receive the drives.

My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
but I want to confirm my understanding of Ceph's response to this.  Mainly,
given my EC pools (or replicated pools for that matter), if I mark all 3
OSD out all at once will I risk data loss?

If I have it right, marking an OSD out will simply cause Ceph to move all
of the PG shards from that OSD to other OSDs, so no major risk of data
loss.  However, if it would be better to do them one per day or something,
I'd rather be safe.

I also assume that I should wait for the rebalance to complete before I
initiate the replacement procedure.

Your thoughts?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] cephfs mount problem - client session lacks required features

2023-08-04 Thread Götz Reinicke
Hi,

During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating 
the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs 
again.

The clients says: mount error 13 = Permission denied

The cephmds log: lacks required features 0x1000 client supports 
0x00ff

The mds/mon is still centos7/nautilus. The Clients centos7 as well.

Any ideas ? Thx for suggestions and hints . Best Götz



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Dave Holland
On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote:
> My inclination is to mark these 3 OSDs 'OUT' before they crash completely,
> but I want to confirm my understanding of Ceph's response to this.  Mainly,
> given my EC pools (or replicated pools for that matter), if I mark all 3
> OSD out all at once will I risk data loss?

It depends on your crush map and failure domain layout. In the
unlikeliest and unluckiest case, all those 3 OSDs are in different
failure domains, and some data has 1 replica on each of those OSDs. In
that situation, if you take them out simultaneously, you would lose
data. If you're unsure, then do them one at a time and wait for the
rebalance/backfill to complete before doing the next.

We arrange our OSDs so that the failure domain is the rack; losing an
entire rack is safe (and we've had that happen) so we know it's safe
to pull any number of OSDs in the same rack and we won't lose data.

Dave
-- 
**   Dave Holland   ** Systems Support -- Informatics Systems Group **
** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**


-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is Wellcome Sanger Institute, Wellcome Genome Campus, 
 Hinxton, CB10 1SA.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features

2023-08-04 Thread Beaman, Joshua
We did not have any cephfs or mds involved.  But since you haven’t even started 
a ceph upgrade in earnest, I have to wonder about your nautilus versions.  
Maybe you have a mismatch there?

I would definitely share the output of `ceph versions` and `ceph features`.  If 
you’re not 14.2.22 across the board, I would at least upgrade your mon, mgr, 
and mds services.  Then check release notes to see if there’s any clues there.

Thank you,
Josh Beaman

From: Götz Reinicke 
Date: Friday, August 4, 2023 at 9:02 AM
To: ceph-users@ceph.io 
Subject: [EXTERNAL] [ceph-users] cephfs mount problem - client session lacks 
required features
Hi,

During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still updating 
the MONs) I got a cephfs client who refuses or is refused to mount the ceph fs 
again.

The clients says: mount error 13 = Permission denied

The cephmds log: lacks required features 0x1000 client supports 
0x00ff

The mds/mon is still centos7/nautilus. The Clients centos7 as well.

Any ideas ? Thx for suggestions and hints . Best Götz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] cephfs mount problem - client session lacks required features - solved

2023-08-04 Thread Götz Reinicke
Hi Josh,

Thanks for your feedback. We did a restart of the active MDS and the error / 
problem is gone.

Best . Götz


> Am 04.08.2023 um 16:19 schrieb Beaman, Joshua :
> 
> We did not have any cephfs or mds involved.  But since you haven’t even 
> started a ceph upgrade in earnest, I have to wonder about your nautilus 
> versions.  Maybe you have a mismatch there?
>  
> I would definitely share the output of `ceph versions` and `ceph features`.  
> If you’re not 14.2.22 across the board, I would at least upgrade your mon, 
> mgr, and mds services.  Then check release notes to see if there’s any clues 
> there.
>  
> Thank you,
> Josh Beaman
>  
> From: Götz Reinicke 
> Date: Friday, August 4, 2023 at 9:02 AM
> To: ceph-users@ceph.io 
> Subject: [EXTERNAL] [ceph-users] cephfs mount problem - client session lacks 
> required features
> 
> Hi,
>  
> During the upgrade from centos7/nautilus to ubuntu 18/nautilus (still 
> updating the MONs) I got a cephfs client who refuses or is refused to mount 
> the ceph fs again.
>  
> The clients says: mount error 13 = Permission denied
>  
> The cephmds log: lacks required features 0x1000 client supports 
> 0x00ff
>  
> The mds/mon is still centos7/nautilus. The Clients centos7 as well.
>  
> Any ideas ? Thx for suggestions and hints . Best Götz



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Dave Hall
Dave,

Actually, my failure domain is OSD since I so far only have 9 OSD nodes but
EC 8+2.  However, the drives are still functioning, except that one has
failed multiple times in the last few days, requiring a node power-cycle to
recover.  I will certainly mark that one out immediately.

The other two pending failures are behaving more politely, so I am assuming
that the cluster could copy the data elsewhere as part of the rebalance.  I
think I'm also concerned about the rebalance process moving data to these
drives with pending failures.

Since I'm EC 8+2, perhaps it is safe to mark two out simultaneously?

Thanks.

-Dave

--
Dave Hall
Binghamton University
kdh...@binghamton.edu

On Fri, Aug 4, 2023 at 10:16 AM Dave Holland  wrote:

> On Fri, Aug 04, 2023 at 09:44:57AM -0400, Dave Hall wrote:
> > My inclination is to mark these 3 OSDs 'OUT' before they crash
> completely,
> > but I want to confirm my understanding of Ceph's response to this.
> Mainly,
> > given my EC pools (or replicated pools for that matter), if I mark all 3
> > OSD out all at once will I risk data loss?
>
> It depends on your crush map and failure domain layout. In the
> unlikeliest and unluckiest case, all those 3 OSDs are in different
> failure domains, and some data has 1 replica on each of those OSDs. In
> that situation, if you take them out simultaneously, you would lose
> data. If you're unsure, then do them one at a time and wait for the
> rebalance/backfill to complete before doing the next.
>
> We arrange our OSDs so that the failure domain is the rack; losing an
> entire rack is safe (and we've had that happen) so we know it's safe
> to pull any number of OSDs in the same rack and we won't lose data.
>
> Dave
> --
> **   Dave Holland   ** Systems Support -- Informatics Systems Group **
> ** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**
>
>
> --
>  The Wellcome Sanger Institute is operated by Genome Research
>  Limited, a charity registered in England with number 1021457 and a
>  company registered in England with number 2742969, whose registered
>  office is Wellcome Sanger Institute, Wellcome Genome Campus,
>  Hinxton, CB10 1SA.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What's the max of snap ID?

2023-08-04 Thread Tony Liu
Thank you Eugen and Nathan!
uint64 is big enough, no concerns any more.

Tony

From: Nathan Fish 
Sent: August 4, 2023 04:19 AM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: [ceph-users] Re: What's the max of snap ID?

2^64 byte in peta byte

= 18446.744073709551616 (peta⋅byte)

Assuming that a snapshot requires storing any data at all, which it
must, nobody has a Ceph cluster that could store that much snapshot
metadata even for empty snapshots.

On Fri, Aug 4, 2023 at 7:05 AM Eugen Block  wrote:
>
> I'm no programmer but if I understand [1] correctly it's an unsigned
> long long:
>
> >  int ImageCtx::snap_set(uint64_t in_snap_id) {
>
> which means the max snap_id should be this:
>
> 2^64 = 18446744073709551616
>
> Not sure if you can get your cluster to reach that limit, but I also
> don't know what would happen if you actually would reach it. I also
> might be misunderstanding so maybe someone with more knowledge can
> confirm oder correct me.
>
> [1] https://github.com/ceph/ceph/blob/main/src/librbd/ImageCtx.cc#L328
>
> Zitat von Tony Liu :
>
> > Hi,
> >
> > There is a snap ID for each snapshot. How is this ID allocated, 
> > sequentially?
> > Did some tests, it seems this ID is per pool, starting from 4 and
> > always going up.
> > Is that correct?
> > What's the max of this ID?
> > What's going to happen when ID reaches the max, going back to start
> > from 4 again?
> >
> >
> > Thanks!
> > Tony
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: snapshot timestamp

2023-08-04 Thread Tony Liu
Thank you Ilya for confirmation!

Tony

From: Ilya Dryomov 
Sent: August 4, 2023 04:51 AM
To: Tony Liu
Cc: d...@ceph.io; ceph-users@ceph.io
Subject: Re: [ceph-users] snapshot timestamp

On Fri, Aug 4, 2023 at 7:49 AM Tony Liu  wrote:
>
> Hi,
>
> We know snapshot is on a point of time. Is this point of time tracked 
> internally by
> some sort of sequence number, or the timestamp showed by "snap ls", or 
> something else?

Hi Tony,

The timestamp in "rbd snap ls" output is the snapshot creation
timestamp.

>
> I noticed that when "deep cp", the timestamps of all snapshot are changed to 
> copy-time.

Correct -- exactly the same as the image creation timestamp (visible in
"rbd info" output).

> Say I create a snapshot at 1PM and make a copy at 3PM, the timestamp of 
> snapshot in
> the copy is 3PM. If I rollback the copy to this snapshot, I'd assume it will 
> actually bring me
> back to the state of 1PM. Is that correct?

Correct.

>
> If the above is true, I won't be able to rely on timestamp to track snapshots.
>
> Say I create a snapshot every hour and make a backup by copy at the end of 
> the day.
> Then the original image is damaged and backup is used to restore the work. On 
> this
> backup image, how do I know which snapshot was on 1PM, which was on 2PM, etc.?
> Any advices to track snapshots properly in such case?

I would suggest embedding that info along with any additional metadata
needed in the snapshot name.

Thanks,

Ilya
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [External Email] Re: Natuilus: Taking out OSDs that are 'Failure Pending' [EXT]

2023-08-04 Thread Tyler Stachecki
On Fri, Aug 4, 2023 at 11:33 AM Dave Hall  wrote:
>
> Dave,
>
> Actually, my failure domain is OSD since I so far only have 9 OSD nodes but
> EC 8+2.  However, the drives are still functioning, except that one has
> failed multiple times in the last few days, requiring a node power-cycle to
> recover.  I will certainly mark that one out immediately.
>
> The other two pending failures are behaving more politely, so I am assuming
> that the cluster could copy the data elsewhere as part of the rebalance.  I
> think I'm also concerned about the rebalance process moving data to these
> drives with pending failures.
>
> Since I'm EC 8+2, perhaps it is safe to mark two out simultaneously?

Dave,

You should be able to mark out two OSDs simultaneously without worry
as long as you have enough space, etc. When you mark an OSD out, it
still participates in the cluster as long as the OSD remains up and is
able to aid in the backfilling process. Thus, you'll also want to
avoid stopping/downing the OSDs until backfilling completes. Following
that logic: if you stop both OSDs before backfilling completes, you
will put yourself in a bad spot.

If all PGs are active+clean, you may both a) out the two OSDs and b)
stop/down *only the one* imminently failing OSD (leaving the second
OSD being drained still up) and things should also be fine... but you
will be vulnerable to blocked ops/unavailable data if _subsequent_
OSDs fail unexpectedly, including the second OSD being out'd,
depending upon your CRUSH map and cluster status.

Note that if your intent is to purge the OSD after it is drained, I
believe you should do a `ceph osd crush reweight osd.X 0` and not an
`ceph out osd.X` or `ceph osd reweight osd.X 0` as it should result in
slightly less net data movement.

Cheers,
Tyler
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] snaptrim number of objects

2023-08-04 Thread Angelo Höngens
Hey guys,

I'm trying to figure out what's happening to my backup cluster that
often grinds to a halt when cephfs automatically removes snapshots.
Almost all OSD's go to 100% CPU, ceph complains about slow ops, and
CephFS stops doing client i/o.

I'm graphing the cumulative value of the snaptrimq_len value, and that
slowly decreases over time. One night it takes an hour, but other
days, like today, my cluster has been down for almost 20 hours, and I
think we're half way. Funny thing is that in both cases, the
snaptrimq_len value initially goes to the same value, around 3000, and
then slowly decreases, but my guess is that the number of objects that
need to be trimmed varies hugely every day.

Is there a way to show the size of cephfs snapshots, or get the number
of objects or bytes that need snaptrimming? Perhaps I can graph that
and see where the differences are.

That won't explain why my cluster bogs down, but at least it gives
some visibility. Running 17.2.6 everywhere by the way.

Angelo.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io