[ceph-users] Re: PG Sizing Question

2023-02-28 Thread Anthony D'Atri
This can be subtle and is easy to mix up.

The “PG ratio” is intended to be the number of PGs hosted on each OSD, plus or 
minus a few.

Note how I phrased that, it’s not the number of PGs divided by the number of 
OSDs.  Remember that PGs are replicated.

While each PG belongs to exactly one pool, for purposes of estimating pg_num, 
we calculate the desired aggregate number of PGs on this ratio, then divide 
that up among pools, ideally split into powers of 2 per pool, relative to the 
amount of data in the pool.

You can run `ceph osd df` and see the number of PGs on each OSD.  There will be 
some variance, but consider the average.

This venerable calculator:

https://old.ceph.com/pgcalc/
PGCalc
old.ceph.com

can help get a feel for how this works.

100 is the official party line, it used to be 200.  More PGs means more memory 
use; too few has various other drawbacks.

PGs can in part be thought of as parallelism domains; more PGs means more 
parallelism.  So on HDDs, a ratio in the 100-200 range is IMHO reasonable.  
SAS/SATA OSDs 200-300, NVMe OSDs perhaps higher, though perhaps not if each 
device hosts more than one OSD (which should only ever be done on NVMe devices).

Your numbers below are probably ok for HDDs, you might bump the pool with the 
most data up to the next power of 2 if these are SSDs.

The pgcalc above includes parameters for what fraction of the cluster’s data 
each pool contains.  A pool with 5% of the data needs fewer PGs than a pool 
with 50% of the cluster’s data.

Others may well have different perspectives, this is something where opinions 
vary.  The pg_autoscaler in bulk mode can automate this, if one is prescient 
with feeding it parameters.



> On Feb 28, 2023, at 9:23 PM, Deep Dish  wrote:
> 
> Hello
> 
> 
> 
> Looking to get some official guidance on PG and PGP sizing.
> 
> 
> 
> Is the goal to maintain approximately 100 PGs per OSD per pool or for the
> cluster general?
> 
> 
> 
> Assume the following scenario:
> 
> 
> 
> Cluster with 80 OSD across 8 nodes;
> 
> 3 Pools:
> 
> -   Pool1 = Replicated 3x
> 
> -   Pool2 = Replicated 3x
> 
> -   Pool3 = Erasure Coded 6-4
> 
> 
> 
> 
> 
> Assuming the well published formula:
> 
> 
> 
> Let (Target PGs / OSD) = 100
> 
> 
> 
> [ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)
> 
> 
> 
> -   Pool1 = (100*80)/3 = 2666.67 => 4096
> 
> -   Pool2 = (100*80)/3 = 2666.67 => 4096
> 
> -   Pool3 = (100*80)/10 = 800 => 1024
> 
> 
> 
> Total cluster would have 9216 PGs and PGPs.
> 
> 
> Are there any implications (performance / monitor / MDS / RGW sizing) with
> how many PGs are created on the cluster?
> 
> 
> 
> Looking for validation and / or clarification of the above.
> 
> 
> 
> Thank you.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-02-28 Thread Konstantin Shalygin
Hi,

You can try [1] geesefs project, the presentation for this code is here [2]


[1] https://github.com/yandex-cloud/geesefs
[2] 
https://yourcmc-ru.translate.goog/geesefs-2022/highload.html?_x_tr_sl=ru&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp
k

> On 28 Feb 2023, at 22:31, Marc  wrote:
> 
> Anyone know of a s3 compatible interface that I can just run, and 
> reads/writes files from a local file system and not from object storage?

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] PG Sizing Question

2023-02-28 Thread Deep Dish
Hello



Looking to get some official guidance on PG and PGP sizing.



Is the goal to maintain approximately 100 PGs per OSD per pool or for the
cluster general?



Assume the following scenario:



Cluster with 80 OSD across 8 nodes;

3 Pools:

-   Pool1 = Replicated 3x

-   Pool2 = Replicated 3x

-   Pool3 = Erasure Coded 6-4





Assuming the well published formula:



Let (Target PGs / OSD) = 100



[ (Target PGs / OSD) * (# of OSDs) ] / (Replica Size)



-   Pool1 = (100*80)/3 = 2666.67 => 4096

-   Pool2 = (100*80)/3 = 2666.67 => 4096

-   Pool3 = (100*80)/10 = 800 => 1024



Total cluster would have 9216 PGs and PGPs.


Are there any implications (performance / monitor / MDS / RGW sizing) with
how many PGs are created on the cluster?



Looking for validation and / or clarification of the above.



Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Mark Nelson
One thing to watch out for with bluefs_buffered_io is that disabling it 
can greatly impact certain rocksdb workloads.  From what I remember it 
was a huge problem during certain iteration workloads for things like 
collection listing.  I think the block cache was being invalidated or 
simply never cached the data properly, but the underlying code has 
changed quite a bit so it was tough to track down exactly how it worked 
in different versions of RocksDB.


We got bitten by this a couple years ago when we switch to direct IO and 
it caused a lot of people trouble.  We ended up having to turn it back 
on after lots of frustration and digging  Basically the linux page cache 
is saving the day even though it really shouldn't be necessary.  It's 
irritating because bluestore is otherwise faster in many scenarios (at 
least with NVMe drives) when bluefs uses direct IO.


Mark

On 2/28/23 15:46, Boris Behrens wrote:

Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.


Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen <
jbaer...@digitalocean.com>:


Hi Boris,

OK, what I'm wondering is whether
https://tracker.ceph.com/issues/58530 is involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, and thus the things I mention in the next
paragraph may or may not be applicable to you.

There's no known workaround or solution for this at this time. In some
cases I've seen that disabling bluefs_buffered_io (which itself can
cause IOPS amplification in some cases) can help; I think most folks
do this by setting it in local conf and then restarting OSDs in order
to gain the config change. Something else to consider is

https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
,
as sometimes disabling these write caches can improve the IOPS
performance of SSDs.

Josh

On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  wrote:


Hi Josh,
we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.



Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <

jbaer...@digitalocean.com>:


Hi Boris,

Which version did you upgrade from and to, specifically? And what
workload are you running (RBD, etc.)?

Josh

On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:


Hi,
today I did the first update from octopus to pacific, and it looks

like the

avg apply latency went up from 1ms to 2ms.

All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config

value?


Cheers
  Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




--
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im

groüen Saal.





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Reed Dier
> Yeah, there seems to be a fear that attempting to repair those will 
> negatively impact performance even more. I disagree and think we should do 
> them immediately.

There shouldn’t really be too much of a noticeable performance hit.
Some good documentation here 
.

> The general feeling is that we're stuck on luminous and that it's destructive 
> to upgrade to anything else.
> I refuse to believe that is true.
> At least if we upgraded everything to 12.2.3 we'd have the 'balancer' stuff 
> that came with I think 12.2.2...

Upgrades are definitely not destructive, however, they also aren’t trivial.
You can upgrade 2 releases at a time, but the distro’s those packages are for 
may vary release to release.

For example, if you were to want to get to Quincy from Luminous, you should be 
able to step from Luminous (12) to Nautilus (14), then to Pacific (16), and on 
to Quincy (17) if you wanted.
However, your Luminous install may be on Ubuntu 14.04 or 16.04, which you can 
immediately move to Nautilus with.
To get to Pacific, you’re going to then need to move to Ubuntu 18.04 (Nautilus 
compatible), and then on to Pacific.
If you then wanted to move to Quincy, you then need to upgrade to Ubuntu 20.04, 
before moving on to Quincy with 20.04.

This probably sounds daunting, and it is certainly non-trivial, but definitely 
doable if you take things in small steps, and should be possible with no 
downtime if planned out.

> Also, there seems to be a belief that bluestore is an 'all-or-nothing' 
> proposition
> Yet I see that you can have a mixture of both in your deployments 

You can mix filestore and bluestore OSDs in your cluster, however — 

> […] and that it's impossible to migrate from filestore to bluestore.

> […] and it's indeed possible to migrate from filestore to bluestore.

If you have filestore OSDs, the only way to migrate them to bluestore is by 
destroying the OSD, and recreating it as bluestore, see here 
.
This can be a time consuming process if you drain an OSD, let it backfill off, 
blow it away, recreate, and then bring data back.
This can also prove to be IO expensive as well if your ceph cluster is already 
IO saturated, due to all of the backfill IO on top of the client IO.

> TL;DR -- there is a *lot* of fear of touching this thing because nobody is 
> truly an 'expert' in it atm.
> But not touching it is why we've gotten ourselves into a situation with 
> broken stuff and horrendous performance.

Given how critical (and brittle) this infrastructure is sounding to your org, 
it might be best to pull in some experts , and I 
think most on the mailing list would likely recommend Croit as a good place to 
start outside of any existing support contracts.

Hope thats helpful,
Reed

> On Feb 28, 2023, at 1:11 PM, Dave Ingram  wrote:
> 
> 
> On Tue, Feb 28, 2023 at 12:56 PM Reed Dier  > wrote:
> I think a few other things that could help would be `ceph osd df tree` which 
> will show the hierarchy across different crush domains.
> 
> Good idea: https://pastebin.com/y07TKt52 
>  
> And if you’re doing something like erasure coded pools, or something other 
> than replication 3, maybe `ceph osd crush rule dump` may provide some further 
> context with the tree output.
> 
> No erasure coded pools - all replication.
>  
> 
> Also, the cluster is running Luminous (12) which went EOL 3 years ago 
> tomorrow 
> .
> So there are also likely a good bit of improvements all around under the hood 
> to be gained by moving forward from Luminous.
> 
> Yes, nobody here wants to touch upgrading this at all - too terrified of 
> breaking things. This ceph deployment is serving several hundred VMs.
> 
> The general feeling is that we're stuck on luminous and that it's destructive 
> to upgrade to anything else. I refuse to believe that is true. At least if we 
> upgraded everything to 12.2.3 we'd have the 'balancer' stuff that came with I 
> think 12.2.2...
> 
> What would you recommend upgrading luminous to?
>  
> Though, I would say take care of the scrub errors prior to doing any major 
> upgrades, as well as checking your upgrade path (can only upgrade two 
> releases at a time, if you have filestore OSDs, etc).
> 
> Yeah, there seems to be a fear that attempting to repair those will 
> negatively impact performance even more. I disagree and think we should do 
> them immediately.
> 
> Also, there seems to be a belief that bluestore is an 'all-or-nothing' 
> proposition and that it's impossible to migrate from filestore to bluestore. 
> Yet I see that you can have a mixture of both in your deployments and it's 
> indeed possible to migrate from filestore to bluestore.
> 
> 

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Boris Behrens
Hi Josh,
thanks a lot for the breakdown and the links.
I disabled the write cache but it didn't change anything. Tomorrow I will
try to disable bluefs_buffered_io.

It doesn't sound that I can mitigate the problem with more SSDs.


Am Di., 28. Feb. 2023 um 15:42 Uhr schrieb Josh Baergen <
jbaer...@digitalocean.com>:

> Hi Boris,
>
> OK, what I'm wondering is whether
> https://tracker.ceph.com/issues/58530 is involved. There are two
> aspects to that ticket:
> * A measurable increase in the number of bytes written to disk in
> Pacific as compared to Nautilus
> * The same, but for IOPS
>
> Per the current theory, both are due to the loss of rocksdb log
> recycling when using default recovery options in rocksdb 6.8; Octopus
> uses version 6.1.2, Pacific uses 6.8.1.
>
> 16.2.11 largely addressed the bytes-written amplification, but the
> IOPS amplification remains. In practice, whether this results in a
> write performance degradation depends on the speed of the underlying
> media and the workload, and thus the things I mention in the next
> paragraph may or may not be applicable to you.
>
> There's no known workaround or solution for this at this time. In some
> cases I've seen that disabling bluefs_buffered_io (which itself can
> cause IOPS amplification in some cases) can help; I think most folks
> do this by setting it in local conf and then restarting OSDs in order
> to gain the config change. Something else to consider is
>
> https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches
> ,
> as sometimes disabling these write caches can improve the IOPS
> performance of SSDs.
>
> Josh
>
> On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  wrote:
> >
> > Hi Josh,
> > we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.
> >
> >
> >
> > Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <
> jbaer...@digitalocean.com>:
> >>
> >> Hi Boris,
> >>
> >> Which version did you upgrade from and to, specifically? And what
> >> workload are you running (RBD, etc.)?
> >>
> >> Josh
> >>
> >> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:
> >> >
> >> > Hi,
> >> > today I did the first update from octopus to pacific, and it looks
> like the
> >> > avg apply latency went up from 1ms to 2ms.
> >> >
> >> > All 36 OSDs are 4TB SSDs and nothing else changed.
> >> > Someone knows if this is an issue, or am I just missing a config
> value?
> >> >
> >> > Cheers
> >> >  Boris
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> >
> > --
> > Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
> groüen Saal.
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph quincy nvme drives displayed in device list sata ssd not displayed

2023-02-28 Thread Chris Brown
Hello,

 

Setting up first ceph cluster in lab.

Rocky 8.6

Ceph quincy

Using curl install method

Following cephadm deployment steps

 

Everything works as expected except 

ceph orch device ls --refresh

Only displays nvme devices and not the sata ssds on the ceph host.

 

Tried

sgdisk --zap-all /dev/sda

 

wipefs -a /dev/sda

 

Adding sata osd manually I get:

ceph orch daemon add osd ceph-a:data_devices=/dev/sda

Created no osd(s) on host ceph-a; already created?

 

nvme osd gets added without issue.

 

I have looked in the volume log on the node and monitor log on the admin
server and have not seen anything that seems like an obvious clue.

I can see commands running successfully against /dev/sda in the logs.

 

Ideas?

 

Thanks,

 

cb

 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Mark Nelson



On 2/28/23 13:11, Dave Ingram wrote:

On Tue, Feb 28, 2023 at 12:56 PM Reed Dier  wrote:


I think a few other things that could help would be `ceph osd df tree`
which will show the hierarchy across different crush domains.



Good idea: https://pastebin.com/y07TKt52


Yeah, it looks like OSD.147 has over 3x the amount of data on it vs some 
of the smaller HDD OSDs.  I bet it's getting hammered.  Are the drives 
different rotational speeds?  That's going to hurt too, especially if 
the bigger drives are slower and you aren't using flash for Journals/WALs.


You might want to look at the device queue wait times and see which 
drives are slow to service IOs.  I suspect it will be 147 leading the 
pack with the other 16TB drives following.  You never know though, 
sometimes you see an odd one that's slow but not showing smartctl errors 
yet.


Mark






And if you’re doing something like erasure coded pools, or something other
than replication 3, maybe `ceph osd crush rule dump` may provide some
further context with the tree output.



No erasure coded pools - all replication.




Also, the cluster is running Luminous (12) which went EOL 3 years ago
tomorrow
.
So there are also likely a good bit of improvements all around under the
hood to be gained by moving forward from Luminous.



Yes, nobody here wants to touch upgrading this at all - too terrified of
breaking things. This ceph deployment is serving several hundred VMs.

The general feeling is that we're stuck on luminous and that it's
destructive to upgrade to anything else. I refuse to believe that is true.
At least if we upgraded everything to 12.2.3 we'd have the 'balancer' stuff
that came with I think 12.2.2...

What would you recommend upgrading luminous to?



Though, I would say take care of the scrub errors prior to doing any major
upgrades, as well as checking your upgrade path (can only upgrade two
releases at a time, if you have filestore OSDs, etc).



Yeah, there seems to be a fear that attempting to repair those will
negatively impact performance even more. I disagree and think we should do
them immediately.

Also, there seems to be a belief that bluestore is an 'all-or-nothing'
proposition and that it's impossible to migrate from filestore to
bluestore. Yet I see that you can have a mixture of both in your
deployments and it's indeed possible to migrate from filestore to bluestore.

TL;DR -- there is a *lot* of fear of touching this thing because nobody is
truly an 'expert' in it atm. But not touching it is why we've gotten
ourselves into a situation with broken stuff and horrendous performance.

Thanks Reed!
-Dave




-Reed

On Feb 28, 2023, at 11:12 AM, Dave Ingram  wrote:

There is a
lot of variability in drive sizes - two different sets of admins added
disks sized between 6TB and 16TB and I suspect this and imbalanced
weighting is to blame.

CEPH OSD DF:

(not going to paste that all in here): https://pastebin.com/CNW5RKWx

What else am I missing in terms of what to share with you all?

Thanks all,
-Dave
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-02-28 Thread Fox, Kevin M
Minio no longer lets you read / write from the posix side. Only through minio 
itself. :(

Haven't found a replacement yet. If you do, please let me know.

Thanks,
Kevin


From: Robert Sander 
Sent: Tuesday, February 28, 2023 9:37 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: s3 compatible interface

Check twice before you click! This email originated from outside PNNL.


On 28.02.23 16:31, Marc wrote:
>
> Anyone know of a s3 compatible interface that I can just run, and 
> reads/writes files from a local file system and not from object storage?

Have a look at Minio:

https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmin.io%2Fproduct%2Foverview%23architecture=05%7C01%7Ckevin.fox%40pnnl.gov%7Cfbffadde8e0a45e1d18308db19b2b714%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638132027594291339%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=uPhkVghMl%2B%2BU75ddjwv9FMaLlAHO4GgkcreH5bZFIm0%3D=0

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.heinlein-support.de%2F=05%7C01%7Ckevin.fox%40pnnl.gov%7Cfbffadde8e0a45e1d18308db19b2b714%7Cd6faa5f90ae240338c0130048a38deeb%7C0%7C0%7C638132027594291339%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C=ciJR1pAWHTbtBbpJJ6GDtcBl7pUJdnU8C5ZBLoWlcaM%3D=0

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Dave Ingram
On Tue, Feb 28, 2023 at 12:56 PM Reed Dier  wrote:

> I think a few other things that could help would be `ceph osd df tree`
> which will show the hierarchy across different crush domains.
>

Good idea: https://pastebin.com/y07TKt52


> And if you’re doing something like erasure coded pools, or something other
> than replication 3, maybe `ceph osd crush rule dump` may provide some
> further context with the tree output.
>

No erasure coded pools - all replication.


>
> Also, the cluster is running Luminous (12) which went EOL 3 years ago
> tomorrow
> .
> So there are also likely a good bit of improvements all around under the
> hood to be gained by moving forward from Luminous.
>

Yes, nobody here wants to touch upgrading this at all - too terrified of
breaking things. This ceph deployment is serving several hundred VMs.

The general feeling is that we're stuck on luminous and that it's
destructive to upgrade to anything else. I refuse to believe that is true.
At least if we upgraded everything to 12.2.3 we'd have the 'balancer' stuff
that came with I think 12.2.2...

What would you recommend upgrading luminous to?


> Though, I would say take care of the scrub errors prior to doing any major
> upgrades, as well as checking your upgrade path (can only upgrade two
> releases at a time, if you have filestore OSDs, etc).
>

Yeah, there seems to be a fear that attempting to repair those will
negatively impact performance even more. I disagree and think we should do
them immediately.

Also, there seems to be a belief that bluestore is an 'all-or-nothing'
proposition and that it's impossible to migrate from filestore to
bluestore. Yet I see that you can have a mixture of both in your
deployments and it's indeed possible to migrate from filestore to bluestore.

TL;DR -- there is a *lot* of fear of touching this thing because nobody is
truly an 'expert' in it atm. But not touching it is why we've gotten
ourselves into a situation with broken stuff and horrendous performance.

Thanks Reed!
-Dave


>
> -Reed
>
> On Feb 28, 2023, at 11:12 AM, Dave Ingram  wrote:
>
> There is a
> lot of variability in drive sizes - two different sets of admins added
> disks sized between 6TB and 16TB and I suspect this and imbalanced
> weighting is to blame.
>
> CEPH OSD DF:
>
> (not going to paste that all in here): https://pastebin.com/CNW5RKWx
>
> What else am I missing in terms of what to share with you all?
>
> Thanks all,
> -Dave
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Dave Ingram
When I suggested this to the senior admin here I was told that was a bad
idea because it would negatively impact performance.

Is that true? I thought all that would do was accept the information from
the other 2 OSDs and the one with the errors would rebuild the record.

The underlying disks don't appear to have actual catastrophic errors based
on smartctl and other tools.

On Tue, Feb 28, 2023 at 12:21 PM Janne Johansson 
wrote:

> Den tis 28 feb. 2023 kl 18:13 skrev Dave Ingram :
> > There are also several
> > scrub errors. In short, it's a complete wreck.
> >
> > health: HEALTH_ERR
> > 3 scrub errors
> > Possible data damage: 3 pgs inconsistent
>
>
> > [root@ceph-admin davei]# ceph health detail
> > HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
> > OSD_SCRUB_ERRORS 3 scrub errors
> > PG_DAMAGED Possible data damage: 3 pgs inconsistent
> > pg 2.8a is active+clean+inconsistent, acting [13,152,127]
> > pg 2.ce is active+clean+inconsistent, acting [145,13,152]
> > pg 2.e8 is active+clean+inconsistent, acting [150,162,42]
>
> You can ask the cluster to repair those three,
> "ceph pg repair 2.8a"
> "ceph pg repair 2.ce"
> "ceph pg repair 2.e8"
>
> and they should start fixing themselves.
>
> --
> May the most significant bit of your life be positive.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Reed Dier
I think a few other things that could help would be `ceph osd df tree` which 
will show the hierarchy across different crush domains.
And if you’re doing something like erasure coded pools, or something other than 
replication 3, maybe `ceph osd crush rule dump` may provide some further 
context with the tree output.

Also, the cluster is running Luminous (12) which went EOL 3 years ago tomorrow 
.
So there are also likely a good bit of improvements all around under the hood 
to be gained by moving forward from Luminous.
Though, I would say take care of the scrub errors prior to doing any major 
upgrades, as well as checking your upgrade path (can only upgrade two releases 
at a time, if you have filestore OSDs, etc).

-Reed

> On Feb 28, 2023, at 11:12 AM, Dave Ingram  wrote:
> 
> There is a
> lot of variability in drive sizes - two different sets of admins added
> disks sized between 6TB and 16TB and I suspect this and imbalanced
> weighting is to blame.
> 
> CEPH OSD DF:
> 
> (not going to paste that all in here): https://pastebin.com/CNW5RKWx
> 
> What else am I missing in terms of what to share with you all?
> 
> Thanks all,
> -Dave
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph OSD imbalance and performance

2023-02-28 Thread Janne Johansson
Den tis 28 feb. 2023 kl 18:13 skrev Dave Ingram :
> There are also several
> scrub errors. In short, it's a complete wreck.
>
> health: HEALTH_ERR
> 3 scrub errors
> Possible data damage: 3 pgs inconsistent


> [root@ceph-admin davei]# ceph health detail
> HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
> OSD_SCRUB_ERRORS 3 scrub errors
> PG_DAMAGED Possible data damage: 3 pgs inconsistent
> pg 2.8a is active+clean+inconsistent, acting [13,152,127]
> pg 2.ce is active+clean+inconsistent, acting [145,13,152]
> pg 2.e8 is active+clean+inconsistent, acting [150,162,42]

You can ask the cluster to repair those three,
"ceph pg repair 2.8a"
"ceph pg repair 2.ce"
"ceph pg repair 2.e8"

and they should start fixing themselves.

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-02-28 Thread Jens Galsgaard
Also look at truenas core which has minio built in.

Venlig hilsen - Mit freundlichen Grüßen - Kind Regards,
Jens Galsgaard

Gitservice.dk 
+45 28864340

-Oprindelig meddelelse-
Fra: Robert Sander  
Sendt: 28. februar 2023 18:38
Til: ceph-users@ceph.io
Emne: [ceph-users] Re: s3 compatible interface

On 28.02.23 16:31, Marc wrote:
> 
> Anyone know of a s3 compatible interface that I can just run, and 
> reads/writes files from a local file system and not from object storage?

Have a look at Minio:

https://min.io/product/overview#architecture

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin 
___
ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to 
ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Do not use SSDs with (small) SLC cache

2023-02-28 Thread Matthew Stroud
A bit late to the game, but I'm not sure if it is your drives. I had a very 
similar issue to yours on enterprise drives (not that means much outside of 
support).

What I was seeing is that a rebuild would kick off, PGs would instantly start 
to become laggy and then our clients (openstack rbd) would start getting hit by 
a slow-requests since it would start read locking the osd because of the 
expiring lease. This was felt by the client and was a production issue that 
cost the company. I spent weeks trying to tune the mClock profile, including 
the overall io cap, rebuild cap, and recovery cap (clients were unlimited the 
whole time). None of it really worked, so I switched to wpq. With that one 
configuration switch, all the problems went away with no real impact to rebuild 
time.

To clarify I don't really care for fast rebuilds as long as the rebuild time 
was in a reasonable, but the client impact was just killing us.

I also could never get this to trip unless it was during a recovery or rebuild. 
I could slam our cluster with 100k iops (random or sequential) from a bunch of 
different clients, which is about 50x our normal load (yeah, I know this 
cluster is massively over built in terms of performance), and there were zero 
issues.

In our use case we have flagged mClock as unstable. Since we want this to work 
because the concept is awesome, we will retest at the 18.2.


From: Michael Wodniok 
Sent: Tuesday, February 21, 2023 12:53 AM
To: ceph-users@ceph.io 
Subject: [ceph-users] Do not use SSDs with (small) SLC cache

Hi all,

digging around debugging, why our (small: 10 Hosts/~60 OSDs) cluster is so slow 
even while recovering I found out one of our key issues are some SSDs with SLC 
cache (in our case Samsung SSD 870 EVO) - which we just recycled from other use 
cases in the hope to speed up our mainly hdd based cluster. We know it's a 
little bit random which objects get accelerated when not used as cache.

However the opposite was the case. These type's ssds are only fast when 
operating in their SLC cache, which is only several Gigabytes in a multi-TB ssd 
[1]. When doing a big write or a backfill onto these SSDs we got really low 
IO-rates (around 10 MB/s even with 4M-objects).

But it got even worse. Disclaimer: This is my view as a user, maybe a more 
technically involved person is able to correct me. Cause seems to be the 
mclock-scheduler which measures the iops an osd is able to do. As in the blog 
measured [2], this is usually a good thing as there is done some profiling and 
queing is done different. But in our case the osd_mclock_max_capacity_iops_ssd 
for most of the corresponding osds was very low. But not for everyone. I assume 
that it depends when mclock-scheduler measured the iops capacity. That led to a 
broken scheduling where backfills were at low speed and the ssd itself had 
nearly no disk usage because it was operating in it's cache again and could 
work faster. That issue could be solved by switching back to wpq scheduler for 
the affected SSDs. This scheduler seems to just queue up ios without throttling 
because of maximum iops reached. Now we see a still bad IO situation because of 
the slow SSDs but at least they are operating at their maximum (having typical 
settings like osd_recovery_max_active and osd_recovery_sleep* tuned).

We are going to replace the SSDs to hopefully more consistent performing ones 
(even if their peak performance would be not as good).

I hope this may help somebody in the future when being stuck in low performance 
recoverys.

Refs:

[1] 
https://www.tomshardware.com/reviews/samsung-870-evo-sata-ssd-review-the-best-just-got-better
[2] 
https://ceph.io/en/news/blog/2022/mclock-vs-wpq-testing-with-background-ops-part1/

Happy Storing!
Michael Wodniok

--

Michael Wodniok M.Sc.
WorNet AG
Bürgermeister-Graf-Ring 28
82538 Geretsried

Simply42 und SecuMail sind Marken der WorNet AG.
http://www.wor.net/

Handelsregister Amtsgericht München (HRB 129882)
Vorstand: Christian Eich
Aufsichtsratsvorsitzender: Dirk Steinkopf




CONFIDENTIALITY NOTICE: This message is intended only for the use and review of 
the individual or entity to which it is addressed and may contain information 
that is privileged and confidential. If the reader of this message is not the 
intended recipient, or the employee or agent responsible for delivering the 
message solely to the intended recipient, you are hereby notified that any 
dissemination, distribution or copying of this communication is strictly 
prohibited. If you have received this communication in error, please notify 
sender immediately by telephone or return email. Thank you.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CephFS Kernel Mount Options Without Mount Helper

2023-02-28 Thread Shawn Weeks
Even the documentation at 
https://www.kernel.org/doc/html/v5.14/filesystems/ceph.html#mount-options is 
incomplete and doesn’t list options like “secret” and “mds_namespace”

Thanks
Shawn

> On Feb 28, 2023, at 11:03 AM, Shawn Weeks  wrote:
> 
> I’m trying to find documentation for which mount options are supported 
> directly by the kernel module. For example in the kernel module included in 
> Rocky Linux 8 and 9 the secretfile option isn’t supported even though the 
> documentation seems to imply it is. It seems like the documentation assumes 
> you’ll always be using the mount.ceph helper and I’m trying to find out what 
> options are supported if you don’t have mount.ceph helper.
> 
> Thanks
> Shawn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: s3 compatible interface

2023-02-28 Thread Robert Sander

On 28.02.23 16:31, Marc wrote:


Anyone know of a s3 compatible interface that I can just run, and reads/writes 
files from a local file system and not from object storage?


Have a look at Minio:

https://min.io/product/overview#architecture

Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph OSD imbalance and performance

2023-02-28 Thread Dave Ingram
Hello,

Our ceph cluster performance has become horrifically slow over the past few
months.

Nobody here is terribly familiar with ceph and we're inheriting this
cluster without much direction.

Architecture: 40Gbps QDR IB fabric between all ceph nodes and our ovirt VM
hosts. 11 OSD nodes with a total of 163 OSDs. 14 pools, 3616 PGs, 1.19PB
total capacity.

Ceph versions:

{
  "mon": {
"ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee)
luminous (stable)": 3
  },
  "mgr": {
"ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee)
luminous (stable)": 3
  },
  "osd": {
"ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee)
luminous (stable)": 118,
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 22,
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 19
  },
  "mds": {},
  "overall": {
"ceph version 12.2.11 (26dc3775efc7bb286a1d6d66faee0ba30ea23eee)
luminous (stable)": 124,
"ceph version 12.2.12 (1436006594665279fe734b4c15d7e08c13ebd777)
luminous (stable)": 22,
"ceph version 12.2.13 (584a20eb0237c657dc0567da126be145106aa47e)
luminous (stable)": 19
  }
}

The majority of disks are spindles but there are also NVMe SSDs. There is a
lot of variability in drive sizes - two different sets of admins added
disks sized between 6TB and 16TB and I suspect this and imbalanced
weighting is to blame.

Performance on the ovirt VMs can dip as low as several *kilobytes*
per-second (!) on reads and a few MB/sec on writes. There are also several
scrub errors. In short, it's a complete wreck.

STATUS:

[root@ceph-admin davei]# ceph -s
  cluster:
id: 1b8d958c-e50b-40ef-a681-16cfeb9390b8
health: HEALTH_ERR
3 scrub errors
Possible data damage: 3 pgs inconsistent

  services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3
mgr: ceph3(active), standbys: ceph2, ceph1
osd: 163 osds: 159 up, 158 in

  data:
pools:   14 pools, 3616 pgs
objects: 46.28M objects, 174TiB
usage:   527TiB used, 694TiB / 1.19PiB avail
pgs: 3609 active+clean
 4active+clean+scrubbing+deep
 3active+clean+inconsistent

  io:
client:   74.3MiB/s rd, 96.0MiB/s wr, 3.85kop/s rd, 3.68kop/s wr

---
HEALTH:

[root@ceph-admin davei]# ceph health detail
HEALTH_ERR 3 scrub errors; Possible data damage: 3 pgs inconsistent
OSD_SCRUB_ERRORS 3 scrub errors
PG_DAMAGED Possible data damage: 3 pgs inconsistent
pg 2.8a is active+clean+inconsistent, acting [13,152,127]
pg 2.ce is active+clean+inconsistent, acting [145,13,152]
pg 2.e8 is active+clean+inconsistent, acting [150,162,42]
---
CEPH OSD DF:

(not going to paste that all in here): https://pastebin.com/CNW5RKWx

What else am I missing in terms of what to share with you all?

Any advice on how we should 'reweight' these to get the performance to
improve?

Thanks all,
-Dave
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS Kernel Mount Options Without Mount Helper

2023-02-28 Thread Shawn Weeks
I’m trying to find documentation for which mount options are supported directly 
by the kernel module. For example in the kernel module included in Rocky Linux 
8 and 9 the secretfile option isn’t supported even though the documentation 
seems to imply it is. It seems like the documentation assumes you’ll always be 
using the mount.ceph helper and I’m trying to find out what options are 
supported if you don’t have mount.ceph helper.

Thanks
Shawn
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] [RGW] Rebuilding a non master zone

2023-02-28 Thread Gilles Mocellin

Hi Cephers,

I have large OMAP objects on one of my cluster (certainly due to a big 
bucket deletion, and things not completely purged).


Since there is no tool to either reconstruct index from data or purge 
unused index, I thought I can use mutlisite replication.


As I am in a multisite configuration, and the cluster is not in the 
master zone,
will all the data be recovered from the master zone if I stop radosgw, 
delete RGW index and data pools, and restart radosgw ?

Or will it definitively not be so simple ?

The data pool reports 8TB used.
Even if it works, it will take ages...

If someone has another idea to remove those large OMAP objects...
I've seen several times that question on the mailing list, but never saw 
a response that works or was adapted for my use case.


The rgw-orphan-list script could be a solution, but too long to run on 
my cluster.
And still, I have to know if I really can delete objects, and I have no 
clue...

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mds readonly, mds all down

2023-02-28 Thread Eugen Block
It doesn't really help to create multiple threads for the same issue.  
I don't see a reason why the MDS went read-only in your log output  
from [1], could you please add the startup log from the MDS in debug  
mode so we can actually see why it's going into read-only?


[1]  
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/SPIEG6YVZ2KSZUY7SLAN2VHIWMPFVI73/


Zitat von kreept.s...@gmail.com:

Hello. We trying to resolve some issue with ceph. Our openshift  
cluster is blocked and we tried do almost all.

Actual state is:
MDS_ALL_DOWN: 1 filesystem is offline
MDS_DAMAGE: 1 mds daemon damaged
FS_DEGRADED: 1 filesystem is degraded
MON_DISK_LOW: mon be is low on available space
RECENT_CRASH: 1 daemons have recently crashed
We try to perform
cephfs-journal-tool --rank=gml-okd-cephfs:all event recover_dentries summary
cephfs-journal-tool --rank=gml-okd-cephfs:all journal reset
cephfs-table-tool gml-okd-cephfs:all reset session
ceph mds repaired 0
ceph config rm mds mds_verify_scatter
ceph config rm mds mds_debug_scatterstat
ceph tell gml-okd-cephfs scrub start / recursive repair force

After these commands, mds rises but an error appears:
MDS_READ_ONLY: 1 MDSs are read only

We also tried to create new fs with new metadata pool, delete and  
recreate old fs with same name with old\new metadatapool.
We got rid of the errors, but the Openshift cluster did not want to  
work with the old persistence volumes. The pods wrote an error that  
they could not find it, while it was present and moreover, this  
volume was associated with pvc.


Now we have rolled back the cluster and are trying to remove the mds  
error. Any ideas what to try?

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] s3 compatible interface

2023-02-28 Thread Marc


Anyone know of a s3 compatible interface that I can just run, and reads/writes 
files from a local file system and not from object storage?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade cephadm cluster

2023-02-28 Thread Nicola Mori
So I decided to proceed and everything went very well, with the cluster 
remaining up and running during the whole process.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph 16.2.10 - misplaced object after changing crush map only setting hdd class

2023-02-28 Thread xadhoom76
Hi to all and thanks for sharing your experience on ceph !
We have an easy setup with 9 osd all hdd and 3 nodes, 3 osd for each node.
We started the cluster to test how it works with hdd with default and easy 
bootstrap . Then we decide to add ssd and create a pool to use only ssd.
In order to have pools on hdd  and pools on ssd only we edited the crushmap to 
add class hdd
We do not enter anything about ssd till now, nor disk or rules only add the 
class map to the default rule.
So i show you the rules  before introducing class hdd
# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default
step chooseleaf indep 0 type host
step emit
}

And  here is the after 

# rules
rule replicated_rule {
id 0
type replicated
min_size 1
max_size 10
step take default class hdd
step chooseleaf firstn 0 type host
step emit
}
rule erasure-code {
id 1
type erasure
min_size 3
max_size 4
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure2_1 {
id 2
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.meta {
id 3
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
rule erasure-pool.data {
id 4
type erasure
min_size 3
max_size 3
step set_chooseleaf_tries 5
step set_choose_tries 100
step take default class hdd
step chooseleaf indep 0 type host
step emit
}
Just doing this triggered the misplaced of all pgs bind to EC pool.

Is that correct ? and why ?
Best regards 
Alessandro Bolgia
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] mds readonly, mds all down

2023-02-28 Thread kreept . sama
Hello. We trying to resolve some issue with ceph. Our openshift cluster is 
blocked and we tried do almost all. 
Actual state is:
MDS_ALL_DOWN: 1 filesystem is offline
MDS_DAMAGE: 1 mds daemon damaged
FS_DEGRADED: 1 filesystem is degraded
MON_DISK_LOW: mon be is low on available space
RECENT_CRASH: 1 daemons have recently crashed
We try to perform 
cephfs-journal-tool --rank=gml-okd-cephfs:all event recover_dentries summary
cephfs-journal-tool --rank=gml-okd-cephfs:all journal reset
cephfs-table-tool gml-okd-cephfs:all reset session
ceph mds repaired 0
ceph config rm mds mds_verify_scatter
ceph config rm mds mds_debug_scatterstat
ceph tell gml-okd-cephfs scrub start / recursive repair force

After these commands, mds rises but an error appears:
MDS_READ_ONLY: 1 MDSs are read only

We also tried to create new fs with new metadata pool, delete and recreate old 
fs with same name with old\new metadatapool. 
We got rid of the errors, but the Openshift cluster did not want to work with 
the old persistence volumes. The pods wrote an error that they could not find 
it, while it was present and moreover, this volume was associated with pvc. 

Now we have rolled back the cluster and are trying to remove the mds error. Any 
ideas what to try?
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CompleteMultipartUploadResult has empty ETag response

2023-02-28 Thread Lars Dunemark
Hi,

When I was looking at a complete multipart upload request I found that the 
response did return an empty ETag entry in the XML.
If I query the keys metadata after the Complete is done it will return the 
expected ETag so it looks like it is calculated correctly.


http://s3.amazonaws.com/doc/2006-03-01/;>
testsystem/test/upload-file
test
upload-file



I only have a cluster running on v17.2.3 so I haven't verified if this still 
exist on latest version.
I found an really old issue that was closed ~9 years ago with the same issue.
https://tracker.ceph.com/issues/6830

The problem is that my account to the tracker doesn't seams to work as is 
should so I can't login and comment on it or create a new ticket.

It also looks like there is a inconsistency between 
https://docs.ceph.com/en/latest/radosgw/s3/objectops/#complete-multipart-upload 
that says that ETag is required in the request but for eg. AWS documentation 
doesn't have it as a possible argument for the CompleteMultipartUpload so it is 
not possible to send using common third party libraries.

Best regards,
Lars Dunemark
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to see bucket usage when user is suspended ?

2023-02-28 Thread Arnaud M
Hello to everyone

When I use this command to see bucket usage

radosgw-admin bucket stats --bucket=

It work only when the owner of the bucket is activated

How to see the usage even when the owner is suspended ?

Here is 2 exemple, one with the owner activated et the other one with owner
suspended:

radosgw-admin bucket stats --bucket=bonjour
{
"bucket": "bonjour",
"num_shards": 11,
"tenant": "",
"zonegroup": "46d4ba06-76ff-44b4-a441-54197517ded2",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "f8c2e3e2-da22-4c80-b330-466db13bbf6a.204114.85",
"marker": "f8c2e3e2-da22-4c80-b330-466db13bbf6a.204114.85",
"index_type": "Normal",
"owner": "identifiant_leviia_GB6mSIAmTt48cY5O",
"ver":
"0#148,1#124,2#134,3#155,4#199,5#123,6#165,7#141,8#133,9#154,10#137",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
"mtime": "0.00",
"creation_time": "2023-02-24T16:16:14.196314Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#",
"usage": {
"rgw.main": {
"size": 532572233,
"size_actual": 535318528,
"size_utilized": 532572233,
"size_kb": 520091,
"size_kb_actual": 522772,
"size_kb_utilized": 520091,
"num_objects": 1486
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}

radosgw-admin bucket stats --bucket=locking4
{
"bucket": "locking4",
"num_shards": 11,
"tenant": "",
"zonegroup": "46d4ba06-76ff-44b4-a441-54197517ded2",
"placement_rule": "default-placement",
"explicit_placement": {
"data_pool": "",
"data_extra_pool": "",
"index_pool": ""
},
"id": "f8c2e3e2-da22-4c80-b330-466db13bbf6a.204114.80",
"marker": "f8c2e3e2-da22-4c80-b330-466db13bbf6a.204114.80",
"index_type": "Normal",
"owner": "identifiant_leviia_xf4q139fq1",
"ver": "0#1,1#1,2#1,3#1,4#1,5#1,6#1,7#1,8#1,9#1,10#1",
"master_ver": "0#0,1#0,2#0,3#0,4#0,5#0,6#0,7#0,8#0,9#0,10#0",
"mtime": "0.00",
"creation_time": "2023-02-23T12:49:24.089538Z",
"max_marker": "0#,1#,2#,3#,4#,5#,6#,7#,8#,9#,10#",
"usage": {},
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
}
}

As you can see with the bucket where the owner is suspended (locking4) it
lack the part:

"usage": {
"rgw.main": {
"size": 532572233,
"size_actual": 535318528,
"size_utilized": 532572233,
"size_kb": 520091,
"size_kb_actual": 522772,
"size_kb_utilized": 520091,
"num_objects": 1486
},
"rgw.multimeta": {
"size": 0,
"size_actual": 0,
"size_utilized": 0,
"size_kb": 0,
"size_kb_actual": 0,
"size_kb_utilized": 0,
"num_objects": 0
}
},

How to have this part even when the owner is suspended ? Is it possible via
API ?

All the best
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Any experience dealing with CephMgrPrometheusModuleInactive?

2023-02-28 Thread Joshua Katz
Hey all!

I'm a first time ceph user trying to learn how to set up a cluster. I've
gotten a basic cluster created using the following:

```
cehphadm bootstrap --mon-ip 
ceph orch host add server-2  _admin
```
I've created and mounted an fs on a host, everything is going well, but I
have noticed that I have an alert
triggered: CephMgrPrometheusModuleInactive.

It seems this alert is trying to `curl server-2:9283`. To debug if this was
a network issue I did `ceph mgr fail` to move the mgr to server-2. After
some time I get the same alert with the instance being server-1:9283.
Running `ss -l -n -p | grep 9283` shows the port is bound on server-2 and
not server-1. If I run `ceph mgr fail` again the port becomes bound on
server-1 and not server-2.

Is this alert important? Is there a way to remediate this issue? Let me
know if I am missing something here.

Thanks,
- Josh
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Daily failed capability releases, slow ops, fully stuck IO

2023-02-28 Thread Kuhring, Mathias
Dear Ceph community,

since about two or three weeks, we have CephFS clients regularly failing 
to respond to capability releases accompanied OSD slow ops. By now, this 
happens daily every time clients get more active (e.g. during nightly 
backups).

We mostly observe it with a handful of highly active clients, so 
correlating with IO volume. But we have over 250 clients which mount the 
CephFS and plan to get them all more active soon. What's worrying me 
further, it doesn't seem to effect only the clients which fail to 
respond to the capability release. But also other clients get just stuck 
accessing data on the CephFS.

So far I've been tracking down the corresponding OSDs via the client 
(`cat /sys/kernel/debug/ceph/*/osdc`) and restarted them one by one. But 
since this is now a regular/systemic issue, this is obviously no 
sustainable solution. This would be usually a handful of OSDs per client 
and I couldn't observe any particular pattern of involved OSDs, yet.

Our cluster still runs on CentOS 7 with kernel 
3.10.0-1160.42.2.el7.x86_64 using cephadm with ceph version 17.2.1 
(ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy (stable).

Most active clients are currently on kernel versions such as:
4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64, 
4.18.0-348.7.1.el8_5.x86_64

I picked up some logging ideas from an older issue with similar symptoms:
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/CKTIM6LF274RVHSSCSDNCQR35PYSTLEK/
But this has been already fixed in the kernel client and I don't have 
similar things in the logs.

But also I'm not sure if the things I digging up in the logs are 
actually useful. And if I'm actually looking in the right places.

So, I enabled "debug_ms 1" for the OSDs as suggested in the other thread.
But this filled up our host disks pretty fast, leading to e.g. monitors 
crashing.
I disabled the debug messages again and trimmed logs to free up space.
But I made copies of two OSD logs files which were involved in a 
capability release / slow requests issue.
They are quite big now (~3GB) and even if I remove things like ping stuff,
I have more than 1 million lines just for the morning until the disk 
space was full (around 7 hours).
So now I'm wondering how to filter/look for the right things here.

When I grep for "error", I get a few of these messages:
{"log":"debug 2023-02-22T06:18:08.113+ 7f15c5fff700  1 -- 
[v2:192.168.1.13:6881/4149819408,v1:192.168.1.13:6884/4149819408] 
\u003c== osd.161 v2:192.168.1.31:6835/1012436344 182573  
pg_update_log_missing(3.1a6s2 epoch 646235/644895 rep_tid 1014320 
entries 646235'7672108 (0'0) error 
3:65836dde:::10016e9b7c8.:head by mds.0.1221974:8515830 0.00 
-2 ObjectCleanRegions clean_offsets: [0~18446744073709551615], 
clean_omap: 1, new_object: 0 trim_to 646178'7662340 roll_forward_to 
646192'7672106) v3  261+0+0 (crc 0 0 0) 0x562d55e52380 con 
0x562d8a2de400\n","stream":"stderr","time":"2023-02-22T06:18:08.115002765Z"}

And if I grep for "failed", I get a couple of those:
{"log":"debug 2023-02-22T06:15:25.242+ 7f58bbf7c700  1 -- 
[v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] 
\u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 
msgr2=0x55b9ce07e580 crc :-1 s=STATE_CONNECTION_ESTABLISHED 
l=1).read_until read 
failed\n","stream":"stderr","time":"2023-02-22T06:15:25.243808392Z"}
{"log":"debug 2023-02-22T06:15:25.242+ 7f58bbf7c700  1 --2- 
[v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] 
\u003e\u003e 172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 
0x55b9ce07e580 crc :-1 s=READY pgs=2096664 cs=0 l=1 rev1=1 crypto rx=0 
tx=0 comp rx=0 tx=0).handle_read_frame_preamble_main read frame preamble 
failed r=-1 ((1) Operation not 
permitted)\n","stream":"stderr","time":"2023-02-22T06:15:25.243813528Z"}

Not sure, if they are related to the issue.

In the kernel logs of the client (dmesg, journalctl or /var/log/messages),
there seem to be no errors or any stack traces in the relevant time periods.
The only thing I can see is me restarting the relevant OSDs:
[Mi Feb 22 07:29:59 2023] libceph: osd90 down
[Mi Feb 22 07:30:34 2023] libceph: osd90 up
[Mi Feb 22 07:31:55 2023] libceph: osd93 down
[Mi Feb 22 08:37:50 2023] libceph: osd93 up

I noticed a socket closed for another client, but I assume that's more 
related to monitors failing due to full disks:
[Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 socket 
closed (con state OPEN)
[Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 session 
lost, hunting for new mon
[Mi Feb 22 05:59:52 2023] libceph: mon3 (1)172.16.62.13:6789 session 
established

I would appreciate if anybody has a suggestion where I should look next.
Thank you for your help

Best Wishes,
Mathias
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [ext] Re: Re: kernel client osdc ops stuck and mds slow reqs

2023-02-28 Thread Kuhring, Mathias
Hey Ilya,

I'm not sure if the things I find in the logs are actually anything related or 
useful.
But I'm not really sure, if I'm looking in the right places.

I enabled "debug_ms 1" for the OSDs as suggested above.
But this filled up our host disks pretty fast,  leading to e.g. monitors 
crashing.
I disabled the debug messages again and trimmed logs to free up space.
But I made copies of two OSD logs files which were involved in another 
capability release / slow requests issue.
They are quite big now (~3GB) and even if I remove things like ping stuff,
I have more than 1 million lines just for the morning until the disk space was 
full (around 7 hours).
So now I'm wondering how to filter/look for the right things here.

When I grep for "error", I get a few of these messages:
{"log":"debug 2023-02-22T06:18:08.113+ 7f15c5fff700  1 -- 
[v2:192.168.1.13:6881/4149819408,v1:192.168.1.13:6884/4149819408] \u003c== 
osd.161 v2:192.168.1.31:6835/1012436344 182573  
pg_update_log_missing(3.1a6s2 epoch 646235/644895 rep_tid 1014320 entries 
646235'7672108 (0'0) error3:65836dde:::10016e9b7c8.:head by 
mds.0.1221974:8515830 0.00 -2 ObjectCleanRegions clean_offsets: 
[0~18446744073709551615], clean_omap: 1, new_object: 0 trim_to 646178'7662340 
roll_forward_to 646192'7672106) v3  261+0+0 (crc 0 0 0) 0x562d55e52380 con 
0x562d8a2de400\n","stream":"stderr","time":"2023-02-22T06:18:08.115002765Z"}

And if I grep for "failed", I get a couple of those:
{"log":"debug 2023-02-22T06:15:25.242+ 7f58bbf7c700  1 -- 
[v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 
172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 msgr2=0x55b9ce07e580 crc :-1 
s=STATE_CONNECTION_ESTABLISHED l=1).read_until read 
failed\n","stream":"stderr","time":"2023-02-22T06:15:25.243808392Z"}
{"log":"debug 2023-02-22T06:15:25.242+ 7f58bbf7c700  1 --2- 
[v2:172.16.62.11:6829/3509070161,v1:172.16.62.11:6832/3509070161] \u003e\u003e 
172.16.62.10:0/3127362489 conn(0x55ba06bf3c00 0x55b9ce07e580 crc :-1 s=READY 
pgs=2096664 cs=0 l=1 rev1=1 crypto rx=0 tx=0 comp rx=0 
tx=0).handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) 
Operation not 
permitted)\n","stream":"stderr","time":"2023-02-22T06:15:25.243813528Z"}

Not sure, if they are related to the issue.

In the kernel logs of the client (dmesg, journalctl or /var/log/messages),
there seem to be no errors or any stack traces in the relevant time periods.
The only thing I can see is our restart of the relevant OSDs:
[Mi Feb 22 07:29:59 2023] libceph: osd90 down
[Mi Feb 22 07:30:34 2023] libceph: osd90 up
[Mi Feb 22 07:31:55 2023] libceph: osd93 down
[Mi Feb 22 08:37:50 2023] libceph: osd93 up

I noticed a socket closed for another client, but I assume that's more related 
to monitors failing due to full disks:
[Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 socket closed (con 
state OPEN)
[Mi Feb 22 05:59:52 2023] libceph: mon2 (1)172.16.62.12:6789 session lost, 
hunting for new mon
[Mi Feb 22 05:59:52 2023] libceph: mon3 (1)172.16.62.13:6789 session established

Best, Mathias

On 2/21/2023 11:42 AM, Ilya Dryomov wrote:

On Tue, Feb 21, 2023 at 1:01 AM Xiubo Li 
 wrote:




On 20/02/2023 22:28, Kuhring, Mathias wrote:


Hey Dan, hey Ilya

I know this issue is two years old already, but we are having similar
issues.

Do you know, if the fixes got ever backported to RHEL kernels?



It's already backported to RHEL 8 long time ago since kernel-4.18.0-154.el8.



Not looking for el7 but rather el8 fixes.
Wondering if the patches were backported and we shouldn't actually see
these issues.
Or if you could maybe resolve them with a kernel upgrade.

Most active clients are currently on kernel versions such as:
4.18.0-348.el8.0.2.x86_64, 4.18.0-348.2.1.el8_5.x86_64,
4.18.0-348.7.1.el8_5.x86_64

While the cluster runs with kernel 3.10.0-1160.42.2.el7.x86_64 and
cephadm with
ceph version 17.2.1 (ec95624474b1871a821a912b8c3af68f8f8e7aa1) quincy
(stable).



It seems not backported to el7 yet.



"Yet" might be misleading here -- I don't think there is/was ever
a plan to backport these fixes to RHEL 7.



Not sure, if the cluster kernel is actually relevant here for OSD <>
kernel client connection.



If you are seeing page allocation failures only on the kernel client
nodes, then it's not relevant.

Unless the stack trace is the same as in the original tracker [1] or
Dan's paste [2] (note ceph_osdmap_decode() -> osdmap_set_max_osd() ->
krealloc() sequence), you are hitting a different issue.  Pasting the
entire splat(s) from the kernel log would be a good start.

[1] https://tracker.ceph.com/issues/40481
[2] https://pastebin.com/neyah54k

Thanks,

Ilya


--
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail: mathias.kuhr...@bih-charite.de
Mobile: +49 172 3475576

[ceph-users] slow replication of large buckets

2023-02-28 Thread Glaza
Hi Cephers,   We have two octopus 15.2.17 clusters in a multisite 
configuration. Every once in a while we have to perform a bucket reshard (most 
recently on 613 shards) and this practically kills our replication for a few 
days. Does anyone know of any priority mechanics within sync to give 
priority to other buckets and/or lower them?   Are there any improvements to 
this in higher versions of ceph that we could take advantage of if we upgrade 
the cluster (I havent found any)?   How to safely perform the increase of 
rgw_data_log_num_shards, because the documentation only says: The values 
of rgw_data_log_num_shards and rgw_md_log_max_shards should not be changed 
after sync has started. Does this mean that I should block access to the 
cluster, wait until sync is caught up with source/master, change this value, 
restart rgw and unblock access?  Kind Regards,  Tom

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Josh Baergen
Hi Boris,

OK, what I'm wondering is whether
https://tracker.ceph.com/issues/58530 is involved. There are two
aspects to that ticket:
* A measurable increase in the number of bytes written to disk in
Pacific as compared to Nautilus
* The same, but for IOPS

Per the current theory, both are due to the loss of rocksdb log
recycling when using default recovery options in rocksdb 6.8; Octopus
uses version 6.1.2, Pacific uses 6.8.1.

16.2.11 largely addressed the bytes-written amplification, but the
IOPS amplification remains. In practice, whether this results in a
write performance degradation depends on the speed of the underlying
media and the workload, and thus the things I mention in the next
paragraph may or may not be applicable to you.

There's no known workaround or solution for this at this time. In some
cases I've seen that disabling bluefs_buffered_io (which itself can
cause IOPS amplification in some cases) can help; I think most folks
do this by setting it in local conf and then restarting OSDs in order
to gain the config change. Something else to consider is
https://docs.ceph.com/en/quincy/start/hardware-recommendations/#write-caches,
as sometimes disabling these write caches can improve the IOPS
performance of SSDs.

Josh

On Tue, Feb 28, 2023 at 7:19 AM Boris Behrens  wrote:
>
> Hi Josh,
> we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.
>
>
>
> Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen 
> :
>>
>> Hi Boris,
>>
>> Which version did you upgrade from and to, specifically? And what
>> workload are you running (RBD, etc.)?
>>
>> Josh
>>
>> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:
>> >
>> > Hi,
>> > today I did the first update from octopus to pacific, and it looks like the
>> > avg apply latency went up from 1ms to 2ms.
>> >
>> > All 36 OSDs are 4TB SSDs and nothing else changed.
>> > Someone knows if this is an issue, or am I just missing a config value?
>> >
>> > Cheers
>> >  Boris
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im 
> groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Boris Behrens
Hi Josh,
we upgraded 15.2.17 -> 16.2.11 and we only use rbd workload.



Am Di., 28. Feb. 2023 um 15:00 Uhr schrieb Josh Baergen <
jbaer...@digitalocean.com>:

> Hi Boris,
>
> Which version did you upgrade from and to, specifically? And what
> workload are you running (RBD, etc.)?
>
> Josh
>
> On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:
> >
> > Hi,
> > today I did the first update from octopus to pacific, and it looks like
> the
> > avg apply latency went up from 1ms to 2ms.
> >
> > All 36 OSDs are 4TB SSDs and nothing else changed.
> > Someone knows if this is an issue, or am I just missing a config value?
> >
> > Cheers
> >  Boris
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Josh Baergen
Hi Boris,

Which version did you upgrade from and to, specifically? And what
workload are you running (RBD, etc.)?

Josh

On Tue, Feb 28, 2023 at 6:51 AM Boris Behrens  wrote:
>
> Hi,
> today I did the first update from octopus to pacific, and it looks like the
> avg apply latency went up from 1ms to 2ms.
>
> All 36 OSDs are 4TB SSDs and nothing else changed.
> Someone knows if this is an issue, or am I just missing a config value?
>
> Cheers
>  Boris
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-02-28 Thread Jan Pekař - Imatic

Hi,

the same on my side - destroyed and replaced by bluestore.

JP

On 28/02/2023 14.17, Mark Schouten wrote:

Hi,

I just destroyed the filestore osd and added it as a bluestore osd. Worked fine.

—
Mark Schouten, CTO
Tuxis B.V.
m...@tuxis.nl / +31 318 200208


-- Original Message --
From "Jan Pekař - Imatic" 
To m...@tuxis.nl; ceph-users@ceph.io
Date 2/25/2023 4:14:54 PM
Subject Re: [ceph-users] OSD upgrade problem nautilus->octopus - snap_mapper 
upgrade stuck


Hi,

I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 
1500 keys.

JP

On 23/02/2023 00.16, Jan Pekař - Imatic wrote:

Hi,

I enabled debug and the same - 1500 keys is where it ends.. I also enabled 
debug_filestore and ...

2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7920 already registered
2023-02-23T00:02:34.876+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 148188829 osr(meta) 
4859 bytes   (queue has 49 ops and 238167 bytes)

2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1470 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 148188845 
[Transaction(0x55fb299870e0)]
2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) _register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# 
(0x55fb29f9c9e0)

2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1500 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 148188830 
osr(meta) [Transaction(0x55fb29986000)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7aa0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 148188830 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 148188831 
osr(meta) [Transaction(0x55fb29986120)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7c20 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 148188831 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 148188832 
osr(meta) [Transaction(0x55fb29986240)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7d40 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 148188832 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 148188833 
osr(meta) [Transaction(0x55fb29986360)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7ec0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 148188833 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 148188834 
osr(meta) [Transaction(0x55fb29986480)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921db60 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 148188834 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 148188835 
osr(meta) [Transaction(0x55fb299865a0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921c8a0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 148188835 osr(meta) 
4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 148188836 
osr(meta) [Transaction(0x55fb299866c0)]

2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 

[ceph-users] avg apply latency went up after update from octopus to pacific

2023-02-28 Thread Boris Behrens
Hi,
today I did the first update from octopus to pacific, and it looks like the
avg apply latency went up from 1ms to 2ms.

All 36 OSDs are 4TB SSDs and nothing else changed.
Someone knows if this is an issue, or am I just missing a config value?

Cheers
 Boris
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: CompleteMultipartUploadResult has empty ETag response

2023-02-28 Thread Casey Bodley
On Tue, Feb 28, 2023 at 8:19 AM Lars Dunemark  wrote:
>
> Hi,
>
> I notice that CompleteMultipartUploadResult does return an empty ETag
> field when completing an multipart upload in v17.2.3.
>
> I haven't had the possibility to verify from which version this changed
> and can't find in the changelog that it should be fixed in newer version.
>
> The response looks like:
>
> 
> http://s3.amazonaws.com/doc/2006-03-01/ 
>  ">
>  s3.myceph.com/test-bucket/test.file
>  test-bucket
>  test.file
>  
> 
>
> I have found a old issue that is closed around 9 years ago with the same
> issue so I guess that this has been fixed before.
> https://tracker.ceph.com/issues/6830 
>
> It looks like my account to the tracker is still not activated so I
> can't create or comment on the issue.

thanks Lars, i've opened https://tracker.ceph.com/issues/58879 to
track the regression

>
> Best regards,
> Lars Dunemark
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD upgrade problem nautilus->octopus - snap_mapper upgrade stuck

2023-02-28 Thread Mark Schouten

Hi,

I just destroyed the filestore osd and added it as a bluestore osd. 
Worked fine.


—
Mark Schouten, CTO
Tuxis B.V.
m...@tuxis.nl / +31 318 200208


-- Original Message --
From "Jan Pekař - Imatic" 
To m...@tuxis.nl; ceph-users@ceph.io
Date 2/25/2023 4:14:54 PM
Subject Re: [ceph-users] OSD upgrade problem nautilus->octopus - 
snap_mapper upgrade stuck



Hi,

I tried upgrade to Pacific now. The same result. OSD is not starting, stuck at 
1500 keys.

JP

On 23/02/2023 00.16, Jan Pekař - Imatic wrote:

Hi,

I enabled debug and the same - 1500 keys is where it ends.. I also enabled 
debug_filestore and ...

2023-02-23T00:02:34.876+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7920 already registered
2023-02-23T00:02:34.876+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7920 seq 
148188829 osr(meta) 4859 bytes   (queue has 49 ops and 238167 bytes)
2023-02-23T00:02:34.876+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1470 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2345): (writeahead) 
148188845 [Transaction(0x55fb299870e0)]
2023-02-23T00:02:34.880+0100 7f8efc23ee00 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb29c52d20 #-1:c0371625:::snapmapper:0# (0x55fb29f9c9e0)
2023-02-23T00:02:34.880+0100 7f8efc23ee00 10 snap_mapper.convert_legacy 
converted 1500 keys
2023-02-23T00:02:34.880+0100 7f8efc23ee00  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_transactions(2303): osr 
0x55fb27780540 osr(meta)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7aa0 seq 
148188830 osr(meta) [Transaction(0x55fb29986000)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7aa0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7aa0 seq 
148188830 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7c20 seq 
148188831 osr(meta) [Transaction(0x55fb29986120)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7c20 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7c20 seq 
148188831 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7d40 seq 
148188832 osr(meta) [Transaction(0x55fb29986240)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7d40 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7d40 seq 
148188832 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb297e7ec0 seq 
148188833 osr(meta) [Transaction(0x55fb29986360)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb297e7ec0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb297e7ec0 seq 
148188833 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921db60 seq 
148188834 osr(meta) [Transaction(0x55fb29986480)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921db60 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921db60 seq 
148188834 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb2921c8a0 seq 
148188835 osr(meta) [Transaction(0x55fb299865a0)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb2921c8a0 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) queue_op(2181): 0x55fb2921c8a0 seq 
148188835 osr(meta) 4859 bytes   (queue has 50 ops and 243026 bytes)
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 
filestore(/var/lib/ceph/osd/ceph-0) _journaled_ahead(2440): 0x55fb29c52000 seq 
148188836 osr(meta) [Transaction(0x55fb299866c0)]
2023-02-23T00:02:34.888+0100 7f8ef26d1700 20 filestore.osr(0x55fb27780540) 
_register_apply 0x55fb29c52000 already registered
2023-02-23T00:02:34.888+0100 7f8ef26d1700  5 

[ceph-users] CompleteMultipartUploadResult has empty ETag response

2023-02-28 Thread Lars Dunemark

Hi,

I notice that CompleteMultipartUploadResult does return an empty ETag 
field when completing an multipart upload in v17.2.3.


I haven't had the possibility to verify from which version this changed 
and can't find in the changelog that it should be fixed in newer version.


The response looks like:


http://s3.amazonaws.com/doc/2006-03-01/  
">
s3.myceph.com/test-bucket/test.file
test-bucket
test.file



I have found a old issue that is closed around 9 years ago with the same 
issue so I guess that this has been fixed before. 
https://tracker.ceph.com/issues/6830 


It looks like my account to the tracker is still not activated so I 
can't create or comment on the issue.


Best regards,
Lars Dunemark
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io