[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Kai Stian Olstad

On 21.02.2024 17:07, wodel youchi wrote:
   - The documentation of ceph does not indicate what versions of 
grafana,

   prometheus, ...etc should be used with a certain version.
  - I am trying to deploy Quincy, I did a bootstrap to see what
  containers were downloaded and their version.
  - I am asking because I need to use a local registry to deploy 
those

  images.


You need to check the cephadm source for the version you would like to 
use

https://github.com/ceph/ceph/blob/v17.2.7/src/cephadm/cephadm#L46

--
Kai Stian Olstad
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Some questions about cephadm

2024-02-21 Thread Adam King
Cephadm does not have some variable that explicitly says it's an HCI
deployment. However, the HCI variable in ceph ansible I believe only
controlled the osd_memory_target attribute, which would automatically set
it to 20% or 70% respectively of the memory on the node  divided by the
number of OSDs on the node depending on whether it's HCI or not. Cephadm
doesn't have that exactly, but has a similar feature of osd memory
autotuning which has some docs here
https://docs.ceph.com/en/latest/cephadm/services/osd/#automatically-tuning-osd-memory.
The warning indicates that it isn't ideal for HCI to use this, but I think
if you set the mgr/cephadm/autotune_memory_target_ratio to a value closer
to 0.2 rather than the default 0.7, it might end up working out close to
how ceph-ansible worked with the is_hci option set to true. Otherwise, you
can set the option yourself after doing a similar calculation to what
ceph-ansible did with something like `ceph config set osd/host:
osd_memory_target ` where that amount is the per OSD memory target
you want for OSDs on that host.

I don't think we have documentation of what version to use for the
monitoring stack daemons, but the assumption is the default version defined
in cephadm should be okay to use unless you have some specific use case
that requires a different one. There are docs on how to change it to use a
different image if you'd like to do so
https://docs.ceph.com/en/latest/cephadm/services/monitoring/#using-custom-images

Can't speak much to the setup wizard in the dashboard and whether it's
possible to get it going again after closing out of it, or the telemetry
related dashboard issue.

On Wed, Feb 21, 2024 at 11:08 AM wodel youchi 
wrote:

> Hi,
>
> I have some questions about ceph using cephadm.
>
> I used to deploy ceph using ceph-ansible, now I have to move to cephadm, I
> am in my learning journey.
>
>
>- How can I tell my cluster that it's a part of an HCI deployment? With
>ceph-ansible it was easy using is_hci : yes
>- The documentation of ceph does not indicate what versions of grafana,
>prometheus, ...etc should be used with a certain version.
>   - I am trying to deploy Quincy, I did a bootstrap to see what
>   containers were downloaded and their version.
>   - I am asking because I need to use a local registry to deploy those
>   images.
>- After the bootstrap, the Web interface was accessible :
>   - How can I access the wizard page again? If I don't use it the first
>   time I could not find another way to get it.
>   - I had a problem with telemetry, I did not configure telemetry, then
>   when I clicked the button, the web gui became
> inaccessible.!!!
>
>
>
> Regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Reef 18.2.1 unable to join multi-side when rgw_dns_name is configured

2024-02-21 Thread Ansgar Jazdzewski
for the record i tried both ways to configure:

```
radosgw-admin zonegroup get --rgw-zonegroup="dev" | \
jq '.hostnames |= ["dev.s3.localhost"]' | \
radosgw-admin zonegroup set --rgw-zonegroup="dev" -i -
```

```
ceph config set global rgw_dns_name dev.s3.localhost
```

Am Mi., 21. Feb. 2024 um 17:34 Uhr schrieb Ansgar Jazdzewski
:
>
> Hi folks,
>
> i just try to setup a new ceph s3 multisite-setup and it looks to me
> that dns-style s3 is broken in multi-side as wehn rgw_dns_name is
> configured the `radosgw-admin period update -commit`from the new mebe
> will not succeeded!
>
> it looks like when ever hostnames is configured it brakes on the new
> to add cluster
> https://docs.ceph.com/en/reef/radosgw/multisite/#setting-a-zonegroup
>
> Thanks for any advice!
> Ansgar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Ramon Orrù
Hi Adam, 
thanks, you saved me from more time spent looking in the dark.
I’ll plan an update. 


R
-- 
Ramon Orrù
Servizio di Calcolo
Laboratori Nazionali di Frascati
Istituto Nazionale di Fisica Nucleare
Via E. Fermi, 54 - 00044 Frascati (RM) Italy
Tel. +39 06 9403 2345

> On 21 Feb 2024, at 16:02, Adam King  wrote:
> 
> It seems the quincy backport for that feature 
> (https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According 
> to the quincy part of 
> https://docs.ceph.com/en/latest/releases/#release-timeline it looks like that 
> would mean it would only be present in 17.2.7, but not 17.2.6.
> 
> On Wed, Feb 21, 2024 at 8:52 AM Ramon Orrù  > wrote:
>> Hello, 
>> I deployed RGW and NFSGW services over a ceph (version 17.2.6) cluster. Both 
>> services are being accessed using 2 (separated) ingresses, actually working 
>> as expected when contacted by clients.
>> Besides, I’m experiencing some problem while letting the ingresses work on 
>> the same cluster.
>> 
>> keepalived logs are full of  "(VI_0) received an invalid passwd!”  lines, 
>> because both ingresses are using the same virtualrouter id, so I’m trying to 
>> introduce some additional parameter in service definition manifests to 
>> workaround the problem (first_virtual_router_id, default value is 50),  
>> below are the manifest content: 
>> 
>> service_type: ingress
>> service_id: ingress.rgw
>> service_name: ingress.rgw
>> placement:
>>   hosts:
>>   - c00.domain.org 
>>   - c01.domain.org 
>>   - c02.domain.org 
>> spec:
>>   backend_service: rgw.rgw
>>   frontend_port: 8080
>>   monitor_port: 1967
>>   virtual_ips_list:
>> - X.X.X.200/24
>>   first_virtual_router_id: 60
>> 
>> service_type: ingress
>> service_id: nfs.nfsgw
>> service_name: ingress.nfs.nfsgw
>> placement:
>>   count: 2
>> spec:
>>   backend_service: nfs.nfsgw
>>   frontend_port: 2049
>>   monitor_port: 9049
>>   virtual_ip: X.X.X.222/24
>>   first_virtual_router_id: 70
>> 
>> 
>> When I apply the manifests I’m getting the error, for both ingress 
>> definitions:
>> 
>> Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 
>> ‘first_virtual_router_id'
>> 
>> even the documentation for quincy version describes the option and includes 
>> some similar example at: https://docs.ceph.com/en/quincy/cephadm/services/rgw
>> 
>> Both manifests are working smoothly if I remove the first_virtual_router_id 
>> line.
>> 
>> Any ideas on how I can troubleshoot the issue?
>> 
>> Thanks in advance
>> 
>> Ramon 
>> 
>> -- 
>> Ramon Orrù
>> Servizio di Calcolo
>> Laboratori Nazionali di Frascati
>> Istituto Nazionale di Fisica Nucleare
>> Via E. Fermi, 54 - 00044 Frascati (RM) Italy
>> Tel. +39 06 9403 2345
>> 
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io 
>> To unsubscribe send an email to ceph-users-le...@ceph.io 
>> 



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Reef 18.2.1 unable to join multi-side when rgw_dns_name is configured

2024-02-21 Thread Ansgar Jazdzewski
Hi folks,

i just try to setup a new ceph s3 multisite-setup and it looks to me
that dns-style s3 is broken in multi-side as wehn rgw_dns_name is
configured the `radosgw-admin period update -commit`from the new mebe
will not succeeded!

it looks like when ever hostnames is configured it brakes on the new
to add cluster
https://docs.ceph.com/en/reef/radosgw/multisite/#setting-a-zonegroup

Thanks for any advice!
Ansgar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-21 Thread Yuri Weinstein
Still seeking approvals:

rados - Radek, Junior, Travis, Adam King

All other product areas have been approved and are ready for the release step.

Pls also review the Release Notes: https://github.com/ceph/ceph/pull/55694


On Tue, Feb 20, 2024 at 7:58 AM Yuri Weinstein  wrote:
>
> We have restarted QE validation after fixing issues and merging several PRs.
> The new Build 3 (rebase of pacific) tests are summarized in the same
> note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals:
>
> rados - Radek, Junior, Travis, Ernesto, Adam King
> rgw - Casey
> fs - Venky
> rbd - Ilya
> krbd - Ilya
>
> upgrade/octopus-x (pacific) - Adam King, Casey PTL
>
> upgrade/pacific-p2p - Casey PTL
>
> ceph-volume - Guillaume, fixed by
> https://github.com/ceph/ceph/pull/55658 retesting
>
> On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
> >
> > thanks, i've created https://tracker.ceph.com/issues/64360 to track
> > these backports to pacific/quincy/reef
> >
> > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
> > >
> > > Hi,
> > >
> > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
> > >
> > > You definitely want to build the Ubuntu / debian packages with the
> > > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
> > >
> > > Thanks,
> > >
> > > Gr. Stefan
> > >
> > > P.s. Kudos to Mark Nelson for figuring it out / testing.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Some questions about cephadm

2024-02-21 Thread wodel youchi
Hi,

I have some questions about ceph using cephadm.

I used to deploy ceph using ceph-ansible, now I have to move to cephadm, I
am in my learning journey.


   - How can I tell my cluster that it's a part of an HCI deployment? With
   ceph-ansible it was easy using is_hci : yes
   - The documentation of ceph does not indicate what versions of grafana,
   prometheus, ...etc should be used with a certain version.
  - I am trying to deploy Quincy, I did a bootstrap to see what
  containers were downloaded and their version.
  - I am asking because I need to use a local registry to deploy those
  images.
   - After the bootstrap, the Web interface was accessible :
  - How can I access the wizard page again? If I don't use it the first
  time I could not find another way to get it.
  - I had a problem with telemetry, I did not configure telemetry, then
  when I clicked the button, the web gui became inaccessible.!!!



Regards.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Leadership Team Meeting: 2024-2-21 Minutes

2024-02-21 Thread Casey Bodley
Estimate on release timeline for 17.2.8?
- after pacific 16.2.15 and reef 18.2.2 hotfix
(https://tracker.ceph.com/issues/64339,
https://tracker.ceph.com/issues/64406)

Estimate on release timeline for 19.2.0?
- target April, depending on testing and RCs
- Testing plan for Squid beyond dev freeze (regression and upgrade
tests, performance tests, RCs)

Can we fix old.ceph.com?
- continued discussion about the need to revive the pg calc tool

T release name?
- please add and vote for suggestions in https://pad.ceph.com/p/t
- need name before we can open "t kickoff" pr
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Debian 12 (bookworm) / Reef 18.2.1 problems

2024-02-21 Thread Matthew Vernon

[mgr modules failing because pyO3 can't be imported more than once]

On 29/01/2024 12:27, Chris Palmer wrote:

I have logged this as https://tracker.ceph.com/issues/64213


I've noted there that it's related to 
https://tracker.ceph.com/issues/63529 (an earlier report relating to the 
dashboard); there is a MR to fix just the dashboard issue which got 
merged into main. I've opened a MR to backport that change to Reef:

https://github.com/ceph/ceph/pull/55689

I don't know what the devs' plans are for dealing with the broader pyO3 
issue, but I'll ask on the dev list...


Regards,

Matthew
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Adam King
It seems the quincy backport for that feature (
https://github.com/ceph/ceph/pull/53098) was merged Oct 1st 2023. According
to the quincy part of
https://docs.ceph.com/en/latest/releases/#release-timeline it looks like
that would mean it would only be present in 17.2.7, but not 17.2.6.

On Wed, Feb 21, 2024 at 8:52 AM Ramon Orrù  wrote:

> Hello,
> I deployed RGW and NFSGW services over a ceph (version 17.2.6) cluster.
> Both services are being accessed using 2 (separated) ingresses, actually
> working as expected when contacted by clients.
> Besides, I’m experiencing some problem while letting the ingresses work on
> the same cluster.
>
> keepalived logs are full of  "(VI_0) received an invalid passwd!”  lines,
> because both ingresses are using the same virtualrouter id, so I’m trying
> to introduce some additional parameter in service definition manifests to
> workaround the problem (first_virtual_router_id, default value is 50),
>  below are the manifest content:
>
> service_type: ingress
> service_id: ingress.rgw
> service_name: ingress.rgw
> placement:
>   hosts:
>   - c00.domain.org
>   - c01.domain.org
>   - c02.domain.org
> spec:
>   backend_service: rgw.rgw
>   frontend_port: 8080
>   monitor_port: 1967
>   virtual_ips_list:
> - X.X.X.200/24
>   first_virtual_router_id: 60
>
> service_type: ingress
> service_id: nfs.nfsgw
> service_name: ingress.nfs.nfsgw
> placement:
>   count: 2
> spec:
>   backend_service: nfs.nfsgw
>   frontend_port: 2049
>   monitor_port: 9049
>   virtual_ip: X.X.X.222/24
>   first_virtual_router_id: 70
>
>
> When I apply the manifests I’m getting the error, for both ingress
> definitions:
>
> Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument
> ‘first_virtual_router_id'
>
> even the documentation for quincy version describes the option and
> includes some similar example at:
> https://docs.ceph.com/en/quincy/cephadm/services/rgw
>
> Both manifests are working smoothly if I remove the
> first_virtual_router_id line.
>
> Any ideas on how I can troubleshoot the issue?
>
> Thanks in advance
>
> Ramon
>
> --
> Ramon Orrù
> Servizio di Calcolo
> Laboratori Nazionali di Frascati
> Istituto Nazionale di Fisica Nucleare
> Via E. Fermi, 54 - 00044 Frascati (RM) Italy
> Tel. +39 06 9403 2345
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: pacific 16.2.15 QE validation status

2024-02-21 Thread Casey Bodley
On Tue, Feb 20, 2024 at 10:58 AM Yuri Weinstein  wrote:
>
> We have restarted QE validation after fixing issues and merging several PRs.
> The new Build 3 (rebase of pacific) tests are summarized in the same
> note (see Build 3 runs) https://tracker.ceph.com/issues/64151#note-1
>
> Seeking approvals:
>
> rados - Radek, Junior, Travis, Ernesto, Adam King
> rgw - Casey

rgw approved

> fs - Venky
> rbd - Ilya
> krbd - Ilya
>
> upgrade/octopus-x (pacific) - Adam King, Casey PTL
>
> upgrade/pacific-p2p - Casey PTL

Yuri and i managed to get a green run here, approved

>
> ceph-volume - Guillaume, fixed by
> https://github.com/ceph/ceph/pull/55658 retesting
>
> On Thu, Feb 8, 2024 at 8:43 AM Casey Bodley  wrote:
> >
> > thanks, i've created https://tracker.ceph.com/issues/64360 to track
> > these backports to pacific/quincy/reef
> >
> > On Thu, Feb 8, 2024 at 7:50 AM Stefan Kooman  wrote:
> > >
> > > Hi,
> > >
> > > Is this PR: https://github.com/ceph/ceph/pull/54918 included as well?
> > >
> > > You definitely want to build the Ubuntu / debian packages with the
> > > proper CMAKE_CXX_FLAGS. The performance impact on RocksDB is _HUGE_.
> > >
> > > Thanks,
> > >
> > > Gr. Stefan
> > >
> > > P.s. Kudos to Mark Nelson for figuring it out / testing.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] first_virtual_router_id not allowed in ingress manifest

2024-02-21 Thread Ramon Orrù
Hello, 
I deployed RGW and NFSGW services over a ceph (version 17.2.6) cluster. Both 
services are being accessed using 2 (separated) ingresses, actually working as 
expected when contacted by clients.
Besides, I’m experiencing some problem while letting the ingresses work on the 
same cluster.

keepalived logs are full of  "(VI_0) received an invalid passwd!”  lines, 
because both ingresses are using the same virtualrouter id, so I’m trying to 
introduce some additional parameter in service definition manifests to 
workaround the problem (first_virtual_router_id, default value is 50),  below 
are the manifest content: 

service_type: ingress
service_id: ingress.rgw
service_name: ingress.rgw
placement:
  hosts:
  - c00.domain.org
  - c01.domain.org
  - c02.domain.org
spec:
  backend_service: rgw.rgw
  frontend_port: 8080
  monitor_port: 1967
  virtual_ips_list:
- X.X.X.200/24
  first_virtual_router_id: 60

service_type: ingress
service_id: nfs.nfsgw
service_name: ingress.nfs.nfsgw
placement:
  count: 2
spec:
  backend_service: nfs.nfsgw
  frontend_port: 2049
  monitor_port: 9049
  virtual_ip: X.X.X.222/24
  first_virtual_router_id: 70


When I apply the manifests I’m getting the error, for both ingress definitions:

Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 
‘first_virtual_router_id'

even the documentation for quincy version describes the option and includes 
some similar example at: https://docs.ceph.com/en/quincy/cephadm/services/rgw

Both manifests are working smoothly if I remove the first_virtual_router_id 
line.

Any ideas on how I can troubleshoot the issue?

Thanks in advance

Ramon 

-- 
Ramon Orrù
Servizio di Calcolo
Laboratori Nazionali di Frascati
Istituto Nazionale di Fisica Nucleare
Via E. Fermi, 54 - 00044 Frascati (RM) Italy
Tel. +39 06 9403 2345




smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scrub stuck and 'pg has invalid (post-split) stat'

2024-02-21 Thread Cedric
Update: we have run fsck and re-shard on all bluestore volume, seems sharding 
were not applied.

Unfortunately scrubs and deep-scrubs are still stuck on PGs of the pool that is 
suffering the issue, but other PGs scrubs well.

The next step will be to remove the cache tier as suggested, but its not 
available yet as PGs needs to be scrubbed in order for the cache tier can be 
activated.

As we are struggling to make this cluster works again, any help would be 
greatly appreciated.

Cédric

> On 20 Feb 2024, at 20:22, Cedric  wrote:
> 
> Thanks Eugen, sorry about the missed reply to all.
> 
> The reason we still have the cache tier is because we were not able to flush 
> all dirty entry to remove it (as per the procedure), so the cluster as been 
> migrated from HDD/SSD to NVME a while ago but tiering remains, unfortunately.
> 
> So actually we are trying to understand the root cause
> 
> On Tue, Feb 20, 2024 at 1:43 PM Eugen Block  wrote:
>> 
>> Please don't drop the list from your response.
>> 
>> The first question coming to mind is, why do you have a cache-tier if 
>> all your pools are on nvme decices anyway? I don't see any benefit here.
>> Did you try the suggested workaround and disable the cache-tier?
>> 
>> Zitat von Cedric :
>> 
>>> Thanks Eugen, see attached infos.
>>> 
>>> Some more details:
>>> 
>>> - commands that actually hangs: ceph balancer status ; rbd -p vms ls ;
>>> rados -p vms_cache cache-flush-evict-all
>>> - all scrub running on vms_caches pgs are stall / start in a loop
>>> without actually doing anything
>>> - all io are 0 both from ceph status or iostat on nodes
>>> 
>>> On Tue, Feb 20, 2024 at 10:00 AM Eugen Block  wrote:
 
 Hi,
 
 some more details would be helpful, for example what's the pool size
 of the cache pool? Did you issue a PG split before or during the
 upgrade? This thread [1] deals with the same problem, the described
 workaround was to set hit_set_count to 0 and disable the cache layer
 until that is resolved. Afterwards you could enable the cache layer
 again. But keep in mind that the code for cache tier is entirely
 removed in Reef (IIRC).
 
 Regards,
 Eugen
 
 [1]
 https://ceph-users.ceph.narkive.com/zChyOq5D/ceph-strange-issue-after-adding-a-cache-osd
 
 Zitat von Cedric :
 
> Hello,
> 
> Following an upgrade from Nautilus (14.2.22) to Pacific (16.2.13), we
> encounter an issue with a cache pool becoming completely stuck,
> relevant messages below:
> 
> pg xx.x has invalid (post-split) stats; must scrub before tier agent
> can activate
> 
> In OSD logs, scrubs are starting in a loop without succeeding for all
> pg of this pool.
> 
> What we already tried without luck so far:
> 
> - shutdown / restart OSD
> - rebalance pg between OSD
> - raise the memory on OSD
> - repeer PG
> 
> Any idea what is causing this? any help will be greatly appreciated
> 
> Thanks
> 
> Cédric
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
 
 
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>> 
>> 
>> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] pg repair doesn't fix "got incorrect hash on read" / "candidate had an ec hash mismatch"

2024-02-21 Thread Kai Stian Olstad

Hi,

Short summary

PG 404.bc is an EC 4+2 where s0 and s2 report hash mismtach for 698 
objects.
Ceph pg repair doesn't fix it, because if you run deep-srub on the PG 
after repair is finished, it still report scrub errors.


Why can't ceph pg repair repair this, it has 4 out of 6 should be able 
to reconstruct the corrupted shards?
Is there a way to fix this? Like delete object s0 and s2 so it's forced 
to recreate them?



Long detailed summary

A short backstory.
* This is aftermath of problems with mclock, post "17.2.7: Backfilling 
deadlock / stall / stuck / standstill" [1].

  - 4 OSDs had a few bad sectors, set all 4 out and cluster stopped.
  - Solution was to swap from mclock to wpq and restart alle OSD.
  - When all backfilling was finished all 4 OSD was replaced.
  - osd.223 and osd.269 was 2 of the 4 OSDs that was replaced.


PG / pool 404 is EC 4+2 default.rgw.buckets.data

9 days after the osd.223 og osd.269 was replaced, deep-scub was run and 
reported errors

ceph status
---
HEALTH_ERR 1396 scrub errors; Possible data damage: 1 pg 
inconsistent

[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+inconsistent, acting 
[223,297,269,276,136,197]


I then run repair
ceph pg repair 404.bc

And ceph status showed this
ceph status
---
HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired

But osd.223 and osd.269 is new disks and the disks has no SMART error or 
any I/O error in OS logs.

So I tried to run deep-scrub again on the PG.
ceph pg deep-scrub 404.bc

And got this result.

ceph status
---
HEALTH_ERR 1396 scrub errors; Too many repaired reads on 2 OSDs; 
Possible data damage: 1 pg inconsistent

[ERR] OSD_SCRUB_ERRORS: 1396 scrub errors
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 698 reads repaired
osd.269 had 698 reads repaired
[ERR] PG_DAMAGED: Possible data damage: 1 pg inconsistent
pg 404.bc is active+clean+scrubbing+deep+inconsistent+repair, 
acting [223,297,269,276,136,197]


698 + 698 = 1396 so the same amount of errors.

Run repair again on 404.bc and ceph status is

HEALTH_WARN Too many repaired reads on 2 OSDs
[WRN] OSD_TOO_MANY_REPAIRS: Too many repaired reads on 2 OSDs
osd.223 had 1396 reads repaired
osd.269 had 1396 reads repaired

So even when repair finish it doesn't fix the problem since they 
reappear again after a deep-scrub.


The log for osd.223 and osd.269 contain "got incorrect hash on read" and 
"candidate had an ec hash mismatch" for 698 unique objects.
But i only show the logs for 1 of the 698 object, the log is the same 
for the other 697 objects.


osd.223 log (only showing 1 of 698 object named 
2021-11-08T19%3a43%3a50,145489260+00%3a00)

---
Feb 20 10:31:00 ceph-hd-003 ceph-osd[3665432]: osd.223 pg_epoch: 
231235 pg[404.bcs0( v 231235'1636919 (231078'1632435,231235'1636919] 
local-lis/les=226263/226264 n=296580 ec=36041/27862 lis/c=226263/226263 
les/c/f=226264/230954/0 sis=226263) [223,297,269,276,136,197]p223(0) r=0 
lpr=226263 crt=231235'1636919 lcod 231235'1636918 mlcod 231235'1636918 
active+clean+scrubbing+deep+inconsistent+repair [ 404.bcs0:  REQ_SCRUB ] 
 MUST_REPAIR MUST_DEEP_SCRUB MUST_SCRUB planned REQ_SCRUB] _scan_list  
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
got incorrect hash on read 0xc5d1dd1b !=  expected 0x7c2f86d7
Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) 
log [ERR] : 404.bc shard 223(0) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
: candidate had an ec hash mismatch
Feb 20 10:31:01 ceph-hd-003 ceph-osd[3665432]: log_channel(cluster) 
log [ERR] : 404.bc shard 269(2) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
: candidate had an ec hash mismatch
Feb 20 10:31:01 ceph-hd-003 
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 
2024-02-20T10:31:01.117+ 7f128a88d700 -1 log_channel(cluster) log 
[ERR] : 404.bc shard 223(0) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b13511333a2.625411.3__multipart_2021-11-08T19%3a43%3a50,145489260+00%3a00.2~OoetD5vkh8fyh-2eeR7GF5rZK7d5EVa.1:head 
: candidate had an ec hash mismatch
Feb 20 10:31:01 ceph-hd-003 
ceph-b321e76e-da3a-11eb-b75c-4f948441dcd0-osd-223[3665427]: 
2024-02-20T10:31:01.117+ 7f128a88d700 -1 log_channel(cluster) log 
[ERR] : 404.bc shard 269(2) soid 
404:3d001f95:::1f244892-a2e7-406b-aa62-1b1351

[ceph-users] Re: Performance improvement suggestion

2024-02-21 Thread Peter Grandi
> 1. Write object A from client.
> 2. Fsync to primary device completes.
> 3. Ack to client.
> 4. Writes sent to replicas.
[...]

As mentioned in the discussion this proposal is the opposite of
what the current policy, is, which is to wait for all replicas
to be written before writes are acknowledged to the client:

https://github.com/ceph/ceph/blob/main/doc/architecture.rst

   "After identifying the target placement group, the client
   writes the object to the identified placement group's primary
   OSD. The primary OSD then [...] confirms that the object was
   stored successfully in the secondary and tertiary OSDs, and
   reports to the client that the object was stored
   successfully."

A more revolutionary option would be for 'librados' to write in
parallel to all the "active set" OSDs and report this to the
primary, but that would greatly increase client-Ceph traffic,
while the current logic increases traffic only among OSDs.

> So I think that to maintain any semblance of reliability,
> you'd need to at least wait for a commit ack from the first
> replica (i.e. min_size=2).

Perhaps it could be similar to 'k'+'m' for EC, that is 'k'
synchronous (write completes to the client only when all at
least 'k' replicas, including primary, have been committed) and
'm' asynchronous, instead of 'k' being just 1 or 2.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io