[ceph-users] Re: Ceph reef and (slow) backfilling - how to speed it up

2024-05-02 Thread Wesley Dillingham
In our case it was with a EC pool as well. I believe the PG state was
degraded+recovering / recovery_wait and iirc the PGs just simply sat in the
recovering state without any progress (degraded PG object count did not
decline). A repeer of the PG was attempted but no success there. A restart
of all the OSDs for the given PGs was attempted under mclock. That didnt
work. Switching to wpq for all OSDS in the given PG did resolve the issue.
This was on a 17.2.7 cluster.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Thu, May 2, 2024 at 9:54 AM Sridhar Seshasayee 
wrote:

> >
> > Multiple people -- including me -- have also observed backfill/recovery
> > stop completely for no apparent reason.
> >
> > In some cases poking the lead OSD for a PG with `ceph osd down` restores,
> > in other cases it doesn't.
> >
> > Anecdotally this *may* only happen for EC pools on HDDs but that sample
> > size is small.
> >
> >
> Thanks for the information. We will try and reproduce this locally with EC
> pools and investigate this further.
> I will revert with a tracker for this.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Remove an OSD with hardware issue caused rgw 503

2024-04-26 Thread Wesley Dillingham
What you want to do is to stop the OSD (and all its copies of data it
contains) by stopping the OSD service immediately. The downside of this
approach is it causes the PGs on that OSD to be degraded. But the upside is
the OSD which has bad hardware is immediately no  longer participating in
any client IO (the source of your RGW 503s). In this situation the PGs go
into degraded+backfilling

The alternative method is to keep the failing OSD up and in the cluster but
slowly migrate the data off of it, this would be a long drawn out period of
time in which the failing disk would continue to serve client reads and
also facilitate backfill but you wouldnt take a copy of the data out of the
cluster and cause degraded PGs. In this scenario the PGs would be
remapped+backfilling

I tried to find a way to have your cake and eat it to in relation to this
"predicament" in this tracker issue: https://tracker.ceph.com/issues/44400
but it was deemed "wont fix".

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 26, 2024 at 11:25 AM Mary Zhang  wrote:

> Thank you Eugen for your warm help!
>
> I'm trying to understand the difference between 2 methods.
> For method 1, or "ceph orch osd rm osd_id", OSD Service — Ceph
> Documentation
>  says
> it involves 2 steps:
>
>1.
>
>evacuating all placement groups (PGs) from the OSD
>2.
>
>removing the PG-free OSD from the cluster
>
> For method 2, or the procedure you recommended, Adding/Removing OSDs — Ceph
> Documentation
> <
> https://docs.ceph.com/en/latest/rados/operations/add-or-rm-osds/#removing-osds-manual
> >
> says
> "After the OSD has been taken out of the cluster, Ceph begins rebalancing
> the cluster by migrating placement groups out of the OSD that was removed.
> "
>
> What's the difference between "evacuating PGs" in method 1 and "migrating
> PGs" in method 2? I think method 1 must read the OSD to be removed.
> Otherwise, we would not see slow ops warning. Does method 2 not involve
> reading this OSD?
>
> Thanks,
> Mary
>
> On Fri, Apr 26, 2024 at 5:15 AM Eugen Block  wrote:
>
> > Hi,
> >
> > if you remove the OSD this way, it will be drained. Which means that
> > it will try to recover PGs from this OSD, and in case of hardware
> > failure it might lead to slow requests. It might make sense to
> > forcefully remove the OSD without draining:
> >
> > - stop the osd daemon
> > - mark it as out
> > - osd purge  [--force] [--yes-i-really-mean-it]
> >
> > Regards,
> > Eugen
> >
> > Zitat von Mary Zhang :
> >
> > > Hi,
> > >
> > > We recently removed an osd from our Cepth cluster. Its underlying disk
> > has
> > > a hardware issue.
> > >
> > > We use command: ceph orch osd rm osd_id --zap
> > >
> > > During the process, sometimes ceph cluster enters warning state with
> slow
> > > ops on this osd. Our rgw also failed to respond to requests and
> returned
> > > 503.
> > >
> > > We restarted rgw daemon to make it work again. But the same failure
> > occured
> > > from time to time. Eventually we noticed that rgw 503 error is a result
> > of
> > > osd slow ops.
> > >
> > > Our cluster has 18 hosts and 210 OSDs. We expect remove an osd with
> > > hardware issue won't impact cluster performance & rgw availbility. Is
> our
> > > expectation reasonable? What's the best way to handle osd with hardware
> > > failures?
> > >
> > > Thank you in advance for any comments or suggestions.
> > >
> > > Best Regards,
> > > Mary Zhang
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting S3 bucket policies with multi-tenants

2024-04-12 Thread Wesley Dillingham
Did you actually get this working? I am trying to replicate your steps but
am not being successful doing this with multi-tenant.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Wed, Nov 1, 2023 at 12:52 PM Thomas Bennett  wrote:

> To update my own question, it would seem that  Principle should be
> defined like this:
>
>- "Principal": {"AWS": ["arn:aws:iam::Tenant1:user/readwrite"]}
>
> And resource should:
> "Resource": [ "arn:aws:s3:::backups"]
>
> Is it worth having the docs updates -
> https://docs.ceph.com/en/quincy/radosgw/bucketpolicy/
> to indicate that usfolks in the example is the tenant name?
>
>
> On Wed, 1 Nov 2023 at 18:27, Thomas Bennett  wrote:
>
> > Hi,
> >
> > I'm running Ceph Quincy (17.2.6) with a rados-gateway. I have muti
> > tenants, for example:
> >
> >- Tenant1$manager
> >- Tenant1$readwrite
> >
> > I would like to set a policy on a bucket (backups for example) owned by
> > *Tenant1$manager* to allow *Tenant1$readwrite* access to that bucket. I
> > can't find any documentation that discusses this scenario.
> >
> > Does anyone know how to specify the Principle and Resource section of a
> > policy.json file? Or any other configuration that I might be missing?
> >
> > I've tried some variations on Principal and Resource including and
> > excluding tenant information, but not no luck yet.
> >
> >
> > For example:
> > {
> >   "Version": "2012-10-17",
> >   "Statement": [{
> > "Effect": "Allow",
> > "Principal": {"AWS": ["arn:aws:iam:::user/*Tenant1$readwrite*"]},
> > "Action": ["s3:ListBucket","s3:GetObject", ,"s3:PutObject"],
> > "Resource": [
> >   "arn:aws:s3:::*Tenant1/backups*"
> > ]
> >   }]
> > }
> >
> > I'm using s3cmd for testing, so:
> > s3cmd --config s3cfg.manager setpolicy policy.json s3://backups/
> > Returns:
> > s3://backups/: Policy updated
> >
> > And then testing:
> > s3cmd --config s3cfg.readwrite ls s3://backups/
> > ERROR: Access to bucket 'backups' was denied
> > ERROR: S3 error: 403 (AccessDenied)
> >
> > Thanks,
> > Tom
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PG inconsistent

2024-04-12 Thread Wesley Dillingham
check your ceph.log on the mons for "stat mismatch" and grep for the PG in
question for potentially more information.

Additionally "rados list-inconsistent-obj {pgid}" will often show which OSD
and objects are implicated for the inconsistency. If the acting set has
changed since the scrub (for example an osd is removed or failed) in which
the inconsistency was found this data wont be there any longer and you
would need to deep-scrub the PG again to get that information.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Fri, Apr 12, 2024 at 6:56 AM Frédéric Nass <
frederic.n...@univ-lorraine.fr> wrote:

>
> Hello Albert,
>
> Have you check the hardware status of the involved drives other than with
> smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for
> DELL hardware for example).
> If these tools don't report any media error (that is bad blocs on disks)
> then you might just be facing the bit rot phenomenon. But this is very rare
> and should happen in a sysadmin's lifetime as often as a Royal Flush hand
> in a professional poker player's lifetime. ;-)
>
> If no media error is reported, then you might want to check and update the
> firmware of all drives.
>
> Once you figured it out, you may enable osd_scrub_auto_repair=true to have
> these inconsistencies repaired automatically on deep-scrubbing, but make
> sure you're using the alert module [1] so to at least get informed about
> the scrub errors.
>
> Regards,
> Frédéric.
>
> [1] https://docs.ceph.com/en/latest/mgr/alerts/
>
> - Le 12 Avr 24, à 11:59, Albert Shih albert.s...@obspm.fr a écrit :
>
> > Hi everyone.
> >
> > I got a warning with
> >
> > root@cthulhu1:/etc/ceph# ceph -s
> >  cluster:
> >id: 9c5bb196-c212-11ee-84f3-c3f2beae892d
> >health: HEALTH_ERR
> >1 scrub errors
> >Possible data damage: 1 pg inconsistent
> >
> > So I find the pg with the issue, and launch a pg repair (still waiting)
> >
> > But I try to find «why» so I check all the OSD related on this pg and
> > didn't find anything, no error from osd daemon, no errors from smartctl,
> no
> > error from the kernel message.
> >
> > So I just like to know if that's «normal» or should I scratch deeper.
> >
> > JAS
> > --
> > Albert SHIH 嶺 
> > France
> > Heure locale/Local time:
> > ven. 12 avril 2024 11:51:37 CEST
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-04 Thread Wesley Dillingham
Initial indication shows "osd_async_recovery_min_cost = 0"  to be a huge
win here. Some initial thoughts. Were this not for the fact that the index
(and other OMAP pools) were isolated to their own OSDs in this cluster this
tunable would seemingly cause data/blob objects from data pools to async
recover when synchronous recovery might be better for those pools / that
data. I can play around with how this affects the RGW data pools. There was
a Ceph code walk thru video of this topic:
https://www.youtube.com/watch?v=waOtatCpnYs it seems that perhaps
osd_async_recovery_min_cost may have previously been referred to as
osd_async_recover_min_pg_log_entries (both default to 100). For a pool with
OMAP data where each or some OMAP objects are very very large this may not
be a dynamic enough factor to base the decision on. Thanks for the feedback
everybody!


Respectfully,

*Wes Dillingham*
LinkedIn <http://www.linkedin.com/in/wesleydillingham>
w...@wesdillingham.com




On Wed, Apr 3, 2024 at 1:38 PM Joshua Baergen 
wrote:

> We've had success using osd_async_recovery_min_cost=0 to drastically
> reduce slow ops during index recovery.
>
> Josh
>
> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham 
> wrote:
> >
> > I am fighting an issue on an 18.2.0 cluster where a restart of an OSD
> which
> > supports the RGW index pool causes crippling slow ops. If the OSD is
> marked
> > with primary-affinity of 0 prior to the OSD restart no slow ops are
> > observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> > ops only occur during the recovery period of the OMAP data and further
> only
> > occur when client activity is allowed to pass to the cluster. Luckily I
> am
> > able to test this during periods when I can disable all client activity
> at
> > the upstream proxy.
> >
> > Given the behavior of the primary affinity changes preventing the slow
> ops
> > I think this may be a case of recovery being more detrimental than
> > backfill. I am thinking that causing an pg_temp acting set by forcing
> > backfill may be the right method to mitigate the issue. [1]
> >
> > I believe that reducing the PG log entries for these OSDs would
> accomplish
> > that but I am also thinking a tuning of osd_async_recovery_min_cost [2]
> may
> > also accomplish something similar. Not sure the appropriate tuning for
> that
> > config at this point or if there may be a better approach. Seeking any
> > input here.
> >
> > Further if this issue sounds familiar or sounds like another condition
> > within the OSD may be at hand I would be interested in hearing your input
> > or thoughts. Thanks!
> >
> > [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> > [2]
> >
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> > w...@wesdillingham.com
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Slow ops during recovery for RGW index pool only when degraded OSD is primary

2024-04-03 Thread Wesley Dillingham
I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
supports the RGW index pool causes crippling slow ops. If the OSD is marked
with primary-affinity of 0 prior to the OSD restart no slow ops are
observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
ops only occur during the recovery period of the OMAP data and further only
occur when client activity is allowed to pass to the cluster. Luckily I am
able to test this during periods when I can disable all client activity at
the upstream proxy.

Given the behavior of the primary affinity changes preventing the slow ops
I think this may be a case of recovery being more detrimental than
backfill. I am thinking that causing an pg_temp acting set by forcing
backfill may be the right method to mitigate the issue. [1]

I believe that reducing the PG log entries for these OSDs would accomplish
that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
also accomplish something similar. Not sure the appropriate tuning for that
config at this point or if there may be a better approach. Seeking any
input here.

Further if this issue sounds familiar or sounds like another condition
within the OSD may be at hand I would be interested in hearing your input
or thoughts. Thanks!

[1] https://docs.ceph.com/en/latest/dev/peering/#concepts
[2]
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mounting A RBD Via Kernal Modules

2024-03-24 Thread Wesley Dillingham
I suspect this may be a network / firewall issue between the client and one
OSD-server. Perhaps the 100MB RBD didn't have an object mapped to a PG with
the primary on this problematic OSD host but the 2TB RBD does. Just a
theory.

Respectfully,

*Wes Dillingham*
LinkedIn 
w...@wesdillingham.com




On Mon, Mar 25, 2024 at 12:34 AM duluxoz  wrote:

> Hi Alexander,
>
> Already set (and confirmed by running the command again) - no good, I'm
> afraid.
>
> So I just restart with a brand new image and ran the following commands
> on the ceph cluster and the host respectively. Results are below:
>
> On the ceph cluster:
>
> [code]
>
> rbd create --size 4T my_pool.meta/my_image --data-pool my_pool.data
> --image-feature exclusive-lock --image-feature deep-flatten
> --image-feature fast-diff --image-feature layering --image-feature
> object-map --image-feature data-pool
>
> [/code]
>
> On the host:
>
> [code]
>
> rbd device map my_pool.meta/my_image --id ceph_rbd_user --keyring
> /etc/ceph/ceph.client.ceph_rbd_user.keyring
>
> mkfs.xfs /dev/rbd0
>
> [/code]
>
> Results:
>
> [code]
>
> meta-data=/dev/rbd0  isize=512agcount=32,
> agsize=33554432 blks
>   =   sectsz=512   attr=2, projid32bit=1
>   =   crc=1finobt=1, sparse=1, rmapbt=0
>   =   reflink=1bigtime=1 inobtcount=1
> nrext64=0
> data =   bsize=4096   blocks=1073741824, imaxpct=5
>   =   sunit=16 swidth=16 blks
> naming   =version 2  bsize=4096   ascii-ci=0, ftype=1
> log  =internal log   bsize=4096   blocks=521728, version=2
>   =   sectsz=512   sunit=16 blks, lazy-count=1
> realtime =none   extsz=4096   blocks=0, rtextents=0
> Discarding blocks...Done.
> mkfs.xfs: pwrite failed: Input/output error
> libxfs_bwrite: write failed on (unknown) bno 0x1ff00/0x100, err=5
> mkfs.xfs: Releasing dirty buffer to free list!
> found dirty buffer (bulk) on free list!
> mkfs.xfs: pwrite failed: Input/output error
> libxfs_bwrite: write failed on (unknown) bno 0x0/0x100, err=5
> mkfs.xfs: Releasing dirty buffer to free list!
> found dirty buffer (bulk) on free list!
> mkfs.xfs: pwrite failed: Input/output error
> libxfs_bwrite: write failed on xfs_sb bno 0x0/0x1, err=5
> mkfs.xfs: Releasing dirty buffer to free list!
> found dirty buffer (bulk) on free list!
> mkfs.xfs: pwrite failed: Input/output error
> libxfs_bwrite: write failed on (unknown) bno 0x10080/0x80, err=5
> mkfs.xfs: Releasing dirty buffer to free list!
> found dirty buffer (bulk) on free list!
> mkfs.xfs: read failed: Input/output error
> mkfs.xfs: data size check failed
> mkfs.xfs: filesystem failed to initialize
> [/code]
>
> On 25/03/2024 15:17, Alexander E. Patrakov wrote:
> > Hello Matthew,
> >
> > Is the overwrite enabled in the erasure-coded pool? If not, here is
> > how to fix it:
> >
> > ceph osd pool set my_pool.data allow_ec_overwrites true
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: activating+undersized+degraded+remapped

2024-03-17 Thread Wesley Dillingham
You may be suffering from the "crush gives up too soon" situation:

https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#crush-gives-up-too-soon

You have a 5+3 with only 8 hosts, you may need to increase your crush
tries. See the link for how to fix

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Sun, Mar 17, 2024 at 8:18 AM Joachim Kraftmayer - ceph ambassador <
joachim.kraftma...@clyso.com> wrote:

> also helpful is the output of:
>
> cephpg{poolnum}.{pg-id}query
>
> ___
> ceph ambassador DACH
> ceph consultant since 2012
>
> Clyso GmbH - Premier Ceph Foundation Member
>
> https://www.clyso.com/
>
> Am 16.03.24 um 13:52 schrieb Eugen Block:
> > Yeah, the whole story would help to give better advice. With EC the
> > default min_size is k+1, you could reduce the min_size to 5
> > temporarily, this might bring the PGs back online. But the long term
> > fix is to have all required OSDs up and have enough OSDs to sustain an
> > outage.
> >
> > Zitat von Wesley Dillingham :
> >
> >> Please share "ceph osd tree" and "ceph osd df tree" I suspect you
> >> have not
> >> enough hosts to satisfy the EC
> >>
> >> On Sat, Mar 16, 2024, 8:04 AM Deep Dish  wrote:
> >>
> >>> Hello
> >>>
> >>> I found myself in the following situation:
> >>>
> >>> [WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive
> >>>
> >>> pg 4.3d is stuck inactive for 8d, current state
> >>> activating+undersized+degraded+remapped, last acting
> >>> [4,NONE,46,NONE,10,13,NONE,74]
> >>>
> >>> pg 4.6e is stuck inactive for 9d, current state
> >>> activating+undersized+degraded+remapped, last acting
> >>> [NONE,27,77,79,55,48,50,NONE]
> >>>
> >>> pg 4.cb is stuck inactive for 8d, current state
> >>> activating+undersized+degraded+remapped, last acting
> >>> [6,NONE,42,8,60,22,35,45]
> >>>
> >>>
> >>> I have one cephfs with two backing pools -- one for replicated data,
> >>> the
> >>> other for erasure data.  Each pool is mapped to REPLICATED/ vs.
> >>> ERASURE/
> >>> directories on the filesystem.
> >>>
> >>>
> >>> The above pgs. are affecting the ERASURE pool (5+3) backing the
> >>> FS.   How
> >>> can I get ceph to recover these three PGs?
> >>>
> >>>
> >>>
> >>> Thank you.
> >>> ___
> >>> ceph-users mailing list -- ceph-users@ceph.io
> >>> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: activating+undersized+degraded+remapped

2024-03-16 Thread Wesley Dillingham
Please share "ceph osd tree" and "ceph osd df tree" I suspect you have not
enough hosts to satisfy the EC

On Sat, Mar 16, 2024, 8:04 AM Deep Dish  wrote:

> Hello
>
> I found myself in the following situation:
>
> [WRN] PG_AVAILABILITY: Reduced data availability: 3 pgs inactive
>
> pg 4.3d is stuck inactive for 8d, current state
> activating+undersized+degraded+remapped, last acting
> [4,NONE,46,NONE,10,13,NONE,74]
>
> pg 4.6e is stuck inactive for 9d, current state
> activating+undersized+degraded+remapped, last acting
> [NONE,27,77,79,55,48,50,NONE]
>
> pg 4.cb is stuck inactive for 8d, current state
> activating+undersized+degraded+remapped, last acting
> [6,NONE,42,8,60,22,35,45]
>
>
> I have one cephfs with two backing pools -- one for replicated data, the
> other for erasure data.  Each pool is mapped to REPLICATED/ vs. ERASURE/
> directories on the filesystem.
>
>
> The above pgs. are affecting the ERASURE pool (5+3) backing the FS.   How
> can I get ceph to recover these three PGs?
>
>
>
> Thank you.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph-storage slack access

2024-03-06 Thread Wesley Dillingham
At the very bottom of this page is a link
https://ceph.io/en/community/connect/

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Mar 6, 2024 at 11:45 AM Matthew Vernon 
wrote:

> Hi,
>
> How does one get an invite to the ceph-storage slack, please?
>
> Thanks,
>
> Matthew
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-02-01 Thread Wesley Dillingham
I would just set noout for the duration of the reboot no other flags really
needed. There is a better option to limit that flag to just the host being
rebooted. which is "set-group noout " where  is the servers
name  in CRUSH. Just the global noout will suffice though.

Anyways... your not scrubbed in time warnings arent going away anytime in
the short term until you finish the pg split. In fact, they will get more
numerous until the pg split finishes (did you start that?). If you want to
get rid of the "cosmetic" issue of the warning you can adjust the interval
in which the warning comes up, but I would suggest you leave it since you
are trying to address the root of the situation and want to see the
resolution.



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, Feb 1, 2024 at 9:16 AM Michel Niyoyita  wrote:

> And as said before still it is in warning state with pgs not deep-scrubed
> in time . Hope this can be ignored and set those two flags "noout and
> nobackfill" then reboot .
>
> Thank you again Sir
>
> On Thu, 1 Feb 2024, 16:11 Michel Niyoyita,  wrote:
>
>> Thank you very much Janne.
>>
>> On Thu, 1 Feb 2024, 15:21 Janne Johansson,  wrote:
>>
>>> pause and nodown is not a good option to set, that will certainly make
>>> clients stop IO. Pause will stop it immediately, and nodown will stop
>>> IO when the OSD processes stop running on this host.
>>>
>>> When we do service on a host, we set "noout" and "nobackfill", that is
>>> enough for reboots, OS upgrades and simple disk exchanges.
>>> The PGs on this one host will be degraded during the down period, but
>>> IO continues.
>>> Of course this is when the cluster was healthy to begin with (not
>>> counting "not scrubbed in time" warnings, they don't matter in this
>>> case.)
>>>
>>>
>>>
>>> Den tors 1 feb. 2024 kl 12:21 skrev Michel Niyoyita :
>>> >
>>> > Thanks Very much Wesley,
>>> >
>>> > We have decided to restart one host among three osds hosts. before
>>> doing
>>> > that I need the advices of the team . these are flags I want to set
>>> before
>>> > restart.
>>> >
>>> >  'ceph osd set noout'
>>> >  'ceph osd set nobackfill'
>>> >  'ceph osd set norecover'
>>> >  'ceph osd set norebalance'
>>> > 'ceph osd set nodown'
>>> >  'ceph osd set pause'
>>> > 'ceph osd set nodeep-scrub'
>>> > 'ceph osd set noscrub'
>>> >
>>> >
>>> > Would like to ask if this can be enough to set and restart the host
>>> safely
>>> > . the cluster has 3 as replicas.
>>> >
>>> > will the cluster still be accessible while restart the hosts? after
>>> > restarting I will unset the flags.
>>> >
>>> > Kindly advise.
>>> >
>>> > Michel
>>> >
>>> >
>>> > On Tue, 30 Jan 2024, 17:44 Wesley Dillingham, 
>>> wrote:
>>> >
>>> > > actually it seems the issue I had in mind was fixed in 16.2.11 so you
>>> > > should be fine.
>>> > >
>>> > > Respectfully,
>>> > >
>>> > > *Wes Dillingham*
>>> > > w...@wesdillingham.com
>>> > > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >
>>> > >
>>> > > On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham <
>>> w...@wesdillingham.com>
>>> > > wrote:
>>> > >
>>> > >> You may want to consider upgrading to 16.2.14 before you do the pg
>>> split.
>>> > >>
>>> > >> Respectfully,
>>> > >>
>>> > >> *Wes Dillingham*
>>> > >> w...@wesdillingham.com
>>> > >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> > >>
>>> > >>
>>> > >> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita >> >
>>> > >> wrote:
>>> > >>
>>> > >>> I tried that on one of my pool (pool id 3) but the number of pgs
>>> not
>>> > >>> deep-scrubbed in time increased also from 55 to 100 but the number
>>> of PGs
>>> > >>> was increased. I set also autoscale to off mode. before continue
>>> to other
>>> > >>

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-30 Thread Wesley Dillingham
actually it seems the issue I had in mind was fixed in 16.2.11 so you
should be fine.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jan 30, 2024 at 10:34 AM Wesley Dillingham 
wrote:

> You may want to consider upgrading to 16.2.14 before you do the pg split.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita 
> wrote:
>
>> I tried that on one of my pool (pool id 3) but the number of pgs not
>> deep-scrubbed in time increased also from 55 to 100 but the number of PGs
>> was increased. I set also autoscale to off mode. before continue to other
>> pools would like to ask if so far there is no negative impact.
>>
>> ceph -s
>>   cluster:
>> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
>> health: HEALTH_WARN
>> 100 pgs not deep-scrubbed in time
>>
>>   services:
>> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
>> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
>> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
>> rgw: 6 daemons active (6 hosts, 1 zones)
>>
>>   data:
>> pools:   10 pools, 609 pgs
>> objects: 6.03M objects, 23 TiB
>> usage:   151 TiB used, 282 TiB / 433 TiB avail
>> pgs: 603 active+clean
>>  4   active+clean+scrubbing+deep
>>  2   active+clean+scrubbing
>>
>>   io:
>> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>>
>> root@ceph-osd3:/var/log# ceph df
>> --- RAW STORAGE ---
>> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
>> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
>> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
>>
>> --- POOLS ---
>> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX
>> AVAIL
>> device_health_metrics   11  1.1 MiB3  3.2 MiB  0 72
>> TiB
>> .rgw.root   2   32  3.7 KiB8   96 KiB  0 72
>> TiB
>> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0 72
>> TiB
>> default.rgw.control 4   32  0 B8  0 B  0 72
>> TiB
>> default.rgw.meta5   32382 B2   24 KiB  0 72
>> TiB
>> volumes 6  128   21 TiB5.74M   62 TiB  22.30 72
>> TiB
>> images      7   32  878 GiB  112.50k  2.6 TiB   1.17 72
>> TiB
>> backups 8   32  0 B0  0 B  0 72
>> TiB
>> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13 72
>> TiB
>> testbench  10   32  0 B0  0 B  0 72
>> TiB
>>
>> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham 
>> wrote:
>>
>>> It will take a couple weeks to a couple months to complete is my best
>>> guess on 10TB spinners at ~40% full. The cluster should be usable
>>> throughout the process.
>>>
>>> Keep in mind, you should disable the pg autoscaler on any pool which you
>>> are manually adjusting the pg_num for. Increasing the pg_num is called "pg
>>> splitting" you can google around for this to see how it will work etc.
>>>
>>> There are a few knobs to increase or decrease the aggressiveness of the
>>> pg split, primarily these are osd_max_backfills and
>>> target_max_misplaced_ratio.
>>>
>>> You can monitor the progress of the split by looking at "ceph osd pool
>>> ls detail" for the pool you are splitting, for this pool pgp_num will
>>> slowly increase up until it reaches the pg_num / pg_num_target.
>>>
>>> IMO this blog post best covers the issue which you are looking to
>>> undertake:
>>> https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/
>>>
>>> Respectfully,
>>>
>>> *Wes Dillingham*
>>> w...@wesdillingham.com
>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>>
>>>
>>> On Tue, Jan 30, 2024 at 9:38 AM Michel Niyoyita 
>>> wrote:
>>>
>>>> Thanks for your advices Wes, below is what ceph osd df tree shows , the
>>>> increase of pg_num of the production cluster will not affect the
>>>> performance or crush ? how long it can takes to finish?
>>>>
>>>> ceph osd df tree
>>&

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-30 Thread Wesley Dillingham
You may want to consider upgrading to 16.2.14 before you do the pg split.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jan 30, 2024 at 10:18 AM Michel Niyoyita  wrote:

> I tried that on one of my pool (pool id 3) but the number of pgs not
> deep-scrubbed in time increased also from 55 to 100 but the number of PGs
> was increased. I set also autoscale to off mode. before continue to other
> pools would like to ask if so far there is no negative impact.
>
> ceph -s
>   cluster:
> id: cb0caedc-eb5b-42d1-a34f-96facfda8c27
> health: HEALTH_WARN
> 100 pgs not deep-scrubbed in time
>
>   services:
> mon: 3 daemons, quorum ceph-mon1,ceph-mon2,ceph-mon3 (age 11M)
> mgr: ceph-mon2(active, since 11M), standbys: ceph-mon3, ceph-mon1
> osd: 48 osds: 48 up (since 11M), 48 in (since 12M)
> rgw: 6 daemons active (6 hosts, 1 zones)
>
>   data:
> pools:   10 pools, 609 pgs
> objects: 6.03M objects, 23 TiB
> usage:   151 TiB used, 282 TiB / 433 TiB avail
> pgs: 603 active+clean
>  4   active+clean+scrubbing+deep
>  2   active+clean+scrubbing
>
>   io:
> client:   96 MiB/s rd, 573 MiB/s wr, 576 op/s rd, 648 op/s wr
>
> root@ceph-osd3:/var/log# ceph df
> --- RAW STORAGE ---
> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.93
> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.93
>
> --- POOLS ---
> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> device_health_metrics   11  1.1 MiB3  3.2 MiB  0 72 TiB
> .rgw.root   2   32  3.7 KiB8   96 KiB  0 72 TiB
> default.rgw.log 3  256  3.6 KiB  204  408 KiB  0 72 TiB
> default.rgw.control 4   32  0 B8  0 B  0 72 TiB
> default.rgw.meta5   32382 B2   24 KiB  0 72 TiB
> volumes 6  128   21 TiB5.74M   62 TiB  22.30 72 TiB
> images  7   32  878 GiB  112.50k  2.6 TiB   1.17 72 TiB
> backups 8   32  0 B0  0 B  0 72 TiB
> vms 9   32  870 GiB  170.73k  2.5 TiB   1.13     72 TiB
> testbench  10   32  0 B0  0 B  0 72 TiB
>
> On Tue, Jan 30, 2024 at 5:05 PM Wesley Dillingham 
> wrote:
>
>> It will take a couple weeks to a couple months to complete is my best
>> guess on 10TB spinners at ~40% full. The cluster should be usable
>> throughout the process.
>>
>> Keep in mind, you should disable the pg autoscaler on any pool which you
>> are manually adjusting the pg_num for. Increasing the pg_num is called "pg
>> splitting" you can google around for this to see how it will work etc.
>>
>> There are a few knobs to increase or decrease the aggressiveness of the
>> pg split, primarily these are osd_max_backfills and
>> target_max_misplaced_ratio.
>>
>> You can monitor the progress of the split by looking at "ceph osd pool ls
>> detail" for the pool you are splitting, for this pool pgp_num will slowly
>> increase up until it reaches the pg_num / pg_num_target.
>>
>> IMO this blog post best covers the issue which you are looking to
>> undertake:
>> https://ceph.io/en/news/blog/2019/new-in-nautilus-pg-merging-and-autotuning/
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Tue, Jan 30, 2024 at 9:38 AM Michel Niyoyita 
>> wrote:
>>
>>> Thanks for your advices Wes, below is what ceph osd df tree shows , the
>>> increase of pg_num of the production cluster will not affect the
>>> performance or crush ? how long it can takes to finish?
>>>
>>> ceph osd df tree
>>> ID  CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA  OMAP
>>>  META AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
>>> -1 433.11841 -  433 TiB  151 TiB67 TiB  364 MiB  210
>>> GiB  282 TiB  34.86  1.00-  root default
>>> -3 144.37280 -  144 TiB   50 TiB22 TiB  121 MiB   70
>>> GiB   94 TiB  34.86  1.00-  host ceph-osd1
>>>  2hdd9.02330   1.0  9.0 TiB  2.7 TiB  1021 GiB  5.4 MiB  3.7
>>> GiB  6.3 TiB  30.40  0.87   19  up  osd.2
>>>  3hdd9.02330   1.0  9.0 TiB  2.7 TiB   931 GiB  4.1 MiB  3.5
>>> GiB  6.4 TiB  29.43  0.84   29  up  osd.3
&g

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-30 Thread Wesley Dillingham
.1 TiB  32.77  0.94   21  up  osd.19
> 21hdd9.02330   1.0  9.0 TiB  2.8 TiB   1.1 TiB  5.5 MiB  3.7
> GiB  6.2 TiB  31.58  0.91   26  up  osd.21
> 24hdd9.02330   1.0  9.0 TiB  2.6 TiB   855 GiB  4.7 MiB  3.3
> GiB  6.4 TiB  28.61  0.82   19  up  osd.24
> 27hdd9.02330   1.0  9.0 TiB  3.7 TiB   1.9 TiB   10 MiB  5.2
> GiB  5.3 TiB  40.84  1.17   24  up  osd.27
> 30hdd9.02330   1.0  9.0 TiB  3.2 TiB   1.4 TiB  7.5 MiB  4.5
> GiB  5.9 TiB  35.16  1.01   22  up  osd.30
> 33hdd9.02330   1.0  9.0 TiB  3.1 TiB   1.4 TiB  8.6 MiB  4.3
> GiB  5.9 TiB  34.59  0.99   23  up  osd.33
> 36hdd9.02330   1.0  9.0 TiB  3.4 TiB   1.7 TiB   10 MiB  5.0
> GiB  5.6 TiB  38.17  1.09   25  up  osd.36
> 39hdd9.02330   1.0  9.0 TiB  3.4 TiB   1.7 TiB  8.5 MiB  5.1
> GiB  5.6 TiB  37.79  1.08   31  up  osd.39
> 42hdd9.02330   1.0  9.0 TiB  3.6 TiB   1.8 TiB   10 MiB  5.2
> GiB  5.4 TiB  39.68  1.14   23  up  osd.42
> 45hdd9.02330   1.0  9.0 TiB  2.7 TiB   964 GiB  5.1 MiB  3.5
> GiB  6.3 TiB  29.78  0.85   21  up  osd.45
> -5 144.37280 -  144 TiB   50 TiB22 TiB  121 MiB   70
> GiB   94 TiB  34.86  1.00-  host ceph-osd3
>  0hdd9.02330   1.0  9.0 TiB  2.7 TiB   934 GiB  4.9 MiB  3.4
> GiB  6.4 TiB  29.47  0.85   21  up  osd.0
>  4hdd9.02330   1.0  9.0 TiB  3.0 TiB   1.2 TiB  6.5 MiB  4.1
> GiB  6.1 TiB  32.73  0.94   22  up  osd.4
>  7hdd9.02330   1.0  9.0 TiB  3.5 TiB   1.8 TiB  9.2 MiB  5.1
> GiB  5.5 TiB  39.02  1.12   30  up  osd.7
> 11hdd9.02330   1.0  9.0 TiB  3.6 TiB   1.9 TiB   10 MiB  5.1
> GiB  5.4 TiB  39.97  1.15   27  up  osd.11
> 14hdd9.02330   1.0  9.0 TiB  3.5 TiB   1.7 TiB   10 MiB  5.1
> GiB  5.6 TiB  38.24  1.10   27  up  osd.14
> 17hdd9.02330   1.0  9.0 TiB  3.0 TiB   1.2 TiB  6.4 MiB  4.1
> GiB  6.0 TiB  33.09  0.95   23  up  osd.17
> 20hdd9.02330   1.0  9.0 TiB  2.8 TiB   1.1 TiB  5.6 MiB  3.8
> GiB  6.2 TiB  31.55  0.90   20  up  osd.20
> 23hdd9.02330   1.0  9.0 TiB  2.6 TiB   828 GiB  4.0 MiB  3.3
> GiB  6.5 TiB  28.32  0.81   23  up  osd.23
> 26hdd9.02330   1.0  9.0 TiB  2.9 TiB   1.2 TiB  5.8 MiB  3.8
> GiB  6.1 TiB  32.12  0.92   26  up  osd.26
> 29hdd9.02330   1.0  9.0 TiB  3.6 TiB   1.8 TiB   10 MiB  5.1
> GiB  5.4 TiB  39.73  1.14   24  up  osd.29
> 31hdd9.02330   1.0  9.0 TiB  2.8 TiB   1.1 TiB  5.8 MiB  3.7
> GiB  6.2 TiB  31.56  0.91   22  up  osd.31
> 34hdd9.02330   1.0  9.0 TiB  3.3 TiB   1.5 TiB  8.2 MiB  4.6
> GiB  5.7 TiB  36.29  1.04   23  up  osd.34
> 37hdd9.02330   1.0  9.0 TiB  3.2 TiB   1.5 TiB  8.2 MiB  4.5
> GiB  5.8 TiB  35.51  1.02   20  up  osd.37
> 40hdd9.02330   1.0  9.0 TiB  3.4 TiB   1.7 TiB  9.3 MiB  4.9
> GiB  5.6 TiB  38.16  1.09   25  up  osd.40
> 43hdd9.02330   1.0  9.0 TiB  3.4 TiB   1.6 TiB  8.5 MiB  4.8
> GiB  5.7 TiB  37.19  1.07   29  up  osd.43
> 46hdd9.02330   1.0  9.0 TiB  3.1 TiB   1.4 TiB  8.4 MiB  4.4
> GiB  5.9 TiB  34.85  1.00   23  up  osd.46
>  TOTAL  433 TiB  151 TiB67 TiB  364 MiB  210
> GiB  282 TiB  34.86
> MIN/MAX VAR: 0.81/1.28  STDDEV: 3.95
>
>
> Michel
>
>
> On Tue, Jan 30, 2024 at 4:18 PM Wesley Dillingham 
> wrote:
>
>> I now concur you should increase the pg_num as a first step for this
>> cluster. Disable the pg autoscaler for and increase the volumes pool to
>> pg_num 256. Then likely re-asses and make the next power of 2 jump to 512
>> and probably beyond.
>>
>> Keep in mind this is not going to fix your short term deep-scrub issue in
>> fact it will increase the number of not scrubbed in time PGs until the
>> pg_num change is complete.  This is because OSDs dont scrub when they are
>> backfilling.
>>
>> I would sit on 256 for a couple weeks and let scrubs happen then continue
>> past 256.
>>
>> with the ultimate target of around 100-200 PGs per OSD which "ceph osd df
>> tree" will show you in the PGs column.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita 
>> wrote:
>>
>>> Dear te

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-30 Thread Wesley Dillingham
I now concur you should increase the pg_num as a first step for this
cluster. Disable the pg autoscaler for and increase the volumes pool to
pg_num 256. Then likely re-asses and make the next power of 2 jump to 512
and probably beyond.

Keep in mind this is not going to fix your short term deep-scrub issue in
fact it will increase the number of not scrubbed in time PGs until the
pg_num change is complete.  This is because OSDs dont scrub when they are
backfilling.

I would sit on 256 for a couple weeks and let scrubs happen then continue
past 256.

with the ultimate target of around 100-200 PGs per OSD which "ceph osd df
tree" will show you in the PGs column.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Jan 30, 2024 at 3:16 AM Michel Niyoyita  wrote:

> Dear team,
>
> below is the output of ceph df command and the ceph version I am running
>
>  ceph df
> --- RAW STORAGE ---
> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> hdd433 TiB  282 TiB  151 TiB   151 TiB  34.82
> TOTAL  433 TiB  282 TiB  151 TiB   151 TiB  34.82
>
> --- POOLS ---
> POOL   ID  PGS   STORED  OBJECTS USED  %USED  MAX AVAIL
> device_health_metrics   11  1.1 MiB3  3.2 MiB  0 73 TiB
> .rgw.root   2   32  3.7 KiB8   96 KiB  0 73 TiB
> default.rgw.log 3   32  3.6 KiB  209  408 KiB  0 73 TiB
> default.rgw.control 4   32  0 B8  0 B  0 73 TiB
> default.rgw.meta5   32382 B2   24 KiB  0 73 TiB
> volumes 6  128   21 TiB5.68M   62 TiB  22.09 73 TiB
> images  7   32  878 GiB  112.50k  2.6 TiB   1.17 73 TiB
> backups 8   32  0 B0  0 B  0 73 TiB
> vms 9   32  881 GiB  174.30k  2.5 TiB   1.13 73 TiB
> testbench  10   32  0 B0  0 B  0 73 TiB
> root@ceph-mon1:~# ceph --version
> ceph version 16.2.11 (3cf40e2dca667f68c6ce3ff5cd94f01e711af894) pacific
> (stable)
> root@ceph-mon1:~#
>
> please advise accordingly
>
> Michel
>
> On Mon, Jan 29, 2024 at 9:48 PM Frank Schilder  wrote:
>
> > You will have to look at the output of "ceph df" and make a decision to
> > balance "objects per PG" and "GB per PG". Increase he PG count for the
> > pools with the worst of these two numbers most such that it balances out
> as
> > much as possible. If you have pools that see significantly more user-IO
> > than others, prioritise these.
> >
> > You will have to find out for your specific cluster, we can only give
> > general guidelines. Make changes, run benchmarks, re-evaluate. Take the
> > time for it. The better you know your cluster and your users, the better
> > the end result will be.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Michel Niyoyita 
> > Sent: Monday, January 29, 2024 2:04 PM
> > To: Janne Johansson
> > Cc: Frank Schilder; E Taka; ceph-users
> > Subject: Re: [ceph-users] Re: 6 pgs not deep-scrubbed in time
> >
> > This is how it is set , if you suggest to make some changes please
> advises.
> >
> > Thank you.
> >
> >
> > ceph osd pool ls detail
> > pool 1 'device_health_metrics' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change
> 1407
> > flags hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application
> > mgr_devicehealth
> > pool 2 '.rgw.root' replicated size 3 min_size 2 crush_rule 0 object_hash
> > rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 1393 flags
> > hashpspool stripe_width 0 application rgw
> > pool 3 'default.rgw.log' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> > 1394 flags hashpspool stripe_width 0 application rgw
> > pool 4 'default.rgw.control' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> > 1395 flags hashpspool stripe_width 0 application rgw
> > pool 5 'default.rgw.meta' replicated size 3 min_size 2 crush_rule 0
> > object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> > 1396 flags hashpspool stripe_width 0 pg_autoscale_bias 4 application rgw
> > pool 6 'volumes' replicated size 3 min_size 2 crush_rule 0 object_hash
> > rjenkins pg_num 128 pgp_num 128 autoscale_mode on last_change 108802 lfor
> > 0/0/14812 flags hashpspool,selfmanaged_snaps stripe_width 0 application
> rbd
> > removed_snaps_queue
> >
> [22d7~3,11561~2,11571~1,11573~1c,11594~6,1159b~f,115b0~1,115b3~1,115c3~1,115f3~1,115f5~e,11613~6,1161f~c,11637~1b,11660~1,11663~2,11673~1,116d1~c,116f5~10,11721~c]
> > pool 7 'images' replicated size 3 min_size 2 crush_rule 0 object_hash
> > rjenkins pg_num 32 pgp_num 32 

[ceph-users] Re: 6 pgs not deep-scrubbed in time

2024-01-29 Thread Wesley Dillingham
Respond back with "ceph versions" output

If your sole goal is to eliminate the not scrubbed in time errors you can
increase the aggressiveness of scrubbing by setting:
osd_max_scrubs = 2

The default in pacific is 1.

if you are going to start tinkering manually with the pg_num you will want
to turn off the pg autoscaler on the pools you are touching.
reducing the size of your PGs may make sense and help with scrubbing but if
the pool has a lot of data it will take a long long time to finish.





Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jan 29, 2024 at 10:08 AM Michel Niyoyita  wrote:

> I am running ceph pacific , version 16 , ubuntu 20 OS , deployed using
> ceph-ansible.
>
> Michel
>
> On Mon, Jan 29, 2024 at 4:47 PM Josh Baergen 
> wrote:
>
> > Make sure you're on a fairly recent version of Ceph before doing this,
> > though.
> >
> > Josh
> >
> > On Mon, Jan 29, 2024 at 5:05 AM Janne Johansson 
> > wrote:
> > >
> > > Den mån 29 jan. 2024 kl 12:58 skrev Michel Niyoyita  >:
> > > >
> > > > Thank you Frank ,
> > > >
> > > > All disks are HDDs . Would like to know if I can increase the number
> > of PGs
> > > > live in production without a negative impact to the cluster. if yes
> > which
> > > > commands to use .
> > >
> > > Yes. "ceph osd pool set  pg_num "
> > > where the number usually should be a power of two that leads to a
> > > number of PGs per OSD between 100-200.
> > >
> > > --
> > > May the most significant bit of your life be positive.
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.7: Backfilling deadlock / stall / stuck / standstill

2024-01-26 Thread Wesley Dillingham
I faced a similar issue. The PG just would never finish recovery. Changing
all OSDs in the PG to "osd_op_queue wpq" and then restarting them serially
ultimately allowed the PG to recover. Seemed to be some issue with mclock.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Jan 26, 2024 at 7:57 AM Kai Stian Olstad 
wrote:

> Hi,
>
> This is a cluster running 17.2.7 upgraded from 16.2.6 on the 15 January
> 2024.
>
> On Monday 22 January we had 4 HDD all on different server with I/O-error
> because of some damage sectors, the OSD is hybrid so the DB is on SSD, 5
> HDD share 1 SSD.
> I set the OSD out, ceph osd out 223 269 290 318 and all hell broke
> loose.
>
> I took only minutes before the users complained about Ceph not working.
> Ceph status reportet slow OPS on the OSDs that was set to out, and “ceph
> tell osd. dump_ops_in_flight” against the out OSDs it just hang,
> after 30 minutes I stopped the dump command.
> Long story short I ended up running “ceph osd set nobackfill” to slow
> ops was gone and then unset it when the slow ops message disappeared.
> I needed to run that all the time so the cluster didn’t come to a holt
> so this oneliner loop was used
> “while true; do ceph -s | grep -qE "oldest one blocked for [0-9]{2,}" &&
> (date; ceph osd set nobackfill; sleep 15; ceph osd unset nobackfill);
> sleep 10; done”
>
>
> But now 4 days later the backfilling has stopped progressing completely
> and the number of misplaced object is increasing.
> Some PG has 0 misplaced object but sill have backfilling state, and been
> in this state for over 24 hours now.
>
> I have a hunch that it’s because of PG 404.6e7 is in state
> “active+recovering+degraded+remapped” it’s been in this state for over
> 48 hours.
> It’s has possible 2 missing object, but since they are not unfound I
> can’t delete them with “ceph pg 404.6e7 mark_unfound_lost delete”
>
> Could someone please help to solve this?
> Down below is some output of ceph commands, I’ll also attache them.
>
>
> ceph status (only removed information about no running scrub and
> deep_scrub)
> ---
>cluster:
>  id: b321e76e-da3a-11eb-b75c-4f948441dcd0
>  health: HEALTH_WARN
>  Degraded data redundancy: 2/6294904971 objects degraded
> (0.000%), 1 pg degraded
>
>services:
>  mon: 3 daemons, quorum ceph-mon-1,ceph-mon-2,ceph-mon-3 (age 11d)
>  mgr: ceph-mon-1.ptrsea(active, since 11d), standbys:
> ceph-mon-2.mfdanx
>  mds: 1/1 daemons up, 1 standby
>  osd: 355 osds: 355 up (since 22h), 351 in (since 4d); 18 remapped
> pgs
>  rgw: 7 daemons active (7 hosts, 1 zones)
>
>data:
>  volumes: 1/1 healthy
>  pools:   14 pools, 3945 pgs
>  objects: 1.14G objects, 1.1 PiB
>  usage:   1.8 PiB used, 1.2 PiB / 3.0 PiB avail
>  pgs: 2/6294904971 objects degraded (0.000%)
>   2980455/6294904971 objects misplaced (0.047%)
>   3901 active+clean
>   22   active+clean+scrubbing+deep
>   17   active+remapped+backfilling
>   4active+clean+scrubbing
>   1active+recovering+degraded+remapped
>
>io:
>  client:   167 MiB/s rd, 13 MiB/s wr, 6.02k op/s rd, 2.35k op/s wr
>
>
> ceph health detail (only removed information about no running scrub and
> deep_scrub)
> ---
> HEALTH_WARN Degraded data redundancy: 2/6294902067 objects degraded
> (0.000%), 1 pg degraded
> [WRN] PG_DEGRADED: Degraded data redundancy: 2/6294902067 objects
> degraded (0.000%), 1 pg degraded
>  pg 404.6e7 is active+recovering+degraded+remapped, acting
> [223,274,243,290,286,283]
>
>
> ceph pg 202.6e7 list_unfound
> ---
> {
>  "num_missing": 2,
>  "num_unfound": 0,
>  "objects": [],
>  "state": "Active",
>  "available_might_have_unfound": true,
>  "might_have_unfound": [],
>  "more": false
> }
>
> ceph pg 404.6e7 query | jq .recovery_state
> ---
> [
>{
>  "name": "Started/Primary/Active",
>  "enter_time": "2024-01-26T09:08:41.918637+",
>  "might_have_unfound": [
>{
>  "osd": "243(2)",
>  "status": "already probed"
>},
>{
>  "osd": "274(1)",
>  "status": "already probed"
>},
>{
>  "osd": "275(0)",
>  "status": "already probed"
>},
>{
>  "osd": "283(5)",
>  "status": "already probed"
>},
>{
>  "osd": "286(4)",
>  "status": "already probed"
>},
>{
>  "osd": "290(3)",
>  "status": "already probed"
>},
>{
>  "osd": "335(3)",
>  "status": "already probed"
>}
>  ],
>  "recovery_progress": {
>"backfill_targets": [
>  "275(0)",
>  "335(3)"
>],
>"waiting_on_backfill": [],
>"last_backfill_started":
>
> 

[ceph-users] Re: Is there a way to find out which client uses which version of ceph?

2023-12-21 Thread Wesley Dillingham
You can ask the monitor to dump its sessions (which should expose the IPs
and the release / features) you can then track down by IP those with the
undesirable features/release

ceph daemon mon.`hostname -s` sessions

Assuming your mon is named after the short hostname, you may need to do
this for every mon.  Alternatively using the `ceph tell mon.* sessions` to
hit every mon at once.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Dec 21, 2023 at 10:46 AM Anthony D'Atri 
wrote:

> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$ ceph features
> {
> "mon": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 3
> }
> ],
> "osd": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 600
> }
> ],
> "client": [
> {
> "features": "0x2f018fb87aa4aafe",
> "release": "luminous",
> "num": 41
> },
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 147
> }
> ],
> "mgr": [
> {
> "features": "0x3f01cfbf7ffd",
> "release": "luminous",
> "num": 2
> }
> ]
> }
> [rook@rook-ceph-tools-5ff8d58445-gkl5w .aws]$
>
> IIRC there are nuances, there are case where a client can *look* like
> Jewel but actually be okay.
>
>
> > On Dec 21, 2023, at 10:41, Simon Oosthoek 
> wrote:
> >
> > Hi,
> >
> > Our cluster is currently running quincy, and I want to set the minimal
> > client version to luminous, to enable upmap balancer, but when I tried
> to,
> > I got this:
> >
> > # ceph osd set-require-min-compat-client luminous Error EPERM: cannot set
> > require_min_compat_client to luminous: 2 connected client(s) look like
> > jewel (missing 0x800); add --yes-i-really-mean-it to do it
> > anyway
> >
> > I think I know the most likely candidate (and I've asked them), but is
> > there a way to find out, the way ceph seems to know?
> >
> > tnx
> >
> > /Simon
> > --
> > I'm using my gmail.com address, because the gmail.com dmarc policy is
> > "none", some mail servers will reject this (microsoft?) others will
> instead
> > allow this when I send mail to a mailling list which has not yet been
> > configured to send mail "on behalf of" the sender, but rather do a kind
> of
> > "forward". The latter situation causes dkim/dmarc failures and the dmarc
> > policy will be applied. see https://wiki.list.org/DEV/DMARC for more
> details
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Logging control

2023-12-19 Thread Wesley Dillingham
"ceph daemon" commands need to be run local to the machine where the daemon
is running. So in this case if you arent on the node where osd.1 lives it
wouldnt work. "ceph tell" should work anywhere there is a client.admin key.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Dec 19, 2023 at 4:02 PM Tim Holloway  wrote:

> Ceph version is Pacific (16.2.14), upgraded from a sloppy Octopus.
>
> I ran afoul of all the best bugs in Octopus, and in the process
> switched on a lot of stuff better left alone, including some detailed
> debug logging. Now I can't turn it off.
>
> I am confidently informed by the documentation that the first step
> would be the command:
>
> ceph daemon osd.1 config show | less
>
> But instead of config information I get back:
>
> Can't get admin socket path: unable to get conf option admin_socket for
> osd: b"error parsing 'osd': expected string of the form TYPE.ID, valid
> types are: auth, mon, osd, mds, mgr, client\n"
>
> Which seems to be kind of insane.
>
> Attempting to get daemon config info on a monitor on that machine
> gives:
>
> admin_socket: exception getting command descriptions: [Errno 2] No such
> file or directory
>
> Which doesn't help either.
>
> Anyone got an idea?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best Practice for OSD Balancing

2023-11-28 Thread Wesley Dillingham
It's a complicated topic and there is no one answer, it varies for each
cluster and depends. You have a good lay of the land.

I just wanted to mention that the correct "foundation" for equally utilized
OSDs within a cluster relies on two important factors:

- Symmetry of disk/osd quantity and capacity (weight) between hosts.
- Achieving the correct amount of PGs-per-osd (typically between 100 and
200).

Without having reasonable settings/configurations for these two factors the
various higher-level balancing techniques wont work too well/at all.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Nov 28, 2023 at 3:27 PM Rich Freeman  wrote:

> I'm fairly new to Ceph and running Rook on a fairly small cluster
> (half a dozen nodes, about 15 OSDs).  I notice that OSD space use can
> vary quite a bit - upwards of 10-20%.
>
> In the documentation I see multiple ways of managing this, but no
> guidance on what the "correct" or best way to go about this is.  As
> far as I can tell there is the balancer, manual manipulation of upmaps
> via the command line tools, and OSD reweight.  The last two can be
> optimized with tools to calculate appropriate corrections.  There is
> also the new read/active upmap (at least for non-EC pools), which is
> manually triggered.
>
> The balancer alone is leaving fairly wide deviations in space use, and
> at times during recovery this can become more significant.  I've seen
> OSDs hit the 80% threshold and start impacting IO when the entire
> cluster is only 50-60% full during recovery.
>
> I've started using ceph osd reweight-by-utilization and that seems
> much more effective at balancing things, but this seems redundant with
> the balancer which I have turned on.
>
> What is generally considered the best practice for OSD balancing?
>
> --
> Rich
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs failing to start due to crc32 and osdmap error

2023-11-27 Thread Wesley Dillingham
So those options are not consistent with the error in the video I linked.

I am not entirely sure how to proceed with your OSDs (how many are
impacted?)

but you may want to try injecting an older osdmap epoch fetched from the
mon in your osdmap injection:

try rewinding 1 epoch at a time from the current and see if that gets them
to start.

Proceed with caution, I would test this as well.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, Nov 27, 2023 at 2:36 PM Denis Polom  wrote:

> it's:
>
> "bluestore_compression_algorithm": "snappy"
>
> "bluestore_compression_mode": "none"
>
>
> On 11/27/23 20:13, Wesley Dillingham wrote:
>
> How about these two options:
>
> bluestore_compression_algorithm
> bluestore_compression_mode
>
> Thanks.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Mon, Nov 27, 2023 at 2:01 PM Denis Polom  wrote:
>
>> Hi,
>>
>> no we don't:
>>
>> "bluestore_rocksdb_options":
>> "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824",
>> thx
>>
>> On 11/27/23 19:17, Wesley Dillingham wrote:
>>
>> Curious if you are using bluestore compression?
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Mon, Nov 27, 2023 at 10:09 AM Denis Polom 
>> wrote:
>>
>>> Hi
>>>
>>> we have issue to start some OSDs on one node on our Ceph Quincy 17.2.7
>>> cluster. Some OSDs on that node are running fine, but some failing to
>>> start.
>>>
>>> Looks like crc32 checksum error, and failing to get OSD map. I found a
>>> some discussions on that but nothing helped.
>>>
>>> I've also tried to insert current OSD map but that ends with error:
>>>
>>> # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool
>>> --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap
>>> osdmap (#-1:20684533:::osdmap.931991:0#) does not exist.
>>>
>>> Log is bellow
>>>
>>> Any ideas please?
>>>
>>> Thank you
>>>
>>>
>>>  From log file:
>>>
>>> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public
>>> interface
>>>
>>> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
>>> bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000
>>> checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device
>>> location [0x1~1000], logical extent 0x0~1000, object
>>> #-1:7b3f43c4:::osd_superblock:0#
>>>
>>> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load
>>> OSD map for epoch 927580, got 0 bytes
>>>
>>> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
>>> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
>>> 2023-11-27T16:01:51.443522+0100
>>> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>>> (stable)
>>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x14f) [0x561ad07d2624]
>>>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>>>   4: main()
>>>   5: __libc_start_main()
>>>   6: _start()
>>> *** Caught signal (Aborted) **
>>>   in thread 7f3f17aa13c0 thread_name:ceph-osd
>>> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
>>> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
>>> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
>>> 2023-11-27T16:01:51.443522+0100
>>> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>>>
>>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>>> (stable)
>>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0x14f) [0x561ad07d2624]
>>>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>>>   4: main()
>>>   5: __libc_start_main()
>>>   

[ceph-users] Re: OSDs failing to start due to crc32 and osdmap error

2023-11-27 Thread Wesley Dillingham
What I was getting at was to see if this was somehow related to the bug
described here https://www.youtube.com/watch?v=_4HUR00oCGo

It should not be given the version of ceph you are using but the CRC error
you are seeing is similar.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, Nov 27, 2023 at 2:19 PM Anthony D'Atri  wrote:

> The options Wes listed are for data, not RocksDB.
>
> > On Nov 27, 2023, at 1:59 PM, Denis Polom  wrote:
> >
> > Hi,
> >
> > no we don't:
> >
> > "bluestore_rocksdb_options":
> "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824",
> >
> > thx
> >
> > On 11/27/23 19:17, Wesley Dillingham wrote:
> >> Curious if you are using bluestore compression?
> >>
> >> Respectfully,
> >>
> >> *Wes Dillingham*
> >> w...@wesdillingham.com
> >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >>
> >>
> >> On Mon, Nov 27, 2023 at 10:09 AM Denis Polom 
> wrote:
> >>
> >>Hi
> >>
> >>we have issue to start some OSDs on one node on our Ceph Quincy
> >>17.2.7
> >>cluster. Some OSDs on that node are running fine, but some failing
> >>to start.
> >>
> >>Looks like crc32 checksum error, and failing to get OSD map. I
> >>found a
> >>some discussions on that but nothing helped.
> >>
> >>I've also tried to insert current OSD map but that ends with error:
> >>
> >># CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool
> >>--data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap
> >>osdmap (#-1:20684533:::osdmap.931991:0#) does not exist.
> >>
> >>Log is bellow
> >>
> >>Any ideas please?
> >>
> >>Thank you
> >>
> >>
> >> From log file:
> >>
> >>2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public
> >>interface
> >>
> >>2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
> >>bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000
> >>checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2,
> >>device
> >>location [0x1~1000], logical extent 0x0~1000, object
> >>#-1:7b3f43c4:::osd_superblock:0#
> >>
> >>2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load
> >>OSD map for epoch 927580, got 0 bytes
> >>
> >>/build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
> >>OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
> >>2023-11-27T16:01:51.443522+0100
> >>/build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
> >>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
> >>quincy
> >>(stable)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>const*)+0x14f) [0x561ad07d2624]
> >>  2: ceph-osd(+0xc2e836) [0x561ad07d2836]
> >>  3: (OSD::init()+0x4026) [0x561ad08e5a86]
> >>  4: main()
> >>  5: __libc_start_main()
> >>  6: _start()
> >>*** Caught signal (Aborted) **
> >>  in thread 7f3f17aa13c0 thread_name:ceph-osd
> >>2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
> >>/build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
> >>OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
> >>2023-11-27T16:01:51.443522+0100
> >>/build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
> >>
> >>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
> >>quincy
> >>(stable)
> >>  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >>const*)+0x14f) [0x561ad07d2624]
> >>  2: ceph-osd(+0xc2e836) [0x561ad07d2836]
> >>  3: (OSD::init()+0x4026) [0x561ad08e5a86]
> >>  4: main()
> >>  5: __libc_start_main()
> >>  6: _start()
> >>
> >>
> >>  ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2)
> >>quincy
> >>(stable)
> >>  1: /lib/x86_64-linux-gnu/libpthr

[ceph-users] Re: OSDs failing to start due to crc32 and osdmap error

2023-11-27 Thread Wesley Dillingham
How about these two options:

bluestore_compression_algorithm
bluestore_compression_mode

Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, Nov 27, 2023 at 2:01 PM Denis Polom  wrote:

> Hi,
>
> no we don't:
>
> "bluestore_rocksdb_options":
> "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2,max_total_wal_size=1073741824",
> thx
>
> On 11/27/23 19:17, Wesley Dillingham wrote:
>
> Curious if you are using bluestore compression?
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Mon, Nov 27, 2023 at 10:09 AM Denis Polom  wrote:
>
>> Hi
>>
>> we have issue to start some OSDs on one node on our Ceph Quincy 17.2.7
>> cluster. Some OSDs on that node are running fine, but some failing to
>> start.
>>
>> Looks like crc32 checksum error, and failing to get OSD map. I found a
>> some discussions on that but nothing helped.
>>
>> I've also tried to insert current OSD map but that ends with error:
>>
>> # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool
>> --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap
>> osdmap (#-1:20684533:::osdmap.931991:0#) does not exist.
>>
>> Log is bellow
>>
>> Any ideas please?
>>
>> Thank you
>>
>>
>>  From log file:
>>
>> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public
>> interface
>>
>> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
>> bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000
>> checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device
>> location [0x1~1000], logical extent 0x0~1000, object
>> #-1:7b3f43c4:::osd_superblock:0#
>>
>> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load
>> OSD map for epoch 927580, got 0 bytes
>>
>> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
>> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
>> 2023-11-27T16:01:51.443522+0100
>> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>> (stable)
>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x14f) [0x561ad07d2624]
>>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>>   4: main()
>>   5: __libc_start_main()
>>   6: _start()
>> *** Caught signal (Aborted) **
>>   in thread 7f3f17aa13c0 thread_name:ceph-osd
>> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
>> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
>> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
>> 2023-11-27T16:01:51.443522+0100
>> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>>
>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>> (stable)
>>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x14f) [0x561ad07d2624]
>>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>>   4: main()
>>   5: __libc_start_main()
>>   6: _start()
>>
>>
>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>> (stable)
>>   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420]
>>   2: gsignal()
>>   3: abort()
>>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1b7) [0x561ad07d268c]
>>   5: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>   6: (OSD::init()+0x4026) [0x561ad08e5a86]
>>   7: main()
>>   8: __libc_start_main()
>>   9: _start()
>> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal (Aborted)
>> **
>>   in thread 7f3f17aa13c0 thread_name:ceph-osd
>>
>>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
>> (stable)
>>   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420]
>>   2: gsignal()
>>   3: abort()
>>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>> const*)+0x1b7) [0x561ad07d268c]
>>   5: ceph-osd(+0xc2e836) [0x561ad07d2836]
>>   6: (OSD::init()+0x4026) [0x561ad08e5a86]
>>   7: main()
>>   8: __libc_start_main()
&g

[ceph-users] Re: About number of osd node can be failed with erasure code 3+2

2023-11-27 Thread Wesley Dillingham
With a k+m which is 3+2 each RADOS object is broken into 5 shards. By
default the pool will have a min_size of k+1 (4 in this case). Which means
you can lose 1 shard and still be >= min_size. If one host goes down and
you use a host-based failure domain (default) you will lose 1 shard out of
all PGs on that host. You will now be at min_size and so still
readable/writeable. If you lose another host you will now be below min_size
with 3 healthy shards for some subset of PG (those common to the 2 hosts)
will be inactive and therefore not read/writeable. As you can see, the
higher your M the more disks/hosts you can lose before dropping below
min_size.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Nov 27, 2023 at 1:36 PM  wrote:

> Hi Groups,
>
> Recently I was setting up a ceph cluster with 10 nodes 144 osd, and I use
> S3 for it with pool erasure code EC3+2 on it.
>
> I have a question, how many osd nodes can fail with erasure code 3+2 with
> cluster working normal (read, write)? and can i choose better erasure code
> ec7+3, 8+2 etc..?
>
> With the erasure code algorithm, it only ensures no data loss, but does
> not guarantee that the cluster operates normally and does not block IO when
> osd nodes down. Is that right?
>
> Thanks to the community.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs failing to start due to crc32 and osdmap error

2023-11-27 Thread Wesley Dillingham
Curious if you are using bluestore compression?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Nov 27, 2023 at 10:09 AM Denis Polom  wrote:

> Hi
>
> we have issue to start some OSDs on one node on our Ceph Quincy 17.2.7
> cluster. Some OSDs on that node are running fine, but some failing to
> start.
>
> Looks like crc32 checksum error, and failing to get OSD map. I found a
> some discussions on that but nothing helped.
>
> I've also tried to insert current OSD map but that ends with error:
>
> # CEPH_ARGS="--bluestore-ignore-data-csum" ceph-objectstore-tool
> --data-path /var/lib/ceph/osd/ceph-888/ --op set-osdmap --file osdmap
> osdmap (#-1:20684533:::osdmap.931991:0#) does not exist.
>
> Log is bellow
>
> Any ideas please?
>
> Thank you
>
>
>  From log file:
>
> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to public
> interface
>
> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
> bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device
> location [0x1~1000], logical extent 0x0~1000, object
> #-1:7b3f43c4:::osd_superblock:0#
>
> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed to load
> OSD map for epoch 927580, got 0 bytes
>
> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
> 2023-11-27T16:01:51.443522+0100
> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14f) [0x561ad07d2624]
>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>   4: main()
>   5: __libc_start_main()
>   6: _start()
> *** Caught signal (Aborted) **
>   in thread 7f3f17aa13c0 thread_name:ceph-osd
> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
> 2023-11-27T16:01:51.443522+0100
> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14f) [0x561ad07d2624]
>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>   4: main()
>   5: __libc_start_main()
>   6: _start()
>
>
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420]
>   2: gsignal()
>   3: abort()
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1b7) [0x561ad07d268c]
>   5: ceph-osd(+0xc2e836) [0x561ad07d2836]
>   6: (OSD::init()+0x4026) [0x561ad08e5a86]
>   7: main()
>   8: __libc_start_main()
>   9: _start()
> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal (Aborted) **
>   in thread 7f3f17aa13c0 thread_name:ceph-osd
>
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>   1: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14420) [0x7f3f1814b420]
>   2: gsignal()
>   3: abort()
>   4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x1b7) [0x561ad07d268c]
>   5: ceph-osd(+0xc2e836) [0x561ad07d2836]
>   6: (OSD::init()+0x4026) [0x561ad08e5a86]
>   7: main()
>   8: __libc_start_main()
>   9: _start()
>   NOTE: a copy of the executable, or `objdump -rdS ` is
> needed to interpret this.
>
>
>-558> 2023-11-27T16:01:47.691+0100 7f3f17aa13c0 -1 Falling back to
> public interface
>
>  -5> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1
> bluestore(/var/lib/ceph/osd/ceph-888) _verify_csum bad crc32c/0x1000
> checksum at blob offset 0x0, got 0xb1701b42, expected 0x9ee5ece2, device
> location [0x1~1000], logical extent 0x0~1000, object
> #-1:7b3f43c4:::osd_superblock:0#
>
>  -2> 2023-11-27T16:01:51.439+0100 7f3f17aa13c0 -1 osd.888 0 failed
> to load OSD map for epoch 927580, got 0 bytes
>
>  -1> 2023-11-27T16:01:51.443+0100 7f3f17aa13c0 -1
> /build/ceph-17.2.7/src/osd/OSD.h: In function 'OSDMapRef
> OSDService::get_map(epoch_t)' thread 7f3f17aa13c0 time
> 2023-11-27T16:01:51.443522+0100
> /build/ceph-17.2.7/src/osd/OSD.h: 696: FAILED ceph_assert(ret)
>
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy
> (stable)
>   1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14f) [0x561ad07d2624]
>   2: ceph-osd(+0xc2e836) [0x561ad07d2836]
>   3: (OSD::init()+0x4026) [0x561ad08e5a86]
>   4: main()
>   5: __libc_start_main()
>   6: _start()
>
>
>   0> 2023-11-27T16:01:51.447+0100 7f3f17aa13c0 -1 *** Caught signal
> (Aborted) **
>   in thread 7f3f17aa13c0 thread_name:ceph-osd
>
>   ceph version 17.2.7 (b12291d110049b2f35e32e0de30d70e9a4c060d2) quincy

[ceph-users] Re: Why is min_size of erasure pools set to k+1

2023-11-20 Thread Wesley Dillingham
" if min_size is k and you lose an OSD during recovery after a failure of m
OSDs, data will become unavailable"

In that situation data wouldnt become unavailable it would be lost.

Having a min_size of k+1 provides a buffer between data being
active+writeable and where data is lost. That inbetween is called inactive.

By having that buffer you prevent the situation of having data being
written to the PG when you are only one disk/shard away from data loss.

Imagine the scenario of 4+2 with a min_size of 4. The cluster is 6 servers
filled with osds

You have brought 2 servers down for maintenance (not a good idea but this
is an example). Your PGs are all degraded with only 4 shards of clean data
but active because k=min_size. Data is being written to the pool.

As you are booting your 2 servers up out of maintenance an OSD/disk on
another server fails and fails hard. Because that OSD was part of the
acting set the cluster only wrote four shards and now one is lost.

You only have 3 shards of data in a 4+2 and now some subset of data is lost.

Now imagine a 4+2 with min_size = 5.

You wouldnt bring down more than 1 host because "ceph osd ok-to-stop" would
return false if your tried to bring down more than 1 host for maintenance.

Lets say you did bring down two hosts against the advice of the ok-to-stop
command your PGs would become inactive and so they wouldn't accept
writes. Once you boot your 2 servers back the cluster heals.

Lets say you heed the advice of ok-to-stop and only bring 1 host down for
maintenance at a time.  Your data is degraded with 5/6 shards healthy. New
data is being written with 5 shards able to be written out.

As you are booting your server out of maintenance an OSD on another host
dies and those shards are lost forever, The PGs from that lost OSD now have
4  healthy shards. That is enough shards to recover the data from (though
you would have some PGs inactive for a bit until recovery finished)

Hope this helps to answer the min_size question a bit.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Nov 20, 2023 at 2:03 PM Vladimir Brik <
vladimir.b...@icecube.wisc.edu> wrote:

> Could someone help me understand why it's a bad idea to set min_size of
> erasure-coded pools to k?
>
> From what I've read, the argument for k+1 is that if min_size is k and you
> lose an OSD during recovery after a failure of m OSDs, data will become
> unavailable. But how does setting min_size to k+1 help? If m=2, if you
> experience a double failure followed by another failure during recovery you
> still lost 3 OSDs and therefore your data because the pool wasn't set up to
> handle 3 concurrent failures, and the value of min_size is irrelevant.
>
> https://github.com/ceph/ceph/pull/8008 mentions inability to peer if
> min_size = k, but I don't understand why. Does that mean that if min_size=k
> and I lose m OSDs, and then an OSD is restarted during recovery, PGs will
> not peer even after the restarted OSD comes back online?
>
>
> Vlad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: blustore osd nearfull but no pgs on it

2023-11-20 Thread Wesley Dillingham
The large amount of osdmaps is what i was suspecting. "ceph tell osd.158
status" (or any osd other than 158) would show us how many osdmaps the osds
are currently holding on to.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Nov 20, 2023 at 6:15 AM Debian  wrote:

> Hi,
>
> yes all of my small osds are affected
>
> i found the issue, my cluster is healthy and my rebalance finished - i
> have only to wait that my old osdmaps get cleaned up.
>
> like in the thread "Disks are filling up even if there is not a single
> placement group on them"
>
> thx!
>
> On 20.11.23 11:36, Eugen Block wrote:
> > You provide only a few details at a time, it would help to get a full
> > picture if you provided the output Wesley asked for (ceph df detail,
> > ceph tell osd.158 status, ceph osd df tree). Is osd.149 now the
> > problematic one or did you just add output from a different osd?
> > It's not really clear what you're doing without the necessary context.
> > You can just add the 'ceph daemon osd.{OSD} perf dump' output here or
> > in some pastebin.
> >
> > Zitat von Debian :
> >
> >> Hi,
> >>
> >> the block.db size ist default and not custom configured:
> >>
> >> current:
> >>
> >> bluefs.db_used_bytes: 9602859008
> >> bluefs.db_used_bytes: 469434368
> >>
> >> ceph daemon osd.149 config show
> >>
> >> "bluestore_bitmapallocator_span_size": "1024",
> >> "bluestore_block_db_size": "0",
> >> "bluestore_block_size": "107374182400",
> >> "bluestore_block_wal_size": "100663296",
> >> "bluestore_cache_size": "0",
> >> "bluestore_cache_size_hdd": "1073741824",
> >> "bluestore_cache_size_ssd": "3221225472",
> >> "bluestore_compression_max_blob_size": "0",
> >> "bluestore_compression_max_blob_size_hdd": "524288",
> >> "bluestore_compression_max_blob_size_ssd": "65536",
> >> "bluestore_compression_min_blob_size": "0",
> >> "bluestore_compression_min_blob_size_hdd": "131072",
> >> "bluestore_compression_min_blob_size_ssd": "8192",
> >> "bluestore_extent_map_inline_shard_prealloc_size": "256",
> >> "bluestore_extent_map_shard_max_size": "1200",
> >> "bluestore_extent_map_shard_min_size": "150",
> >> "bluestore_extent_map_shard_target_size": "500",
> >> "bluestore_extent_map_shard_target_size_slop": "0.20",
> >> "bluestore_max_alloc_size": "0",
> >> "bluestore_max_blob_size": "0",
> >> "bluestore_max_blob_size_hdd": "524288",
> >> "bluestore_max_blob_size_ssd": "65536",
> >> "bluestore_min_alloc_size": "0",
> >> "bluestore_min_alloc_size_hdd": "65536",
> >> "bluestore_min_alloc_size_ssd": "4096",
> >> "bluestore_prefer_deferred_size": "0",
> >> "bluestore_prefer_deferred_size_hdd": "32768",
> >> "bluestore_prefer_deferred_size_ssd": "0",
> >> "bluestore_rocksdb_options":
> >>
> "compression=kNoCompression,max_write_buffer_number=4,min_write_buffer_number_to_merge=1,recycle_log_file_num=4,write_buffer_size=268435456,writable_file_max_buffer_size=0,compaction_readahead_size=2097152,max_background_compactions=2",
> >>
> >> "bluefs_alloc_size": "1048576",
> >> "bluefs_allocator": "hybrid",
> >> "bluefs_buffered_io": "false",
> >> "bluefs_check_for_zeros": "false",
> >> "bluefs_compact_log_sync": "false",
> >> "bluefs_log_compact_min_ratio": "5.00",
> >> "bluefs_log_compact_min_size": "16777216",
> >> "bluefs_max_log_runway": "4194304",
> >> "bluefs_max_prefetch": "1048576",
> >> "bluefs_min_flush_size": "524288",
> >> "bluefs_min_log_runway": "1048576",
> >> "bluefs_preextend_wal_files": "false",
> >> "bluefs_replay_recovery": "false",
> >> "bluefs_replay_recovery_disable_compact": "false",
> >> "bluefs_shared_alloc_size": "65536",
> >> "bluefs_sync_write": "false",
> >>
> >> which the osd performance counter i cannot determine who is using the
> >> memory,...
> >>
> >> thx & best regards
> >>
> >>
> >> On 18.11.23 09:05, Eugen Block wrote:
> >>> Do you have a large block.db size defined in the ceph.conf (or
> >>> config store)?
> >>>
> >>> Zitat von Debian :
> >>>
>  thx for your reply, it shows nothing,... there are no pgs on the
>  osd,...
> 
>  best regards
> 
>  On 17.11.23 23:09, Eugen Block wrote:
> > After you create the OSD, run ‚ceph pg ls-by-osd {OSD}‘, it should
> > show you which PGs are created there and then you’ll know which
> > pool they belong to, then check again the crush rule for that
> > pool. You can paste the outputs here.
> >
> > Zitat von Debian :
> >
> >> Hi,
> >>
> >> after a massive rebalance(tunables) my small SSD-OSDs are getting
> >> full, i changed my crush rules so there are actual no pgs/pools
> >> on it, but the disks stay full:
> >>
> >> ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6)
> >> nautilus (stable)
> >>
> >> ID CLASS WEIGHT 

[ceph-users] Re: blustore osd nearfull but no pgs on it

2023-11-17 Thread Wesley Dillingham
Please send along a pastebin of "ceph status" and "ceph osd df tree" and
"ceph df detail" also "ceph tell osd.158 status"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Nov 17, 2023 at 6:20 PM Debian  wrote:

> thx for your reply, it shows nothing,... there are no pgs on the osd,...
>
> best regards
>
> On 17.11.23 23:09, Eugen Block wrote:
> > After you create the OSD, run ‚ceph pg ls-by-osd {OSD}‘, it should
> > show you which PGs are created there and then you’ll know which pool
> > they belong to, then check again the crush rule for that pool. You can
> > paste the outputs here.
> >
> > Zitat von Debian :
> >
> >> Hi,
> >>
> >> after a massive rebalance(tunables) my small SSD-OSDs are getting
> >> full, i changed my crush rules so there are actual no pgs/pools on
> >> it, but the disks stay full:
> >>
> >> ceph version 14.2.21 (5ef401921d7a88aea18ec7558f7f9374ebd8f5a6)
> >> nautilus (stable)
> >>
> >> ID CLASS WEIGHT REWEIGHT SIZERAW USE DATAOMAP
> >> META AVAIL%USE  VAR  PGS STATUS TYPE NAME
> >> 158   ssd0.21999  1.0 224 GiB 194 GiB 193 GiB  22 MiB 1002
> >> MiB   30 GiB 86.68 1.49   0 up osd.158
> >>
> >> inferring bluefs devices from bluestore path
> >> 1 : device size 0x37e440 : own 0x[1ad3f0~23c60] =
> >> 0x23c60 : using 0x3963(918 MiB) : bluestore has
> >> 0x46e2d(18 GiB) available
> >>
> >> when i recreate the osd the osd gets full again
> >>
> >> any suggestion?
> >>
> >> thx & best regards
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: per-rbd snapshot limitation

2023-11-15 Thread Wesley Dillingham
Are you aware of any config item that can be set (perhaps in the ceph.conf
or config db) so the limit is enforced immediately at creation time without
needing to set it for each rbd?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Nov 15, 2023 at 1:14 PM David C.  wrote:

> rbd create testpool/test3 --size=100M
> rbd snap limit set testpool/test3 --limit 3
>
>
> Le mer. 15 nov. 2023 à 17:58, Wesley Dillingham  a
> écrit :
>
>> looking into how to limit snapshots at the ceph level for RBD snapshots.
>> Ideally ceph would enforce an arbitrary number of snapshots allowable per
>> rbd.
>>
>> Reading the man page for rbd command I see this option:
>> https://docs.ceph.com/en/quincy/man/8/rbd/#cmdoption-rbd-limit
>>
>> --limit
>>
>> Specifies the limit for the number of snapshots permitted.
>>
>> Seems perfect. But on attempting to use it as such I get an error:
>>
>> admin@rbdtest:~$ rbd create testpool/test3 --size=100M --limit=3
>> rbd: unrecognised option '--limit=3'
>>
>> Where am I going wrong here? Is there another way to enforce a limit of
>> snapshots for RBD? Thanks.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: per-rbd snapshot limitation

2023-11-15 Thread Wesley Dillingham
Perfect, thank you.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Nov 15, 2023 at 1:00 PM Ilya Dryomov  wrote:

> On Wed, Nov 15, 2023 at 5:57 PM Wesley Dillingham 
> wrote:
> >
> > looking into how to limit snapshots at the ceph level for RBD snapshots.
> > Ideally ceph would enforce an arbitrary number of snapshots allowable per
> > rbd.
> >
> > Reading the man page for rbd command I see this option:
> > https://docs.ceph.com/en/quincy/man/8/rbd/#cmdoption-rbd-limit
> >
> > --limit
> >
> > Specifies the limit for the number of snapshots permitted.
> >
> > Seems perfect. But on attempting to use it as such I get an error:
> >
> > admin@rbdtest:~$ rbd create testpool/test3 --size=100M --limit=3
> > rbd: unrecognised option '--limit=3'
> >
> > Where am I going wrong here? Is there another way to enforce a limit of
> > snapshots for RBD? Thanks.
>
> Hi Wes,
>
> I think you want "rbd snap limit set --limit 3 testpool/test3".
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] per-rbd snapshot limitation

2023-11-15 Thread Wesley Dillingham
looking into how to limit snapshots at the ceph level for RBD snapshots.
Ideally ceph would enforce an arbitrary number of snapshots allowable per
rbd.

Reading the man page for rbd command I see this option:
https://docs.ceph.com/en/quincy/man/8/rbd/#cmdoption-rbd-limit

--limit

Specifies the limit for the number of snapshots permitted.

Seems perfect. But on attempting to use it as such I get an error:

admin@rbdtest:~$ rbd create testpool/test3 --size=100M --limit=3
rbd: unrecognised option '--limit=3'

Where am I going wrong here? Is there another way to enforce a limit of
snapshots for RBD? Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-08 Thread Wesley Dillingham
Jaynath:

Just to be clear with the "--admin" user's key's you have attempted to
delete the bucket policy using the following method:
https://docs.aws.amazon.com/cli/latest/reference/s3api/delete-bucket-policy.html

This is what worked for me (on a 16.2.14 cluster). I didn't attempt to
interact with the affected bucket in any way other than "aws s3api
delete-bucket-policy"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Nov 8, 2023 at 8:30 AM Jayanth Reddy 
wrote:

> Hello Casey,
>
> We're totally stuck at this point and none of the options seem to work.
> Please let us know if there is something in metadata or index to remove
> those applied bucket policies. We downgraded to v17.2.6 and encountering
> the same.
>
> Regards,
> Jayanth
>
> On Wed, Nov 8, 2023 at 7:14 AM Jayanth Reddy 
> wrote:
>
>> Hello Casey,
>>
>> And on further inspection, we identified that there were bucket policies
>> set from the initial days; we were in v16.2.12.
>> We upgraded the cluster to v17.2.7 two days ago and it seems obvious that
>> the IAM error logs are generated the next minute rgw daemon upgraded from
>> v16.2.12 to v17.2.7. Looks like there is some issue with parsing.
>>
>> I'm thinking to downgrade back to v17.2.6 and earlier, please let me know
>> if this is a good option for now.
>>
>> Thanks,
>> Jayanth
>> --
>> *From:* Jayanth Reddy 
>> *Sent:* Tuesday, November 7, 2023 11:59:38 PM
>> *To:* Casey Bodley 
>> *Cc:* Wesley Dillingham ; ceph-users <
>> ceph-users@ceph.io>; Adam Emerson 
>> *Subject:* Re: [ceph-users] Re: owner locked out of bucket via bucket
>> policy
>>
>> Hello Casey,
>>
>> Thank you for the quick response. I see
>> `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please
>> let me know.
>>
>> Regards
>> Jayanth
>>
>> On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley  wrote:
>>
>> On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy
>>  wrote:
>> >
>> > Hello Wesley and Casey,
>> >
>> > We've ended up with the same issue and here it appears that even the
>> user with "--admin" isn't able to do anything. We're now unable to figure
>> out if it is due to bucket policies, ACLs or IAM of some sort. I'm seeing
>> these IAM errors in the logs
>> >
>> > ```
>> >
>> > Nov  7 00:02:00 ceph-05 radosgw[4054570]: req 8786689665323103851
>> 0.00368s s3:get_obj Error reading IAM Policy: Terminate parsing due to
>> Handler error.
>> >
>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s s3:list_bucket Error reading IAM Policy: Terminate parsing due
>> to Handler error.
>>
>> it's failing to parse the bucket policy document, but the error
>> message doesn't say what's wrong with it
>>
>> disabling rgw_policy_reject_invalid_principals might help if it's
>> failing on the Principal
>>
>> > Nov  7 22:51:40 ceph-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s s3:list_bucket init_permissions on
>> :window-dev[1d0fa0b4-04eb-48f9-889b-a60de865ccd8.24143.10]) failed, ret=-13
>> > Nov  7 22:51:40 ceph-feed-05 radosgw[4054570]: req 13293029267332025583
>> 0.0s op->ERRORHANDLER: err_no=-13 new_err_no=-13
>> >
>> > ```
>> >
>> > Please help what's wrong here. We're in Ceph v17.2.7.
>> >
>> > Regards,
>> > Jayanth
>> >
>> > On Thu, Oct 26, 2023 at 7:14 PM Wesley Dillingham <
>> w...@wesdillingham.com> wrote:
>> >>
>> >> Thank you, this has worked to remove the policy.
>> >>
>> >> Respectfully,
>> >>
>> >> *Wes Dillingham*
>> >> w...@wesdillingham.com
>> >> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> >>
>> >>
>> >> On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley 
>> wrote:
>> >>
>> >> > On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham <
>> w...@wesdillingham.com>
>> >> > wrote:
>> >> > >
>> >> > > Thank you, I am not sure (inherited cluster). I presume such an
>> admin
>> >> > user created after-the-fact would work?
>> >> >
>> >> > yes
>> >> >
>> >> > > Is there a good way to discover an admin user other than iterate
>> over
>> >> >

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-26 Thread Wesley Dillingham
Thank you, this has worked to remove the policy.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Oct 25, 2023 at 5:10 PM Casey Bodley  wrote:

> On Wed, Oct 25, 2023 at 4:59 PM Wesley Dillingham 
> wrote:
> >
> > Thank you, I am not sure (inherited cluster). I presume such an admin
> user created after-the-fact would work?
>
> yes
>
> > Is there a good way to discover an admin user other than iterate over
> all users and retrieve user information? (I presume radosgw-admin user info
> --uid=" would illustrate such administrative access?
>
> not sure there's an easy way to search existing users, but you could
> create a temporary admin user for this repair
>
> >
> > Respectfully,
> >
> > Wes Dillingham
> > w...@wesdillingham.com
> > LinkedIn
> >
> >
> > On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:
> >>
> >> if you have an administrative user (created with --admin), you should
> >> be able to use its credentials with awscli to delete or overwrite this
> >> bucket policy
> >>
> >> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham <
> w...@wesdillingham.com> wrote:
> >> >
> >> > I have a bucket which got injected with bucket policy which locks the
> >> > bucket even to the bucket owner. The bucket now cannot be accessed
> (even
> >> > get its info or delete bucket policy does not work) I have looked in
> the
> >> > radosgw-admin command for a way to delete a bucket policy but do not
> see
> >> > anything. I presume I will need to somehow remove the bucket policy
> from
> >> > however it is stored in the bucket metadata / omap etc. If anyone can
> point
> >> > me in the right direction on that I would appreciate it. Thanks
> >> >
> >> > Respectfully,
> >> >
> >> > *Wes Dillingham*
> >> > w...@wesdillingham.com
> >> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >> >
> >>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: owner locked out of bucket via bucket policy

2023-10-25 Thread Wesley Dillingham
Thank you, I am not sure (inherited cluster). I presume such an admin user
created after-the-fact would work? Is there a good way to discover an admin
user other than iterate over all users and retrieve user information? (I
presume radosgw-admin user info --uid=" would illustrate such
administrative access?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Oct 25, 2023 at 4:41 PM Casey Bodley  wrote:

> if you have an administrative user (created with --admin), you should
> be able to use its credentials with awscli to delete or overwrite this
> bucket policy
>
> On Wed, Oct 25, 2023 at 4:11 PM Wesley Dillingham 
> wrote:
> >
> > I have a bucket which got injected with bucket policy which locks the
> > bucket even to the bucket owner. The bucket now cannot be accessed (even
> > get its info or delete bucket policy does not work) I have looked in the
> > radosgw-admin command for a way to delete a bucket policy but do not see
> > anything. I presume I will need to somehow remove the bucket policy from
> > however it is stored in the bucket metadata / omap etc. If anyone can
> point
> > me in the right direction on that I would appreciate it. Thanks
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > w...@wesdillingham.com
> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] owner locked out of bucket via bucket policy

2023-10-25 Thread Wesley Dillingham
I have a bucket which got injected with bucket policy which locks the
bucket even to the bucket owner. The bucket now cannot be accessed (even
get its info or delete bucket policy does not work) I have looked in the
radosgw-admin command for a way to delete a bucket policy but do not see
anything. I presume I will need to somehow remove the bucket policy from
however it is stored in the bucket metadata / omap etc. If anyone can point
me in the right direction on that I would appreciate it. Thanks

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How do you handle large Ceph object storage cluster?

2023-10-17 Thread Wesley Dillingham
Well you are probably in the top 1% of cluster size. I would guess that
trying to cut your existing cluster in half while not encountering any
downtime as you shuffle existing buckets between old cluster and new
cluster would be harder than redirecting all new buckets (or users) to a
second cluster. Obviously you will need to account for each cluster having
a single bucket namespace when attempting to redirect requests to a cluster
of clusters. Lots of ways to skin this cat and it would be a large and
complicated architectural undertaking.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Oct 16, 2023 at 10:53 AM  wrote:

> Hi Everyone,
>
> My company is dealing with quite large Ceph cluster (>10k OSDs, >60 PB of
> data). It is entirely dedicated to object storage with S3 interface.
> Maintenance and its extension are getting more and more problematic and
> time consuming. We consider to split it to two or more completely separate
> clusters (without replication of data among them) and create S3 layer of
> abstraction with some additional metadata that will allow us to use these
> 2+ physically independent instances as a one logical cluster. Additionally,
> newest data is the most demanded data, so we have to spread it equally
> among clusters to avoid skews in cluster load.
>
> Do you have any similar experience? How did you handle it? Maybe you have
> some advice? I'm not a Ceph expert. I'm just a Ceph's user and software
> developer who does not like to duplicate someone's job.
>
> Best,
> Paweł
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-11 Thread Wesley Dillingham
Just to be clear, you should remove the osd by stopping the daemon and
marking it out before you repair the PG. The pg may not be able to be
repaired until you remove the bad disk.

1 - identify the bad disk (via scrubs or SMART/dmesg inspection)
2 - stop daemon and mark it out
3 - wait for PG to finish backfill
4 - issue the pg repair

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Oct 11, 2023 at 4:38 PM Wesley Dillingham 
wrote:

> If I recall correctly When the acting or up_set of an PG changes the scrub
> information is lost. This was likely lost when you stopped osd.238 and
> changed the sets.
>
> I do not believe based on your initial post you need to be using the
> objectstore tool currently. Inconsistent PGs are a common occurrence and
> can be repaired.
>
> After your most recent post I would get osd.238 back in the cluster unless
> you have reason to believe it is the failing hardware. But it could be any
> of the osds in the following set (from your initial post)
> [238,106,402,266,374,498,590,627,684,73,66]
>
> You should inspect the SMART data and dmesg on the drives and servers
> supporting the above OSDs to determine which one is failing.
>
> After you get the PG back to active+clean+inconsistent (get osd.238 back
> in and it finishes its backfill) you can re-issue a manual deep-scrub of it
> and once that deep-scrub finishes the rados list-inconsistent-obj 15.f4f
> should return and implicate a single osd with errors.
>
> Finally you should issue the PG repair again.
>
> In order to get your manually issued scrubs and repairs to start sooner
> you may want to set the noscrub and nodeep-scrub flags until you can get
> your PG repaired.
>
> As an aside osd_max_scrubs of 9 is too aggressive IMO I would drop that
> back to 3, max
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Wed, Oct 11, 2023 at 10:51 AM Siddhit Renake 
> wrote:
>
>> Hello Wes,
>>
>> Thank you for your response.
>>
>> brc1admin:~ # rados list-inconsistent-obj 15.f4f
>> No scrub information available for pg 15.f4f
>>
>> brc1admin:~ # ceph osd ok-to-stop osd.238
>> OSD(s) 238 are ok to stop without reducing availability or risking data,
>> provided there are no other concurrent failures or interventions.
>> 341 PGs are likely to be degraded (but remain available) as a result.
>>
>> Before I proceed with your suggested action plan, needed clarification on
>> below.
>> In order to list all objects residing on the inconsistent PG, we had
>> stopped the primary osd (osd.238) and extracted the list of all objects
>> residing on this osd using ceph-objectstore tool. We notice that that when
>> we stop the osd (osd.238) using systemctl, RGW gateways continuously
>> restarts which is impacting our S3 service availability. This was observed
>> twice when we stopped osd.238 for general maintenance activity w.r.t
>> ceph-objectstore tool. How can we ensure that stopping and marking out
>> osd.238 ( primary osd of inconsistent pg) does not impact RGW service
>> availability ?
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-11 Thread Wesley Dillingham
If I recall correctly When the acting or up_set of an PG changes the scrub
information is lost. This was likely lost when you stopped osd.238 and
changed the sets.

I do not believe based on your initial post you need to be using the
objectstore tool currently. Inconsistent PGs are a common occurrence and
can be repaired.

After your most recent post I would get osd.238 back in the cluster unless
you have reason to believe it is the failing hardware. But it could be any
of the osds in the following set (from your initial post)
[238,106,402,266,374,498,590,627,684,73,66]

You should inspect the SMART data and dmesg on the drives and servers
supporting the above OSDs to determine which one is failing.

After you get the PG back to active+clean+inconsistent (get osd.238 back in
and it finishes its backfill) you can re-issue a manual deep-scrub of it
and once that deep-scrub finishes the rados list-inconsistent-obj 15.f4f
should return and implicate a single osd with errors.

Finally you should issue the PG repair again.

In order to get your manually issued scrubs and repairs to start sooner you
may want to set the noscrub and nodeep-scrub flags until you can get your
PG repaired.

As an aside osd_max_scrubs of 9 is too aggressive IMO I would drop that
back to 3, max


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Oct 11, 2023 at 10:51 AM Siddhit Renake 
wrote:

> Hello Wes,
>
> Thank you for your response.
>
> brc1admin:~ # rados list-inconsistent-obj 15.f4f
> No scrub information available for pg 15.f4f
>
> brc1admin:~ # ceph osd ok-to-stop osd.238
> OSD(s) 238 are ok to stop without reducing availability or risking data,
> provided there are no other concurrent failures or interventions.
> 341 PGs are likely to be degraded (but remain available) as a result.
>
> Before I proceed with your suggested action plan, needed clarification on
> below.
> In order to list all objects residing on the inconsistent PG, we had
> stopped the primary osd (osd.238) and extracted the list of all objects
> residing on this osd using ceph-objectstore tool. We notice that that when
> we stop the osd (osd.238) using systemctl, RGW gateways continuously
> restarts which is impacting our S3 service availability. This was observed
> twice when we stopped osd.238 for general maintenance activity w.r.t
> ceph-objectstore tool. How can we ensure that stopping and marking out
> osd.238 ( primary osd of inconsistent pg) does not impact RGW service
> availability ?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-10 Thread Wesley Dillingham
In case it's not obvious I forgot a space: "rados list-inconsistent-obj
15.f4f"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Oct 10, 2023 at 4:55 PM Wesley Dillingham 
wrote:

> You likely have a failing disk, what does "rados
> list-inconsistent-obj15.f4f" return?
>
> It should identify the failing osd. Assuming "ceph osd ok-to-stop "
> returns in the affirmative for that osd, you likely need to stop the
> associated osd daemon, then mark it out "ceph osd out  wait for it
> to backfill the inconsistent PG and then re-issue the repair. Then turn to
> replacing the disk.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Oct 10, 2023 at 4:46 PM  wrote:
>
>> Hello All,
>> Greetings. We've a Ceph Cluster with the version
>> *ceph version 14.2.16-402-g7d47dbaf4d
>> (7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable)
>>
>>
>> ===
>>
>> Issues: 1 pg in inconsistent state and does not recover.
>>
>> # ceph -s
>>   cluster:
>> id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794
>> health: HEALTH_ERR
>> 2 large omap objects
>> 1 pools have many more objects per pg than average
>> 159224 scrub errors
>> Possible data damage: 1 pg inconsistent
>> 2 pgs not deep-scrubbed in time
>> 2 pgs not scrubbed in time
>>
>> # ceph health detail
>>
>> HEALTH_ERR 2 large omap objects; 1 pools have many more objects per pg
>> than average; 159224 scrub errors; Possible data damage: 1 pg inconsistent;
>> 2 pgs not deep-scrubbed in time; 2 pgs not scrubbed in time
>> LARGE_OMAP_OBJECTS 2 large omap objects
>> 2 large objects found in pool 'default.rgw.log'
>> Search the cluster log for 'Large omap object found' for more details.
>> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
>> pool iscsi-images objects per pg (541376) is more than 14.9829 times
>> cluster average (36133)
>> OSD_SCRUB_ERRORS 159224 scrub errors
>> PG_DAMAGED Possible data damage: 1 pg inconsistent
>> pg 15.f4f is active+clean+inconsistent, acting
>> [238,106,402,266,374,498,590,627,684,73,66]
>> PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
>> pg 1.5c not deep-scrubbed since 2021-04-05 23:20:13.714446
>> pg 1.55 not deep-scrubbed since 2021-04-11 07:12:37.185074
>> PG_NOT_SCRUBBED 2 pgs not scrubbed in time
>> pg 1.5c not scrubbed since 2023-07-10 21:15:50.352848
>> pg 1.55 not scrubbed since 2023-06-24 10:02:10.038311
>>
>> ==
>>
>>
>> We have implemented below command to resolve it
>>
>> 1. We have ran pg repair command "ceph pg repair 15.f4f
>> 2. We have restarted associated  OSDs that is mapped to pg 15.f4f
>> 3. We tuned osd_max_scrubs value and set it to 9.
>> 4. We have done scrub and deep scrub by ceph pg scrub 15.4f4 & ceph pg
>> deep-scrub 15.f4f
>> 5. We also tried to ceph-objectstore-tool command to fix it
>> ==
>>
>> We have checked the logs of the primary OSD of the respective
>> inconsistent PG and found the below errors.
>> [ERR] : 15.f4fs0 shard 402(2)
>> 15:f2f3fff4:::94a51ddb-a94f-47bc-9068-509e8c09af9a.7862003.20_c%2f4%2fd61%2f885%2f49627697%2f192_1.ts:head
>> : missing
>> /var/log/ceph/ceph-osd.238.log:339:2023-10-06 00:37:06.410 7f65024cb700
>> -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 266(3)
>> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
>> : missing
>> /var/log/ceph/ceph-osd.238.log:340:2023-10-06 00:37:06.410 7f65024cb700
>> -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 402(2)
>> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
>> : missing
>> /var/log/ceph/ceph-osd.238.log:341:2023-10-06 00:37:06.410 7f65024cb700
>> -1 log_channel(cluster) log [ERR] : 15.f4fs0 shard 590(6)
>> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
>> : missing
>> ===
>> and also we noticed that the no. of

[ceph-users] Re: Unable to fix 1 Inconsistent PG

2023-10-10 Thread Wesley Dillingham
You likely have a failing disk, what does "rados
list-inconsistent-obj15.f4f" return?

It should identify the failing osd. Assuming "ceph osd ok-to-stop "
returns in the affirmative for that osd, you likely need to stop the
associated osd daemon, then mark it out "ceph osd out  wait for it
to backfill the inconsistent PG and then re-issue the repair. Then turn to
replacing the disk.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Oct 10, 2023 at 4:46 PM  wrote:

> Hello All,
> Greetings. We've a Ceph Cluster with the version
> *ceph version 14.2.16-402-g7d47dbaf4d
> (7d47dbaf4d0960a2e910628360ae36def84ed913) nautilus (stable)
>
>
> ===
>
> Issues: 1 pg in inconsistent state and does not recover.
>
> # ceph -s
>   cluster:
> id: 30d6f7ee-fa02-4ab3-8a09-9321c8002794
> health: HEALTH_ERR
> 2 large omap objects
> 1 pools have many more objects per pg than average
> 159224 scrub errors
> Possible data damage: 1 pg inconsistent
> 2 pgs not deep-scrubbed in time
> 2 pgs not scrubbed in time
>
> # ceph health detail
>
> HEALTH_ERR 2 large omap objects; 1 pools have many more objects per pg
> than average; 159224 scrub errors; Possible data damage: 1 pg inconsistent;
> 2 pgs not deep-scrubbed in time; 2 pgs not scrubbed in time
> LARGE_OMAP_OBJECTS 2 large omap objects
> 2 large objects found in pool 'default.rgw.log'
> Search the cluster log for 'Large omap object found' for more details.
> MANY_OBJECTS_PER_PG 1 pools have many more objects per pg than average
> pool iscsi-images objects per pg (541376) is more than 14.9829 times
> cluster average (36133)
> OSD_SCRUB_ERRORS 159224 scrub errors
> PG_DAMAGED Possible data damage: 1 pg inconsistent
> pg 15.f4f is active+clean+inconsistent, acting
> [238,106,402,266,374,498,590,627,684,73,66]
> PG_NOT_DEEP_SCRUBBED 2 pgs not deep-scrubbed in time
> pg 1.5c not deep-scrubbed since 2021-04-05 23:20:13.714446
> pg 1.55 not deep-scrubbed since 2021-04-11 07:12:37.185074
> PG_NOT_SCRUBBED 2 pgs not scrubbed in time
> pg 1.5c not scrubbed since 2023-07-10 21:15:50.352848
> pg 1.55 not scrubbed since 2023-06-24 10:02:10.038311
>
> ==
>
>
> We have implemented below command to resolve it
>
> 1. We have ran pg repair command "ceph pg repair 15.f4f
> 2. We have restarted associated  OSDs that is mapped to pg 15.f4f
> 3. We tuned osd_max_scrubs value and set it to 9.
> 4. We have done scrub and deep scrub by ceph pg scrub 15.4f4 & ceph pg
> deep-scrub 15.f4f
> 5. We also tried to ceph-objectstore-tool command to fix it
> ==
>
> We have checked the logs of the primary OSD of the respective inconsistent
> PG and found the below errors.
> [ERR] : 15.f4fs0 shard 402(2)
> 15:f2f3fff4:::94a51ddb-a94f-47bc-9068-509e8c09af9a.7862003.20_c%2f4%2fd61%2f885%2f49627697%2f192_1.ts:head
> : missing
> /var/log/ceph/ceph-osd.238.log:339:2023-10-06 00:37:06.410 7f65024cb700 -1
> log_channel(cluster) log [ERR] : 15.f4fs0 shard 266(3)
> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
> : missing
> /var/log/ceph/ceph-osd.238.log:340:2023-10-06 00:37:06.410 7f65024cb700 -1
> log_channel(cluster) log [ERR] : 15.f4fs0 shard 402(2)
> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
> : missing
> /var/log/ceph/ceph-osd.238.log:341:2023-10-06 00:37:06.410 7f65024cb700 -1
> log_channel(cluster) log [ERR] : 15.f4fs0 shard 590(6)
> 15:f2f2:::94a51ddb-a94f-47bc-9068-509e8c09af9a.11432468.3_TN8QHE_04.20.2020_08.41%2fCV_MAGNETIC%2fV_274396%2fCHUNK_2440801%2fSFILE_CONTAINER_031.FOLDER%2f3:head
> : missing
> ===
> and also we noticed that the no. of scrub errors in ceph health status are
> matching with the ERR log entries in the primary OSD logs of the
> inconsistent PG as below
> grep -Hn 'ERR' /var/log/ceph/ceph-osd.238.log|wc -l
> 159226
> 
> Ceph is cleaning the scrub errors but rate of scrub repair is very slow
> (avg of 200 scrub errors per day) ,we want to increase the rate of scrub
> error repair to finish the cleanup of pending 159224 scrub errors.
>
> #ceph pg 15.f4f query
>
>
> {
> "state": "active+clean+inconsistent",
> "snap_trimq": "[]",
> "snap_trimq_len": 0,
> "epoch": 409009,
> "up": [
> 238,
> 106,
> 402,
> 266,
> 374,
> 498,
> 590,
> 627,
> 684,
> 73,
> 66
> ],
> "acting": [
> 238,
> 106,
> 402,
> 266,
> 374,
> 498,
> 590,
> 627,

[ceph-users] Re: cannot repair a handful of damaged pg's

2023-10-06 Thread Wesley Dillingham
A repair is just a type of scrub and it is also limited by osd_max_scrubs
which in pacific is 1.

If another scrub is occurring on any OSD in the PG it wont start.

do "ceph osd set noscrub" and "ceph osd set nodeep-scrub" wait for all
scrubs to stop (a few seconds probably)

Then issue the pg repair command again. It may start.

You also have pgs in backfilling state. Note that by default OSDs in
backfill or backfill_wait also wont perform scrubs.

You can modify this behavior with `ceph config set osd
osd_scrub_during_recovery
true`

I would suggest only setting that after the noscub flags are set and the
only scrub you want to get processed is your manual repair.

Then rm the scrub_during_recovery config item before unsetting the noscrub
flags.



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Oct 6, 2023 at 11:02 AM Simon Oosthoek 
wrote:

> On 06/10/2023 16:09, Simon Oosthoek wrote:
> > Hi
> >
> > we're still in HEALTH_ERR state with our cluster, this is the top of the
> > output of `ceph health detail`
> >
> > HEALTH_ERR 1/846829349 objects unfound (0.000%); 248 scrub errors;
> > Possible data damage: 1 pg recovery_unfound, 2 pgs inconsistent;
> > Degraded data redundancy: 6/7118781559 objects degraded (0.000%), 1 pg
> > degraded, 1 pg undersized; 63 pgs not deep-scrubbed in time; 657 pgs not
> > scrubbed in time
> > [WRN] OBJECT_UNFOUND: 1/846829349 objects unfound (0.000%)
> >  pg 26.323 has 1 unfound objects
> > [ERR] OSD_SCRUB_ERRORS: 248 scrub errors
> > [ERR] PG_DAMAGED: Possible data damage: 1 pg recovery_unfound, 2 pgs
> > inconsistent
> >  pg 26.323 is active+recovery_unfound+degraded+remapped, acting
> > [92,109,116,70,158,128,243,189,256], 1 unfound
> >  pg 26.337 is active+clean+inconsistent, acting
> > [139,137,48,126,165,89,237,199,189]
> >  pg 26.3e2 is active+clean+inconsistent, acting
> > [12,27,24,234,195,173,98,32,35]
> > [WRN] PG_DEGRADED: Degraded data redundancy: 6/7118781559 objects
> > degraded (0.000%), 1 pg degraded, 1 pg undersized
> >  pg 13.3a5 is stuck undersized for 4m, current state
> > active+undersized+remapped+backfilling, last acting
> > [2,45,32,62,2147483647,55,116,25,225,202,240]
> >  pg 26.323 is active+recovery_unfound+degraded+remapped, acting
> > [92,109,116,70,158,128,243,189,256], 1 unfound
> >
> >
> > For the PG_DAMAGED pgs I try the usual `ceph pg repair 26.323` etc.,
> > however it fails to get resolved.
> >
> > The osd.116 is already marked out and is beginning to get empty. I've
> > tried restarting the osd processes of the first osd listed for each PG,
> > but that doesn't get it resolved either.
> >
> > I guess we should have enough redundancy to get the correct data back,
> > but how can I tell ceph to fix it in order to get back to a healthy
> state?
>
> I guess this could be related to the number of scrubs going on, I read
> somewhere that this may interfere with the repair request. I would
> expect the repair would have priority over scrubs...
>
> BTW, we're running pacific for now, we want to update when the cluster
> is healthy again.
>
> Cheers
>
> /Simon
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: librbd hangs during large backfill

2023-07-18 Thread Wesley Dillingham
Did your automation / process allow for stalls in between changes to allow
peering to complete? My hunch is you caused a very large peering storm
(during peering a PG is inactive) which in turn caused your VMs to panic.
If the RBDs are unmapped and re-mapped does it still continue to struggle?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Jul 18, 2023 at 11:52 AM <
fb2cd0fc-933c-4cfe-b534-93d67045a...@simplelogin.com> wrote:

> Starting on Friday, as part of adding a new pod of 12 servers, we
> initiated a reweight on roughly 384 drives; from 0.1 to 0.25. Something
> about the resulting large backfill is causing librbd to hang, requiring
> server restarts. The volumes are showing buffer i/o errors when this
> happens.We are currently using hybrid OSDs with both SSD and traditional
> spinning disks. The current status of the cluster is:
> ceph --version
> ceph version 14.2.22
> Cluster Kernel 5.4.49-200
> {
> "mon": {
> "ceph version 14.2.22 nautilus (stable)": 3
> },
> "mgr": {
> "ceph version 14.2.22 nautilus (stable)": 3
> },
> "osd": {
> "ceph version 14.2.21 nautilus (stable)": 368,
> "ceph version 14.2.22 (stable)": 2055
> },
> "mds": {},
> "rgw": {
> "ceph version 14.2.22 (stable)": 7
> },
> "overall": {
> "ceph version 14.2.21 (stable)": 368,
> "ceph version 14.2.22 (stable)": 2068
> }
> }
>
> HEALTH_WARN, noscrub,nodeep-scrub flag(s) set.
> pgs: 6815703/11016906121 objects degraded (0.062%) 2814059622/11016906121
>  objects misplaced (25.543%).
>
> The client servers are on 3.10.0-1062.1.2.el7.x86_6
>
> We have found a couple of issues that look relevant:
> https://tracker.ceph.com/issues/19385
> https://tracker.ceph.com/issues/18807
> Has anyone experienced anything like this before? Does anyone have any
> recommendations as to settings that can help alleviate this while the
> backfill completes?
> An example of the buffer ii/o errors:
>
> Jul 17 06:36:08 host8098 kernel: buffer_io_error: 22 callbacks suppressed
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 0, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-4, logical
> block 3, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-5, logical
> block 511984, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical
> block 3487657728, async page read
> Jul 17 06:36:08 host8098 kernel: Buffer I/O error on dev dm-6, logical
> block 3487657729, async page read
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: mon log file grows huge

2023-07-10 Thread Wesley Dillingham
At what level do you have logging set to for your mons? That is a high
volume of logs for the mon to generate.

You can ask all the mons to print their debug logging level with:

"ceph tell mon.* config get debug_mon"

The default is 1/5

What is the overall status of your cluster? Is it healthy?

"ceph status"

Consider implementing more aggressive log rotation

This link may provide useful
https://docs.ceph.com/en/latest/rados/troubleshooting/log-and-debug/



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jul 10, 2023 at 9:44 AM Ben  wrote:

> Hi,
>
> In our cluster monitors' log grows to couple GBs in days. There are quite
> many debug message from rocksdb, osd, mgr and mds. These should not be
> necessary with a well-run cluster. How could I close these logging?
>
> Thanks,
> Ben
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph.conf and two different ceph clusters

2023-06-26 Thread Wesley Dillingham
You need to use the --id and --cluster options of the rbd command and
maintain a .conf file for each cluster.

/etc/ceph/clusterA.conf
/etc/ceph/clusterB.conf

/etc/ceph/clusterA.client.userA.keyring
/etc/ceph/clusterB.client.userB.keyring

now use the rbd commands as such:

rbd --id userA --cluster clusterA

This will cause the client to read the appropriate files (
/etc/ceph/clusterA.client.userA.keyring and  /etc/ceph/clusterA.conf)
The --id and --cluster translate to which file is read for config and
keyring


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jun 26, 2023 at 9:15 AM garcetto  wrote:

> good afternoon,
>   how can i config ceph.conf file on a generic rbd client to say to use two
> different ceph clusters to access different volumes on them?
>
> ceph-cluster-left --> rbd-vol-green
> ceph-cluster-right --> rbd-vol-blue
>
> thank you.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: The pg_num from 1024 reduce to 32 spend much time, is there way to shorten the time?

2023-06-06 Thread Wesley Dillingham
Can you send along the responses from "ceph df detail" and ceph "ceph osd
pool ls detail"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Jun 6, 2023 at 1:03 PM Eugen Block  wrote:

> I suspect the target_max_misplaced_ratio (default 0.05). You could try
> setting it to 1 and see if it helps. This has been discussed multiple
> times on this list, check out the archives for more details.
>
> Zitat von Louis Koo :
>
> > Thanks for your responses, I want to know why it spend much time to
> > reduce the pg num?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck undersized and not scrubbed

2023-06-05 Thread Wesley Dillingham
When PGs are degraded they won't scrub, further, if an OSD is involved with
recovery of another PG it wont accept scrubs either so that is the likely
explanation of your not-scrubbed-in time issue. Its of low concern.

Are you sure that recovery is not progressing? I see: "7349/147534197
objects degraded" can you check that again (maybe wait an hour) and see if
7,349 has been reduced.

Another thing I'm noticing is that OSD 57 and 79 are the primary for many
of the PGs which are degraded. They might could use a service restart.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jun 5, 2023 at 12:01 PM Nicola Mori  wrote:

> Dear Ceph users,
>
> after an outage and recovery of one machine I have several PGs stuck in
> active+recovering+undersized+degraded+remapped. Furthermore, many PGs
> have not been (deep-)scrubbed in time. See below for status and health
> details.
> It's been like this for two days, with no recovery I/O being reported,
> so I guess something is stuck in a bad state. I'd need some help in
> understanding what's going on here and how to fix it.
> Thanks,
>
> Nicola
>
> -
>
> # ceph -s
>cluster:
>  id: b1029256-7bb3-11ec-a8ce-ac1f6b627b45
>  health: HEALTH_WARN
>  2 OSD(s) have spurious read errors
>  Degraded data redundancy: 7349/147534197 objects degraded
> (0.005%), 22 pgs degraded, 22 pgs undersized
>  332 pgs not deep-scrubbed in time
>  503 pgs not scrubbed in time
>  (muted: OSD_SLOW_PING_TIME_BACK OSD_SLOW_PING_TIME_FRONT)
>
>services:
>  mon: 5 daemons, quorum bofur,balin,aka,romolo,dwalin (age 2d)
>  mgr: bofur.tklnrn(active, since 32h), standbys: balin.hvunfe,
> aka.wzystq
>  mds: 2/2 daemons up, 1 standby
>  osd: 104 osds: 104 up (since 37h), 104 in (since 37h); 22 remapped pgs
>
>data:
>  volumes: 1/1 healthy
>  pools:   3 pools, 529 pgs
>  objects: 18.53M objects, 40 TiB
>  usage:   54 TiB used, 142 TiB / 196 TiB avail
>  pgs: 7349/147534197 objects degraded (0.005%)
>   2715/147534197 objects misplaced (0.002%)
>   507 active+clean
>   20  active+recovering+undersized+degraded+remapped
>   2   active+recovery_wait+undersized+degraded+remapped
>
> # ceph health detail
> [WRN] PG_DEGRADED: Degraded data redundancy: 7349/147534197 objects
> degraded (0.005%), 22 pgs degraded, 22 pgs undersized
>  pg 3.2c is stuck undersized for 37h, current state
> active+recovery_wait+undersized+degraded+remapped, last acting
> [79,83,34,37,65,NONE,18,95]
>  pg 3.57 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,99,37,NONE,15,104,55,40]
>  pg 3.76 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,5,37,15,100,33,85,NONE]
>  pg 3.9c is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,86,88,NONE,11,69,20,10]
>  pg 3.106 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [79,15,89,NONE,36,32,23,64]
>  pg 3.107 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [79,NONE,64,20,61,92,104,43]
>  pg 3.10c is stuck undersized for 37h, current state
> active+recovery_wait+undersized+degraded+remapped, last acting
> [79,34,NONE,95,104,16,69,18]
>  pg 3.11e is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [79,89,64,46,32,NONE,40,15]
>  pg 3.14e is stuck undersized for 37h, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,34,69,97,85,NONE,46,62]
>  pg 3.160 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,1,101,84,18,33,NONE,69]
>  pg 3.16a is stuck undersized for 37h, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,16,59,103,13,38,49,NONE]
>  pg 3.16e is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,0,27,96,55,10,81,NONE]
>  pg 3.170 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [NONE,57,14,46,55,99,15,40]
>  pg 3.19b is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [NONE,79,59,8,32,17,7,90]
>  pg 3.1a0 is stuck undersized for 2d, current state
> active+recovering+undersized+degraded+remapped, last acting
> [NONE,79,26,50,104,24,97,40]
>  pg 3.1a5 is stuck undersized for 37h, current state
> active+recovering+undersized+degraded+remapped, last acting
> [57,100,61,27,20,NONE,24,85]
>  pg 3.1a8 is stuck undersized for 2d, current state
> 

[ceph-users] Re: `ceph features` on Nautilus still reports "luminous"

2023-05-25 Thread Wesley Dillingham
Fairly confident this is normal. I just checked a pacific cluster and they
all report luminous as well. I think some of the backstory of this is
luminous is the release where up-maps were released and there hasnt been a
reason to increment the features release of subsequent daemons.

To be honest I am not confident that "ceph osd
set-require-min-compat-client nautilus" is a necessary step for you. What
prompted you to run that command?

That step is not listed here:
https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous

but its been a bit since ive operated a pre-nautilus release.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, May 25, 2023 at 3:14 PM Oliver Schmidt  wrote:

> Hi Marc,
>
> >
> > I think for an upgrade the rocksdb is necessary. Check this for your
> monitors
> >
> > cat /var/lib/ceph/mon/ceph-a/kv_backend
>
> Thanks, but I already had migrated all mons to use rocksdb when upgrading
> to Luminous.
>
> ~ # cat /srv/ceph/mon/ceph-host1/kv_backend
> rocksdb
>
> Is this what you expect here?
>
> Best regards
> Oliver
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph Pacific - MDS activity freezes when one the MDSs is restarted

2023-05-24 Thread Wesley Dillingham
There was a memory issue with standby-replay that may have been resolved
since and fix is in 16.2.10 (not sure), the suggestion at the time was to
avoid standby-replay.

Perhaps a dev can chime in on that status. Your MDSs look pretty inactive.
I would consider scaling them down (potentially to single active if your
workload allows).

The MDS have an intricate update process when you use multiple active, make
sure to read the docs on that if you arent using cephadm and want to
attempt an upgrade.

standby-replay can only take over for a single rank (tracks a single active
MDS) where a standby can take over for any rank. more here:
https://docs.ceph.com/en/latest/cephfs/standby/#configuring-standby-replay

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, May 24, 2023 at 10:33 AM Eugen Block  wrote:

> Hi,
>
> using standby-replay daemons is something to test as it can have a
> negative impact, it really depends on the actual workload. We stopped
> using standby-replay in all clusters we (help) maintain, in one
> specific case with many active MDSs and a high load the failover time
> decreased and was "cleaner" for the client application.
> Also, do you know why you use a multi-active MDS setup? Was that a
> requirement for subtree pinning (otherwise multiple active daemons
> would balance the hell out of each other) or maybe just an experiment?
> Depending on the workload pinning might have been necessary, maybe you
> would impact performance if you removed 3 MDS daemons? As an
> alternative you can also deploy multiple MDS daemons per host
> (count_per_host) which can utilize the server better, not sure which
> Pacific version that is, I just tried successfully on 16.2.13. That
> way you could still maintain the required number of MDS daemons (if
> it's still 7 ) and also have enough standby daemons. But that of
> course means in case one MDS host goes down all it's daemons will also
> be unavailable. But we used this feature in an older version
> (customized Nautilus) quite successfully in a customer cluster.
> There are many things to consider here, just wanted to share a couple
> of thoughts.
>
> Regards,
> Eugen
>
> Zitat von Hector Martin :
>
> > Hi,
> >
> > On 24/05/2023 22.02, Emmanuel Jaep wrote:
> >> Hi Hector,
> >>
> >> thank you very much for the detailed explanation and link to the
> >> documentation.
> >>
> >> Given our current situation (7 active MDSs and 1 standby MDS):
> >> RANK  STATE  MDS ACTIVITY DNSINOS   DIRS   CAPS
> >>  0active  icadmin012  Reqs:   82 /s  2345k  2288k  97.2k   307k
> >>  1active  icadmin008  Reqs:  194 /s  3789k  3789k  17.1k   641k
> >>  2active  icadmin007  Reqs:   94 /s  5823k  5369k   150k   257k
> >>  3active  icadmin014  Reqs:  103 /s   813k   796k  47.4k   163k
> >>  4active  icadmin013  Reqs:   81 /s  3815k  3798k  12.9k   186k
> >>  5active  icadmin011  Reqs:   84 /s   493k   489k  9145176k
> >>  6active  icadmin015  Reqs:  374 /s  1741k  1669k  28.1k   246k
> >>   POOL TYPE USED  AVAIL
> >> cephfs_metadata  metadata  8547G  25.2T
> >>   cephfs_data  data 223T  25.2T
> >> STANDBY MDS
> >>  icadmin006
> >>
> >> I would probably be better off having:
> >>
> >>1. having only 3 active MDSs (rank 0 to 2)
> >>2. configure 3 standby-replay to mirror the ranks 0 to 2
> >>3. have 2 'regular' standby MDSs
> >>
> >> Of course, this raises the question of storage and performance.
> >>
> >> Since I would be moving from 7 active MDSs to 3:
> >>
> >>1. each new active MDS will have to store more than twice the data
> >>2. the load will be more than twice as high
> >>
> >> Am I correct?
> >
> > Yes, that is correct. The MDSes don't store data locally but do
> > cache/maintain it in memory, so you will either have higher memory load
> > for the same effective cache size, or a lower cache size for the same
> > memory load.
> >
> > If you have 8 total MDSes, I'd go for 4+4. You don't need non-replay
> > standbys if you have a standby replay for each active MDS. As far as I
> > know, if you end up with an active and its standby both failing, some
> > other standby-replay MDS will still be stolen to take care of that rank,
> > so the cluster will eventually become healthy again after the replay
> time.
> >
> > With 4 active MDSes down from the current 7, the load per MDS will be a
> > bit less than double.
> >
> >>
> >> Emmanuel
> >>
> >> On Wed, May 24, 2023 at 2:31 PM Hector Martin  wrote:
> >>
> >>> On 24/05/2023 21.15, Emmanuel Jaep wrote:
>  Hi,
> 
>  we are currently running a ceph fs cluster at the following version:
>  MDS version: ceph version 16.2.10
>  (45fa1a083152e41a408d15505f594ec5f1b4fe17) pacific (stable)
> 
>  The cluster is composed of 7 active MDSs and 1 standby MDS:
>  RANK  STATE  MDS ACTIVITY DNSINOS   DIRS   CAPS
>   0active  

[ceph-users] Re: Upgrade Ceph cluster + radosgw from 14.2.18 to latest 15

2023-05-15 Thread Wesley Dillingham
I have upgraded dozens of clusters 14 -> 16 using the methods described in
the docs, and when followed precisely no issues have arisen. I would
suggest moving to a release that is receiving backports still (pacific or
quincy). The important aspects are only doing one system at a time. In the
case of monitors ensuring it rejoins quorum after restarting on new version
before proceeding to next mon. In the case of OSDs waiting for all PGs to
be active+clean* before proceeding to the next host.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, May 15, 2023 at 3:46 AM Marc  wrote:

> why are you still not on 14.2.22?
>
> >
> > Yes, the documents show an example of upgrading from Nautilus to
> > Pacific. But I'm not really 100% trusting the Ceph documents, and I'm
> > also afraid of what if Nautilus is not compatible with Pacific in some
> > operations of monitor or osd =)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Upgrade Ceph cluster + radosgw from 14.2.18 to latest 15

2023-05-09 Thread Wesley Dillingham
Curious, why not go to Pacific? You can upgrade up to 2 major releases in a
go.

The upgrade process to pacific is here:
https://docs.ceph.com/en/latest/releases/pacific/#upgrading-non-cephadm-clusters
The upgrade to Octopus is here:
https://docs.ceph.com/en/latest/releases/octopus/#upgrading-from-mimic-or-nautilus

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, May 9, 2023 at 3:25 AM Marc  wrote:

> >
> > Hi, I want to upgrade my old Ceph cluster + Radosgw from v14 to v15. But
> > I'm not using cephadm and I'm not sure how to limit errors as much as
> > possible during the upgrade process?
>
> Maybe check the changelog, check upgrading notes, and continuosly monitor
> the mailing list?
> I have to do the same upgrade and eg. I need to recreate one monitor so it
> has the rocksdb before upgrading.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery

2023-05-01 Thread Wesley Dillingham
Assuming size=3 and min_size=2 It will run degraded (read/write capable)
until a third host becomes available at which point it will backfill the
third copy on the third host. It will be unable to create the third copy of
data if no third host exists. If an additional host is lost the data will
become inactive+degraded (below min_size) and will be unavailable for use.
Though data will not be lost assuming no further failures beyond the 2 full
hosts occurs and again if the second and third host comes back the data
will recover. Always best to have an additional host beyond the size
setting for this reason.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, May 1, 2023 at 11:34 AM wodel youchi  wrote:

> Hi,
>
> When creating a ceph cluster, a failover domain is created, and by default
> it uses host as a minimal domain, that domain can be modified to chassis,
> or rack, ...etc.
>
> My question is :
> Suppose I have three osd nodes, my replication is 3 and my failover domain
> is host, which means that each copy of data is stored on a different node.
>
> What happens when one node crashes, does Ceph use the remaining free space
> on the other two to create the third copy, or the ceph cluster will run in
> degraded mode, like a RAID5
>  which lost a disk.
>
> Regards.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: For suggestions and best practices on expanding Ceph cluster and removing old nodes

2023-04-25 Thread Wesley Dillingham
Get on nautilus first and (perhaps even go to pacific) before expansion.
Primarily for the reason that starting  in nautilus degraded data recovery
will be prioritized over remapped data recovery. As you phase out old
hardware and phase in new hardware you will have a very large amount of
backfill happening and if you get into a degraded state in the middle of
this backfill it will take a much longer time for the degraded data to
become clean again.

Additionally, you will want to follow the best practice of updating your
cluster in order. In short monitors then managers then osds then MDS and
RGW then other clients. More details here:
https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous

You dont want to run with a mixed software version cluster longer than a
well coordinated upgrade takes.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Apr 25, 2023 at 12:31 PM huxia...@horebdata.cn <
huxia...@horebdata.cn> wrote:

> Dear Ceph folks,
>
> I would like to listen to your advice on the following topic: We have a
> 6-node Ceph cluster (for RGW usage only ) running on Luminous 12.2.12, and
> now will add 10 new nodes. Our plan is to phase out the old 6 nodes, and
> run RGW Ceph cluster with the new 10 nodes on Nautilus version。
>
> I can think of two ways to achieve the above goal. The first method would
> be:   1) Upgrade the current 6-node cluster from Luminous 12.2.12 to
> Nautilus 14.2.22;  2) Expand the cluster with the 10 new nodes, and then
> re-balance;  3) After rebalance completes, remove the 6 old nodes from the
> cluster
>
> The second method would get rid of the procedure to upgrade the old 6-node
> from Luminous to Nautilus, because those 6 nodes will be phased out anyway,
> but then we have to deal with a hybrid cluster with 6-node on Luminous
> 12.2.12, and 10-node on Nautilus, and after re-balancing, we can remove the
> 6 old nodes from the cluster.
>
> Any suggestions, advice, or best practice would be highly appreciated.
>
> best regards,
>
>
> Samuel
>
>
>
> huxia...@horebdata.cn
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: v16.2.12 Pacific (hot-fix) released

2023-04-24 Thread Wesley Dillingham
A few questions:

- Will the 16.2.12 packages be "corrected" and reuploaded to the ceph.com
mirror? or will 16.2.13 become what 16.2.12 was supposed to be?

- Was the osd activation regression introduced in 16.2.11 (or does 16.2.10
have it as well)?

- Were the hotfxes in 16.2.12 just related to perf / time-to-activation or
was there a total failure to activate / other breaking issue?

- Which version of Pacific is recommended at this time?

Thank you very much.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Apr 24, 2023 at 3:16 AM Simon Oosthoek 
wrote:

> Dear List
>
> we upgraded to 16.2.12 on April 17th, since then we've seen some
> unexplained downed osd services in our cluster (264 osds), is there any
> risk of data loss, if so, would it be possible to downgrade or is a fix
> expected soon? if so, when? ;-)
>
> FYI, we are running a cluster without cephadm, installed from packages.
>
> Cheers
>
> /Simon
>
> On 23/04/2023 03:03, Yuri Weinstein wrote:
> > We are writing to inform you that Pacific v16.2.12, released on April
> > 14th, has many unintended commits in the changelog than listed in the
> > release notes [1].
> >
> > As these extra commits are not fully tested, we request that all users
> > please refrain from upgrading to v16.2.12 at this time. The current
> > v16.2.12 will be QE validated and released as soon as possible.
> >
> > v16.2.12 was a hotfix release meant to resolve several performance
> > flaws in ceph-volume, particularly during osd activation. The extra
> > commits target v16.2.13.
> >
> > We apologize for the inconvenience. Please reach out to the mailing
> > list with any questions.
> >
> > [1]
> https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$
> >
> > On Fri, Apr 14, 2023 at 9:42 AM Yuri Weinstein 
> wrote:
> >>
> >> We're happy to announce the 12th hot-fix release in the Pacific series.
> >>
> >>
> https://urldefense.com/v3/__https://ceph.io/en/news/blog/2023/v16-2-12-pacific-released/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fNaPJ0M8$
> >>
> >> Notable Changes
> >> ---
> >> This is a hotfix release that resolves several performance flaws in
> ceph-volume,
> >> particularly during osd activation (
> https://urldefense.com/v3/__https://tracker.ceph.com/issues/57627__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fg0yeu7U$
> )
> >> Getting Ceph
> >>
> >> 
> >> * Git at git://github.com/ceph/ceph.git
> >> * Tarball at
> https://urldefense.com/v3/__https://download.ceph.com/tarballs/ceph-16.2.12.tar.gz__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fBEJl5p4$
> >> * Containers at
> https://urldefense.com/v3/__https://quay.io/repository/ceph/ceph__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fc7HeSms$
> >> * For packages, see
> https://urldefense.com/v3/__https://docs.ceph.com/en/latest/install/get-packages/__;!!HJOPV4FYYWzcc1jazlU!-OuIFoOFfOQDsz4abuBV7neIEO7j0XkOM1YBEIhz_IYTdUAIMuO9upMHj_R8bAFFrWQ8OBHwS6x4I5-fAKdWZK4$
> >> * Release git sha1: 5a2d516ce4b134bfafc80c4274532ac0d56fc1e2
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: deep scrub and long backfilling

2023-03-05 Thread Wesley Dillingham
In general it is safe and during long running remapping and backfill
situations I enable it. You can enable it with:

 "ceph config set osd osd_scrub_during_recovery true"

If you have any problems you think are caused by the change, undo it:

Stop scrubs asap:
"ceph osd set nodeep-scrub"
"ceph osd set noscrub"

reinstate the previous value:
 "ceph config set osd osd_scrub_during_recovery false"

Once things stabilize unset the no scrub flags to resume normal scrub
operations:

"ceph osd unset nodeep-scrub"
"ceph osd unset noscrub"



Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Sat, Mar 4, 2023 at 3:07 AM Janne Johansson  wrote:

> Den lör 4 mars 2023 kl 08:08 skrev :
> > ceph 16.2.11,
> > is safe to enable scrub and deep scrub during backfilling ?
> > I have log recovery-backfilling due to a new crushmap , backfilling is
> going slow and deep scrub interval as expired so I have many pgs  not
> deep-scrubbed in time.
>
> It is safe to have it enabled, scrubs will skip the PGs currently
> being backfilled.
> It will put some extra load on the cluster, but for most clusters,
> scrubs are always on by default.
>
> --
> May the most significant bit of your life be positive.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Very slow snaptrim operations blocking client I/O

2023-01-27 Thread Wesley Dillingham
I hit this issue once on a nautilus cluster and changed the OSD
parameter bluefs_buffered_io
= true (was set at false). I believe the default of this parameter was
switched from false to true in release 14.2.20, however, perhaps you could
still check what your osds are configured with in regard to this config
item.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Jan 27, 2023 at 8:52 AM Victor Rodriguez 
wrote:

> Hello,
>
> Asking for help with an issue. Maybe someone has a clue about what's
> going on.
>
> Using ceph 15.2.17 on Proxmox 7.3. A big VM had a snapshot and I removed
> it. A bit later, nearly half of the PGs of the pool entered snaptrim and
> snaptrim_wait state, as expected. The problem is that such operations
> ran extremely slow and client I/O was nearly nothing, so all VMs in the
> cluster got stuck as they could not I/O to the storage. Taking and
> removing big snapshots is a normal operation that we do often and this
> is the first time I see this issue in any of my clusters.
>
> Disks are all Samsung PM1733 and network is 25G. It gives us plenty of
> performance for the use case and never had an issue with the hardware.
>
> Both disk I/O and network I/O was very low. Still, client I/O seemed to
> get queued forever. Disabling snaptrim (ceph osd set nosnaptrim) stops
> any active snaptrim operation and client I/O resumes back to normal.
> Enabling snaptrim again makes client I/O to almost halt again.
>
> I've been playing with some settings:
>
> ceph tell 'osd.*' injectargs '--osd-max-trimming-pgs 1'
> ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep 30'
> ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep-ssd 30'
> ceph tell 'osd.*' injectargs '--osd-pg-max-concurrent-snap-trims 1'
>
> None really seemed to help. Also tried restarting OSD services.
>
> This cluster was upgraded from 14.2.x to 15.2.17 a couple of months. Is
> there any setting that must be changed which may cause this problem?
>
> I have scheduled a maintenance window, what should I look for to
> diagnose this problem?
>
> Any help is very appreciated. Thanks in advance.
>
> Victor
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Odd 10-minute delay before recovery IO begins

2022-12-05 Thread Wesley Dillingham
I think you are experiencing the mon_osd_down_out_interval

https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/#confval-mon_osd_down_out_interval

Ceph waits 10 minutes before marking a down osd as out for the reasons you
mention, but this would have been the case in nautilus as well.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Dec 5, 2022 at 5:20 PM Sean Matheny 
wrote:

> Hi all,
>
> New Quincy cluster here that I'm just running through some benchmarks
> against:
>
> ceph version 17.2.3 (dff484dfc9e19a9819f375586300b3b79d80034d) quincy
> (stable)
> 11 nodes of 24x 18TB HDD OSDs, 2x 2.9TB SSD OSDs
>
> I'm seeing a delay of almost exactly 10 minutes when I remove an OSD/node
> from the cluster until actual recovery IO begins. This is much different
> behaviour that what I'm used to in Nautilus previously, where recovery IO
> would commence within seconds. Downed OSDs are reflected in ceph health
> within a few seconds (as expected), and affected PGs show as undersized a
> few seconds later (as expected). I guess this 10-minute delay may even be a
> feature-- accidentally rebooting a node before setting recovery flags would
> prevent rebalancing, for example. Just thought it was worth asking in case
> it's a bug or something to look deeper into.
>
> I've read through the OSD config and all of my recovery tuneables look ok,
> for example:
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/
>
> [ceph: root@ /]# ceph config get osd osd_recovery_delay_start
> 20.00
> 3[ceph: root@ /]# ceph config get osd osd_recovery_sleep
> 40.00
> 5[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hdd
> 60.10
> 7[ceph: root@ /]# ceph config get osd osd_recovery_sleep_ssd
> 80.00
> 9[ceph: root@ /]# ceph config get osd osd_recovery_sleep_hybrid
> 100.025000
>
> Thanks in advance.
>
> Ngā mihi,
>
> Sean Matheny
> HPC Cloud Platform DevOps Lead
> New Zealand eScience Infrastructure (NeSI)
>
> e: sean.math...@nesi.org.nz
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] subdirectory pinning and reducing ranks / max_mds

2022-10-21 Thread Wesley Dillingham
In a situation where you have say 3 active MDS (and 3 standbys).
You have 3 ranks, 0,1,2
In your filesystem you have three directories at the root level [/a, /b, /c]

you pin:
/a to rank 0
/b to rank 1
/c to rank 2

and you need to upgrade your Ceph Version. When it becomes time to reduce
max_mds to 1 and thereby reduce the number of ranks to 1, just rank 0 what
happens to directories /b and /c do they become unavailable between the
time when max_mds is reduced to 1 and after the upgrade when max_mds is
restored to 3. Alternatively if a rank disappears does the CephFS client
understand this and begin to ignore the pinned rank and makes use of the
remaining ranks? Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to determine if a filesystem is allow_standby_replay = true

2022-10-20 Thread Wesley Dillingham
Thanks Dhairya, what version are you using? I am 16.2.10

[root@alma3-4 ~]# ceph fs dump | grep -i replay
dumped fsmap epoch 90
[mds.alma3-6{0:10340349} state up:standby-replay seq 1 addr [v2:
10.0.24.6:6803/937383171,v1:10.0.24.6:6818/937383171] compat
{c=[1],r=[1],i=[7ff]}]

as you can see i have a MDS in replay mode and standby replay is enabled
but my output is different from yours.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, Oct 20, 2022 at 2:43 PM Dhairya Parmar  wrote:

> Hi Wesley,
>
> You can find if the `allow_standby_replay` is turned on or off by looking
> at the fs dump,
> run `ceph fs dump | grep allow_standby_replay` and if it is turned on you
> will find something like:
>
> $ ./bin/ceph fs dump | grep allow_standby_replay
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 2022-10-21T00:06:14.656+0530 7fbed4fc3640 -1 WARNING: all dangerous and
> experimental features are enabled.
> 2022-10-21T00:06:14.663+0530 7fbed4fc3640 -1 WARNING: all dangerous and
> experimental features are enabled.
> dumped fsmap epoch 8
> flags 32 joinable allow_snaps allow_multimds_snaps *allow_standby_replay*
>
> turn it to false and it will be gone:
>
> $ ./bin/ceph fs set a allow_standby_replay false
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 2022-10-21T00:10:38.668+0530 7f68b66f0640 -1 WARNING: all dangerous and
> experimental features are enabled.
> 2022-10-21T00:10:38.675+0530 7f68b66f0640 -1 WARNING: all dangerous and
> experimental features are enabled.
> $ ./bin/ceph fs dump | grep allow_standby_replay
> *** DEVELOPER MODE: setting PATH, PYTHONPATH and LD_LIBRARY_PATH ***
> 2022-10-21T00:10:43.938+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
> experimental features are enabled.
> 2022-10-21T00:10:43.945+0530 7fe6b3e7a640 -1 WARNING: all dangerous and
> experimental features are enabled.
> dumped fsmap epoch 15
>
> Hope it helps.
>
>
> On Thu, Oct 20, 2022 at 11:09 PM Wesley Dillingham 
> wrote:
>
>> I am building some automation for version upgrades of MDS and part of the
>> process I would like to determine if a filesystem has allow_standby_replay
>> set to true and if so then disable it. Granted I could just issue: "ceph
>> fs
>> set MyFS allow_standby_replay false" and be done with it but Its got me
>> curious that there is not the equivalent command: "ceph fs get MyFS
>> allow_standby_replay" to check this information. So where can an operator
>> determine this?
>>
>> I tried a diff of "ceph fs get MyFS" with this configurable in both true
>> and false and found:
>>
>> diff /tmp/true /tmp/false
>> 3,4c3,4
>> < epoch 66
>> < flags 32
>> ---
>> > epoch 67
>> > flags 12
>>
>> and Im guessing this information is encoded  in the "flags" field. I am
>> working with 16.2.10. Thanks.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>
> --
> *Dhairya Parmar*
>
> He/Him/His
>
> Associate Software Engineer, CephFS
>
> Red Hat Inc. <https://www.redhat.com/>
>
> dpar...@redhat.com
> <https://www.redhat.com/>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] How to determine if a filesystem is allow_standby_replay = true

2022-10-20 Thread Wesley Dillingham
I am building some automation for version upgrades of MDS and part of the
process I would like to determine if a filesystem has allow_standby_replay
set to true and if so then disable it. Granted I could just issue: "ceph fs
set MyFS allow_standby_replay false" and be done with it but Its got me
curious that there is not the equivalent command: "ceph fs get MyFS
allow_standby_replay" to check this information. So where can an operator
determine this?

I tried a diff of "ceph fs get MyFS" with this configurable in both true
and false and found:

diff /tmp/true /tmp/false
3,4c3,4
< epoch 66
< flags 32
---
> epoch 67
> flags 12

and Im guessing this information is encoded  in the "flags" field. I am
working with 16.2.10. Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't delete or unprotect snapshot with rbd

2022-10-06 Thread Wesley Dillingham
Anything in the trash?

"rbd trash ls images"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, Oct 6, 2022 at 3:29 PM Niklas Jakobsson <
niklas.jakobs...@kindredgroup.com> wrote:

> A yes, sorry about that. I actually have the issue on two images and I
> seem to have mixed them up when I was putting together the example, here is
> a correct one:
>
> # rbd info images/f3f4c73f-2eec-4af1-9bdf-4974a747607b
> rbd image 'f3f4c73f-2eec-4af1-9bdf-4974a747607b':
> size 8 GiB in 1024 objects
> order 23 (8 MiB objects)
> snapshot_count: 1
> id: 1ae08970c6321e
> block_name_prefix: rbd_data.1ae08970c6321e
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
> op_features:
> flags:
> create_timestamp: Mon Apr  1 14:01:57 2019
> access_timestamp: Thu Jun 23 11:23:11 2022
> # rbd snap ls images/f3f4c73f-2eec-4af1-9bdf-4974a747607b
> SNAPID  NAME  SIZE   PROTECTED  TIMESTAMP
> 40  snap  8 GiB  yesMon Apr  1 14:03:55 2019
> # rbd snap rm images/f3f4c73f-2eec-4af1-9bdf-4974a747607b@snap
> Removing snap: 0% complete...failed.
> rbd: snapshot 'snap' is protected from removal.
> # rbd snap unprotect images/f3f4c73f-2eec-4af1-9bdf-4974a747607b@snap
> rbd: unprotecting snap failed: (16) Device or resource busy
> # rbd children images/f3f4c73f-2eec-4af1-9bdf-4974a747607b@snap
> rbd: listing children failed: (2) No such file or directory
>
>  /Niklas
>
> 
> From: Wesley Dillingham 
> Sent: Thursday, October 6, 2022 20:11
> To: Niklas Jakobsson 
> Cc: ceph-users@ceph.io 
> Subject: Re: [ceph-users] Can't delete or unprotect snapshot with rbd
>
>
> [EXTERNAL]
>
> You are demo'ing two RBDs here:
> images/f3f4c73f-2eec-4af1-9bdf-4974a747607b seems to have 1 snapshot yet
> later when you try to interact with the snapshot you are doing so with a
> different rbd/image altogether: images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9
>
>
> Respectfully,
>
> Wes Dillingham
> w...@wesdillingham.com<mailto:w...@wesdillingham.com>
> LinkedIn<
> https://eur02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.linkedin.com%2Fin%2Fwesleydillingham=05%7C01%7CNiklas.Jakobsson%40kindredgroup.com%7C81206dd028554c3579cb08daa7c64921%7C82ff090d4ac0439f834a0c3f3d5f33ce%7C1%7C1%7C638006767323871099%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C=rw0%2BWJSIXMJw9HH85L0X9OcrUxARmWiWZvUrnIzLy24%3D=0
> >
>
>
> On Thu, Oct 6, 2022 at 9:13 AM Niklas Jakobsson <
> niklas.jakobs...@kindredgroup.com<mailto:niklas.jakobs...@kindredgroup.com>>
> wrote:
> Hi,
>
> I have an issue with a rbd image that I can't delete.
>
> I have tried this:
> # rbd info images/f3f4c73f-2eec-4af1-9bdf-4974a747607b@snap
> rbd image 'f3f4c73f-2eec-4af1-9bdf-4974a747607b':
> size 8 GiB in 1024 objects
> order 23 (8 MiB objects)
> snapshot_count: 1
> id: 1ae08970c6321e
> block_name_prefix: rbd_data.1ae08970c6321e
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
> op_features:
> flags:
> create_timestamp: Mon Apr  1 14:01:57 2019
> access_timestamp: Thu Jun 23 11:23:11 2022
> protected: True
> # rbd snap ls images/f3f4c73f-2eec-4af1-9bdf-4974a747607b
> SNAPID  NAME  SIZE   PROTECTED  TIMESTAMP
> 40  snap  8 GiB  yesMon Apr  1 14:03:55 2019
> # rbd children images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9@snap
> rbd: listing children failed: (2) No such file or directory
> # rbd snap unprotect images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9@snap
> rbd: unprotecting snap failed: (16) Device or resource busy
> ---
>
> So, it looks like something is still referencing that snapshot but the
> reference seems broken.
>
> Any advice here would be helpful, Thanks!
>
>  /Niklas
>
>
> The content of this email is confidential and intended for the recipient
> specified in message only. If you have received it in error, please notify
> us immediately by replying to this e-mail and then follow with its
> deletion. Please do not copy it or use it or disclose its contents to any
> third parties. Thank you for your cooperation.
>
>
> Classified as General
> ___
> ceph-users mailing list -- ceph-users@ceph.io<mailto:ceph-users@ceph.io>
> To unsubscribe send an email to ceph-users-le...@ceph.io ceph-users-le...@ceph.io>
>
>
> Classified as General
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Can't delete or unprotect snapshot with rbd

2022-10-06 Thread Wesley Dillingham
You are demo'ing two RBDs here: images/f3f4c73f-2eec-4af1-9bdf-4974a747607b
seems to have 1 snapshot yet later when you try to interact with the
snapshot you are doing so with a different rbd/image altogether:
images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Oct 6, 2022 at 9:13 AM Niklas Jakobsson <
niklas.jakobs...@kindredgroup.com> wrote:

> Hi,
>
> I have an issue with a rbd image that I can't delete.
>
> I have tried this:
> # rbd info images/f3f4c73f-2eec-4af1-9bdf-4974a747607b@snap
> rbd image 'f3f4c73f-2eec-4af1-9bdf-4974a747607b':
> size 8 GiB in 1024 objects
> order 23 (8 MiB objects)
> snapshot_count: 1
> id: 1ae08970c6321e
> block_name_prefix: rbd_data.1ae08970c6321e
> format: 2
> features: layering, exclusive-lock, object-map, fast-diff,
> deep-flatten
> op_features:
> flags:
> create_timestamp: Mon Apr  1 14:01:57 2019
> access_timestamp: Thu Jun 23 11:23:11 2022
> protected: True
> # rbd snap ls images/f3f4c73f-2eec-4af1-9bdf-4974a747607b
> SNAPID  NAME  SIZE   PROTECTED  TIMESTAMP
> 40  snap  8 GiB  yesMon Apr  1 14:03:55 2019
> # rbd children images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9@snap
> rbd: listing children failed: (2) No such file or directory
> # rbd snap unprotect images/1fcfaa6b-eba0-4c75-b77d-d5b3ab4538a9@snap
> rbd: unprotecting snap failed: (16) Device or resource busy
> ---
>
> So, it looks like something is still referencing that snapshot but the
> reference seems broken.
>
> Any advice here would be helpful, Thanks!
>
>  /Niklas
>
>
> The content of this email is confidential and intended for the recipient
> specified in message only. If you have received it in error, please notify
> us immediately by replying to this e-mail and then follow with its
> deletion. Please do not copy it or use it or disclose its contents to any
> third parties. Thank you for your cooperation.
>
>
> Classified as General
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Fstab entry for mounting specific ceph fs?

2022-09-23 Thread Wesley Dillingham
Try adding mds_namespace option like so:

192.168.1.11,192.168.1.12,192.168.1.13:/ /media/ceph_fs/
name=james_user,secretfile=/etc/ceph/secret.key,mds_namespace=myfs

On Fri, Sep 23, 2022 at 6:41 PM Sagittarius-A Black Hole <
nigrat...@gmail.com> wrote:

> Hi,
>
> The below fstab entry works, so that is a given.
> But how do I specify which Ceph filesystem I want to mount in this fstab
> format?
>
> 192.168.1.11,192.168.1.12,192.168.1.13:/ /media/ceph_fs/
> name=james_user, secretfile=/etc/ceph/secret.key
>
> I have tried different ways, but always get the error "source mount
> path was not specified"
> I can't find many examples of fstab ceph mounts unfortunately.
>
> Thanks,
>
> Daniel
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
-- 

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Power outage recovery

2022-09-15 Thread Wesley Dillingham
Having the quorum / monitors back up may change the MDS and RGW's ability
to start and stay running. Have you tried just restarting the MDS / RGW
daemons again?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Sep 15, 2022 at 5:54 PM Jorge Garcia  wrote:

> OK, I'll try to give more details as I remember them.
>
> 1. There was a power outage and then power came back up.
>
> 2. When the systems came back up, I did a "ceph -s" and it never
> returned. Further investigation revealed that the ceph-mon processes had
> not started in any of the 3 monitors. I looked at the log files and it
> said something about:
>
> ceph_abort_msg("Bad table magic number: expected 9863518390377041911,
> found 30790637387776 in
> /var/lib/ceph/mon/ceph-gi-cprv-adm-01/store.db/2886524.sst")
>
> Looking at the internet, I found some suggestions about troubleshooting
> monitors in:
>
> https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/
>
> I quickly determined that the monitors weren't running, so I found the
> section where it said "RECOVERY USING OSDS". The description made sense:
>
> "But what if all monitors fail at the same time? Since users are
> encouraged to deploy at least three (and preferably five) monitors in a
> Ceph cluster, the chance of simultaneous failure is rare. But unplanned
> power-downs in a data center with improperly configured disk/fs settings
> could fail the underlying file system, and hence kill all the monitors.
> In this case, we can recover the monitor store with the information
> stored in OSDs."
>
> So, I did the procedure described in that section, and then made sure
> the correct keys were in the keyring and restarted the processes.
>
> WELL, I WAS REDOING ALL THESE STEPS WHILE WRITING THIS MAIL MESSAGE, AND
> NOW THE MONITORS ARE BACK! I must have missed some step in the middle of
> my panic.
>
> # ceph -s
>
>cluster:
>  id: ----
>  health: HEALTH_WARN
>  mons are allowing insecure global_id reclaim
>
>services:
>  mon: 3 daemons, quorum host-a, host-b, host-c (age 19m)
>  mgr: host-b(active, since 19m), standbys: host-a, host-c
>  osd: 164 osds: 164 up (since 16m), 164 in (since 8h)
>
>data:
>  pools:   14 pools, 2992 pgs
>  objects: 91.58M objects, 290 TiB
>  usage:   437 TiB used, 1.2 PiB / 1.7 PiB avail
>  pgs: 2985 active+clean
>   7active+clean+scrubbing+deep
>
> Couple of missing or strange things:
>
> 1. Missing mds
> 2. Missing rgw
> 3. New warning showing up
>
> But overall, better than a couple hours ago. If anybody is still reading
> and has any suggestions about how to solve the 3 items above, that would
> be great! Otherwise, back to scanning the internet for ideas...
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Power outage recovery

2022-09-15 Thread Wesley Dillingham
What does "ceph status" "ceph health detail" etc show, currently?

Based on what you have said here my thought is you have created a new
monitor quorum and as such all auth details from the old cluster are lost
including any and all mgr cephx auth keys, so what does the log for the mgr
say? How many monitors did you have before? Do you have a backup the old
monitor store?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Sep 15, 2022 at 2:18 PM Marc  wrote:

> > (particularly the "Recovery using OSDs" section). I got it so the mon
> > processes would start, but then the ceph-mgr process died, and would not
> > restart. Not sure how to recover so both ceph-mgr and ceph-mon processes
> > run. In the meantime, all the data is gone. Any suggestions?
>
> All the data is gone? the osd's are running all. Your networking is fine?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus

2022-09-13 Thread Wesley Dillingham
I haven't read through this entire thread so forgive me if already
mentioned:

What is the parameter "bluefs_buffered_io" set to on your OSDs? We once saw
a terrible slowdown on our OSDs during snaptrim events and setting
bluefs_buffered_io to true alleviated that issue. That was on a nautilus
cluster.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Sep 13, 2022 at 10:48 AM Boris Behrens  wrote:

> The cluster is SSD only with 2TB,4TB and 8TB disks. I would expect that
> this should be done fairly fast.
> For now I will recreate every OSD in the cluster and check if this helps.
>
> Do you experience slow OPS (so the cluster shows a message like "cluster
> [WRN] Health check update: 679 slow ops, oldest one blocked for 95 sec,
> daemons
>
> [osd.0,osd.106,osd.107,osd.108,osd.113,osd.116,osd.123,osd.124,osd.125,osd.134]...
> have slow ops. (SLOW_OPS)")?
>
> I can also see a huge spike in the load of all hosts in our cluster for a
> couple of minutes.
>
>
> Am Di., 13. Sept. 2022 um 13:14 Uhr schrieb Frank Schilder :
>
> > Hi Boris.
> >
> > > 3. wait some time (took around 5-20 minutes)
> >
> > Sounds short. Might just have been the compaction that the OSDs do any
> > ways on startup after upgrade. I don't know how to check for completed
> > format conversion. What I see in your MON log is exactly what I have seen
> > with default snap trim settings until all OSDs were converted. Once an
> OSD
> > falls behind and slow ops start piling up, everything comes to a halt.
> Your
> > logs clearly show a sudden drop of IOP/s on snap trim start and I would
> > guess this is the cause of the slowly growing OPS back log of the OSDs.
> >
> > If its not that, I don't know what else to look for.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Boris Behrens 
> > Sent: 13 September 2022 12:58:19
> > To: Frank Schilder
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] laggy OSDs and staling krbd IO after upgrade
> > from nautilus to octopus
> >
> > Hi Frank,
> > we converted the OSDs directly on the upgrade.
> >
> > 1. installing new ceph versions
> > 2. restart all OSD daemons
> > 3. wait some time (took around 5-20 minutes)
> > 4. all OSDs were online again.
> >
> > So I would expect, that the OSDs are all upgraded correctly.
> > I also checked when the trimming happens, and it does not seem to be an
> > issue on it's own, as the trim happens all the time in various sizes.
> >
> > Am Di., 13. Sept. 2022 um 12:45 Uhr schrieb Frank Schilder  > >:
> > Are you observing this here:
> >
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/LAN6PTZ2NHF2ZHAYXZIQPHZ4CMJKMI5K/
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Boris Behrens mailto:b...@kervyn.de>>
> > Sent: 13 September 2022 11:43:20
> > To: ceph-users@ceph.io
> > Subject: [ceph-users] laggy OSDs and staling krbd IO after upgrade from
> > nautilus to octopus
> >
> > Hi, I need you help really bad.
> >
> > we are currently experiencing a very bad cluster hangups that happen
> > sporadic. (once on 2022-09-08 mid day (48 hrs after the upgrade) and once
> > 2022-09-12 in the evening)
> > We use krbd without cephx for the qemu clients and when the OSDs are
> > getting laggy, the krbd connection comes to a grinding halt, to a point
> > that all IO is staling and we can't even unmap the rbd device.
> >
> > From the logs, it looks like that the cluster starts to snaptrim a lot a
> > PGs, then PGs become laggy and then the cluster snowballs into laggy
> OSDs.
> > I have attached the monitor log and the osd log (from one OSD) around the
> > time where it happened.
> >
> > - is this a known issue?
> > - what can I do to debug it further?
> > - can I downgrade back to nautilus?
> > - should I upgrade the PGs for the pool to 4096 or 8192?
> >
> > The cluster contains a mixture of 2,4 and 8TB SSDs (no rotating disks)
> > where the 8TB disks got ~120PGs and the 2TB disks got ~30PGs. All hosts
> > have a minimum of 128GB RAM and the kernel logs of all ceph hosts do not
> > show anything for the timeframe.
> >
> > Cluster stats:
> >   cluster:
> > id: 74313356-3b3d-43f3-bce6-9fb0e4591097
> > health: HEALTH_OK
> >
> >   services:
> > mon: 3 daemons, quorum ceph-rbd-mon4,ceph-rbd-mon5,ceph-rbd-mon6 (age
> > 25h)
> > mgr: ceph-rbd-mon5(active, since 4d), standbys: ceph-rbd-mon4,
> > ceph-rbd-mon6
> > osd: 149 osds: 149 up (since 6d), 149 in (since 7w)
> >
> >   data:
> > pools:   4 pools, 2241 pgs
> > objects: 25.43M objects, 82 TiB
> > usage:   231 TiB used, 187 TiB / 417 TiB avail
> > pgs: 2241 active+clean
> >
> >   io:
> > client:   211 MiB/s rd, 273 MiB/s wr, 1.43k op/s rd, 8.80k op/s 

[ceph-users] Re: Increasing number of unscrubbed PGs

2022-09-13 Thread Wesley Dillingham
what does "ceph pg ls scrubbing" show? Do you have PGs that have been stuck
in a scrubbing state for a long period of time (many hours,days,weeks etc).
This will show in the "SINCE" column.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Sep 13, 2022 at 7:32 AM Burkhard Linke <
burkhard.li...@computational.bio.uni-giessen.de> wrote:

> Hi Josh,
>
>
> thx for the link. I'm not sure whether this is the root cause, since we
> did not use the noscrub and nodeepscrub flags in the past. I've set them
> for a short period to test whether removing the flag triggers more
> backfilling. During that time no OSD were restarted etc.
>
>
> But the ticket mentioned repeering as a method for resolving the stuck
> OSDs. I've repeered some of the PGs, and the number of affected PG did
> not increase significantly anymore. On the other hand the number of
> running deep-scrubs also did not increase significantly. I'll keep an
> eye on the developement and hope for 16.2.11 being released soon..
>
>
> Best regards,
>
> Burkhard
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full cluster, new OSDS not being used

2022-08-23 Thread Wesley Dillingham
https://docs.ceph.com/en/pacific/rados/operations/upmap/

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Aug 23, 2022 at 1:45 PM Wyll Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> Thank you - we have increased backfill settings, but can you elaborate on
> "injecting upmaps" ?
> ----------
> *From:* Wesley Dillingham 
> *Sent:* Tuesday, August 23, 2022 1:44 PM
> *To:* Wyll Ingersoll 
> *Cc:* ceph-users@ceph.io 
> *Subject:* Re: [ceph-users] Full cluster, new OSDS not being used
>
> In that case I would say your options are to make use of injecting upmaps
> to move data off the full osds or to increase the backfill throttle
> settings to make things move faster.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Aug 23, 2022 at 1:28 PM Wyll Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
>
> Unfortunately, I cannot. The system in question is in a secure location
> and I don't have direct access to it.  The person on site runs the commands
> I send them and the osd tree is correct as far as we can tell. The new
> hosts and osds are in the right place in the tree and have proper weights.
> One small difference is that the new osds have a class ("hdd"), whereas
> MOST of the pre-existing osds do not have a class designation, this is a
> cluster that has grown and been upgraded over several releases of ceph.
> Currently it is running pacific 16.2.9.  However, removing the class
> designation on one of the new osds did not make any difference so I dont
> think that is the issue.
>
> The cluster is slowly recovering, but our new OSDs are very lightly used
> at this point, only a few PGs have been assigned to them, though more than
> zero and the number does appear to be slowly (very slowly) growing so
> recovery is happening but very very slowly.
>
>
>
>
> --
> *From:* Wesley Dillingham 
> *Sent:* Tuesday, August 23, 2022 1:18 PM
> *To:* Wyll Ingersoll 
> *Cc:* ceph-users@ceph.io 
> *Subject:* Re: [ceph-users] Full cluster, new OSDS not being used
>
> Can you please send the output of "ceph osd tree"
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Aug 23, 2022 at 10:53 AM Wyll Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
>
>
> We have a large cluster with a many osds that are at their nearfull or
> full ratio limit and are thus having problems rebalancing.
> We added 2 more storage nodes, each with 20 additional drives  to give the
> cluster room to rebalance.  However, for the past few days, the new OSDs
> are NOT being used and the cluster remains stuck and is not improving.
>
> The crush map is correct, the new hosts and osds are at the correct
> location, but dont seem to be getting used.
>
> Any idea how we can force the full or backfillfull OSDs to start unloading
> their pgs to the newly added ones?
>
> thanks!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full cluster, new OSDS not being used

2022-08-23 Thread Wesley Dillingham
In that case I would say your options are to make use of injecting upmaps
to move data off the full osds or to increase the backfill throttle
settings to make things move faster.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Aug 23, 2022 at 1:28 PM Wyll Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

> Unfortunately, I cannot. The system in question is in a secure location
> and I don't have direct access to it.  The person on site runs the commands
> I send them and the osd tree is correct as far as we can tell. The new
> hosts and osds are in the right place in the tree and have proper weights.
> One small difference is that the new osds have a class ("hdd"), whereas
> MOST of the pre-existing osds do not have a class designation, this is a
> cluster that has grown and been upgraded over several releases of ceph.
> Currently it is running pacific 16.2.9.  However, removing the class
> designation on one of the new osds did not make any difference so I dont
> think that is the issue.
>
> The cluster is slowly recovering, but our new OSDs are very lightly used
> at this point, only a few PGs have been assigned to them, though more than
> zero and the number does appear to be slowly (very slowly) growing so
> recovery is happening but very very slowly.
>
>
>
>
> --
> *From:* Wesley Dillingham 
> *Sent:* Tuesday, August 23, 2022 1:18 PM
> *To:* Wyll Ingersoll 
> *Cc:* ceph-users@ceph.io 
> *Subject:* Re: [ceph-users] Full cluster, new OSDS not being used
>
> Can you please send the output of "ceph osd tree"
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Aug 23, 2022 at 10:53 AM Wyll Ingersoll <
> wyllys.ingers...@keepertech.com> wrote:
>
>
> We have a large cluster with a many osds that are at their nearfull or
> full ratio limit and are thus having problems rebalancing.
> We added 2 more storage nodes, each with 20 additional drives  to give the
> cluster room to rebalance.  However, for the past few days, the new OSDs
> are NOT being used and the cluster remains stuck and is not improving.
>
> The crush map is correct, the new hosts and osds are at the correct
> location, but dont seem to be getting used.
>
> Any idea how we can force the full or backfillfull OSDs to start unloading
> their pgs to the newly added ones?
>
> thanks!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full cluster, new OSDS not being used

2022-08-23 Thread Wesley Dillingham
Can you please send the output of "ceph osd tree"

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, Aug 23, 2022 at 10:53 AM Wyll Ingersoll <
wyllys.ingers...@keepertech.com> wrote:

>
> We have a large cluster with a many osds that are at their nearfull or
> full ratio limit and are thus having problems rebalancing.
> We added 2 more storage nodes, each with 20 additional drives  to give the
> cluster room to rebalance.  However, for the past few days, the new OSDs
> are NOT being used and the cluster remains stuck and is not improving.
>
> The crush map is correct, the new hosts and osds are at the correct
> location, but dont seem to be getting used.
>
> Any idea how we can force the full or backfillfull OSDs to start unloading
> their pgs to the newly added ones?
>
> thanks!
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Map RBD to multiple nodes (line NFS)

2022-07-25 Thread Wesley Dillingham
You probably want CephFS instead RBD. Overview here:
https://docs.ceph.com/en/quincy/cephfs/

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jul 25, 2022 at 11:00 AM Thomas Schneider <74cmo...@gmail.com>
wrote:

> Hi,
>
> I have this use case:
> Multi-node DB must write backup to a device that is accessible by any node.
>
> The backup is currently provided as RBD, and this RBD is mapped on any
> node belonging to the multi-node DB.
>
> Is it possible that any node has access to the same files, independant
> of which node has written the file to RBD, like a NFS?
> If yes, how must the RBD be configured here?
> If no, is there any possibility in Ceph to provide such a shared storage?
>
>
> Regards
> Thomas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy full osd(s)

2022-07-24 Thread Wesley Dillingham
Can you send along the return of "ceph osd pool ls detail" and "ceph health
detail"

On Sun, Jul 24, 2022, 1:00 AM Nigel Williams 
wrote:

> With current 17.2.1 (cephadm) I am seeing an unusual HEALTH_ERR
> Adding files to a new empty cluster, replica 3 (crush is by host), OSDs
> became 95% full and reweighting them to any value does not cause backfill
> to start.
>
> If I reweight the three too full OSDs to 0.0 I get a large number of
> misplaced objects but no subsequent data movement, cluster remains at
> HEALTH_WARN "Low space hindering backfill". Cluster has 1200 OSDs (all
> except three are close to empty).
>
> Balancer is on, autoscale is on for pool.
>
> I feel I am overlooking something obvious, if anyone can suggest what it
> would be appreciated. thanks.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Using cloudbase windows RBD / wnbd with pre-pacific clusters

2022-07-20 Thread Wesley Dillingham
I understand that the client side code available from cloudbase started
being distributed with pacific and now quincy client code but is there any
particular reason it shouldn't work in conjunction with a nautilus, for
instance, cluster.

We have seen some errors when trying to do IO with mapped RBDs with the
error:

The semaphore timeout period has expired.

Just trying to rule out the cluster version theory. Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rh8 krbd mapping causes no match of type 1 in addrvec problem decoding monmap, -2

2022-07-19 Thread Wesley Dillingham
Thanks.

Interestingly the older kernel did not have a problem with it but the newer
kernel does.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jul 19, 2022 at 3:35 PM Ilya Dryomov  wrote:

> On Tue, Jul 19, 2022 at 9:12 PM Wesley Dillingham 
> wrote:
> >
> >
> > from ceph.conf:
> >
> > mon_host = 10.26.42.172,10.26.42.173,10.26.42.174
> >
> > map command:
> > rbd --id profilerbd device map win-rbd-test/originalrbdfromsnap
> >
> > [root@a2tlomon002 ~]# ceph mon dump
> > dumped monmap epoch 44
> > epoch 44
> > fsid 227623f8-b67e-4168-8a15-2ff2a4a68567
> > last_changed 2022-05-18 15:35:39.385763
> > created 2016-08-09 10:02:28.325333
> > min_mon_release 14 (nautilus)
> > 0: [v2:10.26.42.173:3300/0,v1:10.26.42.173:6789/0] mon.a2tlomon003
> > 1: v2:10.26.42.174:3300/0 mon.a2tlomon004
> > 2: [v2:10.26.42.172:3300/0,v1:10.26.42.172:6789/0] mon.a2tlomon002
> >
> > Looks like something is up with mon:1 only listening on v2 addr not sure
> if thats the root cause but seems likely. Though would think the map should
> still be able to have success.
>
> Yes, this is the root cause.  Theoretically the kernel client could
> ignore it and attempt to proceed but it doesn't, on purpose.  This is
> a clear configuration/user error which is better fixed than worked
> around.
>
> You need to either amend mon1 addresses or tell the kernel client to
> use v2 addresses with e.g. "rbd device map -o ms_mode=prefer-crc ...".
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rh8 krbd mapping causes no match of type 1 in addrvec problem decoding monmap, -2

2022-07-19 Thread Wesley Dillingham
from ceph.conf:

mon_host = 10.26.42.172,10.26.42.173,10.26.42.174

map command:
rbd --id profilerbd device map win-rbd-test/originalrbdfromsnap

[root@a2tlomon002 ~]# ceph mon dump
dumped monmap epoch 44
epoch 44
fsid 227623f8-b67e-4168-8a15-2ff2a4a68567
last_changed 2022-05-18 15:35:39.385763
created 2016-08-09 10:02:28.325333
min_mon_release 14 (nautilus)
0: [v2:10.26.42.173:3300/0,v1:10.26.42.173:6789/0] mon.a2tlomon003
1: v2:10.26.42.174:3300/0 mon.a2tlomon004
2: [v2:10.26.42.172:3300/0,v1:10.26.42.172:6789/0] mon.a2tlomon002

Looks like something is up with mon:1 only listening on v2 addr not sure if
thats the root cause but seems likely. Though would think the map should
still be able to have success.

As a note i tried with 16.2.9 client as well and it also failed in the same
manner.






Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jul 19, 2022 at 12:51 PM Ilya Dryomov  wrote:

> On Tue, Jul 19, 2022 at 5:01 PM Wesley Dillingham 
> wrote:
> >
> > I have a strange error when trying to map via krdb on a RH (alma8)
> release
> > / kernel 4.18.0-372.13.1.el8_6.x86_64 using ceph client version 14.2.22
> > (cluster is 14.2.16)
> >
> > the rbd map causes the following error in dmesg:
> >
> > [Tue Jul 19 07:45:00 2022] libceph: no match of type 1 in addrvec
> > [Tue Jul 19 07:45:00 2022] libceph: problem decoding monmap, -2
> >
> > I am able to map this rbd to a cent7 / 3.10.0-1160.71.1.el7.x86_64
> machine
> > using the same client and commands.
> >
> > Of note, on the RH8 node I can fetch info about the rbd and list rbds in
> > the pool check ceph status etc. It seems purely limited to the mapping of
> > the RBD:
> >
> > Info about the RBD:
> >
> > [root@alma8rbdtest ~]# rbd --id profilerbd info
> > win-rbd-test/originalrbdfromsnap
> > rbd image 'originalrbdfromsnap':
> > size 5 GiB in 1280 objects
> > order 22 (4 MiB objects)
> > snapshot_count: 0
> > id: 2c5f465fa134c0
> > block_name_prefix: rbd_data.2c5f465fa134c0
> > format: 2
> > features: layering, exclusive-lock
> > op_features:
> > flags:
> > create_timestamp: Mon Jul 18 13:58:39 2022
> > access_timestamp: Mon Jul 18 13:58:39 2022
> > modify_timestamp: Mon Jul 18 13:58:39 2022
> >
> > anybody seen something like this
>
> Hi Wesley,
>
> Could you please provide:
>
> - full "rbd map" ("rbd device map") command
>
> - "mon host = XYZ" line from ceph.conf file
>
> - "ceph mon dump" output
>
> Thanks,
>
> Ilya
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rh8 krbd mapping causes no match of type 1 in addrvec problem decoding monmap, -2

2022-07-19 Thread Wesley Dillingham
Tried with rh8/14.2.16 package version and same issue.
dmesg shows the error in email subject, stdout shows: rbd: map failed:
(110) Connection timed out

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jul 19, 2022 at 11:00 AM Wesley Dillingham 
wrote:

> I have a strange error when trying to map via krdb on a RH (alma8) release
> / kernel 4.18.0-372.13.1.el8_6.x86_64 using ceph client version 14.2.22
> (cluster is 14.2.16)
>
> the rbd map causes the following error in dmesg:
>
> [Tue Jul 19 07:45:00 2022] libceph: no match of type 1 in addrvec
> [Tue Jul 19 07:45:00 2022] libceph: problem decoding monmap, -2
>
> I am able to map this rbd to a cent7 / 3.10.0-1160.71.1.el7.x86_64 machine
> using the same client and commands.
>
> Of note, on the RH8 node I can fetch info about the rbd and list rbds in
> the pool check ceph status etc. It seems purely limited to the mapping of
> the RBD:
>
> Info about the RBD:
>
> [root@alma8rbdtest ~]# rbd --id profilerbd info
> win-rbd-test/originalrbdfromsnap
> rbd image 'originalrbdfromsnap':
> size 5 GiB in 1280 objects
> order 22 (4 MiB objects)
> snapshot_count: 0
> id: 2c5f465fa134c0
> block_name_prefix: rbd_data.2c5f465fa134c0
> format: 2
> features: layering, exclusive-lock
> op_features:
> flags:
> create_timestamp: Mon Jul 18 13:58:39 2022
> access_timestamp: Mon Jul 18 13:58:39 2022
> modify_timestamp: Mon Jul 18 13:58:39 2022
>
> anybody seen something like this
>
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] rh8 krbd mapping causes no match of type 1 in addrvec problem decoding monmap, -2

2022-07-19 Thread Wesley Dillingham
I have a strange error when trying to map via krdb on a RH (alma8) release
/ kernel 4.18.0-372.13.1.el8_6.x86_64 using ceph client version 14.2.22
(cluster is 14.2.16)

the rbd map causes the following error in dmesg:

[Tue Jul 19 07:45:00 2022] libceph: no match of type 1 in addrvec
[Tue Jul 19 07:45:00 2022] libceph: problem decoding monmap, -2

I am able to map this rbd to a cent7 / 3.10.0-1160.71.1.el7.x86_64 machine
using the same client and commands.

Of note, on the RH8 node I can fetch info about the rbd and list rbds in
the pool check ceph status etc. It seems purely limited to the mapping of
the RBD:

Info about the RBD:

[root@alma8rbdtest ~]# rbd --id profilerbd info
win-rbd-test/originalrbdfromsnap
rbd image 'originalrbdfromsnap':
size 5 GiB in 1280 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 2c5f465fa134c0
block_name_prefix: rbd_data.2c5f465fa134c0
format: 2
features: layering, exclusive-lock
op_features:
flags:
create_timestamp: Mon Jul 18 13:58:39 2022
access_timestamp: Mon Jul 18 13:58:39 2022
modify_timestamp: Mon Jul 18 13:58:39 2022

anybody seen something like this


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: PGs stuck deep-scrubbing for weeks - 16.2.9

2022-07-18 Thread Wesley Dillingham
Yes these seems consistent with what we are experiencing. We have
definitely toggled the noscrub flags in various scenarios in the recent
past. Thanks for tracking down and fixing.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Jul 15, 2022 at 10:16 PM David Orman  wrote:

> Apologies, backport link should be:
> https://github.com/ceph/ceph/pull/46845
>
> On Fri, Jul 15, 2022 at 9:14 PM David Orman  wrote:
>
>> I think you may have hit the same bug we encountered. Cory submitted a
>> fix, see if it fits what you've encountered:
>>
>> https://github.com/ceph/ceph/pull/46727 (backport to Pacific here:
>> https://github.com/ceph/ceph/pull/46877 )
>> https://tracker.ceph.com/issues/54172
>>
>> On Fri, Jul 15, 2022 at 8:52 AM Wesley Dillingham 
>> wrote:
>>
>>> We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9
>>>
>>> Another 16.2.7 -> 16.2.9
>>>
>>> Both with a multi disk (spinner block / ssd block.db) and both CephFS
>>> around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples
>>> of
>>> stuck scrubbing PGs from all of the pools.
>>>
>>> They have generally been behind on scrubbing which we attributed to
>>> simply
>>> being large disks (10TB) with a heavy write load and the OSDs just having
>>> trouble keeping up. On closer inspection it appears we have many PGs that
>>> have been lodged in a deep scrubbing state on one cluster for 2 weeks and
>>> another for 7 weeks. Wondering if others have been experiencing anything
>>> similar. The only example of PGs being stuck scrubbing I have seen in the
>>> past has been related to snaptrim PG state but we arent doing anything
>>> with
>>> snapshots in these new clusters.
>>>
>>> Granted my cluster has been warning me with "pgs not deep-scrubbed in
>>> time"
>>> and its on me for not looking more closely into why. Perhaps a separate
>>> warning of "PG Stuck Scrubbing for greater than 24 hours" or similar
>>> might
>>> be helpful to an operator.
>>>
>>> In any case I was able to get scrubs proceeding again by restarting the
>>> primary OSD daemon in the PGs which were stuck. Will monitor closely for
>>> additional stuck scrubs.
>>>
>>>
>>> Respectfully,
>>>
>>> *Wes Dillingham*
>>> w...@wesdillingham.com
>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] PGs stuck deep-scrubbing for weeks - 16.2.9

2022-07-15 Thread Wesley Dillingham
We have two clusters one 14.2.22 -> 16.2.7 -> 16.2.9

Another 16.2.7 -> 16.2.9

Both with a multi disk (spinner block / ssd block.db) and both CephFS
around 600 OSDs each with combo of rep-3 and 8+3 EC data pools. Examples of
stuck scrubbing PGs from all of the pools.

They have generally been behind on scrubbing which we attributed to simply
being large disks (10TB) with a heavy write load and the OSDs just having
trouble keeping up. On closer inspection it appears we have many PGs that
have been lodged in a deep scrubbing state on one cluster for 2 weeks and
another for 7 weeks. Wondering if others have been experiencing anything
similar. The only example of PGs being stuck scrubbing I have seen in the
past has been related to snaptrim PG state but we arent doing anything with
snapshots in these new clusters.

Granted my cluster has been warning me with "pgs not deep-scrubbed in time"
and its on me for not looking more closely into why. Perhaps a separate
warning of "PG Stuck Scrubbing for greater than 24 hours" or similar might
be helpful to an operator.

In any case I was able to get scrubs proceeding again by restarting the
primary OSD daemon in the PGs which were stuck. Will monitor closely for
additional stuck scrubs.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-15 Thread Wesley Dillingham
I have found that I can only reproduce it on clusters built initially on
pacific. My cluster which went nautilus to pacific does not reproduce the
issue. My working theory is it is related to rocksdb sharding:

https://docs.ceph.com/en/quincy/rados/configuration/bluestore-config-ref/#rocksdb-shardingOSDs
deployed in Pacific or later use RocksDB sharding by default. If Ceph is
upgraded to Pacific from a previous version, sharding is off.
To enable sharding and apply the Pacific defaults, stop an OSD and run

ceph-bluestore-tool \
  --path  \
  --sharding="m(3) p(3,0-12) O(3,0-13)=block_cache={type=binned_lru} L P" \
  reshard


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jun 14, 2022 at 11:31 AM Wesley Dillingham 
wrote:

> I have made https://tracker.ceph.com/issues/56046 regarding the issue I
> am observing.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Tue, Jun 14, 2022 at 5:32 AM Eugen Block  wrote:
>
>> I found the thread I was referring to [1]. The report was very similar
>> to yours, apparently the balancer seems to cause the "degraded"
>> messages, but the thread was not concluded. Maybe a tracker ticket
>> should be created if it doesn't already exist, I didn't find a ticket
>> related to that in a quick search.
>>
>> [1]
>>
>> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/H4L5VNQJKIDXXNY2TINEGUGOYLUTT5UL/
>>
>> Zitat von Wesley Dillingham :
>>
>> > Thanks for the reply. I believe regarding "0" vs "0.0" its the same
>> > difference. I will note its not just changing crush weights which
>> induces
>> > this situation. Introducing upmaps manually or via the balancer also
>> causes
>> > the PGs to be degraded instead of the expected remapped PG state.
>> >
>> > Respectfully,
>> >
>> > *Wes Dillingham*
>> > w...@wesdillingham.com
>> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>> >
>> >
>> > On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
>> > istvan.sz...@agoda.com> wrote:
>> >
>> >> Isn’t it the correct syntax like this?
>> >>
>> >> ceph osd crush reweight osd.1 0.0 ?
>> >>
>> >> Istvan Szabo
>> >> Senior Infrastructure Engineer
>> >> ---
>> >> Agoda Services Co., Ltd.
>> >> e: istvan.sz...@agoda.com
>> >> ---
>> >>
>> >> On 2022. Jun 14., at 0:38, Wesley Dillingham 
>> >> wrote:
>> >>
>> >> ceph osd crush reweight osd.1 0
>> >>
>> >>
>> >> --
>> >> This message is confidential and is for the sole use of the intended
>> >> recipient(s). It may also be privileged or otherwise protected by
>> copyright
>> >> or other legal rules. If you have received it by mistake please let us
>> know
>> >> by reply email and delete it from your system. It is prohibited to copy
>> >> this message or disclose its content to anyone. Any confidentiality or
>> >> privilege is not waived or lost by any mistaken delivery or
>> unauthorized
>> >> disclosure of the message. All messages sent to and from Agoda may be
>> >> monitored to ensure compliance with company policies, to protect the
>> >> company's interests and to remove potential malware. Electronic
>> messages
>> >> may be intercepted, amended, lost or deleted, or contain viruses.
>> >>
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-14 Thread Wesley Dillingham
I have made https://tracker.ceph.com/issues/56046 regarding the issue I am
observing.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Tue, Jun 14, 2022 at 5:32 AM Eugen Block  wrote:

> I found the thread I was referring to [1]. The report was very similar
> to yours, apparently the balancer seems to cause the "degraded"
> messages, but the thread was not concluded. Maybe a tracker ticket
> should be created if it doesn't already exist, I didn't find a ticket
> related to that in a quick search.
>
> [1]
>
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/H4L5VNQJKIDXXNY2TINEGUGOYLUTT5UL/
>
> Zitat von Wesley Dillingham :
>
> > Thanks for the reply. I believe regarding "0" vs "0.0" its the same
> > difference. I will note its not just changing crush weights which induces
> > this situation. Introducing upmaps manually or via the balancer also
> causes
> > the PGs to be degraded instead of the expected remapped PG state.
> >
> > Respectfully,
> >
> > *Wes Dillingham*
> > w...@wesdillingham.com
> > LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> >
> >
> > On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
> > istvan.sz...@agoda.com> wrote:
> >
> >> Isn’t it the correct syntax like this?
> >>
> >> ceph osd crush reweight osd.1 0.0 ?
> >>
> >> Istvan Szabo
> >> Senior Infrastructure Engineer
> >> ---
> >> Agoda Services Co., Ltd.
> >> e: istvan.sz...@agoda.com
> >> ---
> >>
> >> On 2022. Jun 14., at 0:38, Wesley Dillingham 
> >> wrote:
> >>
> >> ceph osd crush reweight osd.1 0
> >>
> >>
> >> --
> >> This message is confidential and is for the sole use of the intended
> >> recipient(s). It may also be privileged or otherwise protected by
> copyright
> >> or other legal rules. If you have received it by mistake please let us
> know
> >> by reply email and delete it from your system. It is prohibited to copy
> >> this message or disclose its content to anyone. Any confidentiality or
> >> privilege is not waived or lost by any mistaken delivery or unauthorized
> >> disclosure of the message. All messages sent to and from Agoda may be
> >> monitored to ensure compliance with company policies, to protect the
> >> company's interests and to remove potential malware. Electronic messages
> >> may be intercepted, amended, lost or deleted, or contain viruses.
> >>
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Wesley Dillingham
Thanks for the reply. I believe regarding "0" vs "0.0" its the same
difference. I will note its not just changing crush weights which induces
this situation. Introducing upmaps manually or via the balancer also causes
the PGs to be degraded instead of the expected remapped PG state.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, Jun 13, 2022 at 9:27 PM Szabo, Istvan (Agoda) <
istvan.sz...@agoda.com> wrote:

> Isn’t it the correct syntax like this?
>
> ceph osd crush reweight osd.1 0.0 ?
>
> Istvan Szabo
> Senior Infrastructure Engineer
> ---
> Agoda Services Co., Ltd.
> e: istvan.sz...@agoda.com
> -------
>
> On 2022. Jun 14., at 0:38, Wesley Dillingham 
> wrote:
>
> ceph osd crush reweight osd.1 0
>
>
> --
> This message is confidential and is for the sole use of the intended
> recipient(s). It may also be privileged or otherwise protected by copyright
> or other legal rules. If you have received it by mistake please let us know
> by reply email and delete it from your system. It is prohibited to copy
> this message or disclose its content to anyone. Any confidentiality or
> privilege is not waived or lost by any mistaken delivery or unauthorized
> disclosure of the message. All messages sent to and from Agoda may be
> monitored to ensure compliance with company policies, to protect the
> company's interests and to remove potential malware. Electronic messages
> may be intercepted, amended, lost or deleted, or contain viruses.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Changes to Crush Weight Causing Degraded PGs instead of Remapped

2022-06-13 Thread Wesley Dillingham
I have a brand new Cluster 16.2.9 running bluestore with 0 client activity.
I am modifying some crush weights to move PGs off of a host for testing
purposes but the result is that the PGs go into a degraded+remapped state
instead of simply a remapped state. This is a strange result to me as in
previous releases (nautilus) this would cause only Remapped PGs. Are there
any known issues around this? Are others running Pacific seeing similar
behavior? Thanks.

"ceph osd crush reweight osd.1 0"

^ Causes degraded PGs which then go into recovery. Expect only remapped PGs

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow delete speed through the s3 API

2022-06-02 Thread Wesley Dillingham
Is it just your deletes which are slow or writes and read as well?

On Thu, Jun 2, 2022, 4:09 PM J-P Methot  wrote:

> I'm following up on this as we upgraded to Pacific 16.2.9 and deletes
> are still incredibly slow. The pool rgw is using is a fairly small
> erasure coding pool set at 8 + 3. Is there anyone who's having the same
> issue?
>
> On 5/16/22 15:23, J-P Methot wrote:
> > Hi,
> >
> > First of all, a quick google search shows me that questions about the
> > s3 API slow object deletion speed have been asked before and are well
> > documented. My issue is slightly different, because I am getting
> > abysmal speeds of 11 objects/second on a full SSD ceph running Octopus
> > with about a hundred OSDs. This is much lower than the Redhat reported
> > limit of 1000 objects/second.
> >
> > I've seen elsewhere that it was a Rocksdb limitation and that it would
> > be fixed in Pacific, but the Pacific release logs do not show me
> > anything that suggest that. Furthermore, I have limited control over
> > the s3client deleting the files as it's a 3rd-party open source
> > automatic backup program.
> >
> > Could updating to Pacific fix this issue? Is there any configuration
> > change I could do to speed up object deletion?
> >
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Wesley Dillingham
pool 13 'mathfs_metadata' replicated size 2 min_size 2 crush_rule 0
object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change

The problem is you have size=2 and min_size=2 on this pool. I would
increase the size of this pool to 3 (but i would also do that to all of
your pools which are size=2) the ok-to-stop command is failing because you
would drop below min_size by stopping any osd service this pg and those pgs
would then be inactive.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Thu, May 26, 2022 at 2:22 PM Sarunas Burdulis 
wrote:

> On 5/26/22 14:09, Wesley Dillingham wrote:
> > What does "ceph osd pool ls detail" say?
>
> $ ceph osd pool ls detail
> pool 0 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 64 pgp_num 64 autoscale_mode on last_change 44740 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 1 '.rgw.root' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 44740 lfor
> 0/0/31483 owner 18446744073709551615 flags hashpspool stripe_width 0
> application rgw
> pool 2 'default.rgw.control' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31469 owner 18446744073709551615 flags hashpspool
> stripe_width 0 application rgw
> pool 3 'default.rgw.data.root' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31471 owner 18446744073709551615 flags hashpspool
> stripe_width 0 application rgw
> pool 4 'default.rgw.gc' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31471 owner 18446744073709551615 flags hashpspool
> stripe_width 0 application rgw
> pool 5 'default.rgw.log' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31387 owner 18446744073709551615 flags hashpspool
> stripe_width 0 application rgw
> pool 6 'default.rgw.users.uid' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31387 flags hashpspool stripe_width 0 application rgw
> pool 12 'mathfs_data' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/31370/31368 flags hashpspool stripe_width 0 application cephfs
> pool 13 'mathfs_metadata' replicated size 2 min_size 2 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/27164/27162 flags hashpspool stripe_width 0 application cephfs
> pool 15 'default.rgw.lc' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 44740 lfor 0/0/31374 flags hashpspool stripe_width 0 application rgw
> pool 21 'libvirt' replicated size 3 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 56244 lfor
> 0/33144/33142 flags hashpspool,selfmanaged_snaps stripe_width 0
> application rbd
> pool 36 'monthly_archive_metadata' replicated size 2 min_size 1
> crush_rule 0 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> last_change 45338 lfor 0/27845/27843 flags hashpspool stripe_width 0
> application cephfs
> pool 37 'monthly_archive_data' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 45334 lfor 0/44535/44533 flags hashpspool stripe_width 0 application cephfs
> pool 38 'device_health_metrics' replicated size 2 min_size 1 crush_rule
> 0 object_hash rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change
> 56507 flags hashpspool stripe_width 0 pg_num_min 1 application
> mgr_devicehealth
> pool 41 'lensfun_metadata' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 54066 flags hashpspool stripe_width 0 pg_autoscale_bias 4 pg_num_min 16
> recovery_priority 5 application cephfs
> pool 42 'lensfun_data' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change
> 54066 flags hashpspool stripe_width 0 application cephfs
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cluster healthy, but 16.2.7 osd daemon upgrade says its unsafe to stop them?

2022-05-26 Thread Wesley Dillingham
What does "ceph osd pool ls detail" say?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, May 26, 2022 at 11:24 AM Sarunas Burdulis <
saru...@math.dartmouth.edu> wrote:

> Running
>
> `ceph osd ok-to-stop 0`
>
> shows:
>
> {"ok_to_stop":false,"osds":[1],
> "num_ok_pgs":25,"num_not_ok_pgs":2,
> "bad_become_inactive":["13.a","13.11"],
>
> "ok_become_degraded":["0.4","0.b","0.11","0.1a","0.1e","0.3c","2.5","2.10","3.19","3.1a","4.7","4.19","4.1e","6.10","12.1","12.6","15.9","21.17","21.18","36.8","36.13","41.7","41.1b","42.6","42.1a"]}
> Error EBUSY: unsafe to stop osd(s) at this time (2 PGs are or would
> become offline)
>
> What are “bad_become_inactive” PGs?
> What can be done to make OSD into “ok-to-stop” (or override it)?
>
> `ceph -s` still reports HEALT_OK and all PGs active+clean.
>
> Upgrade to 16.2.8 still complains about non-stoppable OSDs and won't
> proceed.
>
> --
> Sarunas Burdulis
> Dartmouth Mathematics
> math.dartmouth.edu/~sarunas
>
> · https://useplaintext.email ·
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Drained OSDs are still ACTIVE_PRIMARY - casuing high IO latency on clients

2022-05-20 Thread Wesley Dillingham
This sounds similar to an inquiry I submitted a couple years ago [1]
whereby I discovered that the choose_acting function does not consider
primary affinity when choosing the primary osd. I had made the assumption
it would when developing my procedure for replacing failing disks. After
that discovery I change my process to stop the OSD daemon failing (degraded
pgs) to ensure its not participating in PG anymore. Not sure if any of the
relevant code regarding this has changed since that initial submit but what
you describe here seems similar.

 [1] https://tracker.ceph.com/issues/44400

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, May 20, 2022 at 7:53 AM Denis Polom  wrote:

> Hi
>
> I observed high latencies and mount points hanging since Octopus release
> and it's still observed on Pacific latest while draining OSD.
>
> Cluster setup:
>
> Ceph Pacific 16.2.7
>
> Cephfs with EC data pool
>
> EC profile setup:
>
> crush-device-class=
> crush-failure-domain=host
> crush-root=default
> jerasure-per-chunk-alignment=false
> k=10
> m=2
> plugin=jerasure
> technique=reed_sol_van
> w=8
>
> Description:
>
> If we have broken drive, we are removing it from Ceph cluster by
> draining it first. That means changing its crush weight to 0
>
> ceph osd crush reweight osd.1 0
>
> Normally on Nautilus it didn't affected clients. But after upgrade to
> Octopus (and since Octopus till current Pacific release) I can observe
> very high IO latencies on clients while OSD being drained (10sec and
> higher).
>
> By debugging I found out that drained OSD is still listed as
> ACTIVE_PRIMARY and that happens only on EC pools and only since Octopus.
> I tested it back on Nautilus, to be sure, where behavior is correct and
> drained OSD is not listed under UP and ACTIVE OSDs for PGs.
>
> Even if setting up primary-affinity for given OSD to 0 this doesn't have
> any effect on EC pool.
>
> Bellow are my debugs:
>
> Buggy behavior on Octopus and Pacific:
>
> Before draining osd.70:
>
> PG_STAT  OBJECTS  MISSING_ON_PRIMARY  DEGRADED  MISPLACED UNFOUND
> BYTES   OMAP_BYTES*  OMAP_KEYS*  LOG   DISK_LOG
> STATE  STATE_STAMP VERSION
> REPORTED   UP UP_PRIMARY  ACTING
> ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB
> DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0  0 0
> 89552977270   0  2449 2449
> active+clean 2022-05-19T08:41:55.241734+020019403690'275685
> 19407588:19607199[70,206,216,375,307,57]  70
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> after setting osd.70 crush weight to 0 (osd.70 is still acting primary):
>
>   UP UP_PRIMARY ACTING
> ACTING_PRIMARY  LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP SNAPTRIMQ_LEN
> 16.1fff 2269   0 0   2269 0
> 89552977270   0  2449  2449
> active+remapped+backfill_wait  2022-05-20T08:51:54.249071+0200
> 19403690'275685  19407668:19607289 [71,206,216,375,307,57]  71
> [70,206,216,375,307,57]  7019384365'275621
> 2022-05-19T08:41:55.241493+020019384365'275621
> 2022-05-19T08:41:55.241493+0200  0
> dumped pgs
>
>
> Correct behavior on Nautilus:
>
> Before draining osd.10:
>
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE STATE_STAMP
> VERSION REPORTED UP UP_PRIMARY ACTING ACTING_PRIMARY
> LAST_SCRUB SCRUB_STAMP LAST_DEEP_SCRUB DEEP_SCRUB_STAMP
> SNAPTRIMQ_LEN
> 2.4e  2  00 0   0
> 8388608   0  0   22 active+clean 2022-05-20
> 02:13:47.43210461'275:40   [10,0,7] 10   [10,0,7]
> 100'0 2022-05-20 01:44:36.217286 0'0 2022-05-20
> 01:44:36.217286 0
>
> after setting osd.10 crush weight to 0 (behavior is correct, osd.10 is
> not listed, not used):
>
>
> root@nautilus1:~# ceph pg dump pgs | head -2
> PG_STAT OBJECTS MISSING_ON_PRIMARY DEGRADED MISPLACED UNFOUND BYTES
> OMAP_BYTES* OMAP_KEYS* LOG DISK_LOG STATE
> STATE_STAMPVERSION REPORTED UP UP_PRIMARY
> ACTING ACTING_PRIMARY LAST_SCRUB SCRUB_STAMP
> LAST_DEEP_SCRUB DEEP_SCRUB_STAMP   SNAPTRIMQ_LEN
> 2.4e 14  00 0   0
> 58720256   0  0  18   18 active+clean 2022-05-20
> 02:18:59.414812   75'1880:43 [22,0,7] 22
> [22,0,7] 220'0 2022-05-20
> 01:44:36.217286 0'0 2022-05-20 01:44:36.217286 0
>
>
> Now question is if is it some implemented feature?
>
> Or is it a bug?
>
> Thank you!
>
> ___

[ceph-users] Re: Trouble getting cephadm to deploy iSCSI gateway

2022-05-17 Thread Wesley Dillingham
Well I dont use either the dashboard or the cephadm/containerized
deployment but do use ceph-iscsi. The fact that your two gateways are not
"up" might indicate that they havent been added to the target IQN yet. Once
you can get into gwcli and create an iqn and associate your gateways with
it, I guess they will report as "up"

On the filesystem ceph-iscsi uses /etc/ceph/iscsi-gateway.cfg to store
things lik the api password / trusted_ip_list etc. Maybe check this in the
container and see if it is being configured correctly.

Are you running gwcli on the server where the iscsi gateway is running? It
wants to connect to the rbd-target-api on port 5000 via localhost.

The error you reported "Unable to access the configuration object" also
sounds like a cephx inability to access the configuration rados object at
rbd/gateway.conf By default it would want to use the admin keyring unless
otherwise specified in the .cfg with the gateway_keyring option.

Hope that helps in some way.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, May 17, 2022 at 1:45 PM Erik Andersen  wrote:

> I am attempting to set up a 3 node Ceph cluster using Ubuntu server
> 22.04LTS, and the Cehpadm deployment tool.
>
> 3 times I've succeeded in setting up ceph itself, getting the cluster
> healthy, and OSDs all set up. The nodes (all monitors) are at
> 192.168.122.3, 192.168.122.4, and 192.168.122.5. All nodes have a second
> "backend" network on separate interface in the 10.0.0.3-10.0.0.5 range.
>
> I then create a RBD pool called "rbd".
>
> All is healthy with the cluster per the dashboard up to this point.
>
> I then try to set up iSCSI gateways on 192.168.122.3 and 192.168.122.5,
> following these directions:
> https://docs.ceph.com/en/pacific/cephadm/services/iscsi/
>
> That means doing `cephadm shell`, getting the `iscsi.yaml` file into the
> docker container (with echo since there seems to be no text editors
> available) and then running their recommended deployment command of `ceph
> orch apply -i iscsi.yaml`. The yaml file has in it:
>
>  service_type: iscsi
>  service_id: iscsi
>  placement:
>hosts:
>  - ceph1
>  - ceph3
>  spec:
>pool: rbd  # RADOS pool where ceph-iscsi config data is stored.
>trusted_ip_list:
> "192.168.122.3,192.168.122.5,10.0.0.3,10.0.0.5,192.168.122.4,10.0.0.4"
>
> I then get in the dashboard status page there there are 2 iSCSI gateways
> configured, but down.
> https://i.stack.imgur.com/wr619.png
>
> In services, it shows that the services are running:
>
> https://i.stack.imgur.com/PwSik.png
>
>
>   On the iSCSI gateways page it shows this:
>
> (Ceph dashboard iSCSI gateway page showing both gateways down)
> https://i.stack.imgur.com/Se2Mv.png
>
>
> Looking on one of the node's containers, it does look like cephadm
> started/deployed containers for this (apologies in advance for the horrible
> email formatting of a console table - look at the first two):
>
>  root@ceph1:~# docker ps
>  CONTAINER ID   IMAGE COMMAND
> CREATED STATUS PORTS NAMES
>  cefaf78b98ee   quay.ceph.io/ceph-ci/ceph
>  "/usr/bin/rbd-target…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-iscsi-iscsi-ceph1-alnale
>  b405b321bd6a   quay.ceph.io/ceph-ci/ceph
>  "/usr/bin/tcmu-runner"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-iscsi-iscsi-ceph1-alnale-tcmu
>  a05af7ac9609   quay.io/prometheus/prometheus:v2.33.4
>  "/bin/prometheus --c…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-prometheus-ceph1
>  4699606a7878   quay.io/prometheus/alertmanager:v0.23.0
>  "/bin/alertmanager -…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-alertmanager-ceph1
>  103abafd0c19   quay.ceph.io/ceph-ci/ceph
>  "/usr/bin/ceph-osd -…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-osd-2
>  adcad13a1dcb   quay.ceph.io/ceph-ci/ceph
>  "/usr/bin/ceph-osd -…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-osd-0
>  9626b0794794   quay.io/ceph/ceph-grafana:8.3.5   "/bin/sh -c
> 'grafana…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-grafana-ceph1
>  9a717edbf83f   quay.io/prometheus/node-exporter:v1.3.1
>  "/bin/node_exporter …"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-node-exporter-ceph1
>  c1c52d37baf1   quay.ceph.io/ceph-ci/ceph
>  "/usr/bin/ceph-crash…"   About an hour ago   Up About an hour
>  ceph-9f724dc4-d2de-11ec-b7be-8f11f39bf88a-crash-ceph1
>  f6b2c9fef7e9   quay.ceph.io/ceph-ci/ceph:master
> "/usr/bin/ceph-mgr -…"   About an hour ago   Up About an hour
>  

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-17 Thread Wesley Dillingham
What was the largest cluster that you upgraded that didn't exhibit the new
issue in 16.2.8 ? Thanks.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, May 17, 2022 at 10:24 AM David Orman  wrote:

> We had an issue with our original fix in 45963 which was resolved in
> https://github.com/ceph/ceph/pull/46096. It includes the fix as well as
> handling for upgraded clusters. This is in the 16.2.8 release. I'm not sure
> if it will resolve your problem (or help mitigate it) but it would be worth
> trying.
>
> Head's up on 16.2.8 though, see the release thread, we ran into an issue
> with it on our larger clusters: https://tracker.ceph.com/issues/55687
>
> On Tue, May 17, 2022 at 3:44 AM BEAUDICHON Hubert (Acoss) <
> hubert.beaudic...@acoss.fr> wrote:
>
> > Hi Josh,
> >
> > I'm working with Stéphane and I'm the "ceph admin" (big words ^^) in our
> > team.
> > So, yes, as part of the upgrade we've done the offline repair to split
> the
> > omap by pool.
> > The quick fix is, as far as I know, still disable on the default
> > properties.
> >
> > On the I/O and CPU load, between Nautilus and Pacific, we haven't seen a
> > really big change, just an increase in disk latency and in the end, the
> > "ceph read operation" metric drop from 20K to 5K or less.
> >
> > But yes, a lot of slow IOPs were emerging as time passed.
> >
> > At this time, we have completely out one of our data node, and recreate
> > from scratch 5 of 8 OSD deamons (DB on SSD, data on spinning drive).
> > The result seems very good at this moment (we're seeing better metrics
> > than under Nautilus).
> >
> > Since recreation, I have change 3 parameters :
> > bdev_async_discard => osd : true
> > bdev_enable_discard => osd : true
> > bdev_aio_max_queue_depth => osd: 8192
> >
> > The first two have been extremely helpful for our SSD Pool, even with
> > enterprise grade SSD, the "trim" seems to have rejuvenate our pool.
> > The last one was set in response of messages in the newly create OSD :
> > "bdev(0x55588e220400 ) aio_submit retries XX"
> > After changing it and restarting the OSD process, messages were gone, and
> > it seems to have a beneficial effect on our data node.
> >
> > I've seen that the 16.2.8 was out yesterday, but I'm a little confused
> on :
> > [Revert] bluestore: set upper and lower bounds on rocksdb omap iterators
> > (pr#46092, Neha Ojha)
> > bluestore: set upper and lower bounds on rocksdb omap iterators
> (pr#45963,
> > Cory Snyder)
> >
> > (theses two lines seems related to https://tracker.ceph.com/issues/55324
> ).
> >
> > One step forward, one step backward ?
> >
> > Hubert Beaudichon
> >
> >
> > -Message d'origine-
> > De : Josh Baergen 
> > Envoyé : lundi 16 mai 2022 16:56
> > À : stéphane chalansonnet 
> > Cc : ceph-users@ceph.io
> > Objet : [ceph-users] Re: Migration Nautilus to Pacifi : Very high
> > latencies (EC profile)
> >
> > Hi Stéphane,
> >
> > On Sat, May 14, 2022 at 4:27 AM stéphane chalansonnet <
> schal...@gmail.com>
> > wrote:
> > > After a successful update from Nautilus to Pacific on Centos8.5, we
> > > observed some high latencies on our cluster.
> >
> > As a part of this upgrade, did you also migrate the OSDs to sharded
> > rocksdb column families? This would have been done by setting bluestore's
> > "quick fix on mount" setting to true or by issuing a "ceph-bluestore-tool
> > repair" offline, perhaps in response to a BLUESTORE_NO_PER_POOL_OMAP
> > warning post-upgrade.
> >
> > I ask because I'm wondering if you're hitting
> > https://tracker.ceph.com/issues/55324, for which there is a fix coming
> in
> > 16.2.8. If you inspect the nodes and disks involved in your EC pool, are
> > you seeing high read or write I/O? High CPU usage?
> >
> > Josh
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an
> > email to ceph-users-le...@ceph.io
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-16 Thread Wesley Dillingham
In our case it appears that file deletes have a very high impact on osd
operations. Not a significant delete either ~20T on a 1PB utilized
filesystem (large files as well).

We are trying to tune down cephfs delayed deletes via:
"mds_max_purge_ops": "512",
"mds_max_purge_ops_per_pg": "0.10",

with some success but still experimenting with how we can reduce the
throughput impact from osd slow ops.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Mon, May 16, 2022 at 9:49 AM Wesley Dillingham 
wrote:

> We have a newly-built pacific (16.2.7) cluster running 8+3 EC jerasure
> ~250 OSDS across 21 hosts which has significantly lower than expected IOPS.
> Only doing about 30 IOPS per spinning disk (with appropriately sized SSD
> bluestore db) around ~100 PGs per OSD. Have around 100 CephFS (ceph fuse
> 16.2.7) clients using the cluster. Cluster regularly reports slow ops from
> the OSDs but the vast majority, 90% plus of the OSDs, are only <50% IOPS
> utilized. Plenty of cpu/ram/network left on all cluster nodes. We have
> looked for hardware (disk/bond/network/mce) issues across the cluster with
> no findings / checked send-qs and received-q's across the cluster to try
> and narrow in on an individual failing component but nothing found there.
> Slow ops are also spread equally across the servers in the cluster. Does
> your cluster report any health warnings (slow ops etc) alongside your
> reduced performance?
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Mon, May 16, 2022 at 2:00 AM Martin Verges 
> wrote:
>
>> Hello,
>>
>> depending on your workload, drives and OSD allocation size, using the 3+2
>> can be way slower than the 4+2. Maybe give it a small benchmark and try if
>> you see a huge difference. We had some benchmarks with such and they
>> showed
>> quite ugly results in some tests. Best way to deploy EC in our findings is
>> in power of 2, like 2+x, 4+x, 8+x, 16+x. Especially when you deploy OSDs
>> before the Ceph allocation change patch, you might end up consuming way
>> more space if you don't use power of 2. With the 4k allocation size at
>> least this has been greatly improved for newer deployed OSDs.
>>
>> --
>> Martin Verges
>> Managing director
>>
>> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>>
>> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> CEO: Martin Verges - VAT-ID: DE310638492
>> Com. register: Amtsgericht Munich HRB 231263
>> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>>
>>
>> On Sun, 15 May 2022 at 20:30, stéphane chalansonnet 
>> wrote:
>>
>> > Hi,
>> >
>> > Thank you for your answer.
>> > this is not a good news if you also notice a performance decrease on
>> your
>> > side
>> > No, as far as we know, you cannot downgrade to Octopus.
>> > Going forward seems to be the only way, so Quincy .
>> > We have a a qualification cluster so we can try on it (but full virtual
>> > configuration)
>> >
>> >
>> > We are using 4+2 and 3+2 profile
>> > Are you also on the same profile on your Cluster ?
>> > Maybe replicated profile are not be impacted ?
>> >
>> > Actually, we are trying to recreate one by one the OSD.
>> > some parameters can be only set by this way .
>> > The first storage Node is almost rebuild, we will see if the latencies
>> on
>> > it are below the others ...
>> >
>> > Wait and see .
>> >
>> > Le dim. 15 mai 2022 à 10:16, Martin Verges  a
>> > écrit :
>> >
>> >> Hello,
>> >>
>> >> what exact EC level do you use?
>> >>
>> >> I can confirm, that our internal data shows a performance drop when
>> using
>> >> pacific. So far Octopus is faster and better than pacific but I doubt
>> you
>> >> can roll back to it. We haven't rerun our benchmarks on Quincy yet, but
>> >> according to some presentation it should be faster than pacific. Maybe
>> try
>> >> to jump away from the pacific release into the unknown!
>> >>
>> >> --
>> >> Martin Verges
>> >> Managing director
>> >>
>> >> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>> >>
>> >> croit GmbH, Freseniusstr. 31h, 81247 Munich
>> >> CEO: Martin Verges - VAT-ID: DE310638492

[ceph-users] Re: Migration Nautilus to Pacifi : Very high latencies (EC profile)

2022-05-16 Thread Wesley Dillingham
We have a newly-built pacific (16.2.7) cluster running 8+3 EC jerasure ~250
OSDS across 21 hosts which has significantly lower than expected IOPS. Only
doing about 30 IOPS per spinning disk (with appropriately sized SSD
bluestore db) around ~100 PGs per OSD. Have around 100 CephFS (ceph fuse
16.2.7) clients using the cluster. Cluster regularly reports slow ops from
the OSDs but the vast majority, 90% plus of the OSDs, are only <50% IOPS
utilized. Plenty of cpu/ram/network left on all cluster nodes. We have
looked for hardware (disk/bond/network/mce) issues across the cluster with
no findings / checked send-qs and received-q's across the cluster to try
and narrow in on an individual failing component but nothing found there.
Slow ops are also spread equally across the servers in the cluster. Does
your cluster report any health warnings (slow ops etc) alongside your
reduced performance?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, May 16, 2022 at 2:00 AM Martin Verges 
wrote:

> Hello,
>
> depending on your workload, drives and OSD allocation size, using the 3+2
> can be way slower than the 4+2. Maybe give it a small benchmark and try if
> you see a huge difference. We had some benchmarks with such and they showed
> quite ugly results in some tests. Best way to deploy EC in our findings is
> in power of 2, like 2+x, 4+x, 8+x, 16+x. Especially when you deploy OSDs
> before the Ceph allocation change patch, you might end up consuming way
> more space if you don't use power of 2. With the 4k allocation size at
> least this has been greatly improved for newer deployed OSDs.
>
> --
> Martin Verges
> Managing director
>
> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
>
> croit GmbH, Freseniusstr. 31h, 81247 Munich
> CEO: Martin Verges - VAT-ID: DE310638492
> Com. register: Amtsgericht Munich HRB 231263
> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
>
>
> On Sun, 15 May 2022 at 20:30, stéphane chalansonnet 
> wrote:
>
> > Hi,
> >
> > Thank you for your answer.
> > this is not a good news if you also notice a performance decrease on your
> > side
> > No, as far as we know, you cannot downgrade to Octopus.
> > Going forward seems to be the only way, so Quincy .
> > We have a a qualification cluster so we can try on it (but full virtual
> > configuration)
> >
> >
> > We are using 4+2 and 3+2 profile
> > Are you also on the same profile on your Cluster ?
> > Maybe replicated profile are not be impacted ?
> >
> > Actually, we are trying to recreate one by one the OSD.
> > some parameters can be only set by this way .
> > The first storage Node is almost rebuild, we will see if the latencies on
> > it are below the others ...
> >
> > Wait and see .
> >
> > Le dim. 15 mai 2022 à 10:16, Martin Verges  a
> > écrit :
> >
> >> Hello,
> >>
> >> what exact EC level do you use?
> >>
> >> I can confirm, that our internal data shows a performance drop when
> using
> >> pacific. So far Octopus is faster and better than pacific but I doubt
> you
> >> can roll back to it. We haven't rerun our benchmarks on Quincy yet, but
> >> according to some presentation it should be faster than pacific. Maybe
> try
> >> to jump away from the pacific release into the unknown!
> >>
> >> --
> >> Martin Verges
> >> Managing director
> >>
> >> Mobile: +49 174 9335695  | Chat: https://t.me/MartinVerges
> >>
> >> croit GmbH, Freseniusstr. 31h, 81247 Munich
> >> CEO: Martin Verges - VAT-ID: DE310638492
> >> Com. register: Amtsgericht Munich HRB 231263
> >> Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx
> >>
> >>
> >> On Sat, 14 May 2022 at 12:27, stéphane chalansonnet  >
> >> wrote:
> >>
> >>> Hello,
> >>>
> >>> After a successful update from Nautilus to Pacific on Centos8.5, we
> >>> observed some high latencies on our cluster.
> >>>
> >>> We did not find very much thing on community related to latencies post
> >>> migration
> >>>
> >>> Our setup is
> >>> 6x storage Node (256GRAM, 2SSD OSD + 5*6To SATA HDD)
> >>> Erasure coding profile
> >>> We have two EC pool :
> >>> -> Pool1 : Full HDD SAS Drive 6To
> >>> -> Pool2 : Full SSD Drive
> >>>
> >>> Object S3 and RBD block workload
> >>>
> >>> Our performances in nautilus, before the upgrade , are acceptable.
> >>> However , the next day , performance dropped by 3 or 4
> >>> Benchmark showed 15KIOPS on flash drive , before upgrade we had
> >>> almost 80KIOPS
> >>> Also, HDD pool is almost down (too much lantencies
> >>>
> >>> We suspected , maybe, an impact on erasure Coding configuration on
> >>> Pacific
> >>> Anyone observed the same behaviour ? any tuning ?
> >>>
> >>> Thank you for your help.
> >>>
> >>> ceph osd tree
> >>> ID   CLASS  WEIGHT TYPE NAME STATUS  REWEIGHT
> >>> PRI-AFF
> >>>  -1 347.61304  root default
> >>>  -3  56.71570  host cnp31tcephosd01
> >>>   0hdd5.63399  osd.0 up   1.0
> >>> 1.0
> >>>   1

[ceph-users] Re: Erasure-coded PG stuck in the failed_repair state

2022-05-10 Thread Wesley Dillingham
In my experience:

"No scrub information available for pg 11.2b5
error 2: (2) No such file or directory"

is the output you get from the command when the up or acting osd set has
changed since the last deep-scrub. Have you tried to run a deep scrub (ceph
pg deep-scrub 11.2b5) on the pg and then try "rados list-inconsistent-obj
11.2b5" again. I do recognize that part of the pg repair also performs a
deep-scrub but perhaps the deep-scrub alone will help with your attempt to
run rados list-inconsistent-obj.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Tue, May 10, 2022 at 8:52 AM Robert Appleyard - STFC UKRI <
rob.appley...@stfc.ac.uk> wrote:

> Hi,
>
> We've got an outstanding issue with one of our Ceph clusters here at RAL.
> The cluster is 'Echo', our 40PB cluster. We found an object from an 8+3EC
> RGW pool in the failed_repair state. We aren't sure how the object got into
> this state, but it doesn't appear to be a case of correlated drive failure
> (the rest of the PG is fine). However, the detail of how we got into this
> state isn't our focus, it's how to get the PG back to a clean state.
>
> The object (for our purposes, named OBJNAME) in question is from a RadosGW
> data pool. It presented initially as a PG in the failed_repair state.
> Repeated attempts to get the PG to repair failed. At this point we
> contacted the user who owns the data, and determined that the data in
> question was also stored elsewhere and so we could safely delete the
> object. We did that using radosgw-admin object rm OBJNAME, and confirmed
> that the object is gone with various approaches (radosgw-admin object stat,
> rados ls --pgid PGID | grep OBJNAME).
>
> So far, so good. Except, even after the object was deleted and in spite of
> many instructions to repair, the placement group is still in the state
> active+clean+inconsistent+failed_repair, and the cluster won't go to
> HEALTH_OK. Here's what the log from one of these repair attempts looks like
> (from the log on the primary OSD).
>
> 2022-05-08 16:23:43.898 7f79d3872700  0 log_channel(cluster) log [DBG] :
> 11.2b5 repair starts
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 1899(8) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 1911(7) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 2842(10) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 3256(6) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 3399(5) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 3770(9) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 5206(3) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 shard 6047(4) soid 11:ad45a433:::OBJNAME:head : candidate had an ec
> size mismatch
> 2022-05-08 16:51:38.807 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 soid 11:ad45a433:::OBJNAME:head : failed to pick suitable object info
> 2022-05-08 19:03:12.690 7f79d3872700 -1 log_channel(cluster) log [ERR] :
> 11.2b5 repair 11 errors, 0 fixed
>
> Looking for inconsistent objects in the PG doesn't report anything odd
> about this object (right now we get this rather odd output, but aren't sure
> that this isn't a red herring).
>
> [root@ceph-adm1 ~]# rados list-inconsistent-obj 11.2b5
> No scrub information available for pg 11.2b5
> error 2: (2) No such file or directory
>
> We don't get this output from this command on any other PG that we've
> tried.
>
> So what next? To reiterate, this isn't about data recovery, it's about
> getting the cluster back to a healthy state. I should also note that this
> issue doesn't seem to be impacting the cluster beyond making that PG show
> as being in a bad state.
>
> Rob Appleyard
>
>
> This email and any attachments are intended solely for the use of the
> named recipients. If you are not the intended recipient you must not use,
> disclose, copy or distribute this email or any of its attachments and
> should notify the sender immediately and delete this email from your
> system. UK Research and Innovation (UKRI) has taken every reasonable
> precaution to minimise risk of this email or any attachments containing
> viruses or malware but the recipient should carry out its own virus and
> malware 

[ceph-users] Aggressive Bluestore Compression Mode for client data only?

2022-04-18 Thread Wesley Dillingham
I would like to use bluestore compression (probably zstd level 3) to
compress my clients data unless the incompressible hint is set (aggressive
mode) but I do no want to expose myself to the bug experienced in this Cern
talk (Ceph bug of the year) https://www.youtube.com/watch?v=_4HUR00oCGo
where the osd maps are compressed, even though I realize the bug is fixed
in lz4 (and i'm probably not going to use lz4).

So my question is if the compression mode is aggressive but is set on a per
pool basis (compression_mode vs bluestore_compression_mode) is ceph only
attempting to compress the client data and not "bluestore/cluster internal
data" like osd maps etc. Thanks.


Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Is it normal Ceph reports "Degraded data redundancy" in normal use?

2022-04-18 Thread Wesley Dillingham
If you mark an osd "out" but not down / you dont stop the daemon do the PGs
go remapped or do they go degraded then as well?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Apr 14, 2022 at 5:15 AM Kai Stian Olstad 
wrote:

> On 29.03.2022 14:56, Sandor Zeestraten wrote:
> > I was wondering if you ever found out anything more about this issue.
>
> Unfortunately no, so I turned it off.
>
>
> > I am running into similar degradation issues while running rados bench
> > on a
> > new 16.2.6 cluster.
> > In our case it's with a replicated pool, but the degradation problems
> > also
> > go away when we turn off the balancer.
>
> So this goes a long way of confirming there are something wrong with the
> balancer since we now see it on two different installation.
>
>
> --
> Kai Stian Olstad
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Experience reducing size 3 to 2 on production cluster?

2021-12-10 Thread Wesley Dillingham
I would avoid doing this. Size 2 is not where you want to be. Maybe you can
give more details about your cluster size and shape and what you are trying
to accomplish and another solution could be proposed. The contents of "ceph
osd tree " and "ceph df" would help.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Fri, Dec 10, 2021 at 12:05 PM Marco Pizzolo 
wrote:

> Hello,
>
> As part of a migration process where we will be swinging Ceph hosts from
> one cluster to another we need to reduce the size from 3 to 2 in order to
> shrink the footprint sufficiently to allow safe removal of an OSD/Mon node.
>
> The cluster has about 500M objects as per dashboard, and is about 1.5PB in
> size comprised solely of small files served through CephFS to Samba.
>
> Has anyone encountered a similar situation?  What (if any) problems did you
> face?
>
> Ceph 14.2.22 bare metal deployment on Centos.
>
> Thanks in advance.
>
> Marco
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-19 Thread Wesley Dillingham
You may also be able to use an upmap (or the upmap balancer) to help make
room for you on the osd which is too full.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Nov 19, 2021 at 1:14 PM Wesley Dillingham 
wrote:

> Okay, now I see your attachment, the pg is in state:
>
> "state":
> "active+undersized+degraded+remapped+inconsistent+backfill_toofull",
>
> The reason it cant scrub or repair is that its degraded and further it
> seems that the cluster doesnt have the space to make that recovery happen
> "backfill_toofull" state. This may clear on its own as other pgs recover
> and this pg is ultimately able to recover. Other options are to remove data
> or add capacity. How full is your cluster? Is your cluster currently
> backfilling actively.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Fri, Nov 19, 2021 at 10:57 AM J-P Methot 
> wrote:
>
>> We have stopped deepscrubbing a while ago. However, forcing a deepscrub
>> by doing "ceph pg deep-scrub 6.180" doesn't do anything. The deepscrub
>> doesn't run at all. Could the deepscrubbing process be stuck elsewhere?
>> On 11/18/21 3:29 PM, Wesley Dillingham wrote:
>>
>> That response is typically indicative of a pg whose OSD sets has changed
>> since it was last scrubbed (typically from a disk failing).
>>
>> Are you sure its actually getting scrubbed when you issue the scrub? For
>> example you can issue: "ceph pg  query"  and look for
>> "last_deep_scrub_stamp" which will tell you when it was last deep
>> scrubbed.
>>
>> Further, in sufficiently recent versions of Ceph (introduced in
>> 14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
>> flight deep-scrubs to stop immediately. You may have a scheduling issue
>> where you deep-scrub or repairs arent getting scheduled.
>>
>> Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
>> current deep-scrubs to complete then try and manually re-issue the deep
>> scrub "ceph pg deep-scrub " at this point your scrub should start
>> near immediately and "rados
>> list-inconsistent-obj 6.180 --format=json-pretty" should return with
>> something of value.
>>
>> Respectfully,
>>
>> *Wes Dillingham*
>> w...@wesdillingham.com
>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>
>>
>> On Thu, Nov 18, 2021 at 2:38 PM J-P Methot 
>> wrote:
>>
>>> Hi,
>>>
>>> We currently have a PG stuck in an inconsistent state on an erasure
>>> coded pool. The pool's K and M values are 33 and 3.  The command rados
>>> list-inconsistent-obj 6.180 --format=json-pretty results in the
>>> following error:
>>>
>>> No scrub information available for pg 6.180 error 2: (2) No such file or
>>> directory
>>>
>>> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
>>> 6.180 doesn't seem to do anything. Is there a known bug explaining this
>>> behavior? I am attaching informations regarding the PG in question.
>>>
>>> --
>>> Jean-Philippe Méthot
>>> Senior Openstack system administrator
>>> Administrateur système Openstack sénior
>>> PlanetHoster inc.
>>>
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>> --
>> Jean-Philippe Méthot
>> Senior Openstack system administrator
>> Administrateur système Openstack sénior
>> PlanetHoster inc.
>>
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-19 Thread Wesley Dillingham
Okay, now I see your attachment, the pg is in state:

"state":
"active+undersized+degraded+remapped+inconsistent+backfill_toofull",

The reason it cant scrub or repair is that its degraded and further it
seems that the cluster doesnt have the space to make that recovery happen
"backfill_toofull" state. This may clear on its own as other pgs recover
and this pg is ultimately able to recover. Other options are to remove data
or add capacity. How full is your cluster? Is your cluster currently
backfilling actively.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Fri, Nov 19, 2021 at 10:57 AM J-P Methot 
wrote:

> We have stopped deepscrubbing a while ago. However, forcing a deepscrub by
> doing "ceph pg deep-scrub 6.180" doesn't do anything. The deepscrub doesn't
> run at all. Could the deepscrubbing process be stuck elsewhere?
> On 11/18/21 3:29 PM, Wesley Dillingham wrote:
>
> That response is typically indicative of a pg whose OSD sets has changed
> since it was last scrubbed (typically from a disk failing).
>
> Are you sure its actually getting scrubbed when you issue the scrub? For
> example you can issue: "ceph pg  query"  and look for
> "last_deep_scrub_stamp" which will tell you when it was last deep
> scrubbed.
>
> Further, in sufficiently recent versions of Ceph (introduced in
> 14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
> flight deep-scrubs to stop immediately. You may have a scheduling issue
> where you deep-scrub or repairs arent getting scheduled.
>
> Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
> current deep-scrubs to complete then try and manually re-issue the deep
> scrub "ceph pg deep-scrub " at this point your scrub should start
> near immediately and "rados
> list-inconsistent-obj 6.180 --format=json-pretty" should return with
> something of value.
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Thu, Nov 18, 2021 at 2:38 PM J-P Methot 
> wrote:
>
>> Hi,
>>
>> We currently have a PG stuck in an inconsistent state on an erasure
>> coded pool. The pool's K and M values are 33 and 3.  The command rados
>> list-inconsistent-obj 6.180 --format=json-pretty results in the
>> following error:
>>
>> No scrub information available for pg 6.180 error 2: (2) No such file or
>> directory
>>
>> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
>> 6.180 doesn't seem to do anything. Is there a known bug explaining this
>> behavior? I am attaching informations regarding the PG in question.
>>
>> --
>> Jean-Philippe Méthot
>> Senior Openstack system administrator
>> Administrateur système Openstack sénior
>> PlanetHoster inc.
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-18 Thread Wesley Dillingham
That response is typically indicative of a pg whose OSD sets has changed
since it was last scrubbed (typically from a disk failing).

Are you sure its actually getting scrubbed when you issue the scrub? For
example you can issue: "ceph pg  query"  and look for
"last_deep_scrub_stamp" which will tell you when it was last deep scrubbed.

Further, in sufficiently recent versions of Ceph (introduced in
14.2.something iirc) setting the flag "nodeep-scrub" will cause all in
flight deep-scrubs to stop immediately. You may have a scheduling issue
where you deep-scrub or repairs arent getting scheduled.

Set the nodeep-scrub flag: "ceph osd set nodeep-scrub" and wait for all
current deep-scrubs to complete then try and manually re-issue the deep
scrub "ceph pg deep-scrub " at this point your scrub should start
near immediately and "rados
list-inconsistent-obj 6.180 --format=json-pretty" should return with
something of value.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Thu, Nov 18, 2021 at 2:38 PM J-P Methot 
wrote:

> Hi,
>
> We currently have a PG stuck in an inconsistent state on an erasure
> coded pool. The pool's K and M values are 33 and 3.  The command rados
> list-inconsistent-obj 6.180 --format=json-pretty results in the
> following error:
>
> No scrub information available for pg 6.180 error 2: (2) No such file or
> directory
>
> Forcing a deep scrub of the pg does not fix this. Doing a ceph pg repair
> 6.180 doesn't seem to do anything. Is there a known bug explaining this
> behavior? I am attaching informations regarding the PG in question.
>
> --
> Jean-Philippe Méthot
> Senior Openstack system administrator
> Administrateur système Openstack sénior
> PlanetHoster inc.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd continously fails

2021-08-12 Thread Wesley Dillingham
Can you send the results of "ceph daemon osd.0 status" and maybe do that
for a couple of osd ids ? You may need to target ones which are currently
running.

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Aug 11, 2021 at 9:51 AM Amudhan P  wrote:

> Hi,
>
> Below are the logs in one of the failed OSD.
>
> Aug 11 16:55:48 bash[27152]: debug-20> 2021-08-11T11:25:47.433+
> 7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
> src has [
> Aug 11 16:55:48 bash[27152]: debug-19> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-18> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-17> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-16> 2021-08-11T11:25:47.433+
> 7fbf32006700  5 osd.12 pg_epoch: 6697 pg[2.14b( v 6312'183564
> (4460'174466,6312'18356
> Aug 11 16:55:48 bash[27152]: debug-15> 2021-08-11T11:25:47.441+
> 7fbf3b819700  3 osd.12 6697 handle_osd_map epochs [6696,6697], i have 6697,
> src has [
> Aug 11 16:55:48 bash[27152]: debug-14> 2021-08-11T11:25:47.561+
> 7fbf3a817700  2 osd.12 6697 ms_handle_refused con 0x563b53a3cc00 session
> 0x563b51aecb
> Aug 11 16:55:48 bash[27152]: debug-13> 2021-08-11T11:25:47.561+
> 7fbf3a817700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
> 10.0.103.2:3300/
> Aug 11 16:55:48 bash[27152]: debug-12> 2021-08-11T11:25:47.565+
> 7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66226000 session 0
> Aug 11 16:55:48 bash[27152]: debug-11> 2021-08-11T11:25:47.581+
> 7fbf3b819700  2 osd.12 6697 ms_handle_refused con 0x563b66227c00 session 0
> Aug 11 16:55:48 bash[27152]: debug-10> 2021-08-11T11:25:47.581+
> 7fbf4e0ae700 10 monclient: get_auth_request con 0x563b53a4f400 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -9> 2021-08-11T11:25:47.581+
> 7fbf39815700  2 osd.12 6697 ms_handle_refused con 0x563b53a3c800 session
> 0x563b679120
> Aug 11 16:55:48 bash[27152]: debug -8> 2021-08-11T11:25:47.581+
> 7fbf39815700 10 monclient: _send_mon_message to mon.strg-node2 at v2:
> 10.0.103.2:3300/
> Aug 11 16:55:48 bash[27152]: debug -7> 2021-08-11T11:25:47.581+
> 7fbf4f0b0700 10 monclient: get_auth_request con 0x563b6331d000 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -6> 2021-08-11T11:25:47.581+
> 7fbf4e8af700 10 monclient: get_auth_request con 0x563b53a4f000 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -5> 2021-08-11T11:25:47.717+
> 7fbf4f0b0700 10 monclient: get_auth_request con 0x563b66226c00 auth_method
> 0
> Aug 11 16:55:48 bash[27152]: debug -4> 2021-08-11T11:25:47.789+
> 7fbf43623700  5 prioritycache tune_memory target: 1073741824 mapped:
> 388874240 unmap
> Aug 11 16:55:48 bash[27152]: debug -3> 2021-08-11T11:25:47.925+
> 7fbf32807700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_
> Aug 11 16:55:48 bash[27152]:
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos8/DIST/centos8/MACHINE_SIZ
> Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
> (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
> Aug 11 16:55:48 bash[27152]:  1: (ceph::__ceph_assert_fail(char const*,
> char const*, int, char const*)+0x158) [0x563b46835dbe]
> Aug 11 16:55:48 bash[27152]:  2: (()+0x504fd8) [0x563b46835fd8]
> Aug 11 16:55:48 bash[27152]:  3: (OSD::do_recovery(PG*, unsigned int,
> unsigned long, ThreadPool::TPHandle&)+0x5f5) [0x563b46918c25]
> Aug 11 16:55:48 bash[27152]:  4:
> (ceph::osd::scheduler::PGRecovery::run(OSD*, OSDShard*,
> boost::intrusive_ptr&, ThreadPool::TPHandle&)+0x1d) [0x563b46b74
> Aug 11 16:55:48 bash[27152]:  5: (OSD::ShardedOpWQ::_process(unsigned int,
> ceph::heartbeat_handle_d*)+0x12ef) [0x563b469364df]
> Aug 11 16:55:48 bash[27152]:  6:
> (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x5c4)
> [0x563b46f6f224]
> Aug 11 16:55:48 bash[27152]:  7:
> (ShardedThreadPool::WorkThreadSharded::entry()+0x14) [0x563b46f71e84]
> Aug 11 16:55:48 bash[27152]:  8: (()+0x82de) [0x7fbf528952de]
> Aug 11 16:55:48 bash[27152]:  9: (clone()+0x43) [0x7fbf515cce83]
> Aug 11 16:55:48 bash[27152]: debug -2> 2021-08-11T11:25:47.929+
> 7fbf32807700 -1 *** Caught signal (Aborted) **
> Aug 11 16:55:48 bash[27152]:  in thread 7fbf32807700 thread_name:tp_osd_tp
> Aug 11 16:55:48 bash[27152]:  ceph version 15.2.7
> (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable)
> Aug 11 16:55:48 bash[27152]:  1: (()+0x12dd0) [0x7fbf5289fdd0]
> Aug 11 16:55:48 bash[27152]:  2: (gsignal()+0x10f) [0x7fbf5150870f]
> Aug 11 16:55:48 

[ceph-users] Re: bug ceph auth

2021-07-14 Thread Wesley Dillingham
Do you get the same error if you just do "ceph auth get
client.bootstrap-osd" i.e. does client.bootstrap exist as a user?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn <http://www.linkedin.com/in/wesleydillingham>


On Wed, Jul 14, 2021 at 1:56 PM Wesley Dillingham 
wrote:

> is /var/lib/ceph/bootstrap-osd/ in existence and writeable?
>
> Respectfully,
>
> *Wes Dillingham*
> w...@wesdillingham.com
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>
>
> On Wed, Jul 14, 2021 at 8:35 AM Marc  wrote:
>
>>
>>
>>
>> [@t01 ~]# ceph auth get client.bootstrap-osd -o
>> ​/var/lib/ceph/bootstrap-osd/ceph.keyring
>> Traceback (most recent call last):
>>   File "/usr/bin/ceph", line 1272, in 
>> retval = main()
>>   File "/usr/bin/ceph", line 1120, in main
>> print('Can\'t open output file {0}:
>> {1}'.format(parsed_args.output_file, e), file=sys.stderr)
>>   File "/usr/lib64/python2.7/codecs.py", line 351, in write
>> data, consumed = self.encode(object, self.errors)
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23:
>> ordinal not in range(128)
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bug ceph auth

2021-07-14 Thread Wesley Dillingham
is /var/lib/ceph/bootstrap-osd/ in existence and writeable?

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Wed, Jul 14, 2021 at 8:35 AM Marc  wrote:

>
>
>
> [@t01 ~]# ceph auth get client.bootstrap-osd -o
> ​/var/lib/ceph/bootstrap-osd/ceph.keyring
> Traceback (most recent call last):
>   File "/usr/bin/ceph", line 1272, in 
> retval = main()
>   File "/usr/bin/ceph", line 1120, in main
> print('Can\'t open output file {0}:
> {1}'.format(parsed_args.output_file, e), file=sys.stderr)
>   File "/usr/lib64/python2.7/codecs.py", line 351, in write
> data, consumed = self.encode(object, self.errors)
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 23:
> ordinal not in range(128)
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


  1   2   >