[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
On Sat, Jun 25, 2022 at 3:27 AM Anthony D'Atri 
wrote:

> The pg_autoscaler aims IMHO way too low and I advise turning it off.
>
>
>
> > On Jun 24, 2022, at 11:11 AM, Curt  wrote:
> >
> >> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or
> per
> > osd?
> > Sorry, 18TB of data and 273 PGs total.
> >
> >> `ceph osd df` will show you toward the right how many PGs are on each
> > OSD.  If you have multiple pools, some PGs will have more data than
> others.
> >> So take an average # of PGs per OSD and divide the actual HDD capacity
> > by that.
> > 20 pg on avg / 2TB(technically 1.8 I guess) which would be 10.
>
> I’m confused.  Is 20 what `ceph osd df` is reporting?  Send me the output
> of

Yes, 20 would be the avg pg count.
 ID  CLASS  WEIGHT   REWEIGHT  SIZE RAW USE  DATA  OMAP META
  AVAIL%USE   VAR   PGS  STATUS
 1hdd  1.81940   1.0  1.8 TiB  748 GiB   746 GiB  207 KiB  1.7 GiB
 1.1 TiB  40.16  1.68   21  up
 3hdd  1.81940   1.0  1.8 TiB  459 GiB   457 GiB3 KiB  1.2 GiB
 1.4 TiB  24.61  1.03   20  up
 5hdd  1.81940   1.0  1.8 TiB  153 GiB   152 GiB   32 KiB  472 MiB
 1.7 TiB   8.20  0.34   15  up
 7hdd  1.81940   1.0  1.8 TiB  471 GiB   470 GiB   83 KiB  1.0 GiB
 1.4 TiB  25.27  1.06   24  up
 9hdd  1.81940   1.0  1.8 TiB  1.0 TiB  1022 GiB  136 KiB  2.4 GiB
 838 GiB  54.99  2.30   19  up
11hdd  1.81940   1.0  1.8 TiB  443 GiB   441 GiB4 KiB  1.1 GiB
 1.4 TiB  23.76  0.99   20  up
13hdd  1.81940   1.0  1.8 TiB  438 GiB   437 GiB  310 KiB  1.0 GiB
 1.4 TiB  23.50  0.98   18  up
15hdd  1.81940   1.0  1.8 TiB  334 GiB   333 GiB  621 KiB  929 MiB
 1.5 TiB  17.92  0.75   15  up
17hdd  1.81940   1.0  1.8 TiB  310 GiB   309 GiB2 KiB  807 MiB
 1.5 TiB  16.64  0.70   20  up
19hdd  1.81940   1.0  1.8 TiB  433 GiB   432 GiB7 KiB  974 MiB
 1.4 TiB  23.23  0.97   25  up
45hdd  1.81940   1.0  1.8 TiB  169 GiB   169 GiB2 KiB  615 MiB
 1.7 TiB   9.09  0.38   18  up
 0hdd  1.81940   1.0  1.8 TiB  582 GiB   580 GiB  295 KiB  1.7 GiB
 1.3 TiB  31.24  1.31   21  up
 2hdd  1.81940   1.0  1.8 TiB  870 MiB21 MiB  112 KiB  849 MiB
 1.8 TiB   0.05  0.00   14  up
 4hdd  1.81940   1.0  1.8 TiB  326 GiB   325 GiB   14 KiB  947 MiB
 1.5 TiB  17.48  0.73   24  up
 6hdd  1.81940   1.0  1.8 TiB  450 GiB   448 GiB1 KiB  1.4 GiB
 1.4 TiB  24.13  1.01   17  up
 8hdd  1.81940   1.0  1.8 TiB  152 GiB   152 GiB  618 KiB  900 MiB
 1.7 TiB   8.18  0.34   20  up
10hdd  1.81940   1.0  1.8 TiB  609 GiB   607 GiB4 KiB  1.7 GiB
 1.2 TiB  32.67  1.37   25  up
12hdd  1.81940   1.0  1.8 TiB  333 GiB   332 GiB  175 KiB  1.5 GiB
 1.5 TiB  17.89  0.75   24  up
14hdd  1.81940   1.0  1.8 TiB  1.0 TiB   1.0 TiB1 KiB  2.2 GiB
 834 GiB  55.24  2.31   17  up
16hdd  1.81940   1.0  1.8 TiB  168 GiB   167 GiB4 KiB  1.2 GiB
 1.7 TiB   9.03  0.38   15  up
18hdd  1.81940   1.0  1.8 TiB  299 GiB   298 GiB  261 KiB  1.6 GiB
 1.5 TiB  16.07  0.67   15  up
32hdd  1.81940   1.0  1.8 TiB  873 GiB   871 GiB   45 KiB  2.3 GiB
 990 GiB  46.88  1.96   18  up
22hdd  1.81940   1.0  1.8 TiB  449 GiB   447 GiB  139 KiB  1.6 GiB
 1.4 TiB  24.10  1.01   22  up
23hdd  1.81940   1.0  1.8 TiB  299 GiB   298 GiB5 KiB  1.6 GiB
 1.5 TiB  16.06  0.67   20  up
24hdd  1.81940   1.0  1.8 TiB  887 GiB   885 GiB8 KiB  2.4 GiB
 976 GiB  47.62  1.99   23  up
25hdd  1.81940   1.0  1.8 TiB  451 GiB   449 GiB4 KiB  1.6 GiB
 1.4 TiB  24.20  1.01   17  up
26hdd  1.81940   1.0  1.8 TiB  602 GiB   600 GiB  373 KiB  2.0 GiB
 1.2 TiB  32.29  1.35   21  up
27hdd  1.81940   1.0  1.8 TiB  152 GiB   151 GiB  1.5 MiB  564 MiB
 1.7 TiB   8.14  0.34   14  up
28hdd  1.81940   1.0  1.8 TiB  330 GiB   328 GiB7 KiB  1.6 GiB
 1.5 TiB  17.70  0.74   12  up
29hdd  1.81940   1.0  1.8 TiB  726 GiB   723 GiB7 KiB  2.1 GiB
 1.1 TiB  38.94  1.63   16  up
30hdd  1.81940   1.0  1.8 TiB  596 GiB   594 GiB  173 KiB  2.0 GiB
 1.2 TiB  32.01  1.34   19  up
31hdd  1.81940   1.0  1.8 TiB  304 GiB   303 GiB4 KiB  1.6 GiB
 1.5 TiB  16.34  0.68   20  up
44hdd  1.81940   1.0  1.8 TiB  150 GiB   149 GiB  0 B  599 MiB
 1.7 TiB   8.03  0.34   12  up
33hdd  1.81940   1.0  1.8 TiB  451 GiB   449 GiB  462 KiB  1.8 GiB
 1.4 TiB  24.22  1.01   19  up
34hdd  1.81940   1.0  1.8 TiB  449 GiB   448 GiB2 KiB  966 MiB
 1.4 TiB  24.12  1.01   21  up
35hdd  1.81940   1.0  1.8 TiB  458 GiB   457 GiB2 KiB  1.5 GiB
 1.4 TiB  24.60  1.03   23  up
36hdd  1.81940   1.0  1.8 TiB  872 GiB   870 GiB3 KiB  2.4 GiB
 991 GiB  46.81  1.96   22  up
37hdd  1.81940   1.0  1.8 TiB  443 GiB   441 GiB  136 KiB  

[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
Nope, majority of read/writes happen at night so it's doing less than 1
MiB/s client io right now, sometimes 0.

On Fri, Jun 24, 2022, 22:23 Stefan Kooman  wrote:

> On 6/24/22 20:09, Curt wrote:
> >
> >
> > On Fri, Jun 24, 2022 at 10:00 PM Stefan Kooman  > > wrote:
> >
> > On 6/24/22 19:49, Curt wrote:
> >  > Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's
> >  > objections or keys recovering?\
> >
> > ceph -s. wil tell you what type of recovery is going on.
> >
> > Is it a cephfs metadata pool? Or a rgw index pool?
> >
> > Gr. Stefan
> >
> >
> > object recovery, I guess I'm used to it always showing object, so didn't
> > know it could be key.
> >
> > rbd pool.
>
> recovery has lower priority than client IO. Is the cluster busy?
>
> Gr. Stefan
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
osd?
Sorry, 18TB of data and 273 PGs total.

> `ceph osd df` will show you toward the right how many PGs are on each
OSD.  If you have multiple pools, some PGs will have more data than others.
>  So take an average # of PGs per OSD and divide the actual HDD capacity
by that.
20 pg on avg / 2TB(technically 1.8 I guess) which would be 10.  Shouldn't
that be used though, not capacity? My usage is only 23% capacity.  I
thought ceph autoscalling pg's changed the size dynamically according to
usage?  I'm guessing I'm misunderstanding that part?

Thanks,
Curt

On Fri, Jun 24, 2022 at 9:48 PM Anthony D'Atri 
wrote:

>
> > Yes, SATA, I think my benchmark put it around 125, but that was a year
> ago, so could be misremembering
>
> A FIO benchmark, especially a sequential one on an empty drive, can
> mislead as to the real-world performance one sees on a fragmented drive.
>
> >  273 pg at 18TB so each PG would be 60G.
>
> You wrote 2TB before, are they 2TB or 18TB?  Is that 273 PGs total or per
> osd?
>
> >  Mainly used for RBD, using erasure coding.  cephadm bootstrap with
> docker images.
>
> Ack.  Have to account for replication.
>
> `ceph osd df` will show you toward the right how many PGs are on each
> OSD.  If you have multiple pools, some PGs will have more data than others.
>
> So take an average # of PGs per OSD and divide the actual HDD capacity by
> that.
>
>
>
>
> >
> > On Fri, Jun 24, 2022 at 9:21 PM Anthony D'Atri 
> wrote:
> >
> >
> > >
> > > 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB
> enterprise HD's.
> >
> > SATA? Figure they can write at 70 MB/s
> >
> > How big are your PGs?  What is your cluster used for?  RBD? RGW? CephFS?
> >
> > >
> > > Take this log entry below, 72 minutes and still backfilling
> undersized?  Should it be that slow?
> > >
> > > pg 12.15 is stuck undersized for 72m, current state
> active+undersized+degraded+remapped+backfilling, last acting [34,10,29,NONE]
> > >
> > > Thanks,
> > > Curt
> > >
> > >
> > > On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri <
> anthony.da...@gmail.com> wrote:
> > > Your recovery is slow *because* there are only 2 PGs backfilling.
> > >
> > > What kind of OSD media are you using?
> > >
> > > > On Jun 24, 2022, at 09:46, Curt  wrote:
> > > >
> > > > Hello,
> > > >
> > > > I'm trying to understand why my recovery is so slow with only 2 pg
> > > > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G
> network.  I
> > > > have tested the speed between machines with a few tools and all
> confirm 10G
> > > > speed.  I've tried changing various settings of priority and
> recovery sleep
> > > > hdd, but still the same. Is this a configuration issue or something
> else?
> > > >
> > > > It's just a small cluster right now with 4 hosts, 11 osd's per.
> Please let
> > > > me know if you need more information.
> > > >
> > > > Thanks,
> > > > Curt
> > > > ___
> > > > ceph-users mailing list -- ceph-users@ceph.io
> > > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
On Fri, Jun 24, 2022 at 10:00 PM Stefan Kooman  wrote:

> On 6/24/22 19:49, Curt wrote:
> > Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's
> > objections or keys recovering?\
>
> ceph -s. wil tell you what type of recovery is going on.
>
> Is it a cephfs metadata pool? Or a rgw index pool?
>
> Gr. Stefan
>

object recovery, I guess I'm used to it always showing object, so didn't
know it could be key.

rbd pool.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
Pool 12 is my erasure coding pool, 2+2.  How can I tell if it's objections
or keys recovering?

Thanks,
Curt

On Fri, Jun 24, 2022 at 9:39 PM Stefan Kooman  wrote:

> On 6/24/22 19:04, Curt wrote:
> > 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB
> enterprise
> > HD's.
> >
> > Take this log entry below, 72 minutes and still backfilling undersized?
> > Should it be that slow?
> >
> > pg 12.15 is stuck undersized for 72m, current state
> > active+undersized+degraded+remapped+backfilling, last acting
> [34,10,29,NONE]
>
> What is in that pool 12? Is it objects that are recovering, or keys?
> OMAP data (keys) is slow.
>
> Gr. Stefan
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-24 Thread Curt
2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB enterprise
HD's.

Take this log entry below, 72 minutes and still backfilling undersized?
Should it be that slow?

pg 12.15 is stuck undersized for 72m, current state
active+undersized+degraded+remapped+backfilling, last acting [34,10,29,NONE]

Thanks,
Curt


On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri 
wrote:

> Your recovery is slow *because* there are only 2 PGs backfilling.
>
> What kind of OSD media are you using?
>
> > On Jun 24, 2022, at 09:46, Curt  wrote:
> >
> > Hello,
> >
> > I'm trying to understand why my recovery is so slow with only 2 pg
> > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
> > have tested the speed between machines with a few tools and all confirm
> 10G
> > speed.  I've tried changing various settings of priority and recovery
> sleep
> > hdd, but still the same. Is this a configuration issue or something else?
> >
> > It's just a small cluster right now with 4 hosts, 11 osd's per.  Please
> let
> > me know if you need more information.
> >
> > Thanks,
> > Curt
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph recovery network speed

2022-06-24 Thread Curt
Hello,

I'm trying to understand why my recovery is so slow with only 2 pg
backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
have tested the speed between machines with a few tools and all confirm 10G
speed.  I've tried changing various settings of priority and recovery sleep
hdd, but still the same. Is this a configuration issue or something else?

It's just a small cluster right now with 4 hosts, 11 osd's per.  Please let
me know if you need more information.

Thanks,
Curt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Robert Sander

Am 24.06.22 um 16:44 schrieb Matthew Darwin:
Not sure.  Long enough to try the command and write this email, so at 
least 10 minutes.


I had that too today after upgrading my test cluster.

I just ran "ceph telemetry off" and "ceph telemetry on" and the message 
was gone.


Regards
--
Robert Sander
Heinlein Consulting GmbH
Schwedter Str. 8/9b, 10119 Berlin

http://www.heinlein-support.de

Tel: 030 / 405051-43
Fax: 030 / 405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 220009 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Matthew Darwin
Not sure.  Long enough to try the command and write this email, so at 
least 10 minutes.


I expected it to disappear after 30 seconds or so.

On 2022-06-24 10:34, Laura Flores wrote:

Hi Matthew,

About how long did the warning stay up after you ran the `ceph 
telemetry on` command?


- Laura

On Fri, Jun 24, 2022 at 9:03 AM Yaarit Hatuka  
wrote:


We added a new collection in 17.2.1 to indicate Rook
deployments, since we
want to understand its volume in the wild, thus the module asks for
re-opting-in.

On Fri, Jun 24, 2022 at 9:52 AM Matthew Darwin 
wrote:

> Thanks Yaarit,
>
> The cluster I was using is just a test cluster with a few OSD and
> almost no data. Not sure why I have to re-opt in upgrading
from 17.2.0
> to 17.2.1
>
> On 2022-06-24 09:41, Yaarit Hatuka wrote:
> > Hi Matthew,
> >
> > Thanks for your update. How big is the cluster?
> >
> > Thanks for opting-in to telemetry!
> > Yaarit
> >
> >
> > On Thu, Jun 23, 2022 at 11:53 PM Matthew Darwin
 wrote:
> >
> >     Sorry. Eventually it goes away.  Just slower than I was
expecting.
> >
> >     On 2022-06-23 23:42, Matthew Darwin wrote:
> >     >
> >     > I just updated quincy from 17.2.0 to 17.2.1.  Ceph
status reports
> >     > "Telemetry requires re-opt-in". I then run
> >     >
> >     > $ ceph telemetry on
> >     >
> >     > $ ceph telemetry on --license sharing-1-0
> >     >
> >     > Still the message "TELEMETRY_CHANGED( Telemetry requires
> >     re-opt-in)
> >     > message" remains in the log.
> >     >
> >     > Any ideas how to get rid of this warning? There was no
warning
> >     > before the upgrade.
> >     >
> >     ___
> >     ceph-users mailing list -- ceph-users@ceph.io
> >     To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



--

Laura Flores

She/Her/Hers

Associate Software Engineer, Ceph Storage

Red Hat Inc. 

La Grange Park, IL

lflo...@redhat.com
M: +17087388804 

@RedHat  Red Hat 
 Red Hat 






___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Laura Flores
Hi Matthew,

About how long did the warning stay up after you ran the `ceph telemetry
on` command?

- Laura

On Fri, Jun 24, 2022 at 9:03 AM Yaarit Hatuka  wrote:

> We added a new collection in 17.2.1 to indicate Rook deployments, since we
> want to understand its volume in the wild, thus the module asks for
> re-opting-in.
>
> On Fri, Jun 24, 2022 at 9:52 AM Matthew Darwin  wrote:
>
> > Thanks Yaarit,
> >
> > The cluster I was using is just a test cluster with a few OSD and
> > almost no data. Not sure why I have to re-opt in upgrading from 17.2.0
> > to 17.2.1
> >
> > On 2022-06-24 09:41, Yaarit Hatuka wrote:
> > > Hi Matthew,
> > >
> > > Thanks for your update. How big is the cluster?
> > >
> > > Thanks for opting-in to telemetry!
> > > Yaarit
> > >
> > >
> > > On Thu, Jun 23, 2022 at 11:53 PM Matthew Darwin 
> wrote:
> > >
> > > Sorry. Eventually it goes away.  Just slower than I was expecting.
> > >
> > > On 2022-06-23 23:42, Matthew Darwin wrote:
> > > >
> > > > I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports
> > > > "Telemetry requires re-opt-in". I then run
> > > >
> > > > $ ceph telemetry on
> > > >
> > > > $ ceph telemetry on --license sharing-1-0
> > > >
> > > > Still the message "TELEMETRY_CHANGED( Telemetry requires
> > > re-opt-in)
> > > > message" remains in the log.
> > > >
> > > > Any ideas how to get rid of this warning?  There was no warning
> > > > before the upgrade.
> > > >
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Laura Flores

She/Her/Hers

Associate Software Engineer, Ceph Storage

Red Hat Inc. 

La Grange Park, IL

lflo...@redhat.com
M: +17087388804
@RedHat    Red Hat
  Red Hat


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: librbd leaks memory on crushmap updates

2022-06-24 Thread Peter Lieven
Am 23.06.22 um 12:59 schrieb Ilya Dryomov:
> On Thu, Jun 23, 2022 at 11:32 AM Peter Lieven  wrote:
>> Am 22.06.22 um 15:46 schrieb Josh Baergen:
>>> Hey Peter,
>>>
 I found relatively large allocations in the qemu smaps and checked the 
 contents. It contained several hundred repetitions of osd and pool names. 
 We use the default builds on Ubuntu 20.04. Is there a special memory 
 allocator in place that might not clean up properly?
>>> I'm sure you would have noticed this and mentioned it if it was so -
>>> any chance the contents of these regions look like log messages of
>>> some kind? I recently tracked down a high client memory usage that
>>> looked like a leak that turned out to be a broken config option
>>> resulting in higher in-memory log retention:
>>> https://tracker.ceph.com/issues/56093. AFAICT it affects Nautilus+.
>>
>> Hi Josh, hi Ilya,
>>
>>
>> it seems we were in fact facing 2 leaks with 14.x. Our long running VMs with 
>> librbd 14.x have several million items in the osdmap mempool.
>>
>> In our testing environment with 15.x I see no unlimited increase in the 
>> osdmap mempool (compared this to a second dev host with 14.x client where I 
>> see the increase wiht my tests),
>>
>> but I still see leaking memory when I generate a lot of osdmap changes, but 
>> this in fact seem to be log messages - thanks Josh.
>>
>>
>> So I would appreciate if #56093 would be backported to Octopus before its 
>> final release.
> I picked up Josh's PR that was sitting there unnoticed but I'm not sure
> it is the issue you are hitting.  I think Josh's change just resurrects
> the behavior where clients stored only up to 500 log entries instead of
> up to 1 (the default for daemons).  There is no memory leak there,
> just a difference in how much memory is legitimately consumed.  The
> usage is bounded either way.
>
> However in your case, the usage is slowly but constantly growing.
> In the original post you said that it was observed both on 14.2.22 and
> 15.2.16.  Are you saying that you are no longer seeing it in 15.x?


After I understood whats the background of Josh issue I can confirm that I 
still see increasing memory which is not caused

by osdmap items and also not by log entries. There must be something else going 
on.


Peter



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Yaarit Hatuka
We added a new collection in 17.2.1 to indicate Rook deployments, since we
want to understand its volume in the wild, thus the module asks for
re-opting-in.

On Fri, Jun 24, 2022 at 9:52 AM Matthew Darwin  wrote:

> Thanks Yaarit,
>
> The cluster I was using is just a test cluster with a few OSD and
> almost no data. Not sure why I have to re-opt in upgrading from 17.2.0
> to 17.2.1
>
> On 2022-06-24 09:41, Yaarit Hatuka wrote:
> > Hi Matthew,
> >
> > Thanks for your update. How big is the cluster?
> >
> > Thanks for opting-in to telemetry!
> > Yaarit
> >
> >
> > On Thu, Jun 23, 2022 at 11:53 PM Matthew Darwin  wrote:
> >
> > Sorry. Eventually it goes away.  Just slower than I was expecting.
> >
> > On 2022-06-23 23:42, Matthew Darwin wrote:
> > >
> > > I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports
> > > "Telemetry requires re-opt-in". I then run
> > >
> > > $ ceph telemetry on
> > >
> > > $ ceph telemetry on --license sharing-1-0
> > >
> > > Still the message "TELEMETRY_CHANGED( Telemetry requires
> > re-opt-in)
> > > message" remains in the log.
> > >
> > > Any ideas how to get rid of this warning?  There was no warning
> > > before the upgrade.
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Matthew Darwin

Thanks Yaarit,

The cluster I was using is just a test cluster with a few OSD and 
almost no data. Not sure why I have to re-opt in upgrading from 17.2.0 
to 17.2.1


On 2022-06-24 09:41, Yaarit Hatuka wrote:

Hi Matthew,

Thanks for your update. How big is the cluster?

Thanks for opting-in to telemetry!
Yaarit


On Thu, Jun 23, 2022 at 11:53 PM Matthew Darwin  wrote:

Sorry. Eventually it goes away.  Just slower than I was expecting.

On 2022-06-23 23:42, Matthew Darwin wrote:
>
> I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports
> "Telemetry requires re-opt-in". I then run
>
> $ ceph telemetry on
>
> $ ceph telemetry on --license sharing-1-0
>
> Still the message "TELEMETRY_CHANGED( Telemetry requires
re-opt-in)
> message" remains in the log.
>
> Any ideas how to get rid of this warning?  There was no warning
> before the upgrade.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to remove TELEMETRY_CHANGED( Telemetry requires re-opt-in) message

2022-06-24 Thread Yaarit Hatuka
Hi Matthew,

Thanks for your update. How big is the cluster?

Thanks for opting-in to telemetry!
Yaarit


On Thu, Jun 23, 2022 at 11:53 PM Matthew Darwin  wrote:

> Sorry. Eventually it goes away.  Just slower than I was expecting.
>
> On 2022-06-23 23:42, Matthew Darwin wrote:
> >
> > I just updated quincy from 17.2.0 to 17.2.1.  Ceph status reports
> > "Telemetry requires re-opt-in". I then run
> >
> > $ ceph telemetry on
> >
> > $ ceph telemetry on --license sharing-1-0
> >
> > Still the message "TELEMETRY_CHANGED( Telemetry requires re-opt-in)
> > message" remains in the log.
> >
> > Any ideas how to get rid of this warning?  There was no warning
> > before the upgrade.
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Dan van der Ster
Hi,

>From what I can tell, the ceph osd pool command is indeed the same as
rados mksnap.

But bizarrely I just created a new snapshot, changed max_mds, then
removed the snap -- this time I can't manage to "fix" the
inconsistency.
It may be that my first test was so simple (no client IO, no fs
snapshots) that removing the snap fixed it.

In this case, the inconsistent object appears to be an old version of
mds0_openfiles.0

# rados list-inconsistent-obj 3.6 | jq .
{
  "epoch": 7754,
  "inconsistents": [
{
  "object": {
"name": "mds0_openfiles.0",
"nspace": "",
"locator": "",
"snap": 3,
"version": 2467
  },

I tried modifying the current version of that with setomapval, but the
object stays inconsistent.
I even removed it from the pool (head version) and somehow that old
snapshotted version remains with the wrong checksum even though the
snap exists.

# rados rm -p cephfs.cephfs.meta mds0_openfiles.0
#

# ceph pg ls inconsistent
PG   OBJECTS  DEGRADED  MISPLACED  UNFOUND  BYTES OMAP_BYTES*
OMAP_KEYS*  LOG  STATE  SINCE  VERSIONREPORTED
   UP ACTING SCRUB_STAMP
DEEP_SCRUB_STAMP
3.6   13 0  00  209715200
 0   41  active+clean+inconsistent 2m  7852'2479  7852:12048
[0,3,2]p0  [0,3,2]p0  2022-06-24T11:31:05.605434+0200
2022-06-24T11:31:05.605434+0200

# rados lssnap -p cephfs.cephfs.meta
0 snaps

This is getting super weird (I can list the object but not stat it):

# rados ls -p cephfs.cephfs.meta | grep open
mds1_openfiles.0
mds3_openfiles.0
mds0_openfiles.0
mds2_openfiles.0

# rados stat -p cephfs.cephfs.meta mds0_openfiles.0
 error stat-ing cephfs.cephfs.meta/mds0_openfiles.0: (2) No such file
or directory

I then failed over the mds to a standby so mds0_openfiles.0 exists
again, but the PG remains inconsistent with that old version of the
object.

I will add this to the tracker.

Clearly the objects are not all trimmed correctly when the pool
snapshot is removed.

-- dan



On Fri, Jun 24, 2022 at 11:10 AM Pascal Ehlert  wrote:
>
> Hi Dan,
>
> Just a quick addition here:
>
> I have not used the rados command to create the snapshot but "ceph osd
> pool mksnap $POOL $SNAPNAME" - which I think is the same internally?
>
> And yes, our CephFS has numerous snapshots itself for backup purposes.
>
>
> Cheers,
> Pascal
>
>
>
> Dan van der Ster wrote on 24.06.22 11:06:
> > Hi Pascal,
> >
> > I'm not sure why you don't see that snap, and I'm also not sure if you
> > can just delete the objects directly.
> > BTW, does your CephFS have snapshots itself (e.g. create via mkdir
> > .snap/foobar)?
> >
> > Cheers, Dan
> >
> > On Fri, Jun 24, 2022 at 10:34 AM Pascal Ehlert  wrote:
> >> Hi Dan,
> >>
> >> Thank you so much for going through the effort of reproducing this!
> >> I was just about to plan how to bring up a test cluster but it would've
> >> taken me much longer.
> >>
> >> While I totally assume this is the root cause for our issues, there is
> >> one small difference.
> >> rados lssnap does not list any snapshots for me:
> >>
> >> root@srv01:~# rados lssnap -p kubernetes_cephfs_metadata
> >> 0 snaps
> >>
> >> I do definitely recall having made a snapshot and apparently there are
> >> snapshot objects present in the pool.
> >> Not sure how the reference seemingly got lost.
> >>
> >> Do you have any ideas how I could anyway remove the broken snapshot 
> >> objects?
> >>
> >>
> >> Cheers,
> >>
> >> Pascal
> >>
> >>
> >> Dan van der Ster wrote on 24.06.22 09:27:
> >>> Hi,
> >>>
> >>> It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
> >>> snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
> >>> each meta pg.
> >>>
> >>> In my test I could list and remove the pool snap, then deep-scrub
> >>> again cleared the inconsistencies.
> >>>
> >>> https://tracker.ceph.com/issues/56386
> >>>
> >>> Cheers, Dan
> >>>
> >>> On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
> >>>  wrote:
>  Hi,
> 
>  I would say yes but it would be nice if other people can confirm it too.
> 
>  also can you create a test cluster and do the same tasks
>  * create it with octopus
>  * create snapshot
>  * reduce rank to 1
>  * upgrade to pacific
> 
>  and then try to fix the PG, assuming that you will have the same
>  issues in your test-cluster,
> 
>  cheers,
>  Ansgar
> 
>  Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert 
>  :
> > Hi,
> >
> > I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it 
> > says the snapshot could not be found although I have definitely run 
> > "ceph osd pool mksnap $POOL beforefixes" about three weeks ago.
> > When running rados list-inconsistent-obj $PG on one of the affected 
> > PGs, all of the objects returned have "snap" set to 1:
> >
> > root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er 
> > 

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific [EXT]

2022-06-24 Thread Pascal Ehlert

Hi Dave,

We have checked the hardware and it seems fine.
The same OSDs host numerous other PGs which are unaffected by this issue.

All of the OSDs reported as inconsistent/repair_failed belong to the 
same metadata pool.
We did run a `ceph repair` on the initially which is when the "to many 
repaired reads" error popped up I think.


Cheers,
Pascal

Dave Holland wrote on 24.06.22 11:25:

Hi,

I can't comment on the CephFS side but "Too many repaired reads on 2
OSDs" makes me suggest you check the hardware -- when I've seen that
recently it was due to failing HDDs. I say "failing" not "failed"
because the disks were giving errors on a few sectors but most I/O was
working OK, so neither Linux nor Ceph ejected the disk; and repeated
PG repair attempts were unsuccessful.

Dave


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Inconsistent PGs after upgrade to Pacific [EXT]

2022-06-24 Thread Dave Holland
Hi,

I can't comment on the CephFS side but "Too many repaired reads on 2
OSDs" makes me suggest you check the hardware -- when I've seen that
recently it was due to failing HDDs. I say "failing" not "failed"
because the disks were giving errors on a few sectors but most I/O was
working OK, so neither Linux nor Ceph ejected the disk; and repeated
PG repair attempts were unsuccessful.

Dave
-- 
**   Dave Holland   ** Systems Support -- Informatics Systems Group **
** d...@sanger.ac.uk **Wellcome Sanger Institute, Hinxton, UK**


On Wed, Jun 22, 2022 at 07:41:12PM +0200, Pascal Ehlert wrote:
> Hi all,
> 
> I am currently battling inconsistent PGs after a far-reaching mistake during
> the upgrade from Octopus to Pacific.
> While otherwise following the guide, I restarted the Ceph MDS daemons (and
> this started the Pacific daemons) without previously reducing the ranks to 1
> (from 2).
> 
> This resulted in daemons not coming up and reporting inconsistencies.
> After later reducing the ranks and bringing the MDS back up (I did not
> record every step as this was an emergency situation), we started seeing
> health errors on every scrub.
> 
> Now after three weeks, while our CephFS is still working fine and we haven't
> noticed any data damage, we realized that every single PG of the cephfs
> metadata pool is affected.
> Below you can find some information on the actual status and a detailed
> inspection of one of the affected pgs. I am happy to provide any other
> information that could be useful of course.
> 
> A repair of the affected PGs does not resolve the issue.
> Does anyone else here have an idea what we could try apart from copying all
> the data to a new CephFS pool?
> 
> 
> 
> Thank you!
> 
> Pascal
> 
> 
> 
> 
> root@srv02:~# ceph status
>   cluster:
>     id: f0d6d4d0-8c17-471a-9f95-ebc80f1fee78
>     health: HEALTH_ERR
>     insufficient standby MDS daemons available
>     69262 scrub errors
>     Too many repaired reads on 2 OSDs
>     Possible data damage: 64 pgs inconsistent
> 
>   services:
>     mon: 3 daemons, quorum srv02,srv03,srv01 (age 3w)
>     mgr: srv03(active, since 3w), standbys: srv01, srv02
>     mds: 2/2 daemons up, 1 hot standby
>     osd: 44 osds: 44 up (since 3w), 44 in (since 10M)
> 
>   data:
>     volumes: 2/2 healthy
>     pools:   13 pools, 1217 pgs
>     objects: 75.72M objects, 26 TiB
>     usage:   80 TiB used, 42 TiB / 122 TiB avail
>     pgs: 1153 active+clean
>  55   active+clean+inconsistent
>  9    active+clean+inconsistent+failed_repair
> 
>   io:
>     client:   2.0 MiB/s rd, 21 MiB/s wr, 240 op/s rd, 1.75k op/s wr
> 
> 
> {
>   "epoch": 4962617,
>   "inconsistents": [
>     {
>   "object": {
>     "name": "100cc8e.",
>     "nspace": "",
>     "locator": "",
>     "snap": 1,
>     "version": 4253817
>   },
>   "errors": [],
>   "union_shard_errors": [
>     "omap_digest_mismatch_info"
>   ],
>   "selected_object_info": {
>     "oid": {
>   "oid": "100cc8e.",
>   "key": "",
>   "snapid": 1,
>   "hash": 1369745244,
>   "max": 0,
>   "pool": 7,
>   "namespace": ""
>     },
>     "version": "4962847'6209730",
>     "prior_version": "3916665'4306116",
>     "last_reqid": "osd.27.0:757107407",
>     "user_version": 4253817,
>     "size": 0,
>     "mtime": "2022-02-26T12:56:55.612420+0100",
>     "local_mtime": "2022-02-26T12:56:55.614429+0100",
>     "lost": 0,
>     "flags": [
>   "dirty",
>   "omap",
>   "data_digest",
>   "omap_digest"
>     ],
>     "truncate_seq": 0,
>     "truncate_size": 0,
>     "data_digest": "0x",
>     "omap_digest": "0xe5211a9e",
>     "expected_object_size": 0,
>     "expected_write_size": 0,
>     "alloc_hint_flags": 0,
>     "manifest": {
>   "type": 0
>     },
>     "watchers": {}
>   },
>   "shards": [
>     {
>   "osd": 20,
>   "primary": false,
>   "errors": [
>     "omap_digest_mismatch_info"
>   ],
>   "size": 0,
>   "omap_digest": "0x",
>   "data_digest": "0x"
>     },
>     {
>   "osd": 27,
>   "primary": true,
>   "errors": [
>     "omap_digest_mismatch_info"
>   ],
>   "size": 0,
>   "omap_digest": "0x",
>   "data_digest": "0x"
>     },
>     {
>   "osd": 43,
>   "primary": false,
>   "errors": [
>     "omap_digest_mismatch_info"
>   ],
>   "size": 0,
>   "omap_digest": "0x",
>   "data_digest": "0x"
>     }
>   ]
>     },
> 
> 
> 
> 
> ___
> ceph-users mailing list -- 

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Pascal Ehlert

Hi Dan,

Just a quick addition here:

I have not used the rados command to create the snapshot but "ceph osd 
pool mksnap $POOL $SNAPNAME" - which I think is the same internally?


And yes, our CephFS has numerous snapshots itself for backup purposes.


Cheers,
Pascal



Dan van der Ster wrote on 24.06.22 11:06:

Hi Pascal,

I'm not sure why you don't see that snap, and I'm also not sure if you
can just delete the objects directly.
BTW, does your CephFS have snapshots itself (e.g. create via mkdir
.snap/foobar)?

Cheers, Dan

On Fri, Jun 24, 2022 at 10:34 AM Pascal Ehlert  wrote:

Hi Dan,

Thank you so much for going through the effort of reproducing this!
I was just about to plan how to bring up a test cluster but it would've
taken me much longer.

While I totally assume this is the root cause for our issues, there is
one small difference.
rados lssnap does not list any snapshots for me:

root@srv01:~# rados lssnap -p kubernetes_cephfs_metadata
0 snaps

I do definitely recall having made a snapshot and apparently there are
snapshot objects present in the pool.
Not sure how the reference seemingly got lost.

Do you have any ideas how I could anyway remove the broken snapshot objects?


Cheers,

Pascal


Dan van der Ster wrote on 24.06.22 09:27:

Hi,

It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
each meta pg.

In my test I could list and remove the pool snap, then deep-scrub
again cleared the inconsistencies.

https://tracker.ceph.com/issues/56386

Cheers, Dan

On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
 wrote:

Hi,

I would say yes but it would be nice if other people can confirm it too.

also can you create a test cluster and do the same tasks
* create it with octopus
* create snapshot
* reduce rank to 1
* upgrade to pacific

and then try to fix the PG, assuming that you will have the same
issues in your test-cluster,

cheers,
Ansgar

Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert :

Hi,

I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the snapshot 
could not be found although I have definitely run "ceph osd pool mksnap $POOL 
beforefixes" about three weeks ago.
When running rados list-inconsistent-obj $PG on one of the affected PGs, all of the 
objects returned have "snap" set to 1:

root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do 
rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
[..]
{
"name": "200020744f4.",
"nspace": "",
"locator": "",
"snap": 1,
"version": 5704208
}
{
"name": "200021aeb16.",
"nspace": "",
"locator": "",
"snap": 1,
"version": 6189078
}
[..]

Running listsnaps on any of them then looks like this:

root@srv01:~# rados listsnaps 200020744f4. -p $POOL
200020744f4.:
cloneidsnapssizeoverlap
110[]
head-0


Is it save to assume that these objects belong to a somewhat broken snapshot 
and can be removed safely without causing further damage?


Thanks,

Pascal



Ansgar Jazdzewski wrote on 23.06.22 20:36:

Hi,

we could identify the rbd images that wehre affected and did an export before, 
but in the case of cephfs metadata i have no plan that will work.

can you try to delete the snapshot?
also if the filesystem can be shutdown? try to do a backup of the metadatapool

hope you will have some luck, let me know if I can help,
Ansgar

Pascal Ehlert  schrieb am Do., 23. Juni 2022, 16:45:

Hi Ansgar,

Thank you very much for the response.
Running your first command to obtain inconsistent objects, I retrieve a
total of 23114 only some of which are snaps.

You mentioning snapshots did remind me of the fact however that I
created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
mksnap" before I reduced the number of ranks.
Maybe that has causes the inconsistencies and would explain why the
actual file system appears unaffected?

Is there any way to validate that theory? I am a bit hesitant to just
run "rmsnap". Could that cause inconsistent data to be written back to
the actual objects?


Best regards,

Pascal



Ansgar Jazdzewski wrote on 23.06.22 16:11:

Hi Pascal,

We just had a similar situation on our RBD and had found some bad data
in RADOS here is How we did it:

for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
-F'.' '{print $2}'; done

we than found inconsistent snaps on the Object:

rados list-inconsistent-snapset $PG --format=json-pretty | jq
.inconsistents[].name

List the data on the OSD's (ceph pg map $PG)

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
list ${OBJ} --pgid ${PG}

and finally remove the object, like:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
list rbd_data.762a94d768c04d.0036b7ac --pgid
2.704ceph-objectstore-tool --data-path 

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Dan van der Ster
Hi Pascal,

I'm not sure why you don't see that snap, and I'm also not sure if you
can just delete the objects directly.
BTW, does your CephFS have snapshots itself (e.g. create via mkdir
.snap/foobar)?

Cheers, Dan

On Fri, Jun 24, 2022 at 10:34 AM Pascal Ehlert  wrote:
>
> Hi Dan,
>
> Thank you so much for going through the effort of reproducing this!
> I was just about to plan how to bring up a test cluster but it would've
> taken me much longer.
>
> While I totally assume this is the root cause for our issues, there is
> one small difference.
> rados lssnap does not list any snapshots for me:
>
> root@srv01:~# rados lssnap -p kubernetes_cephfs_metadata
> 0 snaps
>
> I do definitely recall having made a snapshot and apparently there are
> snapshot objects present in the pool.
> Not sure how the reference seemingly got lost.
>
> Do you have any ideas how I could anyway remove the broken snapshot objects?
>
>
> Cheers,
>
> Pascal
>
>
> Dan van der Ster wrote on 24.06.22 09:27:
> > Hi,
> >
> > It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
> > snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
> > each meta pg.
> >
> > In my test I could list and remove the pool snap, then deep-scrub
> > again cleared the inconsistencies.
> >
> > https://tracker.ceph.com/issues/56386
> >
> > Cheers, Dan
> >
> > On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
> >  wrote:
> >> Hi,
> >>
> >> I would say yes but it would be nice if other people can confirm it too.
> >>
> >> also can you create a test cluster and do the same tasks
> >> * create it with octopus
> >> * create snapshot
> >> * reduce rank to 1
> >> * upgrade to pacific
> >>
> >> and then try to fix the PG, assuming that you will have the same
> >> issues in your test-cluster,
> >>
> >> cheers,
> >> Ansgar
> >>
> >> Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert 
> >> :
> >>> Hi,
> >>>
> >>> I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says 
> >>> the snapshot could not be found although I have definitely run "ceph osd 
> >>> pool mksnap $POOL beforefixes" about three weeks ago.
> >>> When running rados list-inconsistent-obj $PG on one of the affected PGs, 
> >>> all of the objects returned have "snap" set to 1:
> >>>
> >>> root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); 
> >>> do rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
> >>> [..]
> >>> {
> >>>"name": "200020744f4.",
> >>>"nspace": "",
> >>>"locator": "",
> >>>"snap": 1,
> >>>"version": 5704208
> >>> }
> >>> {
> >>>"name": "200021aeb16.",
> >>>"nspace": "",
> >>>"locator": "",
> >>>"snap": 1,
> >>>"version": 6189078
> >>> }
> >>> [..]
> >>>
> >>> Running listsnaps on any of them then looks like this:
> >>>
> >>> root@srv01:~# rados listsnaps 200020744f4. -p $POOL
> >>> 200020744f4.:
> >>> cloneidsnapssizeoverlap
> >>> 110[]
> >>> head-0
> >>>
> >>>
> >>> Is it save to assume that these objects belong to a somewhat broken 
> >>> snapshot and can be removed safely without causing further damage?
> >>>
> >>>
> >>> Thanks,
> >>>
> >>> Pascal
> >>>
> >>>
> >>>
> >>> Ansgar Jazdzewski wrote on 23.06.22 20:36:
> >>>
> >>> Hi,
> >>>
> >>> we could identify the rbd images that wehre affected and did an export 
> >>> before, but in the case of cephfs metadata i have no plan that will work.
> >>>
> >>> can you try to delete the snapshot?
> >>> also if the filesystem can be shutdown? try to do a backup of the 
> >>> metadatapool
> >>>
> >>> hope you will have some luck, let me know if I can help,
> >>> Ansgar
> >>>
> >>> Pascal Ehlert  schrieb am Do., 23. Juni 2022, 16:45:
>  Hi Ansgar,
> 
>  Thank you very much for the response.
>  Running your first command to obtain inconsistent objects, I retrieve a
>  total of 23114 only some of which are snaps.
> 
>  You mentioning snapshots did remind me of the fact however that I
>  created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
>  mksnap" before I reduced the number of ranks.
>  Maybe that has causes the inconsistencies and would explain why the
>  actual file system appears unaffected?
> 
>  Is there any way to validate that theory? I am a bit hesitant to just
>  run "rmsnap". Could that cause inconsistent data to be written back to
>  the actual objects?
> 
> 
>  Best regards,
> 
>  Pascal
> 
> 
> 
>  Ansgar Jazdzewski wrote on 23.06.22 16:11:
> > Hi Pascal,
> >
> > We just had a similar situation on our RBD and had found some bad data
> > in RADOS here is How we did it:
> >
> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
> > -F'.' '{print $2}'; done
> >
> > we than found inconsistent snaps on the 

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Pascal Ehlert

Hi Dan,

Thank you so much for going through the effort of reproducing this!
I was just about to plan how to bring up a test cluster but it would've 
taken me much longer.


While I totally assume this is the root cause for our issues, there is 
one small difference.

rados lssnap does not list any snapshots for me:

root@srv01:~# rados lssnap -p kubernetes_cephfs_metadata
0 snaps

I do definitely recall having made a snapshot and apparently there are 
snapshot objects present in the pool.

Not sure how the reference seemingly got lost.

Do you have any ideas how I could anyway remove the broken snapshot objects?


Cheers,

Pascal


Dan van der Ster wrote on 24.06.22 09:27:

Hi,

It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
each meta pg.

In my test I could list and remove the pool snap, then deep-scrub
again cleared the inconsistencies.

https://tracker.ceph.com/issues/56386

Cheers, Dan

On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
 wrote:

Hi,

I would say yes but it would be nice if other people can confirm it too.

also can you create a test cluster and do the same tasks
* create it with octopus
* create snapshot
* reduce rank to 1
* upgrade to pacific

and then try to fix the PG, assuming that you will have the same
issues in your test-cluster,

cheers,
Ansgar

Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert :

Hi,

I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the snapshot 
could not be found although I have definitely run "ceph osd pool mksnap $POOL 
beforefixes" about three weeks ago.
When running rados list-inconsistent-obj $PG on one of the affected PGs, all of the 
objects returned have "snap" set to 1:

root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do 
rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
[..]
{
   "name": "200020744f4.",
   "nspace": "",
   "locator": "",
   "snap": 1,
   "version": 5704208
}
{
   "name": "200021aeb16.",
   "nspace": "",
   "locator": "",
   "snap": 1,
   "version": 6189078
}
[..]

Running listsnaps on any of them then looks like this:

root@srv01:~# rados listsnaps 200020744f4. -p $POOL
200020744f4.:
cloneidsnapssizeoverlap
110[]
head-0


Is it save to assume that these objects belong to a somewhat broken snapshot 
and can be removed safely without causing further damage?


Thanks,

Pascal



Ansgar Jazdzewski wrote on 23.06.22 20:36:

Hi,

we could identify the rbd images that wehre affected and did an export before, 
but in the case of cephfs metadata i have no plan that will work.

can you try to delete the snapshot?
also if the filesystem can be shutdown? try to do a backup of the metadatapool

hope you will have some luck, let me know if I can help,
Ansgar

Pascal Ehlert  schrieb am Do., 23. Juni 2022, 16:45:

Hi Ansgar,

Thank you very much for the response.
Running your first command to obtain inconsistent objects, I retrieve a
total of 23114 only some of which are snaps.

You mentioning snapshots did remind me of the fact however that I
created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
mksnap" before I reduced the number of ranks.
Maybe that has causes the inconsistencies and would explain why the
actual file system appears unaffected?

Is there any way to validate that theory? I am a bit hesitant to just
run "rmsnap". Could that cause inconsistent data to be written back to
the actual objects?


Best regards,

Pascal



Ansgar Jazdzewski wrote on 23.06.22 16:11:

Hi Pascal,

We just had a similar situation on our RBD and had found some bad data
in RADOS here is How we did it:

for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
-F'.' '{print $2}'; done

we than found inconsistent snaps on the Object:

rados list-inconsistent-snapset $PG --format=json-pretty | jq
.inconsistents[].name

List the data on the OSD's (ceph pg map $PG)

ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
list ${OBJ} --pgid ${PG}

and finally remove the object, like:
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
list rbd_data.762a94d768c04d.0036b7ac --pgid
2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
'["2.704",{"oid":"rbd_data.801e1d1d9c719d.00044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
remove

we had to do it for all OSD one after the other after this a 'pg repair' worked

i hope it will help
Ansgar

Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
:

Hi Pascal,

It's not clear to me how the upgrade procedure you described would
lead to inconsistent PGs.

Even if you didn't record every step, do you have the ceph.log, the
mds logs, perhaps some osd logs from this time?
And which versions did 

[ceph-users] Re: cephadm permission denied when extending cluster

2022-06-24 Thread Robert Reihs
Hi, I tested with the 17.2.1 release with a non root ssh user and it worked
fine.
Best
Robert

On Thu, Jun 23, 2022 at 9:12 PM Robert Reihs  wrote:

> Thanks for the help, It was the same issue.
>
> Best
> Robert
>
> On Thu, Jun 23, 2022 at 8:37 PM Robert Reihs 
> wrote:
>
>> Hi Adam,
>>
>> Yes looks like the same error, I will test it with the root user.
>>
>> Thanks for the quick help.
>> Best
>> Robert
>>
>> On Thu, Jun 23, 2022 at 7:46 PM Adam King  wrote:
>>
>>> Hi Robert,
>>>
>>> Do you think this could be https://tracker.ceph.com/issues/54620 (the
>>> traceback looks similar enough I think it likely is)? We had this issue in
>>> 17.2.0 when using a non-root ssh user. If it's the same thing, it should be
>>> fixed for the 17.2.1 release which is planned to be soon.
>>>
>>> Thanks,
>>>   - Adam King
>>>
>>> On Thu, Jun 23, 2022 at 11:20 AM Robert Reihs 
>>> wrote:
>>>
 Hi all,
 I am currently trying to setup a test cluster with cephadm on a system
 with
 ipv6 setup.
 In the ceph.conf I have:
   ms_bind_ipv4 = false
   ms_bind_ipv6 = true
 I also have a non standard ssh port set, configured with the ssh-config
 settings.

 After bootstrapping the first server I am adding hosts with monitors and
 managers the cluster. The first system has mon and mrg running.

 I am getting a permission denied error:
 debug 2022-06-23T14:51:41.776+ 7f1a31ec8700 -1 log_channel(cephadm)
 log
 [ERR] : executing refresh((['fsn1-ceph-01', 'fsn1-ceph-02',
 'fsn1-ceph-03'],)) failed.
 Traceback (most recent call last):
   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 221, in
 _write_remote_file
 await asyncssh.scp(f.name, (conn, tmp_path))
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 922, in scp
 await source.run(srcpath)
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 458, in run
 self.handle_error(exc)
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
 handle_error
 raise exc from None
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 456, in run
 await self._send_files(path, b'')
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 438, in
 _send_files
 self.handle_error(exc)
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 307, in
 handle_error
 raise exc from None
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 434, in
 _send_files
 await self._send_file(srcpath, dstpath, attrs)
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 365, in
 _send_file
 await self._make_cd_request(b'C', attrs, size, srcpath)
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 343, in
 _make_cd_request
 self._fs.basename(path))
   File "/lib/python3.6/site-packages/asyncssh/scp.py", line 224, in
 make_request
 raise exc
 asyncssh.sftp.SFTPFailure: scp: /tmp/etc/ceph/ceph.conf.new: Permission
 denied
 During handling of the above exception, another exception occurred:
 Traceback (most recent call last):
   File "/usr/share/ceph/mgr/cephadm/utils.py", line 76, in do_work
 return f(*arg)
   File "/usr/share/ceph/mgr/cephadm/serve.py", line 265, in refresh
 self._write_client_files(client_files, host)
   File "/usr/share/ceph/mgr/cephadm/serve.py", line 1052, in
 _write_client_files
 self.mgr.ssh.write_remote_file(host, path, content, mode, uid, gid)
   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 238, in
 write_remote_file
 host, path, content, mode, uid, gid, addr))
   File "/usr/share/ceph/mgr/cephadm/module.py", line 569, in wait_async
 return self.event_loop.get_result(coro)
   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 48, in get_result
 return asyncio.run_coroutine_threadsafe(coro, self._loop).result()
   File "/lib64/python3.6/concurrent/futures/_base.py", line 432, in
 result
 return self.__get_result()
   File "/lib64/python3.6/concurrent/futures/_base.py", line 384, in
 __get_result
 raise self._exception
   File "/usr/share/ceph/mgr/cephadm/ssh.py", line 226, in
 _write_remote_file
 raise OrchestratorError(msg)
 orchestrator._interface.OrchestratorError: Unable to write
 fsn1-ceph-01:/etc/ceph/ceph.conf: scp: /tmp/etc/ceph/ceph.conf.new:
 Permission denied

 Thanks!
 Best
 Robert Reihs
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io


>>
>> --
>> Robert Reihs
>> Jakobsweg 22
>> 8046 Stattegg
>> AUSTRIA
>>
>> mobile: +43 (664) 51 035 90
>> robert.re...@gmail.com
>>
>> TECHNISCHE UNIVERSITÄT GRAZ - GRAZ UNIVERSITY OF TECHNOLOGY
>> Biomedizinische Technik - Biomedical Engineering
>>
>
>

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Dan van der Ster
Hi,

It's trivial to reproduce. Running 16.2.9 with max_mds=2, take a pool
snapshot of the meta pool, then decrease to max_mds=1, then deep scrub
each meta pg.

In my test I could list and remove the pool snap, then deep-scrub
again cleared the inconsistencies.

https://tracker.ceph.com/issues/56386

Cheers, Dan

On Fri, Jun 24, 2022 at 8:41 AM Ansgar Jazdzewski
 wrote:
>
> Hi,
>
> I would say yes but it would be nice if other people can confirm it too.
>
> also can you create a test cluster and do the same tasks
> * create it with octopus
> * create snapshot
> * reduce rank to 1
> * upgrade to pacific
>
> and then try to fix the PG, assuming that you will have the same
> issues in your test-cluster,
>
> cheers,
> Ansgar
>
> Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert 
> :
> >
> > Hi,
> >
> > I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says 
> > the snapshot could not be found although I have definitely run "ceph osd 
> > pool mksnap $POOL beforefixes" about three weeks ago.
> > When running rados list-inconsistent-obj $PG on one of the affected PGs, 
> > all of the objects returned have "snap" set to 1:
> >
> > root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do 
> > rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
> > [..]
> > {
> >   "name": "200020744f4.",
> >   "nspace": "",
> >   "locator": "",
> >   "snap": 1,
> >   "version": 5704208
> > }
> > {
> >   "name": "200021aeb16.",
> >   "nspace": "",
> >   "locator": "",
> >   "snap": 1,
> >   "version": 6189078
> > }
> > [..]
> >
> > Running listsnaps on any of them then looks like this:
> >
> > root@srv01:~# rados listsnaps 200020744f4. -p $POOL
> > 200020744f4.:
> > cloneidsnapssizeoverlap
> > 110[]
> > head-0
> >
> >
> > Is it save to assume that these objects belong to a somewhat broken 
> > snapshot and can be removed safely without causing further damage?
> >
> >
> > Thanks,
> >
> > Pascal
> >
> >
> >
> > Ansgar Jazdzewski wrote on 23.06.22 20:36:
> >
> > Hi,
> >
> > we could identify the rbd images that wehre affected and did an export 
> > before, but in the case of cephfs metadata i have no plan that will work.
> >
> > can you try to delete the snapshot?
> > also if the filesystem can be shutdown? try to do a backup of the 
> > metadatapool
> >
> > hope you will have some luck, let me know if I can help,
> > Ansgar
> >
> > Pascal Ehlert  schrieb am Do., 23. Juni 2022, 16:45:
> >>
> >> Hi Ansgar,
> >>
> >> Thank you very much for the response.
> >> Running your first command to obtain inconsistent objects, I retrieve a
> >> total of 23114 only some of which are snaps.
> >>
> >> You mentioning snapshots did remind me of the fact however that I
> >> created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
> >> mksnap" before I reduced the number of ranks.
> >> Maybe that has causes the inconsistencies and would explain why the
> >> actual file system appears unaffected?
> >>
> >> Is there any way to validate that theory? I am a bit hesitant to just
> >> run "rmsnap". Could that cause inconsistent data to be written back to
> >> the actual objects?
> >>
> >>
> >> Best regards,
> >>
> >> Pascal
> >>
> >>
> >>
> >> Ansgar Jazdzewski wrote on 23.06.22 16:11:
> >> > Hi Pascal,
> >> >
> >> > We just had a similar situation on our RBD and had found some bad data
> >> > in RADOS here is How we did it:
> >> >
> >> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
> >> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
> >> > -F'.' '{print $2}'; done
> >> >
> >> > we than found inconsistent snaps on the Object:
> >> >
> >> > rados list-inconsistent-snapset $PG --format=json-pretty | jq
> >> > .inconsistents[].name
> >> >
> >> > List the data on the OSD's (ceph pg map $PG)
> >> >
> >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
> >> > list ${OBJ} --pgid ${PG}
> >> >
> >> > and finally remove the object, like:
> >> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
> >> > list rbd_data.762a94d768c04d.0036b7ac --pgid
> >> > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
> >> > '["2.704",{"oid":"rbd_data.801e1d1d9c719d.00044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
> >> > remove
> >> >
> >> > we had to do it for all OSD one after the other after this a 'pg repair' 
> >> > worked
> >> >
> >> > i hope it will help
> >> > Ansgar
> >> >
> >> > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
> >> > :
> >> >> Hi Pascal,
> >> >>
> >> >> It's not clear to me how the upgrade procedure you described would
> >> >> lead to inconsistent PGs.
> >> >>
> >> >> Even if you didn't record every step, do you have the ceph.log, the
> >> >> mds logs, perhaps some osd logs from this time?
> >> >> And which versions did you upgrade from / to ?
> >> 

[ceph-users] Re: Inconsistent PGs after upgrade to Pacific

2022-06-24 Thread Ansgar Jazdzewski
Hi,

I would say yes but it would be nice if other people can confirm it too.

also can you create a test cluster and do the same tasks
* create it with octopus
* create snapshot
* reduce rank to 1
* upgrade to pacific

and then try to fix the PG, assuming that you will have the same
issues in your test-cluster,

cheers,
Ansgar

Am Do., 23. Juni 2022 um 22:12 Uhr schrieb Pascal Ehlert :
>
> Hi,
>
> I have now tried to "ceph osd pool rmsnap $POOL beforefixes" and it says the 
> snapshot could not be found although I have definitely run "ceph osd pool 
> mksnap $POOL beforefixes" about three weeks ago.
> When running rados list-inconsistent-obj $PG on one of the affected PGs, all 
> of the objects returned have "snap" set to 1:
>
> root@srv01:~# for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do 
> rados list-inconsistent-obj $i | jq -er .inconsistents[].object; done
> [..]
> {
>   "name": "200020744f4.",
>   "nspace": "",
>   "locator": "",
>   "snap": 1,
>   "version": 5704208
> }
> {
>   "name": "200021aeb16.",
>   "nspace": "",
>   "locator": "",
>   "snap": 1,
>   "version": 6189078
> }
> [..]
>
> Running listsnaps on any of them then looks like this:
>
> root@srv01:~# rados listsnaps 200020744f4. -p $POOL
> 200020744f4.:
> cloneidsnapssizeoverlap
> 110[]
> head-0
>
>
> Is it save to assume that these objects belong to a somewhat broken snapshot 
> and can be removed safely without causing further damage?
>
>
> Thanks,
>
> Pascal
>
>
>
> Ansgar Jazdzewski wrote on 23.06.22 20:36:
>
> Hi,
>
> we could identify the rbd images that wehre affected and did an export 
> before, but in the case of cephfs metadata i have no plan that will work.
>
> can you try to delete the snapshot?
> also if the filesystem can be shutdown? try to do a backup of the metadatapool
>
> hope you will have some luck, let me know if I can help,
> Ansgar
>
> Pascal Ehlert  schrieb am Do., 23. Juni 2022, 16:45:
>>
>> Hi Ansgar,
>>
>> Thank you very much for the response.
>> Running your first command to obtain inconsistent objects, I retrieve a
>> total of 23114 only some of which are snaps.
>>
>> You mentioning snapshots did remind me of the fact however that I
>> created a snapshot on the Ceph metadata pool via "ceph osd pool $POOL
>> mksnap" before I reduced the number of ranks.
>> Maybe that has causes the inconsistencies and would explain why the
>> actual file system appears unaffected?
>>
>> Is there any way to validate that theory? I am a bit hesitant to just
>> run "rmsnap". Could that cause inconsistent data to be written back to
>> the actual objects?
>>
>>
>> Best regards,
>>
>> Pascal
>>
>>
>>
>> Ansgar Jazdzewski wrote on 23.06.22 16:11:
>> > Hi Pascal,
>> >
>> > We just had a similar situation on our RBD and had found some bad data
>> > in RADOS here is How we did it:
>> >
>> > for i in $(rados list-inconsistent-pg $POOL | jq -er .[]); do rados
>> > list-inconsistent-obj $i | jq -er .inconsistents[].object.name| awk
>> > -F'.' '{print $2}'; done
>> >
>> > we than found inconsistent snaps on the Object:
>> >
>> > rados list-inconsistent-snapset $PG --format=json-pretty | jq
>> > .inconsistents[].name
>> >
>> > List the data on the OSD's (ceph pg map $PG)
>> >
>> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-${OSD}/ --op
>> > list ${OBJ} --pgid ${PG}
>> >
>> > and finally remove the object, like:
>> > ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/ --op
>> > list rbd_data.762a94d768c04d.0036b7ac --pgid
>> > 2.704ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-459/
>> > '["2.704",{"oid":"rbd_data.801e1d1d9c719d.00044943","key":"","snapid":125458,"hash":4136961796,"max":0,"pool":2,"namespace":"","max":0}]'
>> > remove
>> >
>> > we had to do it for all OSD one after the other after this a 'pg repair' 
>> > worked
>> >
>> > i hope it will help
>> > Ansgar
>> >
>> > Am Do., 23. Juni 2022 um 15:02 Uhr schrieb Dan van der Ster
>> > :
>> >> Hi Pascal,
>> >>
>> >> It's not clear to me how the upgrade procedure you described would
>> >> lead to inconsistent PGs.
>> >>
>> >> Even if you didn't record every step, do you have the ceph.log, the
>> >> mds logs, perhaps some osd logs from this time?
>> >> And which versions did you upgrade from / to ?
>> >>
>> >> Cheers, Dan
>> >>
>> >> On Wed, Jun 22, 2022 at 7:41 PM Pascal Ehlert  wrote:
>> >>> Hi all,
>> >>>
>> >>> I am currently battling inconsistent PGs after a far-reaching mistake
>> >>> during the upgrade from Octopus to Pacific.
>> >>> While otherwise following the guide, I restarted the Ceph MDS daemons
>> >>> (and this started the Pacific daemons) without previously reducing the
>> >>> ranks to 1 (from 2).
>> >>>
>> >>> This resulted in daemons not coming up and reporting inconsistencies.
>> >>> After later reducing the ranks and bringing the MDS back up (I did not
>> >>> record every step as this was an emergency situation), we started seeing
>> >>>