[ceph-users] Re: MDS crashes to damaged metadata

2023-01-08 Thread Venky Shankar
Hi Felix,

On Thu, Dec 15, 2022 at 8:03 PM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went 
> smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime 
> is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> thursday at 13:05 a snapshot was created and our active mds crashed once at 
> this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 
> /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
> ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
> 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
> LOCK_XLOCK || state LOCK_XLOCKDONE)

This crash is the same as detailed in
https://tracker.ceph.com/issues/49132. Fix is under backport to p/q
releases.

>
> 12 Minutes lates the unlink_local error crashes appeared again. This time 
> with a new file. During debugging we noticed a MTU mismatch between MDS 
> (1500) and client (9000) with cephfs kernel mount. The client is also 
> creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I 
> uploaded the mds logs of the first crash along with the information above to 
> https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
> 1500 on all nodes in the ceph public network on the weekend also.
>
> If you need a debug level 20 log of the ScatterLock for further analysis, i 
> could schedule snapshots at the end of our workdays and increase the debug 
> level 5 Minutes arround snap shot creation.
>
> Regards
> Felix
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
> Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior
> -
> -
>
> Am 02.12.2022 um 20:08 schrieb Patrick Donnelly :
>
> On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix  wrote:
>
> Script is running for ~2 hours and according to the line count in the memo 
> file we are at 40% (cephfs is still online).
>
> We had to modify the script putting a try/catch arround the for loop in line 
> 78 to 87. For some reasons there are some objects (186 at this moment) which 
> throw an UnicodeDecodeError exception during the iteration:
>
>  Traceback (most recent call 
> last): File "first-damage.py", line 138, in  traverse(f, ioctx) File 
> "first-damage.py", line 79, in traverse for (dnk, val) in it: File 
> "rados.pyx", line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line 
> 311, in rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode 
> bytes in position 10-11: invalid continuation byte
>
> Don’t know if this is because of the filesystem still running. We saved the 
> object names in a separate file and i will investigate further tomorrow. We 
> should be able to modify the script to only check for the objects which threw 
> the exception instead of searching through the whole pool again.
>
> That shouldn't be caused by teh fs running. It may be you have some
> file names which have invalid unicode characters?
>
> Regarding the mds logfiles with debug 20:
> We cannot run this debug level for longer than one hour since the logfile 
> size increase is to high for the local storage on the mds servers where logs 
> are stored (don’t have a central logging yet).
>
> Okay.
>
> But if you are just interested in the time frame arround the crash, i could 
> set the debug level to 20, trigger the crash on the weekend and sent you the 
> logs.
>
> The crash is unlikely to point to what causes the corruption. I was
> hoping we could locate an instance of damage while the MDS is running.
>
> Regards Felix
>
>
> -
> -
> Forschungszentrum Juelich GmbH
> 52425 Juelich
> Sitz der Gesellschaft: Juelich
> Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Volker Rieke
> Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),

[ceph-users] Serious cluster issue - Incomplete PGs

2023-01-08 Thread Deep Dish
Hello.   I really screwed up my ceph cluster.   Hoping to get data off it
so I can rebuild it.

In summary, too many changes too quickly caused the cluster to develop
incomplete pgs.  Some PGS were reporting that OSDs were to be probes.
I've created those OSD IDs (empty), however this wouldn't clear
incompletes.   Incompletes are part of EC pools.  Running 17.2.5.

This is the overall state:

  cluster:

id: 49057622-69fc-11ed-b46e-d5acdedaae33

health: HEALTH_WARN

Failed to apply 1 service(s): osd.dashboard-admin-1669078094056

1 hosts fail cephadm check

cephadm background work is paused

Reduced data availability: 28 pgs inactive, 28 pgs incomplete

Degraded data redundancy: 55 pgs undersized

2 slow ops, oldest one blocked for 4449 sec, daemons
[osd.25,osd.50,osd.51] have slow ops.



These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph
pg ls incomplete ]:

2.35 23199 0  00  959802736640
  0  2477   incomplete10s  2104'46277   28260:686871
 [44,4,37,3,40,32]p44[44,4,37,3,40,32]p44
 2023-01-03T03:54:47.821280+  2022-12-29T18:53:09.287203+
14  queued for deep scrub
2.53 22821 0  00  944011755520
  0  2745  remapped+incomplete10s  2104'45845   28260:565267
[60,48,52,65,67,7]p60 [60]p60
 2023-01-03T10:18:13.388383+  2023-01-03T10:18:13.388383+
   408  queued for scrub
2.9f 22858 0  00  945559838720
  0  2736  remapped+incomplete10s  2104'45636   28260:759872
 [56,59,3,57,5,32]p56 [56]p56
 2023-01-03T10:55:49.848693+  2023-01-03T10:55:49.848693+
   376  queued for scrub
2.be 22870 0  00  944291102720
  0  2661  remapped+incomplete10s  2104'45561   28260:813759
 [41,31,37,9,7,69]p41 [41]p41
 2023-01-03T14:02:15.790077+  2023-01-03T14:02:15.790077+
   360  queued for scrub
2.e4 22953 0  00  949122785280
  0  2648  remapped+incomplete20m  2104'46048   28259:732896
[37,46,33,4,48,49]p37 [37]p37
 2023-01-02T18:38:46.268723+  2022-12-29T18:05:47.431468+
18  queued for deep scrub
17.7820169 0  00  845178344000
  0  2198  remapped+incomplete10s  3735'53405  28260:1243673
 [4,37,2,36,66,0]p4 [41]p41
 2023-01-03T14:21:41.563424+  2023-01-03T14:21:41.563424+
   348  queued for scrub
17.d820328 0  00  851960531300
  0  1852  remapped+incomplete10s  3735'54458  28260:1309564
 [38,65,61,37,58,39]p38 [53]p53
 2023-01-02T18:32:35.371071+  2022-12-28T19:08:29.492244+
21  queued for deep scrub

At present I'm unable to reliably access my data due to incomplete pages
above.  I'll post whatever outputs requested (won't post now as it can be
rather verbose).  Is there hope?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph-users list archive missing almost all mail

2023-01-08 Thread Matthias Ferdinand
Hi,

I found some mailing list archive links from my notes to throw "Page not
found" errors, e.g. 
https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/J4U24YRJEJWSSMZVEVKQYQFTFNUGIG3N/

Looking around in the archive web interface, it appears only some of the
most recent threads are found, everything else says "no email threads
could be found for this month".

Could somebody please look into this?


Regards
Matthias Ferdinand
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 16.2.11 pacific QE validation status

2023-01-08 Thread Yuri Weinstein
Happy New Year all!

This release remains to be in "progress"/"on hold" status as we are
sorting all infrastructure-related issues.

Unless I hear objections, I suggest doing a full rebase/retest QE
cycle (adding PRs merged lately) since it's taking much longer than
anticipated when sepia is back online.

Objections?

Thx
YuriW

On Thu, Dec 15, 2022 at 9:14 AM Yuri Weinstein  wrote:
>
> Details of this release are summarized here:
>
> https://tracker.ceph.com/issues/58257#note-1
> Release Notes - TBD
>
> Seeking approvals for:
>
> rados - Neha (https://github.com/ceph/ceph/pull/49431 is still being
> tested and will be merged soon)
> rook - Sébastien Han
> cephadm - Adam
> dashboard - Ernesto
> rgw - Casey (rwg will be rerun on the latest SHA1)
> rbd - Ilya, Deepika
> krbd - Ilya, Deepika
> fs - Venky, Patrick
> upgrade/nautilus-x (pacific) - Neha, Laura
> upgrade/octopus-x (pacific) - Neha, Laura
> upgrade/pacific-p2p - Neha - Neha, Laura
> powercycle - Brad
> ceph-volume - Guillaume, Adam K
>
> Thx
> YuriW
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Serious cluster issue - data inaccessible

2023-01-08 Thread Deep Dish
Hello.   I really screwed up my ceph cluster.   Hoping to get data off it
so I can rebuild it.

In summary, too many changes too quickly caused the cluster to develop
incomplete pgs.  Some PGS were reporting that OSDs were to be probes.
I've created those OSD IDs (empty), however this wouldn't clear
incompletes.   Incompletes are part of EC pools.  Running 17.2.5.

This is the overall state:

  cluster:

id: 49057622-69fc-11ed-b46e-d5acdedaae33

health: HEALTH_WARN

Failed to apply 1 service(s): osd.dashboard-admin-1669078094056

1 hosts fail cephadm check

cephadm background work is paused

Reduced data availability: 28 pgs inactive, 28 pgs incomplete

Degraded data redundancy: 55 pgs undersized

2 slow ops, oldest one blocked for 4449 sec, daemons
[osd.25,osd.50,osd.51] have slow ops.



These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph
pg ls incomplete ]:

2.35 23199 0  00  959802736640
  0  2477   incomplete10s  2104'46277   28260:686871
 [44,4,37,3,40,32]p44[44,4,37,3,40,32]p44
 2023-01-03T03:54:47.821280+  2022-12-29T18:53:09.287203+
14  queued for deep scrub
2.53 22821 0  00  944011755520
  0  2745  remapped+incomplete10s  2104'45845   28260:565267
[60,48,52,65,67,7]p60 [60]p60
 2023-01-03T10:18:13.388383+  2023-01-03T10:18:13.388383+
   408  queued for scrub
2.9f 22858 0  00  945559838720
  0  2736  remapped+incomplete10s  2104'45636   28260:759872
 [56,59,3,57,5,32]p56 [56]p56
 2023-01-03T10:55:49.848693+  2023-01-03T10:55:49.848693+
   376  queued for scrub
2.be 22870 0  00  944291102720
  0  2661  remapped+incomplete10s  2104'45561   28260:813759
 [41,31,37,9,7,69]p41 [41]p41
 2023-01-03T14:02:15.790077+  2023-01-03T14:02:15.790077+
   360  queued for scrub
2.e4 22953 0  00  949122785280
  0  2648  remapped+incomplete20m  2104'46048   28259:732896
[37,46,33,4,48,49]p37 [37]p37
 2023-01-02T18:38:46.268723+  2022-12-29T18:05:47.431468+
18  queued for deep scrub
17.7820169 0  00  845178344000
  0  2198  remapped+incomplete10s  3735'53405  28260:1243673
 [4,37,2,36,66,0]p4 [41]p41
 2023-01-03T14:21:41.563424+  2023-01-03T14:21:41.563424+
   348  queued for scrub
17.d820328 0  00  851960531300
  0  1852  remapped+incomplete10s  3735'54458  28260:1309564
 [38,65,61,37,58,39]p38 [53]p53
 2023-01-02T18:32:35.371071+  2022-12-28T19:08:29.492244+
21  queued for deep scrub

At present I'm unable to reliably access my data due to incomplete pages
above.  I'll post whatever outputs requested (won't post now as it can be
rather verbose).  Is there hope?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Setting Prometheus retention_time

2023-01-08 Thread Eugen Block
Hi, I noticed the same and created a tracker issue:  
https://tracker.ceph.com/issues/58262



Zitat von Robert Sander :


Hi,

The Quincy documentation shows that we could set the Prometheus  
retention_time within a service specification:


https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#setting-up-prometheus

When trying this "ceph orch apply" only shows:

Error EINVAL: ServiceSpec: __init__() got an unexpected keyword  
argument 'retention_time'


It looks like release 17.2.5 does not contain this code yet.

Why is the content of the documentation already online when  
https://github.com/ceph/ceph/pull/47943 has not been released yet?


Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Setting Prometheus retention_time

2023-01-08 Thread Robert Sander

Hi,

The Quincy documentation shows that we could set the Prometheus 
retention_time within a service specification:


https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#setting-up-prometheus

When trying this "ceph orch apply" only shows:

Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 
'retention_time'


It looks like release 17.2.5 does not contain this code yet.

Why is the content of the documentation already online when 
https://github.com/ceph/ceph/pull/47943 has not been released yet?


Regards
--
Robert Sander
Heinlein Support GmbH
Linux: Akademie - Support - Hosting
http://www.heinlein-support.de

Tel: 030-405051-43
Fax: 030-405051-19

Zwangsangaben lt. §35a GmbHG:
HRB 93818 B / Amtsgericht Berlin-Charlottenburg,
Geschäftsführer: Peer Heinlein  -- Sitz: Berlin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Erasing Disk to the initial state

2023-01-08 Thread Michel Niyoyita
Hello team,

I have deployed ceph cluster in production , the cluster composed by two
types of disks HDD and SSD , and the cluster was deployed  using
ceph-ansible , unfortunately after deployment the HDD disks appear only
without SSD , would like to restart deployment from scratch ,  but I miss
the way on how to erase disk to the initial state . try to format disks but
LVM comeback with disks.

sda
8:00   7.3T  0 disk
└─ceph--da4a5d58--73ef--473b--9960--371f837cb5ed-osd--block--6e800937--c4d2--4fc9--84ca--083c39d057a8
253:10   7.3T  0 lvm
sdb
8:16   0   7.3T  0 disk
└─ceph--773f50a1--79ed--4908--8f81--74f85efeb473-osd--block--9737a046--ba8b--4494--91f7--b80dd894df0b
253:70   7.3T  0 lvm
sdc
8:32   0   7.3T  0 disk
└─ceph--02000cec--fdbc--4def--967e--a7c32c851964-osd--block--c54d8182--b5e7--4c73--8d7b--7d24c7a3ce15
253:60   7.3T  0 lvm


Kindly help me to sort this out.

Best regards
Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MDS crashes to damaged metadata

2023-01-08 Thread Patrick Donnelly
On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix  wrote:
>
> Hi Patrick,
>
> we used your script to repair the damaged objects on the weekend and it went 
> smoothly. Thanks for your support.
>
> We adjusted your script to scan for damaged files on a daily basis, runtime 
> is about 6h. Until thursday last week, we had exactly the same 17 Files. On 
> thursday at 13:05 a snapshot was created and our active mds crashed once at 
> this time (snapshot was created):
>
> 2022-12-08T13:05:48.919+0100 7f440afec700 -1 
> /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void 
> ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time 
> 2022-12-08T13:05:48.921223+0100
> /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state 
> LOCK_XLOCK || state LOCK_XLOCKDONE)
>
> 12 Minutes lates the unlink_local error crashes appeared again. This time 
> with a new file. During debugging we noticed a MTU mismatch between MDS 
> (1500) and client (9000) with cephfs kernel mount. The client is also 
> creating the snapshots via mkdir in the .snap directory.
>
> We disabled snapshot creation for now, but really need this feature. I 
> uploaded the mds logs of the first crash along with the information above to 
> https://tracker.ceph.com/issues/38452
>
> I would greatly appreciate it, if you could answer me the following question:
>
> Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to 
> 1500 on all nodes in the ceph public network on the weekend also.

I doubt it.

> If you need a debug level 20 log of the ScatterLock for further analysis, i 
> could schedule snapshots at the end of our workdays and increase the debug 
> level 5 Minutes arround snap shot creation.

This would be very helpful!

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] test message

2023-01-08 Thread Joe Comeau
 
Hi 
Just testing as I have not received a message from the list in a couple days
 
Thanks Joe
 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Missing SSDs disk on ceph deployment

2023-01-08 Thread Michel Niyoyita
Dear team,

Kindly help on this , I am completely blocked.

Best Regards

Michel

On Thu, Jan 5, 2023 at 2:45 PM Michel Niyoyita  wrote:

> Dear team,
>
> I have deployed the ceph cluster in production using ceph-ansible on
> ubuntu OS 20.04 it consists of 3 monitors and 3 osds (each osd has 20 disks
> , 16 hdd and 4 ssd) after deployment the ceph cluster healthy is OK , but
> instead of getting 60 disk it appears 48 disks which are hdd only
> (according to the output of ceph osd df tree)  below are output of ceph osd
> df tree and lsblk command.
>
> hdd: 7.71349 T
> ssd: 7 T
>
> ceph osd df tree
> root@ceph-mon3:~# ceph osd df tree
> ID  CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP  META
> AVAIL%USE  VAR   PGS  STATUS  TYPE NAME
> -1 370.24731 -  370 TiB   21 TiB  249 MiB   0 B  1.2 GiB
>  349 TiB  5.66  1.00-  root default
> -5 123.41577 -  123 TiB  7.0 TiB   83 MiB   0 B  403 MiB
>  116 TiB  5.66  1.00-  host ceph-osd1
>  0hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   30 MiB
>  7.3 TiB  5.66  1.006  up  osd.0
>  3hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.008  up  osd.3
>  6hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   35 MiB
>  7.3 TiB  5.66  1.00   12  up  osd.6
>  9hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   21 MiB
>  7.3 TiB  5.66  1.004  up  osd.9
> 12hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.006  up  osd.12
> 16hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   21 MiB
>  7.3 TiB  5.66  1.003  up  osd.16
> 18hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.005  up  osd.18
> 21hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   21 MiB
>  7.3 TiB  5.66  1.006  up  osd.21
> 24hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   35 MiB
>  7.3 TiB  5.66  1.007  up  osd.24
> 27hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.007  up  osd.27
> 30hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.007  up  osd.30
> 33hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.007  up  osd.33
> 36hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.005  up  osd.36
> 39hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.009  up  osd.39
> 42hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   30 MiB
>  7.3 TiB  5.66  1.006  up  osd.42
> 45hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.007  up  osd.45
> -3 123.41577 -  123 TiB  7.0 TiB   83 MiB   0 B  397 MiB
>  116 TiB  5.66  1.00-  host ceph-osd2
>  1hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   26 MiB
>  7.3 TiB  5.66  1.007  up  osd.1
>  5hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   26 MiB
>  7.3 TiB  5.66  1.007  up  osd.5
>  8hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   26 MiB
>  7.3 TiB  5.66  1.006  up  osd.8
> 11hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   21 MiB
>  7.3 TiB  5.66  1.005  up  osd.11
> 14hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   25 MiB
>  7.3 TiB  5.66  1.007  up  osd.14
> 15hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   26 MiB
>  7.3 TiB  5.66  1.00   12  up  osd.15
> 19hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.005  up  osd.19
> 22hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   26 MiB
>  7.3 TiB  5.66  1.004  up  osd.22
> 25hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.002  up  osd.25
> 28hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   21 MiB
>  7.3 TiB  5.66  1.003  up  osd.28
> 31hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   20 MiB
>  7.3 TiB  5.66  1.006  up  osd.31
> 34hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   30 MiB
>  7.3 TiB  5.66  1.007  up  osd.34
> 37hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   30 MiB
>  7.3 TiB  5.66  1.008  up  osd.37
> 40hdd7.71349   1.0  7.7 TiB  447 GiB  5.2 MiB   0 B   30 MiB
>  7.3 TiB  5.66  1.008  up  osd.40
> 43hdd7.71349   1.0  7.7 TiB  447 

[ceph-users] Re: docs.ceph.com -- Do you use the header navigation bar? (RESPONSES REQUESTED)

2023-01-08 Thread John Mulligan
On Wednesday, January 4, 2023 10:35:56 AM EST John Zachary Dover wrote:
> Do you use the header navigation bar on docs.ceph.com? See the attached
> file (sticky_header.png) if you are unsure of what "header navigation bar"
> means. In the attached file, the header navigation bar is indicated by
> means of two large, ugly, red-and-green arrows.
> 
> *Cards on the Table*
> The navigation bar is the kind of thing that is sometimes referred to as a
> "sticky header", and it can get in the way of linked-to sections. I would
> like to remove this header bar. If there is community support for the
> header bar, though, I won't remove it.
> 
> *What is Zac Complaining About?*
> Follow this procedure to see the behavior that has provoked my complaint:
> 
>1. Go to https://docs.ceph.com/en/quincy/glossary/
>2. Scroll down to the "Ceph Cluster Map" entry.
>3. Click the "Cluster Map" link in the line that reads "See Cluster Map".
> 4. Notice that the header navigation bar obscures the headword "Cluster
> Map".
> 
> If you have any opinion at all on this matter, voice it. Please.
> 

FWIW I am not able to reproduce the problem you are describing. In all cases 
the thin blue-green bar appeared above the  term with the selected anchor 
link.

I tried Firefox (108, Linux), Chromium (107, Linux) and for giggles Firefox on 
Android.  In all cases things looked fine to me and the selected term was not 
hidden by that nav bar.  I share because I was surprised by the result given 
that others on the list seem to see the problem.  But I also don't see what I 
would describe as "two large, ugly, red-and-green arrows."  Perhaps the page 
is rendering differently for some people and we don't hit the issue in that 
case?

PS. I also didn't see the png file in question. Perhaps this list strips 
attachments?



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] VolumeGroup must have a non-empty name / 17.2.5

2023-01-08 Thread Peter Eisch
Hi,

I updated from pacific 16.2.10 to 17.2.5 and the orchestration update went 
perfectly.  Very impressive.

I have one host which then started throwing a cephadm warning after the upgrade.

2023-01-07 11:17:50,080 7f0b26c8ab80 INFO Non-zero exit code 1 from 
/usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host 
--entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e 
CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45
 -e NODE_NAME=kelli.domain.name -e CEPH_USE_RANDOM_NONCE=1 -e 
CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v 
/var/run/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca:/var/run/ceph:z -v 
/var/log/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca:/var/log/ceph:z -v 
/var/lib/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca/crash:/var/lib/ceph/crash:z 
-v /run/systemd/journal:/run/systemd/journal -v /dev:/dev -v 
/run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v 
/run/lock/lvm:/run/lock/lvm -v 
/var/lib/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca/selinux:/sys/fs/selinux:ro 
-v /:/rootfs -v /tmp/ceph-tmpltrnmxf8:/etc/ceph/ceph.conf:z 
quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45
 inventory --format=json-pretty --filter-for-batch
2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr Traceback 
(most recent call last):
2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/sbin/ceph-volume", line 11, in 
2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')()
2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__
2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
self.main(self.argv)
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in 
newfunc
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr return 
f(*a, **kw)
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
terminal.dispatch(self.mapper, subcommand_args)
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in 
dispatch
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
instance.main()
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/inventory/main.py", line 53, in 
main
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
with_lsm=self.args.with_lsm))
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 39, in 
__init__
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr 
all_devices_vgs = lvm.get_all_devices_vgs()
2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in 
get_all_devices_vgs
2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr return 
[VolumeGroup(**vg) for vg in vgs]
2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in 

2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr return 
[VolumeGroup(**vg) for vg in vgs]
2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr   File 
"/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 517, in __init__
2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr raise 
ValueError('VolumeGroup must have a non-empty name')
2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr ValueError: 
VolumeGroup must have a non-empty name

This host is the only one which has 14 drives which aren't being used.  I'm 
guessing this is why its getting this error.  The drives may have been used 
previous in a cluster (maybe not the same cluster) or something.  I don't know.

Any suggestions for what to try to get past this issue?

peter


Peter Eisch
DevOps Manager
peter.ei...@virginpulse.com
T1.612.445.5135
Confidentiality Notice: This email was sent securely using Transport Layer 
Security (TLS) Encryption. Please ensure your email systems support TLS before 
replying with any confidential information. The information contained in this 
e-mail, including any attachment(s), is intended solely for use by the 
designated recipient(s). Unauthorized use, dissemination, distribution, or 
reproduction of this message by anyone other than the intended recipient(s), or 
a person designated as responsible