[ceph-users] Re: MDS crashes to damaged metadata
Hi Felix, On Thu, Dec 15, 2022 at 8:03 PM Stolte, Felix wrote: > > Hi Patrick, > > we used your script to repair the damaged objects on the weekend and it went > smoothly. Thanks for your support. > > We adjusted your script to scan for damaged files on a daily basis, runtime > is about 6h. Until thursday last week, we had exactly the same 17 Files. On > thursday at 13:05 a snapshot was created and our active mds crashed once at > this time (snapshot was created): > > 2022-12-08T13:05:48.919+0100 7f440afec700 -1 > /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void > ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time > 2022-12-08T13:05:48.921223+0100 > /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state > LOCK_XLOCK || state LOCK_XLOCKDONE) This crash is the same as detailed in https://tracker.ceph.com/issues/49132. Fix is under backport to p/q releases. > > 12 Minutes lates the unlink_local error crashes appeared again. This time > with a new file. During debugging we noticed a MTU mismatch between MDS > (1500) and client (9000) with cephfs kernel mount. The client is also > creating the snapshots via mkdir in the .snap directory. > > We disabled snapshot creation for now, but really need this feature. I > uploaded the mds logs of the first crash along with the information above to > https://tracker.ceph.com/issues/38452 > > I would greatly appreciate it, if you could answer me the following question: > > Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to > 1500 on all nodes in the ceph public network on the weekend also. > > If you need a debug level 20 log of the ScatterLock for further analysis, i > could schedule snapshots at the end of our workdays and increase the debug > level 5 Minutes arround snap shot creation. > > Regards > Felix > - > - > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt, > Dr. Astrid Lambrecht, Prof. Dr. Frauke Melchior > - > - > > Am 02.12.2022 um 20:08 schrieb Patrick Donnelly : > > On Thu, Dec 1, 2022 at 5:08 PM Stolte, Felix wrote: > > Script is running for ~2 hours and according to the line count in the memo > file we are at 40% (cephfs is still online). > > We had to modify the script putting a try/catch arround the for loop in line > 78 to 87. For some reasons there are some objects (186 at this moment) which > throw an UnicodeDecodeError exception during the iteration: > > Traceback (most recent call > last): File "first-damage.py", line 138, in traverse(f, ioctx) File > "first-damage.py", line 79, in traverse for (dnk, val) in it: File > "rados.pyx", line 1382, in rados.OmapIterator.__next__ File "rados.pyx", line > 311, in rados.decode_cstr UnicodeDecodeError: 'utf-8' codec can't decode > bytes in position 10-11: invalid continuation byte > > Don’t know if this is because of the filesystem still running. We saved the > object names in a separate file and i will investigate further tomorrow. We > should be able to modify the script to only check for the objects which threw > the exception instead of searching through the whole pool again. > > That shouldn't be caused by teh fs running. It may be you have some > file names which have invalid unicode characters? > > Regarding the mds logfiles with debug 20: > We cannot run this debug level for longer than one hour since the logfile > size increase is to high for the local storage on the mds servers where logs > are stored (don’t have a central logging yet). > > Okay. > > But if you are just interested in the time frame arround the crash, i could > set the debug level to 20, trigger the crash on the weekend and sent you the > logs. > > The crash is unlikely to point to what causes the corruption. I was > hoping we could locate an instance of damage while the MDS is running. > > Regards Felix > > > - > - > Forschungszentrum Juelich GmbH > 52425 Juelich > Sitz der Gesellschaft: Juelich > Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Volker Rieke > Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
[ceph-users] Serious cluster issue - Incomplete PGs
Hello. I really screwed up my ceph cluster. Hoping to get data off it so I can rebuild it. In summary, too many changes too quickly caused the cluster to develop incomplete pgs. Some PGS were reporting that OSDs were to be probes. I've created those OSD IDs (empty), however this wouldn't clear incompletes. Incompletes are part of EC pools. Running 17.2.5. This is the overall state: cluster: id: 49057622-69fc-11ed-b46e-d5acdedaae33 health: HEALTH_WARN Failed to apply 1 service(s): osd.dashboard-admin-1669078094056 1 hosts fail cephadm check cephadm background work is paused Reduced data availability: 28 pgs inactive, 28 pgs incomplete Degraded data redundancy: 55 pgs undersized 2 slow ops, oldest one blocked for 4449 sec, daemons [osd.25,osd.50,osd.51] have slow ops. These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph pg ls incomplete ]: 2.35 23199 0 00 959802736640 0 2477 incomplete10s 2104'46277 28260:686871 [44,4,37,3,40,32]p44[44,4,37,3,40,32]p44 2023-01-03T03:54:47.821280+ 2022-12-29T18:53:09.287203+ 14 queued for deep scrub 2.53 22821 0 00 944011755520 0 2745 remapped+incomplete10s 2104'45845 28260:565267 [60,48,52,65,67,7]p60 [60]p60 2023-01-03T10:18:13.388383+ 2023-01-03T10:18:13.388383+ 408 queued for scrub 2.9f 22858 0 00 945559838720 0 2736 remapped+incomplete10s 2104'45636 28260:759872 [56,59,3,57,5,32]p56 [56]p56 2023-01-03T10:55:49.848693+ 2023-01-03T10:55:49.848693+ 376 queued for scrub 2.be 22870 0 00 944291102720 0 2661 remapped+incomplete10s 2104'45561 28260:813759 [41,31,37,9,7,69]p41 [41]p41 2023-01-03T14:02:15.790077+ 2023-01-03T14:02:15.790077+ 360 queued for scrub 2.e4 22953 0 00 949122785280 0 2648 remapped+incomplete20m 2104'46048 28259:732896 [37,46,33,4,48,49]p37 [37]p37 2023-01-02T18:38:46.268723+ 2022-12-29T18:05:47.431468+ 18 queued for deep scrub 17.7820169 0 00 845178344000 0 2198 remapped+incomplete10s 3735'53405 28260:1243673 [4,37,2,36,66,0]p4 [41]p41 2023-01-03T14:21:41.563424+ 2023-01-03T14:21:41.563424+ 348 queued for scrub 17.d820328 0 00 851960531300 0 1852 remapped+incomplete10s 3735'54458 28260:1309564 [38,65,61,37,58,39]p38 [53]p53 2023-01-02T18:32:35.371071+ 2022-12-28T19:08:29.492244+ 21 queued for deep scrub At present I'm unable to reliably access my data due to incomplete pages above. I'll post whatever outputs requested (won't post now as it can be rather verbose). Is there hope? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph-users list archive missing almost all mail
Hi, I found some mailing list archive links from my notes to throw "Page not found" errors, e.g. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/J4U24YRJEJWSSMZVEVKQYQFTFNUGIG3N/ Looking around in the archive web interface, it appears only some of the most recent threads are found, everything else says "no email threads could be found for this month". Could somebody please look into this? Regards Matthias Ferdinand ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: 16.2.11 pacific QE validation status
Happy New Year all! This release remains to be in "progress"/"on hold" status as we are sorting all infrastructure-related issues. Unless I hear objections, I suggest doing a full rebase/retest QE cycle (adding PRs merged lately) since it's taking much longer than anticipated when sepia is back online. Objections? Thx YuriW On Thu, Dec 15, 2022 at 9:14 AM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/58257#note-1 > Release Notes - TBD > > Seeking approvals for: > > rados - Neha (https://github.com/ceph/ceph/pull/49431 is still being > tested and will be merged soon) > rook - Sébastien Han > cephadm - Adam > dashboard - Ernesto > rgw - Casey (rwg will be rerun on the latest SHA1) > rbd - Ilya, Deepika > krbd - Ilya, Deepika > fs - Venky, Patrick > upgrade/nautilus-x (pacific) - Neha, Laura > upgrade/octopus-x (pacific) - Neha, Laura > upgrade/pacific-p2p - Neha - Neha, Laura > powercycle - Brad > ceph-volume - Guillaume, Adam K > > Thx > YuriW ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Serious cluster issue - data inaccessible
Hello. I really screwed up my ceph cluster. Hoping to get data off it so I can rebuild it. In summary, too many changes too quickly caused the cluster to develop incomplete pgs. Some PGS were reporting that OSDs were to be probes. I've created those OSD IDs (empty), however this wouldn't clear incompletes. Incompletes are part of EC pools. Running 17.2.5. This is the overall state: cluster: id: 49057622-69fc-11ed-b46e-d5acdedaae33 health: HEALTH_WARN Failed to apply 1 service(s): osd.dashboard-admin-1669078094056 1 hosts fail cephadm check cephadm background work is paused Reduced data availability: 28 pgs inactive, 28 pgs incomplete Degraded data redundancy: 55 pgs undersized 2 slow ops, oldest one blocked for 4449 sec, daemons [osd.25,osd.50,osd.51] have slow ops. These are PGs that are incomplete that HAVE DATA (Objects > 0) [ via ceph pg ls incomplete ]: 2.35 23199 0 00 959802736640 0 2477 incomplete10s 2104'46277 28260:686871 [44,4,37,3,40,32]p44[44,4,37,3,40,32]p44 2023-01-03T03:54:47.821280+ 2022-12-29T18:53:09.287203+ 14 queued for deep scrub 2.53 22821 0 00 944011755520 0 2745 remapped+incomplete10s 2104'45845 28260:565267 [60,48,52,65,67,7]p60 [60]p60 2023-01-03T10:18:13.388383+ 2023-01-03T10:18:13.388383+ 408 queued for scrub 2.9f 22858 0 00 945559838720 0 2736 remapped+incomplete10s 2104'45636 28260:759872 [56,59,3,57,5,32]p56 [56]p56 2023-01-03T10:55:49.848693+ 2023-01-03T10:55:49.848693+ 376 queued for scrub 2.be 22870 0 00 944291102720 0 2661 remapped+incomplete10s 2104'45561 28260:813759 [41,31,37,9,7,69]p41 [41]p41 2023-01-03T14:02:15.790077+ 2023-01-03T14:02:15.790077+ 360 queued for scrub 2.e4 22953 0 00 949122785280 0 2648 remapped+incomplete20m 2104'46048 28259:732896 [37,46,33,4,48,49]p37 [37]p37 2023-01-02T18:38:46.268723+ 2022-12-29T18:05:47.431468+ 18 queued for deep scrub 17.7820169 0 00 845178344000 0 2198 remapped+incomplete10s 3735'53405 28260:1243673 [4,37,2,36,66,0]p4 [41]p41 2023-01-03T14:21:41.563424+ 2023-01-03T14:21:41.563424+ 348 queued for scrub 17.d820328 0 00 851960531300 0 1852 remapped+incomplete10s 3735'54458 28260:1309564 [38,65,61,37,58,39]p38 [53]p53 2023-01-02T18:32:35.371071+ 2022-12-28T19:08:29.492244+ 21 queued for deep scrub At present I'm unable to reliably access my data due to incomplete pages above. I'll post whatever outputs requested (won't post now as it can be rather verbose). Is there hope? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Setting Prometheus retention_time
Hi, I noticed the same and created a tracker issue: https://tracker.ceph.com/issues/58262 Zitat von Robert Sander : Hi, The Quincy documentation shows that we could set the Prometheus retention_time within a service specification: https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#setting-up-prometheus When trying this "ceph orch apply" only shows: Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 'retention_time' It looks like release 17.2.5 does not contain this code yet. Why is the content of the documentation already online when https://github.com/ceph/ceph/pull/47943 has not been released yet? Regards -- Robert Sander Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030-405051-43 Fax: 030-405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Setting Prometheus retention_time
Hi, The Quincy documentation shows that we could set the Prometheus retention_time within a service specification: https://docs.ceph.com/en/quincy/cephadm/services/monitoring/#setting-up-prometheus When trying this "ceph orch apply" only shows: Error EINVAL: ServiceSpec: __init__() got an unexpected keyword argument 'retention_time' It looks like release 17.2.5 does not contain this code yet. Why is the content of the documentation already online when https://github.com/ceph/ceph/pull/47943 has not been released yet? Regards -- Robert Sander Heinlein Support GmbH Linux: Akademie - Support - Hosting http://www.heinlein-support.de Tel: 030-405051-43 Fax: 030-405051-19 Zwangsangaben lt. §35a GmbHG: HRB 93818 B / Amtsgericht Berlin-Charlottenburg, Geschäftsführer: Peer Heinlein -- Sitz: Berlin ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Erasing Disk to the initial state
Hello team, I have deployed ceph cluster in production , the cluster composed by two types of disks HDD and SSD , and the cluster was deployed using ceph-ansible , unfortunately after deployment the HDD disks appear only without SSD , would like to restart deployment from scratch , but I miss the way on how to erase disk to the initial state . try to format disks but LVM comeback with disks. sda 8:00 7.3T 0 disk └─ceph--da4a5d58--73ef--473b--9960--371f837cb5ed-osd--block--6e800937--c4d2--4fc9--84ca--083c39d057a8 253:10 7.3T 0 lvm sdb 8:16 0 7.3T 0 disk └─ceph--773f50a1--79ed--4908--8f81--74f85efeb473-osd--block--9737a046--ba8b--4494--91f7--b80dd894df0b 253:70 7.3T 0 lvm sdc 8:32 0 7.3T 0 disk └─ceph--02000cec--fdbc--4def--967e--a7c32c851964-osd--block--c54d8182--b5e7--4c73--8d7b--7d24c7a3ce15 253:60 7.3T 0 lvm Kindly help me to sort this out. Best regards Michel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: MDS crashes to damaged metadata
On Thu, Dec 15, 2022 at 9:32 AM Stolte, Felix wrote: > > Hi Patrick, > > we used your script to repair the damaged objects on the weekend and it went > smoothly. Thanks for your support. > > We adjusted your script to scan for damaged files on a daily basis, runtime > is about 6h. Until thursday last week, we had exactly the same 17 Files. On > thursday at 13:05 a snapshot was created and our active mds crashed once at > this time (snapshot was created): > > 2022-12-08T13:05:48.919+0100 7f440afec700 -1 > /build/ceph-16.2.10/src/mds/ScatterLock.h: In function 'void > ScatterLock::set_xlock_snap_sync(MDSContext*)' thread 7f440afec700 time > 2022-12-08T13:05:48.921223+0100 > /build/ceph-16.2.10/src/mds/ScatterLock.h: 59: FAILED ceph_assert(state > LOCK_XLOCK || state LOCK_XLOCKDONE) > > 12 Minutes lates the unlink_local error crashes appeared again. This time > with a new file. During debugging we noticed a MTU mismatch between MDS > (1500) and client (9000) with cephfs kernel mount. The client is also > creating the snapshots via mkdir in the .snap directory. > > We disabled snapshot creation for now, but really need this feature. I > uploaded the mds logs of the first crash along with the information above to > https://tracker.ceph.com/issues/38452 > > I would greatly appreciate it, if you could answer me the following question: > > Is the Bug related to our MTU Mismatch? We fixed the MTU Issue going back to > 1500 on all nodes in the ceph public network on the weekend also. I doubt it. > If you need a debug level 20 log of the ScatterLock for further analysis, i > could schedule snapshots at the end of our workdays and increase the debug > level 5 Minutes arround snap shot creation. This would be very helpful! -- Patrick Donnelly, Ph.D. He / Him / His Principal Software Engineer Red Hat, Inc. GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] test message
Hi Just testing as I have not received a message from the list in a couple days Thanks Joe ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Missing SSDs disk on ceph deployment
Dear team, Kindly help on this , I am completely blocked. Best Regards Michel On Thu, Jan 5, 2023 at 2:45 PM Michel Niyoyita wrote: > Dear team, > > I have deployed the ceph cluster in production using ceph-ansible on > ubuntu OS 20.04 it consists of 3 monitors and 3 osds (each osd has 20 disks > , 16 hdd and 4 ssd) after deployment the ceph cluster healthy is OK , but > instead of getting 60 disk it appears 48 disks which are hdd only > (according to the output of ceph osd df tree) below are output of ceph osd > df tree and lsblk command. > > hdd: 7.71349 T > ssd: 7 T > > ceph osd df tree > root@ceph-mon3:~# ceph osd df tree > ID CLASS WEIGHT REWEIGHT SIZE RAW USE DATA OMAP META > AVAIL%USE VAR PGS STATUS TYPE NAME > -1 370.24731 - 370 TiB 21 TiB 249 MiB 0 B 1.2 GiB > 349 TiB 5.66 1.00- root default > -5 123.41577 - 123 TiB 7.0 TiB 83 MiB 0 B 403 MiB > 116 TiB 5.66 1.00- host ceph-osd1 > 0hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 30 MiB > 7.3 TiB 5.66 1.006 up osd.0 > 3hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.008 up osd.3 > 6hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 35 MiB > 7.3 TiB 5.66 1.00 12 up osd.6 > 9hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 21 MiB > 7.3 TiB 5.66 1.004 up osd.9 > 12hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.006 up osd.12 > 16hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 21 MiB > 7.3 TiB 5.66 1.003 up osd.16 > 18hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.005 up osd.18 > 21hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 21 MiB > 7.3 TiB 5.66 1.006 up osd.21 > 24hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 35 MiB > 7.3 TiB 5.66 1.007 up osd.24 > 27hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.007 up osd.27 > 30hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.007 up osd.30 > 33hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.007 up osd.33 > 36hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.005 up osd.36 > 39hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.009 up osd.39 > 42hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 30 MiB > 7.3 TiB 5.66 1.006 up osd.42 > 45hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.007 up osd.45 > -3 123.41577 - 123 TiB 7.0 TiB 83 MiB 0 B 397 MiB > 116 TiB 5.66 1.00- host ceph-osd2 > 1hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 26 MiB > 7.3 TiB 5.66 1.007 up osd.1 > 5hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 26 MiB > 7.3 TiB 5.66 1.007 up osd.5 > 8hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 26 MiB > 7.3 TiB 5.66 1.006 up osd.8 > 11hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 21 MiB > 7.3 TiB 5.66 1.005 up osd.11 > 14hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 25 MiB > 7.3 TiB 5.66 1.007 up osd.14 > 15hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 26 MiB > 7.3 TiB 5.66 1.00 12 up osd.15 > 19hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.005 up osd.19 > 22hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 26 MiB > 7.3 TiB 5.66 1.004 up osd.22 > 25hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.002 up osd.25 > 28hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 21 MiB > 7.3 TiB 5.66 1.003 up osd.28 > 31hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 20 MiB > 7.3 TiB 5.66 1.006 up osd.31 > 34hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 30 MiB > 7.3 TiB 5.66 1.007 up osd.34 > 37hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 30 MiB > 7.3 TiB 5.66 1.008 up osd.37 > 40hdd7.71349 1.0 7.7 TiB 447 GiB 5.2 MiB 0 B 30 MiB > 7.3 TiB 5.66 1.008 up osd.40 > 43hdd7.71349 1.0 7.7 TiB 447
[ceph-users] Re: docs.ceph.com -- Do you use the header navigation bar? (RESPONSES REQUESTED)
On Wednesday, January 4, 2023 10:35:56 AM EST John Zachary Dover wrote: > Do you use the header navigation bar on docs.ceph.com? See the attached > file (sticky_header.png) if you are unsure of what "header navigation bar" > means. In the attached file, the header navigation bar is indicated by > means of two large, ugly, red-and-green arrows. > > *Cards on the Table* > The navigation bar is the kind of thing that is sometimes referred to as a > "sticky header", and it can get in the way of linked-to sections. I would > like to remove this header bar. If there is community support for the > header bar, though, I won't remove it. > > *What is Zac Complaining About?* > Follow this procedure to see the behavior that has provoked my complaint: > >1. Go to https://docs.ceph.com/en/quincy/glossary/ >2. Scroll down to the "Ceph Cluster Map" entry. >3. Click the "Cluster Map" link in the line that reads "See Cluster Map". > 4. Notice that the header navigation bar obscures the headword "Cluster > Map". > > If you have any opinion at all on this matter, voice it. Please. > FWIW I am not able to reproduce the problem you are describing. In all cases the thin blue-green bar appeared above the term with the selected anchor link. I tried Firefox (108, Linux), Chromium (107, Linux) and for giggles Firefox on Android. In all cases things looked fine to me and the selected term was not hidden by that nav bar. I share because I was surprised by the result given that others on the list seem to see the problem. But I also don't see what I would describe as "two large, ugly, red-and-green arrows." Perhaps the page is rendering differently for some people and we don't hit the issue in that case? PS. I also didn't see the png file in question. Perhaps this list strips attachments? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] VolumeGroup must have a non-empty name / 17.2.5
Hi, I updated from pacific 16.2.10 to 17.2.5 and the orchestration update went perfectly. Very impressive. I have one host which then started throwing a cephadm warning after the upgrade. 2023-01-07 11:17:50,080 7f0b26c8ab80 INFO Non-zero exit code 1 from /usr/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk --init -e CONTAINER_IMAGE=quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 -e NODE_NAME=kelli.domain.name -e CEPH_USE_RANDOM_NONCE=1 -e CEPH_VOLUME_SKIP_RESTORECON=yes -e CEPH_VOLUME_DEBUG=1 -v /var/run/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca:/var/run/ceph:z -v /var/log/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca:/var/log/ceph:z -v /var/lib/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca/crash:/var/lib/ceph/crash:z -v /run/systemd/journal:/run/systemd/journal -v /dev:/dev -v /run/udev:/run/udev -v /sys:/sys -v /run/lvm:/run/lvm -v /run/lock/lvm:/run/lock/lvm -v /var/lib/ceph/404b94ab-b4d6-4218-9a4e-ecb8899108ca/selinux:/sys/fs/selinux:ro -v /:/rootfs -v /tmp/ceph-tmpltrnmxf8:/etc/ceph/ceph.conf:z quay.io/ceph/ceph@sha256:0560b16bec6e84345f29fb6693cd2430884e6efff16a95d5bdd0bb06d7661c45 inventory --format=json-pretty --filter-for-batch 2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr Traceback (most recent call last): 2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/sbin/ceph-volume", line 11, in 2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr load_entry_point('ceph-volume==1.0.0', 'console_scripts', 'ceph-volume')() 2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 41, in __init__ 2023-01-07 11:17:50,081 7f0b26c8ab80 INFO /usr/bin/podman: stderr self.main(self.argv) 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/decorators.py", line 59, in newfunc 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr return f(*a, **kw) 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/main.py", line 153, in main 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr terminal.dispatch(self.mapper, subcommand_args) 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/terminal.py", line 194, in dispatch 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr instance.main() 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/inventory/main.py", line 53, in main 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr with_lsm=self.args.with_lsm)) 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 39, in __init__ 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr all_devices_vgs = lvm.get_all_devices_vgs() 2023-01-07 11:17:50,082 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in get_all_devices_vgs 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr return [VolumeGroup(**vg) for vg in vgs] 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 797, in 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr return [VolumeGroup(**vg) for vg in vgs] 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr File "/usr/lib/python3.6/site-packages/ceph_volume/api/lvm.py", line 517, in __init__ 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr raise ValueError('VolumeGroup must have a non-empty name') 2023-01-07 11:17:50,083 7f0b26c8ab80 INFO /usr/bin/podman: stderr ValueError: VolumeGroup must have a non-empty name This host is the only one which has 14 drives which aren't being used. I'm guessing this is why its getting this error. The drives may have been used previous in a cluster (maybe not the same cluster) or something. I don't know. Any suggestions for what to try to get past this issue? peter Peter Eisch DevOps Manager peter.ei...@virginpulse.com T1.612.445.5135 Confidentiality Notice: This email was sent securely using Transport Layer Security (TLS) Encryption. Please ensure your email systems support TLS before replying with any confidential information. The information contained in this e-mail, including any attachment(s), is intended solely for use by the designated recipient(s). Unauthorized use, dissemination, distribution, or reproduction of this message by anyone other than the intended recipient(s), or a person designated as responsible