[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool
Thanks Patrick, is this the bug you are referring to https://tracker.ceph.com/issues/42515 ? We also see performance issues mainly on metadata operations like finding file stats operations , however mds perf dump shows no sign of any latencies . could this bug cause any performance issues ? here is the perf dump metrics . https://pastebin.com/178anAe1 do you see any clue in this that could cause slow down in such operations ? our metadara pool has around 1.7 GB of data I gave mds cache 3 GB , I am not sure where to check how much used in the 3 GB or what is hit and miss count/ration in cache . We have huge cluster , there is definitely not enough IO that could saturate actual disk capacity so it is definitely MDS , not sure what to check here to pin point the issues. Could you point me where I can start to go deep in troubleshooting this ? Thanks, Uday. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool
It's probably a recently fixed openfiletable bug. Please upgrade to v14.2.8 when it is released in the next week or so. On Mon, Feb 24, 2020 at 1:46 PM Uday Bhaskar jalagam wrote: > > Hello Patrick, > > File system created around 4 months back. Using ceph version 14.2.3 version. > > [root@knode25 /]# ceph fs dump > dumped fsmap epoch 577 > e577 > enable_multiple, ever_enabled_multiple: 0,0 > compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file > layout v2,10=snaprealm v2} > legacy client fscid: 1 > > Filesystem 'cephfs01' (1) > fs_name cephfs01 > epoch 577 > flags 32 > created 2019-10-18 23:59:29.610249 > modified2020-02-22 03:13:09.425905 > tableserver 0 > root0 > session_timeout 60 > session_autoclose 300 > max_file_size 1099511627776 > min_compat_client -1 (unspecified) > last_failure0 > last_failure_osd_epoch 1608 > compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable > ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds > uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file > layout v2,10=snaprealm v2} > max_mds 1 > in 0 > up {0=2981519} > failed > damaged > stopped > data_pools [2] > metadata_pool 1 > inline_data disabled > balancer > standby_count_wanted1 > 2981519: > [v2:10.131.16.30:6808/3209191719,v1:10.131.16.30:6809/3209191719] > 'cephfs01-b' mds.0.572 up:active seq 22141 > 2998684:[v2:10.131.16.89:6832/54557615,v1:10.131.16.89:6833/54557615] > 'cephfs01-a' mds.0.0 up:standby-replay seq 2 > > > [root@knode25 /]# ceph fs status > cephfs01 - 290 clients > > +--+++---+---+---+ > | Rank | State |MDS |Activity | dns | inos | > +--+++---+---+---+ > | 0 | active | cephfs01-b | Reqs: 333 /s | 2738k | 2735k | > | 0-s | standby-replay | cephfs01-a | Evts: 795 /s | 1368k | 1363k | > +--+++---+---+---+ > +---+--+---+---+ > |Pool | type | used | avail | > +---+--+---+---+ > | cephfs01-metadata | metadata | 2193M | 78.1T | > | cephfs01-data0 | data | 753G | 78.1T | > +---+--+---+---+ > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool
Hello Patrick, File system created around 4 months back. Using ceph version 14.2.3 version. [root@knode25 /]# ceph fs dump dumped fsmap epoch 577 e577 enable_multiple, ever_enabled_multiple: 0,0 compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} legacy client fscid: 1 Filesystem 'cephfs01' (1) fs_name cephfs01 epoch 577 flags 32 created 2019-10-18 23:59:29.610249 modified2020-02-22 03:13:09.425905 tableserver 0 root0 session_timeout 60 session_autoclose 300 max_file_size 1099511627776 min_compat_client -1 (unspecified) last_failure0 last_failure_osd_epoch 1608 compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} max_mds 1 in 0 up {0=2981519} failed damaged stopped data_pools [2] metadata_pool 1 inline_data disabled balancer standby_count_wanted1 2981519: [v2:10.131.16.30:6808/3209191719,v1:10.131.16.30:6809/3209191719] 'cephfs01-b' mds.0.572 up:active seq 22141 2998684:[v2:10.131.16.89:6832/54557615,v1:10.131.16.89:6833/54557615] 'cephfs01-a' mds.0.0 up:standby-replay seq 2 [root@knode25 /]# ceph fs status cephfs01 - 290 clients +--+++---+---+---+ | Rank | State |MDS |Activity | dns | inos | +--+++---+---+---+ | 0 | active | cephfs01-b | Reqs: 333 /s | 2738k | 2735k | | 0-s | standby-replay | cephfs01-a | Evts: 795 /s | 1368k | 1363k | +--+++---+---+---+ +---+--+---+---+ |Pool | type | used | avail | +---+--+---+---+ | cephfs01-metadata | metadata | 2193M | 78.1T | | cephfs01-data0 | data | 753G | 78.1T | +---+--+---+---+ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Changing allocation size
Hi all, A while back, I indicated we had an issue with our cluster filling up too fast. After checking everything, we've concluded this was because we had a lot of small files and the allocation size on the bluestore was too high (64kb). We are now recreating the OSD's (2 disk at the same time) but, this will take a very long time as we're dealing with 130 OSDs. The current process we're following is removing 2 osd's and recreating them. We're using erasure coding (6 + 3). Has anyone some advice on how we can move forward with this? We've already increased some parameters to speed up recovery, but even then, it would still cost us too much time. If we could recreate them faster, that would be great... Or adapt the allocation on the fly? Any suggestions are welcome... Thank you, Kristof. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool
On Mon, Feb 24, 2020 at 11:14 AM Uday Bhaskar jalagam wrote: > > Hello Team , > > I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my > cephfs metadata pools , anyone can explain why would this pool getting into > this state frequently and how could I prevent this in future ? > > # ceph health detail > HEALTH_WARN 1 large omap objects > LARGE_OMAP_OBJECTS 1 large omap objects > 1 large objects found in pool 'cephfs01-metadata' > Search the cluster log for 'Large omap object found' for more details. When was the file system created? What version is running? Please also share `ceph fs dump`. -- Patrick Donnelly, Ph.D. He / Him / His Senior Software Engineer Red Hat Sunnyvale, CA GPG: 19F28A586F808C2402351B93C3301A3E258DD79D ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Frequest LARGE_OMAP_OBJECTS in cephfs metadata pool
Hello Team , I am getting frequent LARGE_OMAP_OBJECTS 1 large omap objects in one of my cephfs metadata pools , anyone can explain why would this pool getting into this state frequently and how could I prevent this in future ? # ceph health detail HEALTH_WARN 1 large omap objects LARGE_OMAP_OBJECTS 1 large omap objects 1 large objects found in pool 'cephfs01-metadata' Search the cluster log for 'Large omap object found' for more details. Thanks , Uday ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Limited performance
Hi, we currently creating a new cluster. This cluster is (as far as we can tell) an config-copy (ansible) of our existing cluster, just 5 years later - with new hardware (nvme instead of ssd, bigger disks, ...) The setup: * NVMe for Journals and "Cache"-Pool * HDD with NVMe Journals for "Data"-Pool * Cache-Pool as writeback-Tier on Data-Pool * We are using 12.2.13 without bluestore. If we run an rados benchmark against this pool, everything seems fine, but as soon as we start a fio-benchmark -<- [global] ioengine=rbd clientname=cinder pool=cinder rbdname=fio_test rw=write bs=4M [rbd_iodepth32] iodepth=32 ->- after some seconds the bandwidth drops to <15 MB/s and our hdd-disks are doing more IOs than our Journal-Disks. We also unconfigured the caching completely, but the issue remains. The output of "ceph osd pool stats" shows ~100 op/s, but our disks are doing: -<- Device: rrqm/s wrqm/s r/s w/srMB/swMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util nvme0n1 0.00 0.000.00 278.50 0.0034.07 250.51 0.140.500.000.50 0.03 0.80 nvme1n1 0.00 0.000.00 64.00 0.00 7.77 248.50 0.010.220.000.22 0.03 0.20 sda 0.00 1.500.00 557.00 0.0029.49 108.45 180.57 160.590.00 160.59 1.80 100.00 sdb 0.0042.000.00 592.00 0.0028.2197.60 176.51 1105.790.00 1105.79 1.69 100.00 sdc 0.0014.500.00 528.50 0.0027.95 108.31 183.02 179.470.00 179.47 1.89 100.00 sde 0.00 134.500.00 223.50 0.0014.05 128.72 17.38 60.050.00 60.05 0.89 20.00 sdg 0.0076.000.00 492.00 0.0026.32 109.54 191.81 1474.960.00 1474.96 2.03 100.00 sdf 0.00 0.000.00 491.50 0.0026.76 111.49 176.55 326.050.00 326.05 2.03 100.00 sdh 0.00 0.000.00 548.50 0.0026.7199.75 204.39 327.570.00 327.57 1.82 100.00 sdi 0.00 112.000.00 526.00 0.0023.1590.14 158.32 1325.610.00 1325.61 1.90 100.00 sdj 0.0012.000.00 641.00 0.0034.78 111.13 185.51 278.290.00 278.29 1.56 100.00 sdk 0.0023.500.00 399.50 0.0020.38 104.46 166.77 461.670.00 461.67 2.50 100.00 sdl 0.00 267.000.00 498.50 0.0034.46 141.58 200.37 490.800.00 490.80 2.01 100.00 ->- Any hints how to debug the issue? Thanks a lot, Fabian ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Migrating data to a more efficient EC pool
Hello I have ~300TB of data in default.rgw.buckets.data k2m2 pool and I would like to move it to a new k5m2 pool. I found instructions using cache tiering[1], but they come with a vague scary warning, and it looks like EC-EC may not even be possible [2] (is it still the case?). Can anybody recommend a safe procedure to copy an EC pool's data to another pool with a more efficient erasure coding? Perhaps there is a tool out there that could do it? A few days of downtime would be tolerable, if it will simplify things. Also, I have enough free space to temporarily store the k2m2 data in a replicated pool (if EC-EC tiering is not possible, but EC-replicated and replicated-EC tiering is possible). Is there a tool or some efficient way to verify that the content of two pools is the same? Thanks, Vlad [1] https://ceph.io/geen-categorie/ceph-pool-migration/ [2] http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-February/016109.html ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Ceph @ SoCal Linux Expo
Hey all, we're excited to be returning properly to SCaLE in Pasadena[1] this year (March 5-8) with a Thursday Birds-of-a-Feather session[2] and a booth in the expo hall. Please come by if you're attending the conference or are in the area to get face time with other area users and Ceph developers. :) Also, I got drafted into organizing this so if you'd be willing to help man the booth in exchange for an Expo pass, shoot me an email! I think I've got 3 spots left. -Greg [1]: https://www.socallinuxexpo.org/scale/18x [2]: https://www.socallinuxexpo.org/scale/18x/presentations/ceph-storage ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-mon using 100% CPU after upgrade to 14.2.5
Hi Bryan, Did you ever learn more about this, or see it again? I'm facing 100% ceph-mon CPU usage now, and putting my observations here: https://tracker.ceph.com/issues/42830 Cheers, Dan On Mon, Dec 16, 2019 at 10:58 PM Bryan Stillwell wrote: > > Sasha, > > I was able to get past it by restarting the ceph-mon processes every time it > got stuck, but that's not a very good solution for a production cluster. > > Right now I'm trying to narrow down what is causing the problem. Rebuilding > the OSDs with BlueStore doesn't seem to be enough. I believe it could be > related to us using the extra space on the journal device as an SSD-based > OSD. During the conversion process I'm removing this SSD-based OSD (since > with BlueStore the omap data is ending up on the SSD anyways), and I'm > suspecting it might be causing this problem. > > Bryan > > On Dec 14, 2019, at 10:27 AM, Sasha Litvak > wrote: > > Notice: This email is from an external sender. > > Bryan, > > Were you able to resolve this? If yes, can you please share with the list? > > On Fri, Dec 13, 2019 at 10:08 AM Bryan Stillwell > wrote: >> >> Adding the dev list since it seems like a bug in 14.2.5. >> >> I was able to capture the output from perf top: >> >> 21.58% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 20.90% libstdc++.so.6.0.19 [.] std::getline> std::char_traits, std::allocator > >> 13.25% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::list::append >> 10.11% libstdc++.so.6.0.19 [.] std::istream::sentry::sentry >>8.94% libstdc++.so.6.0.19 [.] std::basic_ios> std::char_traits >::clear >>3.24% libceph-common.so.0 [.] >> ceph::buffer::v14_2_0::ptr::unused_tail_length >>1.69% libceph-common.so.0 [.] std::getline> std::char_traits, std::allocator >@plt >>1.63% libstdc++.so.6.0.19 [.] >> std::istream::sentry::sentry@plt >>1.21% [kernel] [k] __do_softirq >>0.77% libpython2.7.so.1.0 [.] PyEval_EvalFrameEx >>0.55% [kernel] [k] _raw_spin_unlock_irqrestore >> >> I increased mon debugging to 20 and nothing stuck out to me. >> >> Bryan >> >> > On Dec 12, 2019, at 4:46 PM, Bryan Stillwell >> > wrote: >> > >> > On our test cluster after upgrading to 14.2.5 I'm having problems with the >> > mons pegging a CPU core while moving data around. I'm currently >> > converting the OSDs from FileStore to BlueStore by marking the OSDs out in >> > multiple nodes, destroying the OSDs, and then recreating them with >> > ceph-volume lvm batch. This seems too get the ceph-mon process into a >> > state where it pegs a CPU core on one of the mons: >> > >> > 1764450 ceph 20 0 4802412 2.1g 16980 S 100.0 28.1 4:54.72 >> > ceph-mon >> > >> > Has anyone else run into this with 14.2.5 yet? I didn't see this problem >> > while the cluster was running 14.2.4. >> > >> > Thanks, >> > Bryan >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unable to increase PG numbers
I have tried to increase to 16, with the same result: # ceph osd pool set cephfs_data pg_num 16 set pool 1 pg_num to 16 # ceph osd pool get cephfs_data pg_num pg_num: 8 El 24/2/20 a las 15:10, Gabryel Mason-Williams escribió: > Have you tried making a smaller increment instead of jumping from 8 to 128 as > that is quite a big leap? > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > -- *** Andrés Rojas Guerrero Unidad Sistemas Linux Area Arquitectura Tecnológica Secretaría General Adjunta de Informática Consejo Superior de Investigaciones Científicas (CSIC) Pinar 19 28006 - Madrid Tel: +34 915680059 -- Ext. 990059 email: a.ro...@csic.es ID comunicate.csic.es: @50852720l:matrix.csic.es *** ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Unable to increase PG numbers
Have you tried making a smaller increment instead of jumping from 8 to 128 as that is quite a big leap? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Unable to increase PG numbers
Hi, I have a Nautilus installation version 14.2.1 with a very unbalanced cephfs pool, I have 430 osd in the cluster but this pool only have 8 PG and PGP and 118 TB used : # ceph -s cluster: id: a2269da7-e399-484a-b6ae-4ee1a31a4154 health: HEALTH_WARN 1 nearfull osd(s) 2 pool(s) nearfull services: mon: 3 daemons, quorum mon21,mon22,mon23 (age 7M) mgr: mon23(active, since 8M), standbys: mon22, mon21 mds: cephfs:2 {0=mon21=up:active,1=mon22=up:active} 1 up:standby osd: 430 osds: 430 up, 430 in data: pools: 2 pools, 16 pgs objects: 10.07M objects, 38 TiB usage: 118 TiB used, 4.5 PiB / 4.6 PiB avail pgs: 15 active+clean 1 active+clean+scrubbing+deep # ceph osd pool get cephfs_data pg_num pg_num: 8 Due to this bad configuration I have this warning message: # ceph status cluster: id: a2269da7-e399-484a-b6ae-4ee1a31a4154 health: HEALTH_WARN 1 nearfull osd(s) 2 pool(s) nearfull I've discover that some osd are full: # ceph osd status | 113 | osd23 | 9824G | 1351G |0 | 0 |0 | 0 | exists,nearfull,up | I've tried to reweight this osd: ceph osd reweight osd.113 0.9 But the process of reweight doesn't start. Otherwise I've tried to increase the PG and PGP numbers but it' doesn't work. # ceph osd pool set cephfs_data pg_num 128 set pool 1 pg_num to 128 # ceph osd pool get cephfs_data pg_num pg_num: 8 What could be the reason for this problem? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RGW do not show up in 'ceph status'
Sorry for the noise - problem was introduced by a missing iptables rule :-( On Fri, 2020-02-21 at 09:04 +0100, Andreas Haupt wrote: > Dear all, > > we recently added two additional RGWs to our CEPH cluster (version > 14.2.7). They work flawlessly, however they do not show up in 'ceph > status': > > [cephmon1] /root # ceph -s | grep -A 6 services > services: > mon: 3 daemons, quorum cephmon1,cephmon2,cephmon3 (age 14h) > mgr: cephmon1(active, since 14h), standbys: cephmon2, cephmon3 > mds: cephfs:1 {0=cephmon1=up:active} 2 up:standby > osd: 168 osds: 168 up (since 2w), 168 in (since 6w) > rgw: 1 daemon active (ceph-s3) > > As you can see, only the first, old RGW (ceph-s3) is listed. Is there > any place where the RGWs need to get "announced"? Any idea, how to > debug this? > > Thanks, > Andreas > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io -- | Andreas Haupt| E-Mail: andreas.ha...@desy.de | DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt | Platanenallee 6 | Phone: +49/33762/7-7359 | D-15738 Zeuthen | Fax:+49/33762/7-7216 smime.p7s Description: S/MIME cryptographic signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] One PG is stuck and reading is not possible
ceph version 12.2.13 luminous (stable) My whole ceph cluster went to kind of read only state. Ceph status showed that client reads is 0 op/s for whole cluster. There was normal amount of writes going on. I checked health and it said: # ceph health detail HEALTH_WARN Reduced data availability: 1 pg inactive, 1 pg peering PG_AVAILABILITY Reduced data availability: 1 pg inactive, 1 pg peering pg 26.13b is stuck peering for 25523.506788, current state peering, last acting [2,0,33] All osds showed to be up and all monitors are good. All pools are 3/2 (size/min) and space usage ~30%. I fixed this by restarting forst osd.2 (nothing happened) and then restarted osd.0. After that everyting went back to normal. So what can cause "stuck peering" and how can i prevent this event from happening again? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io