[ceph-users] Re: Weird ceph df

2020-12-14 Thread Osama Elswah
ceph df [detail] output (POOLS section) has been modified in plain format: - ‘BYTES USED’ column renamed to ‘STORED’. Represents amount of data stored by the user. - ‘USED’ column now represent amount of space allocated purely for data by all OSD nodes in KB. source:

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Nathan Fish
Perhaps WAL is filling up when iodepth is so high? Is WAL on the same SSDs? If you double the WAL size, does it change? On Mon, Dec 14, 2020 at 9:05 PM Jason Dillaman wrote: > > On Mon, Dec 14, 2020 at 1:28 PM Philip Brown wrote: > > > > Our goal is to put up a high performance ceph cluster

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Jason Dillaman
On Mon, Dec 14, 2020 at 1:28 PM Philip Brown wrote: > > Our goal is to put up a high performance ceph cluster that can deal with 100 > very active clients. So for us, testing with iodepth=256 is actually fairly > realistic. 100 active clients on the same node or just 100 active clients? > but

[ceph-users] Re: Monitors not starting, getting "e3 handle_auth_request failed to assign global_id"

2020-12-14 Thread Hoan Nguyen Van
I found a merge request, ceph mon has new optíon :mon_sync_max_payload_keys https://github.com/ceph/ceph/commit/d6037b7f484e13cfc9136e63e4cf7fac6ad68960#diff-495ccc5deb4f8fbd94e795e66c3720677f821314d4b9042f99664cd48a9506fd My value of options mon_sync_max_payload_size is 4096. If

[ceph-users] Re: Monitors not starting, getting "e3 handle_auth_request failed to assign global_id"

2020-12-14 Thread Wesley Dillingham
We had to rebuild our mons on a few occasions because of this. Only one mon was ever dropped from quorum at a time in our case. In other scenarios with the same error the mon was able to rejoin after thirty minutes or so. We believe we may have tracked it down (in our case) to the upgrade of an AV

[ceph-users] Re: Provide more documentation for MDS performance tuning on large file systems

2020-12-14 Thread Patrick Donnelly
On Mon, Dec 7, 2020 at 12:06 PM Patrick Donnelly wrote: > > Hi Dan & Janek, > > On Sat, Dec 5, 2020 at 6:26 AM Dan van der Ster wrote: > > My understanding is that the recall thresholds (see my list below) > > should be scaled proportionally. OTOH, I haven't played with the decay > > rates (and

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Frédéric Nass
I forgot to mention "If with bluefs_buffered_io=false, the %util is over 75% most of the time ** during data removal (like snapshot removal) **, then you'd better change it to true." Regards, Frédéric. Le 14/12/2020 à 21:35, Frédéric Nass a écrit : Hi Stefan, Initial data removal could

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Stefan Wild
Hi Frédéric, Thanks for the additional input. We are currently only running RGW on the cluster, so no snapshot removal, but there have been plenty of remappings with the OSDs failing (all of them at first during and after the OOM incident, then one-by-one). I haven't had a chance to look into

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Frédéric Nass
Hi Stefan, Initial data removal could also have resulted from a snapshot removal leading to OSDs OOMing and then pg remappings leading to more removals after OOMed OSDs rejoined the cluster and so on. As mentioned by Igor : "Additionally there are users' reports that recent default value's

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Philip Brown
Our goal is to put up a high performance ceph cluster that can deal with 100 very active clients. So for us, testing with iodepth=256 is actually fairly realistic. but it does also exhibit the problem with iodepth=32 [root@irviscsi03 ~]# fio --filename=/dev/rbd0 --direct=1 --rw=randwrite

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Jason Dillaman
On Mon, Dec 14, 2020 at 12:46 PM Philip Brown wrote: > > Further experimentation with fio's -rw flag, setting to rw=read, and > rw=randwrite, in addition to the original rw=randrw, indicates that it is > tied to writes. > > Possibly some kind of buffer flush delay or cache sync delay when using

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Philip Brown
Further experimentation with fio's -rw flag, setting to rw=read, and rw=randwrite, in addition to the original rw=randrw, indicates that it is tied to writes. Possibly some kind of buffer flush delay or cache sync delay when using rbd device, even though fio specified --direct=1 ? -

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Philip Brown
Aha Insightful question! running rados bench write to the same pool, does not exhibit any problems. It consistently shows around 480M/sec throughput, every second. So this would seem to be something to do with using rbd devices. Which we need to do. For what it's worth, I'm using Micron

[ceph-users] Re: performance degredation every 30 seconds

2020-12-14 Thread Jason Dillaman
On Mon, Dec 14, 2020 at 11:28 AM Philip Brown wrote: > > > I have a new 3 node octopus cluster, set up on SSDs. > > I'm running fio to benchmark the setup, with > > fio --filename=/dev/rbd0 --direct=1 --rw=randrw --bs=4k --ioengine=libaio > --iodepth=256 --numjobs=1 --time_based

[ceph-users] performance degredation every 30 seconds

2020-12-14 Thread Philip Brown
I have a new 3 node octopus cluster, set up on SSDs. I'm running fio to benchmark the setup, with fio --filename=/dev/rbd0 --direct=1 --rw=randrw --bs=4k --ioengine=libaio --iodepth=256 --numjobs=1 --time_based --group_reporting --name=iops-test-job --runtime=120 --eta-newline=1 However,

[ceph-users] Re: Slow Replication on Campus

2020-12-14 Thread Eugen Block
Hi, could you share more information about your setup? How much bandwidth does the uplink have? Are there any custom configs regarding rbd_journal or rbd_mirror settings? If there were lots of changes on those images the sync would always be behind per design. But if there's no activity

[ceph-users] Re: iscsi and iser

2020-12-14 Thread Jason Dillaman
On Mon, Dec 14, 2020 at 9:39 AM Marc Boisis wrote: > > > Hi, > > I would like to know if you support iser in gwcli like the traditional > targetcli or if this is planned in a future version of ceph ? We don't have the (HW) resources to test with iSER so it's not something that anyone is looking

[ceph-users] Re: Ceph 15.2.4 segfault, msgr-worker

2020-12-14 Thread alexandre derumier
Hi, I had an osd crash yesterday, with 15.2.7. seem similar: ceph crash info 2020-12-13T02:37:57.475315Z_63f91999-ca9c-49a5-b381-5fad9780dbbb {     "backtrace": [     "(()+0x12730) [0x7f6bccbb5730]", "(std::_Rb_tree, boost::intrusive_ptr, std::_Identity >, std::less >, std::allocator

[ceph-users] iscsi and iser

2020-12-14 Thread Marc Boisis
Hi, I would like to know if you support iser in gwcli like the traditional targetcli or if this is planned in a future version of ceph ? Thanks Marc ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to

[ceph-users] The ceph balancer sets upmap items which violates my crushrule

2020-12-14 Thread Manuel Lausch
The ceph balancer sets upmap items which violates my crushrule the rule: rule cslivebapfirst { id 0 type replicated min_size 2 max_size 4 step take csliveeubap-u01dc step chooseleaf firstn 2 type room step emit step take csliveeubs-u01dc step chooseleaf firstn

[ceph-users] Re: Removing an applied service set

2020-12-14 Thread Michael Wodniok
Thank you Eugen, it worked. For the record this is what I have done to remove the services completely. My CephFS had the name "testfs". * `ceph orch ls mds mds.testfs --export yaml >change.yaml` * removed the placement-spec from `change.yaml`. * reapplied using `cephadm shell -m change.yaml --

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Stefan Wild
Hi Igor, Thank you for the detailed analysis. That makes me hopeful we can get the cluster back on track. No pools have been removed, but yes, due to the initial crash of multiple OSDs and the subsequent issues with individual OSDs we’ve had substantial PG remappings happening constantly. I

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Igor Fedotov
Just a note - all the below is almost completely unrelated to high RAM usage. The latter is a different issue which presumably just triggered PG removal one... On 12/14/2020 2:39 PM, Igor Fedotov wrote: Hi Stefan, given the crash backtrace in your log I presume some data removal is in

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Stefan Wild
Hi Kalle, Memory usage is back on track for the OSDs since the OOM crash. I don’t know what caused it back then, but until all OSDs were back up together, each one of them (10 TiB capacity, 7 TiB used) ballooned to over 15 GB memory used. I’m happy to dump the stats if they’re showing any

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Igor Fedotov
Hi Stefan, given the crash backtrace in your log I presume some data removal is in progress: Dec 12 21:58:38 ceph-tpa-server1 bash[784256]:  3: (KernelDevice::direct_read_unaligned(unsigned long, unsigned long, char*)+0xd8) [0x5587b9364a48] Dec 12 21:58:38 ceph-tpa-server1 bash[784256]:  4:

[ceph-users] Re: PGs down

2020-12-14 Thread Igor Fedotov
Hi Jeremy, I think you lost the data for OSD.11 & .12  I'm not aware of any reliable enough way to recover RocksDB from this sort of errors. Theoretically you might want to disable auto compaction for RocksDB for these daemons and try to bring then up and attempt to drain the data out of

[ceph-users] Re: Removing an applied service set

2020-12-14 Thread Eugen Block
Do you have a spec file for the mds services or how did you deploy the services? If you have a yml file with the mds placement just remove the entries from that file and run 'ceph orch apply -i mds.yml'. You can export your current config with this command and then modify the file to your

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Kalle Happonen
Hi Samuel, I think we're hitting some niche cases. Most of our experience (and links to other posts) is here. https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/EWPPEMPAJQT6GGYSHM7GIM3BZWS2PSUY/ For the pg_log issue, the default of 3000 might be too large for some installations,

[ceph-users] Removing an applied service set

2020-12-14 Thread Michael Wodniok
Hi, we created multiple CephFS, this invloved deploying mutliple mds-services using `ceph orch apply mds [...]`. Worked like a charm. Now the filesystem has been removed and the leftovers of the filesystem should also be removed, but I can't delete the services as cephadm/orchestration module

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread huxia...@horebdata.cn
Hello, Kalle, Your comments abount some bugs with pg_log memory and buffer_anon memory growth worry me a lot, as i am planning to build a cluster with the latest Nautilous version. Could you please comment on, how to safely deal with these bugs or to avoid, if indeed they occur? thanks a

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-12-14 Thread Kalle Happonen
Hi all, Ok, so I have some updates on this. We noticed that we had a bucket with tons of RGW garbage collection pending. It was growing faster than we could clean it up. We suspect this was because users tried to do "s3cmd sync" operations on SWIFT uploaded large files. This could logically