Dzianis Kahanovich пишет: > Christian Balzer пишет: > >>> New problem (unsure, but probably not observed in Hammer, but sure in >>> Infernalis): copying large (tens g) files into kernel cephfs (from >>> outside of cluster, iron - non-VM, preempt kernel) - make slow requests >>> on some of OSDs (repeated range) - mostly 3 Gbps channels (slow). >>> >>> All OSDs default threads numbers. Scheduler=noop. size=3 min_size=2 >>> >>> No same problem with fuse. >>> >>> Looks like broken or unbalanced congestion mechanism or I don't know how >>> to moderate it. write_congestion_kb trying low (=1) - nothing >>> interesting. >>> >> I think cause and effect are not quite what you think they are. >> >> Firstly let me state that I have no experience with CephFS at all, but >> what you're seeing isn't likely related to it all. >> >> Next lets establish some parameters. >> You're testing kernel and fuse from the same machine, right? >> What is the write speed (throughput) when doing this with fuse compared to >> the speed when doing this via the kernel module? > > Now I add 2 and out 2 OSDs to 1 of 3 node (2T->4T), cluster under hardwork, so > no benchmarks now. But I good understand this point. And after message I got > slow request on fuse too. > >> What is the top speed of your cluster when doing a >> "rados -p <yourpoolname> bench 60 write -t 32" from your test machine? >> Does this result in slow requests as well? > > Hmm... may be later. Now I have no rados pools, only RBD, DATA & METADATA. > >> What I think is happening is that you're simply at the limits of your >> current cluster and that fuse is slower, thus not exposing this. >> The kernel module is likely fast AND also will use pagecache, thus creating >> very large writes (how much memory does your test machine have) when it >> gets flushed. > > I bound all read/write values in kernel client more then fuse. > > Mostly I understand - problem are fast write & slow HDDs. But IMHO some > mechanisms must prevent it (congestion-like). And early I don't observe this > problem on similar configs. > > Later, if I will have more info, I say more. May be PREEMPT kernel is "wrong" > there... >
After series of experiments (and multiple "slow requests" on OSDs add/remove and backfills) I found solution (and create unrecoverable "inconsistent" PGs in data pool outside real files - data pool re-created now, all OK). So, solution: caps_wanted_delay_max=5 option on kernel mount. -- WBR, Dzianis Kahanovich AKA Denis Kaganovich, http://mahatma.bspu.unibel.by/ _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com