Hi Dan,
I don't have a solution to the problem, I can only second that we've
also been seeing strange problems when more than one node accesses the
same file in ceph and at least one of them opens it for writing. I've
tried verbose logging on the client (fuse), and it seems that the fuse
client sends some cap request to the MDS and does not get a response
sometimes. And it looks like it has some 5 second polling interval, and
that sometimes (but not always) saves the day and the client continues
with a 5 second-ish delay. This does not happen when multiple processes
open the file for reading, but it does when processes open it for
writing (even if they never write to the file and only read
afterwards). I have some earlier mailing list messages from a week or
two ago describing what we see more in detail (including log outputs).
I think the issue has in some way to do with cap requests being
lost/miscommunicated between the client and the MDS.
Andras
On 04/13/2017 01:41 PM, Dan van der Ster wrote:
Dear ceph-*,
A couple weeks ago I wrote this simple tool to measure the round-trip
latency of a shared filesystem.
https://github.com/dvanders/fsping
In our case, the tool is to be run from two clients who mount the same
CephFS.
First, start the server (a.k.a. the ping reflector) on one machine in
a CephFS directory:
./fsping --server
Then, from another client machine and in the same directory, start the
fsping client (aka the ping emitter):
./fsping --prefix <prefix from the server above>
The idea is that the "client" writes a syn file, the reflector notices
it, and writes an ack file. The time for the client to notice the ack
file is what I call the rtt.
And the output looks like normal ping, so that's neat. (The README.md
shows a working example)
Anyway, two weeks ago when I wrote this, it was working very well on
my CephFS clusters (running 10.2.5, IIRC). I was seeing ~20ms rtt for
small files, which is more or less what I was expecting on my test
cluster.
But when I run fsping today, it does one of two misbehaviours:
1. Most of the time it just hangs, both on the reflector and on the
emitter. The fsping processes are stuck in some uninterruptible state
-- only an MDS failover breaks them out. I tried with and without
fuse_disable_pagecache -- no big difference.
2. When I increase the fsping --size to 512kB, it works a bit more
reliably. But there is a weird bimodal distribution with most
"packets" having 20-30ms rtt, some ~20% having ~5-6 seconds rtt, and
some ~5% taking ~10-11s. I suspected the mds_tick_interval -- but
decreasing that didn't help.
In summary, if someone is curious, please give this tool a try on your
CephFS cluster -- let me know if its working or not (and what rtt you
can achieve with which configuration).
And perhaps a dev would understand why it is not working with latest
jewel ceph-fuse / ceph MDS's?
Best Regards,
Dan
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com