Hi Andras,

On Thu, Jan 18, 2018 at 3:38 AM, Andras Pataki
<apat...@flatironinstitute.org> wrote:
> Hi John,
> Some other symptoms of the problem:  when the MDS has been running for a few
> days, it starts looking really busy.  At this time, listing directories
> becomes really slow.  An "ls -l" on a directory with about 250 entries takes
> about 2.5 seconds.  All the metadata is on OSDs with NVMe backing stores.
> Interestingly enough the memory usage seems pretty low (compared to the
> allowed cache limit).
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
> 1604408 ceph      20   0 3710304 2.387g  18360 S 100.0  0.9 757:06.92
> /usr/bin/ceph-mds -f --cluster ceph --id cephmon00 --setuser ceph --setgroup
> ceph
> Once I bounce it (fail it over), the CPU usage goes down to the 10-25%
> range.  The same ls -l after the bounce takes about 0.5 seconds.  I
> remounted the filesystem before each test to ensure there isn't anything
> cached.
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+
>   111100 ceph      20   0 6537052 5.864g  18500 S  17.6  2.3   9:23.55
> /usr/bin/ceph-mds -f --cluster ceph --id cephmon02 --setuser ceph --setgroup
> ceph
> Also, I have a crawler that crawls the file system periodically.  Normally
> the full crawl runs for about 24 hours, but with the slowing down MDS, now
> it has been running for more than 2 days and isn't close to finishing.
> The MDS related settings we are running with are:
> mds_cache_memory_limit = 17179869184
> mds_cache_reservation = 0.10

Debug logs from the MDS at that time would be helpful with `debug mds
= 20` and `debug ms = 1`. Feel free to create a tracker ticket and use
ceph-post-file [1] to share logs.

[1] http://docs.ceph.com/docs/hammer/man/8/ceph-post-file/

Patrick Donnelly
ceph-users mailing list

Reply via email to