I change the "mds_cache_size" to 500000 from 100000 get rid of the WARN temporary. Now dumping the mds daemon shows like this: "inode_max": 500000, "inodes": 124213, But i have no idea if the "indoes" rises more than 500000 , change the "mds_cache_size" again? Thanks.
2015-07-15 13:34 GMT+08:00 谷枫 <feiche...@gmail.com>: > I change the "mds_cache_size" to 500000 from 100000 get rid of the > WARN temporary. > Now dumping the mds daemon shows like this: > "inode_max": 500000, > "inodes": 124213, > But i have no idea if the "indoes" rises more than 500000 , change the > "mds_cache_size" again? > Thanks. > > 2015-07-15 11:06 GMT+08:00 Eric Eastman <eric.east...@keepertech.com>: > >> Hi John, >> >> I cut the test down to a single client running only Ganesha NFS >> without any ceph drivers loaded on the Ceph FS client. After deleting >> all the files in the Ceph file system, rebooting all the nodes, I >> restarted the create 5 million file test using 2 NFS clients to the >> one Ceph file system node running Ganesha NFS. After a couple hours I >> am seeing the client ede-c2-gw01 failing to respond to cache pressure >> error: >> >> $ ceph -s >> cluster 6d8aae1e-1125-11e5-a708-001b78e265be >> health HEALTH_WARN >> mds0: Client ede-c2-gw01 failing to respond to cache pressure >> monmap e1: 3 mons at >> {ede-c2-mon01= >> 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0 >> } >> election epoch 22, quorum 0,1,2 >> ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 >> mdsmap e1860: 1/1/1 up {0=ede-c2-mds02=up:active}, 2 up:standby >> osdmap e323: 8 osds: 8 up, 8 in >> pgmap v302142: 832 pgs, 4 pools, 162 GB data, 4312 kobjects >> 182 GB used, 78459 MB / 263 GB avail >> 832 active+clean >> >> Dumping the mds daemon shows inodes > inodes_max: >> >> # ceph daemon mds.ede-c2-mds02 perf dump mds >> { >> "mds": { >> "request": 21862302, >> "reply": 21862302, >> "reply_latency": { >> "avgcount": 21862302, >> "sum": 16728.480772060 >> }, >> "forward": 0, >> "dir_fetch": 13, >> "dir_commit": 50788, >> "dir_split": 0, >> "inode_max": 100000, >> "inodes": 100010, >> "inodes_top": 0, >> "inodes_bottom": 0, >> "inodes_pin_tail": 100010, >> "inodes_pinned": 100010, >> "inodes_expired": 4308279, >> "inodes_with_caps": 99998, >> "caps": 99998, >> "subtrees": 2, >> "traverse": 30802465, >> "traverse_hit": 26394836, >> "traverse_forward": 0, >> "traverse_discover": 0, >> "traverse_dir_fetch": 0, >> "traverse_remote_ino": 0, >> "traverse_lock": 0, >> "load_cent": 2186230200, >> "q": 0, >> "exported": 0, >> "exported_inodes": 0, >> "imported": 0, >> "imported_inodes": 0 >> } >> } >> >> Once this test finishes and I verify the files were all correctly >> written, I will retest using the SAMBA VFS interface, followed by the >> kernel test. >> >> Please let me know if there is more info you need and if you want me >> to open a ticket. >> >> Best regards >> Eric >> >> >> >> On Mon, Jul 13, 2015 at 9:40 AM, Eric Eastman >> <eric.east...@keepertech.com> wrote: >> > Thanks John. I will back the test down to the simple case of 1 client >> > without the kernel driver and only running NFS Ganesha, and work forward >> > till I trip the problem and report my findings. >> > >> > Eric >> > >> > On Mon, Jul 13, 2015 at 2:18 AM, John Spray <john.sp...@redhat.com> >> wrote: >> >> >> >> >> >> >> >> On 13/07/2015 04:02, Eric Eastman wrote: >> >>> >> >>> Hi John, >> >>> >> >>> I am seeing this problem with Ceph v9.0.1 with the v4.1 kernel on all >> >>> nodes. This system is using 4 Ceph FS client systems. They all have >> >>> the kernel driver version of CephFS loaded, but none are mounting the >> >>> file system. All 4 clients are using the libcephfs VFS interface to >> >>> Ganesha NFS (V2.2.0-2) and Samba (Version 4.3.0pre1-GIT-0791bb0) to >> >>> share out the Ceph file system. >> >>> >> >>> # ceph -s >> >>> cluster 6d8aae1e-1125-11e5-a708-001b78e265be >> >>> health HEALTH_WARN >> >>> 4 near full osd(s) >> >>> mds0: Client ede-c2-gw01 failing to respond to cache >> >>> pressure >> >>> mds0: Client ede-c2-gw02:cephfs failing to respond to >> cache >> >>> pressure >> >>> mds0: Client ede-c2-gw03:cephfs failing to respond to >> cache >> >>> pressure >> >>> monmap e1: 3 mons at >> >>> >> >>> {ede-c2-mon01= >> 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0 >> } >> >>> election epoch 8, quorum 0,1,2 >> >>> ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 >> >>> mdsmap e912: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby >> >>> osdmap e272: 8 osds: 8 up, 8 in >> >>> pgmap v225264: 832 pgs, 4 pools, 188 GB data, 5173 kobjects >> >>> 212 GB used, 48715 MB / 263 GB avail >> >>> 832 active+clean >> >>> client io 1379 kB/s rd, 20653 B/s wr, 98 op/s >> >> >> >> >> >> It would help if we knew whether it's the kernel clients or the >> userspace >> >> clients that are generating the warnings here. You've probably >> already done >> >> this, but I'd get rid of any unused kernel client mounts to simplify >> the >> >> situation. >> >> >> >> We haven't tested the cache limit enforcement with NFS Ganesha, so >> there >> >> is a decent chance that it is broken. The ganehsha FSAL is doing >> >> ll_get/ll_put reference counting on inodes, so it seems quite possible >> that >> >> its cache is pinning things that we would otherwise be evicting in >> response >> >> to cache pressure. You mention samba as well, >> >> >> >> You can see if the MDS cache is indeed exceeding its limit by looking >> at >> >> the output of: >> >> ceph daemon mds.<daemon id> perf dump mds >> >> >> >> ...where the "inodes" value tells you how many are in the cache, vs. >> >> inode_max. >> >> >> >> If you can, it would be useful to boil this down to a straightforward >> test >> >> case: if you start with a healthy cluster, mount a single ganesha >> client, >> >> and do your 5 million file procedure, do you get the warning? Same for >> >> samba/kernel mounts -- this is likely to be a client side issue, so we >> need >> >> to confirm which client is misbehaving. >> >> >> >> Cheers, >> >> John >> >> >> >> >> >>> >> >>> # cat /proc/version >> >>> Linux version 4.1.0-040100-generic (kernel@gomeisa) (gcc version >> 4.6.3 >> >>> (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #201506220235 SMP Mon Jun 22 06:36:19 >> >>> UTC 2015 >> >>> >> >>> # ceph -v >> >>> ceph version 9.0.1 (997b3f998d565a744bfefaaf34b08b891f8dbf64) >> >>> >> >>> The systems are all running Ubuntu Trusty that has been upgraded to >> >>> the 4.1 kernel. This is all physical machines and no VMs. The test >> >>> run that caused the problem was create and verifying 5 million small >> >>> files. >> >>> >> >>> We have some tools that flag when Ceph is in a WARN state so it would >> >>> be nice to get rid of this warning. >> >>> >> >>> Please let me know what additional information you need. >> >>> >> >>> Thanks, >> >>> >> >>> Eric >> >>> >> > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com