Hi,
    I've deployed a small hammer cluster 0.94.1. And I mount it via
ceph-fuse on Ubuntu 14.04. After several hours I found that the ceph-fuse
process crashed. The end is the crash log from
/var/log/ceph/ceph-client.admin.log. The memory cost of ceph-fuse process
was huge(more than 4GB) when it crashed.
    Then I did some test and found these actions will increase memory cost
of ceph-fuse rapidly and the memory cost never seem to decrease:

   - rsync command to sync small files(rsync -a /mnt/some_small /srv/ceph)
   - chown command/ chmod command(chmod 775 /srv/ceph -R)

But chown/chmod command on accessed files will not increase the memory cost.
It seems that ceph-fuse caches the file nodes but never releases them.
I don't know if there is an option to control the cache size. I set mds
cache size = 2147483647 option to improve the performance of mds, and I
tried to set mds cache size = 1000 at client side but it doesn't effect the
result.






here is the crash log:
   -85> 2015-04-27 11:25:32.263743 7ff7c3fff700  3 client.74478 ll_forget
1000033ebe6 1
   -84> 2015-04-27 11:25:32.263748 7ff7f1ffb700  3 client.74478 ll_forget
1000033ebe6 1
   -83> 2015-04-27 11:25:32.263760 7ff7c3fff700  3 client.74478 ll_getattr
100003436d6.head
   -82> 2015-04-27 11:25:32.263763 7ff7c3fff700  3 client.74478 ll_getattr
100003436d6.head = 0
   -81> 2015-04-27 11:25:32.263770 7ff7c18f0700  3 client.74478 ll_getattr
10000015146.head
   -80> 2015-04-27 11:25:32.263775 7ff7c18f0700  3 client.74478 ll_getattr
10000015146.head = 0
   -79> 2015-04-27 11:25:32.263781 7ff7c18f0700  3 client.74478 ll_forget
10000015146 1
   -78> 2015-04-27 11:25:32.263789 7ff7f17fa700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822
   -77> 2015-04-27 11:25:32.263794 7ff7f17fa700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822 -> 0 (100003459d0)
   -76> 2015-04-27 11:25:32.263800 7ff7f17fa700  3 client.74478 ll_forget
100003436d6 1
   -75> 2015-04-27 11:25:32.263807 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff6e49b42d0 4519
   -74> 2015-04-27 11:25:32.263812 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff6e49b42d0 4519 -> 0 (1000001a4d7)
   -73> 2015-04-27 11:25:32.263820 7ff7c10ef700  3 client.74478 ll_forget
10000015146 1
   -72> 2015-04-27 11:25:32.263827 7ff7037fe700  3 client.74478 ll_getattr
100003459d0.head
   -71> 2015-04-27 11:25:32.263832 7ff7037fe700  3 client.74478 ll_getattr
100003459d0.head = 0
   -70> 2015-04-27 11:25:32.263840 7ff7c3fff700  3 client.74478 ll_forget
100003436d6 1
   -69> 2015-04-27 11:25:32.263849 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg
   -68> 2015-04-27 11:25:32.263854 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg -> 0 (100003464c2)
   -67> 2015-04-27 11:25:32.263863 7ff7c08ee700  3 client.74478 ll_getattr
1000001a4d7.head
   -66> 2015-04-27 11:25:32.263866 7ff7c08ee700  3 client.74478 ll_getattr
1000001a4d7.head = 0
   -65> 2015-04-27 11:25:32.263872 7ff7c08ee700  3 client.74478 ll_forget
1000001a4d7 1
   -64> 2015-04-27 11:25:32.263874 7ff7037fe700  3 client.74478 ll_forget
100003459d0 1
   -63> 2015-04-27 11:25:32.263886 7ff7c08ee700  3 client.74478 ll_getattr
100003464c2.head
   -62> 2015-04-27 11:25:32.263889 7ff7c08ee700  3 client.74478 ll_getattr
100003464c2.head = 0
   -61> 2015-04-27 11:25:32.263891 7ff7c3fff700  3 client.74478 ll_forget
100003459d0 1
   -60> 2015-04-27 11:25:32.263900 7ff7c08ee700  3 client.74478 ll_forget
100003464c2 1
   -59> 2015-04-27 11:25:32.263911 7ff7f2ffd700  3 client.74478 ll_lookup
0x7ff6de277990 5_o_rectImg.png
   -58> 2015-04-27 11:25:32.263924 7ff7f2ffd700  1 -- 192.168.1.201:0/24527
--> 192.168.1.210:6800/1299 -- client_request(client.74478:1304984 lookup
#1000001a4d7/5_o_rectImg.png 2015-04-27 11:25:32.263921) v2 -- ?+0
0x7ff7d8010a50 con 0x2b43690
   -57> 2015-04-27 11:25:32.264026 7ff703fff700  3 client.74478 ll_getattr
10000000000.head
   -56> 2015-04-27 11:25:32.264031 7ff703fff700  3 client.74478 ll_getattr
10000000000.head = 0
   -55> 2015-04-27 11:25:32.264035 7ff703fff700  3 client.74478 ll_forget
10000000000 1
   -54> 2015-04-27 11:25:32.264046 7ff7f27fc700  3 client.74478 ll_lookup
0x7ff80000ad70 backup
   -53> 2015-04-27 11:25:32.264052 7ff7f27fc700  3 client.74478 ll_lookup
0x7ff80000ad70 backup -> 0 (100000003e9)
   -52> 2015-04-27 11:25:32.264057 7ff7f27fc700  3 client.74478 ll_forget
10000000000 1
   -51> 2015-04-27 11:25:32.264071 7ff7f1ffb700  3 client.74478 ll_getattr
100000003e9.head
   -50> 2015-04-27 11:25:32.264076 7ff7f1ffb700  3 client.74478 ll_getattr
100000003e9.head = 0
   -49> 2015-04-27 11:25:32.264080 7ff7f1ffb700  3 client.74478 ll_forget
100000003e9 1
   -48> 2015-04-27 11:25:32.264092 7ff7c18f0700  3 client.74478 ll_lookup
0x7ff80000b6c0 11
   -47> 2015-04-27 11:25:32.264098 7ff7c18f0700  3 client.74478 ll_lookup
0x7ff80000b6c0 11 -> 0 (100000b883c)
   -46> 2015-04-27 11:25:32.264104 7ff7c18f0700  3 client.74478 ll_forget
100000003e9 1
   -45> 2015-04-27 11:25:32.264118 7ff7f17fa700  3 client.74478 ll_getattr
100000b883c.head
   -44> 2015-04-27 11:25:32.264124 7ff7f17fa700  3 client.74478 ll_getattr
100000b883c.head = 0
   -43> 2015-04-27 11:25:32.264129 7ff7f17fa700  3 client.74478 ll_forget
100000b883c 1
   -42> 2015-04-27 11:25:32.264141 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff7d3933130 BDH2EY0784
   -41> 2015-04-27 11:25:32.264145 7ff7c10ef700  3 client.74478 ll_lookup
0x7ff7d3933130 BDH2EY0784 -> 0 (1000033ebe6)
   -40> 2015-04-27 11:25:32.264150 7ff7c10ef700  3 client.74478 ll_forget
100000b883c 1
   -39> 2015-04-27 11:25:32.264163 7ff7037fe700  3 client.74478 ll_getattr
1000033ebe6.head
   -38> 2015-04-27 11:25:32.264166 7ff7037fe700  3 client.74478 ll_getattr
1000033ebe6.head = 0
   -37> 2015-04-27 11:25:32.264170 7ff7037fe700  3 client.74478 ll_forget
1000033ebe6 1
   -36> 2015-04-27 11:25:32.264182 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6fd5f81e0 11-BDH2EY0784-BDH2EY0784B60M
   -35> 2015-04-27 11:25:32.264188 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6fd5f81e0 11-BDH2EY0784-BDH2EY0784B60M -> 0 (100003436d6)
   -34> 2015-04-27 11:25:32.264198 7ff7c3fff700  3 client.74478 ll_forget
1000033ebe6 1
   -33> 2015-04-27 11:25:32.264204 7ff7c3fff700  3 client.74478 ll_getattr
100003436d6.head
   -32> 2015-04-27 11:25:32.264208 7ff7c3fff700  3 client.74478 ll_getattr
100003436d6.head = 0
   -31> 2015-04-27 11:25:32.264214 7ff7c3fff700  3 client.74478 ll_forget
100003436d6 1
   -30> 2015-04-27 11:25:32.264219 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822
   -29> 2015-04-27 11:25:32.264224 7ff7c3fff700  3 client.74478 ll_lookup
0x7ff6ed91fd00 2822 -> 0 (100003459d0)
   -28> 2015-04-27 11:25:32.264232 7ff7c3fff700  3 client.74478 ll_forget
100003436d6 1
   -27> 2015-04-27 11:25:32.264238 7ff7c3fff700  3 client.74478 ll_getattr
100003459d0.head
   -26> 2015-04-27 11:25:32.264242 7ff7c3fff700  3 client.74478 ll_getattr
100003459d0.head = 0
   -25> 2015-04-27 11:25:32.264248 7ff7c3fff700  3 client.74478 ll_forget
100003459d0 1
   -24> 2015-04-27 11:25:32.264261 7ff7c08ee700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg
   -23> 2015-04-27 11:25:32.264267 7ff7c08ee700  3 client.74478 ll_lookup
0x7ff6ed92e8c0 4_o_contour.jpg -> 0 (100003464c2)
   -22> 2015-04-27 11:25:32.264275 7ff7c08ee700  3 client.74478 ll_forget
100003459d0 1
   -21> 2015-04-27 11:25:32.264282 7ff7c08ee700  3 client.74478 ll_getattr
100003464c2.head
   -20> 2015-04-27 11:25:32.264286 7ff7c08ee700  3 client.74478 ll_getattr
100003464c2.head = 0
   -19> 2015-04-27 11:25:32.264272 7ff80bfff700  1 -- 192.168.1.201:0/24527
<== mds.0 192.168.1.210:6800/1299 1640799 ==== client_reply(???:1304984 = 0
(0) Success) v1 ==== 662+0+0 (1952303493 0 0) 0x7ff7f8003830 con 0x2b43690
   -18> 2015-04-27 11:25:32.264295 7ff7c08ee700  3 client.74478 ll_forget
100003464c2 1
   -17> 2015-04-27 11:25:32.264321 7ff7c08ee700  3 client.74478 ll_open
100003464c2.head 32768
   -16> 2015-04-27 11:25:32.264327 7ff7c08ee700  1 -- 192.168.1.201:0/24527
--> 192.168.1.210:6800/1299 -- client_caps(update ino 100003464c2 12575430
seq 4 caps=pAsLsXsFscr dirty=- wanted=pFscr follows 0 size 738117/0 ts 1
mtime 2015-04-24 10:17:11.805738 tws 1) v5 -- ?+0 0x2ba68e0 con 0x2b43690
   -15> 2015-04-27 11:25:32.264358 7ff7c08ee700  3 client.74478 ll_open
100003464c2.head 32768 = 0 (0x2b934c0)
   -14> 2015-04-27 11:25:32.264366 7ff7c08ee700  3 client.74478 ll_forget
100003464c2 1
   -13> 2015-04-27 11:25:32.264382 7ff7f2ffd700  3 client.74478 ll_lookup
0x7ff6de277990 5_o_rectImg.png -> 0 (1000001c1bb)
   -12> 2015-04-27 11:25:32.264394 7ff7f2ffd700  3 client.74478 ll_forget
1000001a4d7 1
   -11> 2015-04-27 11:25:32.264403 7ff703fff700  3 client.74478 ll_getattr
100003464c2.head
   -10> 2015-04-27 11:25:32.264411 7ff703fff700  3 client.74478 ll_getattr
100003464c2.head = 0
    -9> 2015-04-27 11:25:32.264635 7ff810178700  2 -- 192.168.1.201:0/24527
>> 192.168.1.212:6804/3029 pipe(0x7ff7ec023730 sd=2 :55878 s=2 pgs=169 cs=1
l=1 c=0x7ff7ec03c5a0).reader couldn't read tag, (11) Resource temporarily
unavailable
    -8> 2015-04-27 11:25:32.264659 7ff810178700  2 -- 192.168.1.201:0/24527
>> 192.168.1.212:6804/3029 pipe(0x7ff7ec023730 sd=2 :55878 s=2 pgs=169 cs=1
l=1 c=0x7ff7ec03c5a0).fault (11) Resource temporarily unavailable
    -7> 2015-04-27 11:25:32.264695 7ff80bfff700  1 client.74478.objecter
ms_handle_reset on osd.18
    -6> 2015-04-27 11:25:32.264709 7ff80bfff700  1 -- 192.168.1.201:0/24527
mark_down 0x7ff7ec03c5a0 -- pipe dne
    -5> 2015-04-27 11:25:32.264761 7ff80bfff700 10 monclient: renew_subs
    -4> 2015-04-27 11:25:32.264767 7ff80bfff700 10 monclient:
_send_mon_message to mon.node0 at 192.168.1.210:6789/0
    -3> 2015-04-27 11:25:32.264774 7ff80bfff700  1 -- 192.168.1.201:0/24527
--> 192.168.1.210:6789/0 --
mon_subscribe({mdsmap=26+,monmap=4+,osdmap=178}) v2 -- ?+0 0x7ff6dfe539e0
con 0x2b40800
    -2> 2015-04-27 11:25:32.265065 7ff80bfff700  1 -- 192.168.1.201:0/24527
<== mon.0 192.168.1.210:6789/0 8520 ==== mon_subscribe_ack(300s) v1 ====
20+0+0 (946084862 0 0) 0x7ff7fc0090f0 con 0x2b40800
    -1> 2015-04-27 11:25:32.265075 7ff80bfff700 10 monclient:
handle_subscribe_ack sent 2015-04-27 11:25:32.264766 renew after 2015-04-27
11:28:02.264766
     0> 2015-04-27 11:25:32.276419 7ff7f27fc700 -1 *** Caught signal
(Segmentation fault) **
 in thread 7ff7f27fc700

 ceph version 0.94.1 (e4bfad3a3c51054df7e537a724c8d0bf9be972ff)
 1: ceph-fuse() [0x62118a]
 2: (()+0x10340) [0x7ff81b9b3340]
 3: (Client::_get_vino(Inode*)+0) [0x54b9f0]
 4: (Client::ll_getattr(Inode*, stat*, int, int)+0x3e) [0x58e3ae]
 5: ceph-fuse() [0x54774d]
 6: (()+0x14b75) [0x7ff81bdebb75]
 7: (()+0x1525b) [0x7ff81bdec25b]
 8: (()+0x11e79) [0x7ff81bde8e79]
 9: (()+0x8182) [0x7ff81b9ab182]
 10: (clone()+0x6d) [0x7ff81a32ffbd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 rbd_replay
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/10 civetweb
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
   0/ 0 refs
   1/ 5 xio
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/ceph-client.admin.log
--- end dump of recent events ---
2015-04-27 11:26:28.009698 7f1586bb97c0  0 ceph version 0.94.1
(e4bfad3a3c51054df7e537a724c8d0bf9be972ff), process ceph-fuse, pid 26677
2015-04-27 11:26:28.011671 7f1586bb97c0 -1 init, newargv = 0x3336360
newargc=11
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to