Re: [ceph-users] ceph-fuse auto down
I attach filesystem to local use this command: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12, 10.3.1.13:6789 /data. The key is right. I attach the client1.tar in last mail. Please check it . Thank you ! 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > How do you attach filesystem to local file? > > Make sure, keyring is located at: > > /etc/ceph.new/ceph.client.admin.keyring > > And your cluster, public networks are fine. > > If you face same problem again, check: > > uptime > > And how about this: > > > tar cvf .tar \ > > /sys/class/net//statistics/* > > When did you face this issue? > From the beginning or...? > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Sunday, September 13, 2015 12:06:25 PM > Subject: Re: [ceph-users] ceph-fuse auto down > > All clients use same ceph-fuse version. All of them by this problem > troubled. Just crash time different. > > > 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > So you are using same version on other clients? > > But only one client has problem? > > > > Can you provide: > > > > /sys/class/net//statistics/* > > > > just do: > > > > tar cvf .tar \ > > /sys/class/net//statistics/* > > > > Can you hold when same issue happen next? > > No reboot is necessary. > > > > But if you have to reboot, of course you can. > > > > Shinobu > > > > - Original Message - > > From: "谷枫" <feiche...@gmail.com> > > To: "Shinobu Kinjo" <ski...@redhat.com> > > Cc: "ceph-users" <ceph-users@lists.ceph.com> > > Sent: Sunday, September 13, 2015 11:30:57 AM > > Subject: Re: [ceph-users] ceph-fuse auto down > > > > Yes, when some ceph-fuse crash , the mount driver has gone, and can't > > remount . Reboot the server is the only way I can do. > > But other client with ceph-fuse mount on them working well. Can writing / > > reading data on them. > > > > ceph-fuse --version > > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) > > > > ceph -s > > cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630 > > health HEALTH_OK > > monmap e1: 3 mons at {ceph01= > > 10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0} > > election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03 > > mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby > > osdmap e26: 4 osds: 4 up, 4 in > > pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects > > 289 GB used, 1709 GB / 1999 GB avail > > 320 active+clean > > client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s > > > > 2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > > > Can you give us package version of ceph-fuse? > > > > > > > Multi ceph-fuse crash just now today. > > > > > > Did you just mount filesystem or was there any > > > activity on filesystem? > > > > > > e.g: writing / reading data > > > > > > Can you give us output of on cluster side: > > > > > > ceph -s > > > > > > Shinobu > > > > > > - Original Message - > > > From: "谷枫" <feiche...@gmail.com> > > > To: "Shinobu Kinjo" <ski...@redhat.com> > > > Cc: "ceph-users" <ceph-users@lists.ceph.com> > > > Sent: Sunday, September 13, 2015 10:51:35 AM > > > Subject: Re: [ceph-users] ceph-fuse auto down > > > > > > sorry Shinobu, > > > I don't understand what's the means what you pasted. > > > Multi ceph-fuse crash just now today. > > > The ceph-fuse completely unusable for me now. > > > Maybe i must change the kernal mount with it. > > > > > > 2015-09-12 20:08 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > > > > > In _usr_bin_ceph-fuse.0.crash.client2.tar > > > > > > > > What I'm seeing now is: > > > > > > > > 3 Date: Sat Sep 12 06:37:47 2015 > > > > ... > > > > 6 ExecutableTimestamp: 1440614242 > > > > ... > > > > 7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring > > -m > > > > 10.3.1.11,10.3.1.12,10.3.1.13 /grdat
Re: [ceph-users] ceph-fuse auto down
Hi,Shinobu I found the logrotate script at /etc/logrotate.d/ceph. In this script osd mon mds will be reload when rotate done. The logrotate and the ceph-fuse crash at same time mainly. So i think the problem with this matter. How do you think? The code snippet in /etc/logrotate.d/ceph: *** for daemon in osd mon mds ; do find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1 -regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf '%P\n' \ | while read f; do if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e "/var/lib/ceph/$daemon/$f/ready" ] && [ -e "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e "/var/lib/ceph/$daemon/$f/sysvinit" ]; then cluster="${f%%-*}" id="${f#*-}" initctl reload ceph-$daemon cluster="$cluster" id="$id" 2>/dev/null || : fi done done *** Thank you! 2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>: > I attach filesystem to local use this command: ceph-fuse -k > /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12, > 10.3.1.13:6789 /data. > > The key is right. > I attach the client1.tar in last mail. Please check it . Thank you ! > > 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > >> How do you attach filesystem to local file? >> >> Make sure, keyring is located at: >> >> /etc/ceph.new/ceph.client.admin.keyring >> >> And your cluster, public networks are fine. >> >> If you face same problem again, check: >> >> uptime >> >> And how about this: >> >> > tar cvf .tar \ >> > /sys/class/net//statistics/* >> >> When did you face this issue? >> From the beginning or...? >> >> Shinobu >> >> - Original Message - >> From: "谷枫" <feiche...@gmail.com> >> To: "Shinobu Kinjo" <ski...@redhat.com> >> Cc: "ceph-users" <ceph-users@lists.ceph.com> >> Sent: Sunday, September 13, 2015 12:06:25 PM >> Subject: Re: [ceph-users] ceph-fuse auto down >> >> All clients use same ceph-fuse version. All of them by this problem >> troubled. Just crash time different. >> >> >> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: >> >> > So you are using same version on other clients? >> > But only one client has problem? >> > >> > Can you provide: >> > >> > /sys/class/net//statistics/* >> > >> > just do: >> > >> > tar cvf .tar \ >> > /sys/class/net//statistics/* >> > >> > Can you hold when same issue happen next? >> > No reboot is necessary. >> > >> > But if you have to reboot, of course you can. >> > >> > Shinobu >> > >> > - Original Message - >> > From: "谷枫" <feiche...@gmail.com> >> > To: "Shinobu Kinjo" <ski...@redhat.com> >> > Cc: "ceph-users" <ceph-users@lists.ceph.com> >> > Sent: Sunday, September 13, 2015 11:30:57 AM >> > Subject: Re: [ceph-users] ceph-fuse auto down >> > >> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't >> > remount . Reboot the server is the only way I can do. >> > But other client with ceph-fuse mount on them working well. Can writing >> / >> > reading data on them. >> > >> > ceph-fuse --version >> > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) >> > >> > ceph -s >> > cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630 >> > health HEALTH_OK >> > monmap e1: 3 mons at {ceph01= >> > 10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0} >> > election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03 >> > mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby >> > osdmap e26: 4 osds: 4 up, 4 in >> > pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects >> > 289 GB used, 1709 GB / 1999 GB avail >> > 320 active+clean >> > client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s >> > >> > 2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: >> > >> > > Can you give us package version of ceph-fuse? >> > > >> > > > Multi ceph-fuse
Re: [ceph-users] ceph-fuse auto down
I deploy the cluster with ceph-deploy and the script made by ceph-deploy. 2015-09-14 10:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Did you made that script, or be there by default? > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Monday, September 14, 2015 10:48:16 AM > Subject: Re: [ceph-users] ceph-fuse auto down > > Hi,Shinobu > > I found the logrotate script at /etc/logrotate.d/ceph. In this script osd > mon mds will be reload when rotate done. > The logrotate and the ceph-fuse crash at same time mainly. > So i think the problem with this matter. > How do you think? > > > The code snippet in /etc/logrotate.d/ceph: > *** > for daemon in osd mon mds ; do > find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1 > -regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf > '%P\n' \ > | while read f; do > if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e > "/var/lib/ceph/$daemon/$f/ready" ] && [ -e > "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e > "/var/lib/ceph/$daemon/$f/sysvinit" ]; then > cluster="${f%%-*}" > id="${f#*-}" > > initctl reload ceph-$daemon cluster="$cluster" > id="$id" 2>/dev/null || : > fi > done > done > *** > Thank you! > > > > 2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>: > > > I attach filesystem to local use this command: ceph-fuse -k > > /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12, > > 10.3.1.13:6789 /data. > > > > The key is right. > > I attach the client1.tar in last mail. Please check it . Thank you ! > > > > 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > >> How do you attach filesystem to local file? > >> > >> Make sure, keyring is located at: > >> > >> /etc/ceph.new/ceph.client.admin.keyring > >> > >> And your cluster, public networks are fine. > >> > >> If you face same problem again, check: > >> > >> uptime > >> > >> And how about this: > >> > >> > tar cvf .tar \ > >> > /sys/class/net//statistics/* > >> > >> When did you face this issue? > >> From the beginning or...? > >> > >> Shinobu > >> > >> - Original Message - > >> From: "谷枫" <feiche...@gmail.com> > >> To: "Shinobu Kinjo" <ski...@redhat.com> > >> Cc: "ceph-users" <ceph-users@lists.ceph.com> > >> Sent: Sunday, September 13, 2015 12:06:25 PM > >> Subject: Re: [ceph-users] ceph-fuse auto down > >> > >> All clients use same ceph-fuse version. All of them by this problem > >> troubled. Just crash time different. > >> > >> > >> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > >> > >> > So you are using same version on other clients? > >> > But only one client has problem? > >> > > >> > Can you provide: > >> > > >> > /sys/class/net//statistics/* > >> > > >> > just do: > >> > > >> > tar cvf .tar \ > >> > /sys/class/net//statistics/* > >> > > >> > Can you hold when same issue happen next? > >> > No reboot is necessary. > >> > > >> > But if you have to reboot, of course you can. > >> > > >> > Shinobu > >> > > >> > - Original Message - > >> > From: "谷枫" <feiche...@gmail.com> > >> > To: "Shinobu Kinjo" <ski...@redhat.com> > >> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > >> > Sent: Sunday, September 13, 2015 11:30:57 AM > >> > Subject: Re: [ceph-users] ceph-fuse auto down > >> > > >> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't > >> > remount . Reboot the server is the only way I can do. > >> > But other client with ceph-fuse mount on them working well. Can > writing > >> / > >> >
Re: [ceph-users] ceph-fuse auto down
The logrotate run at 6:25 everyday (in crontab i saw this -- below red line) cat /etc/crontab SHELL=/bin/sh PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin # m h dom mon dow user command 17 * * * * rootcd / && run-parts --report /etc/cron.hourly *25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )* 47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.weekly ) 52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.monthly ) # The time same to the crash time mainly. 2015-09-14 9:48 GMT+08:00 谷枫 <feiche...@gmail.com>: > Hi,Shinobu > > I found the logrotate script at /etc/logrotate.d/ceph. In this script osd > mon mds will be reload when rotate done. > The logrotate and the ceph-fuse crash at same time mainly. > So i think the problem with this matter. > How do you think? > > > The code snippet in /etc/logrotate.d/ceph: > *** > for daemon in osd mon mds ; do > find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1 > -regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf > '%P\n' \ > | while read f; do > if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e > "/var/lib/ceph/$daemon/$f/ready" ] && [ -e > "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e > "/var/lib/ceph/$daemon/$f/sysvinit" ]; then > cluster="${f%%-*}" > id="${f#*-}" > > initctl reload ceph-$daemon cluster="$cluster" > id="$id" 2>/dev/null || : > fi > done > done > *** > Thank you! > > > > 2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>: > >> I attach filesystem to local use this command: ceph-fuse -k >> /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12, >> 10.3.1.13:6789 /data. >> >> The key is right. >> I attach the client1.tar in last mail. Please check it . Thank you ! >> >> 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: >> >>> How do you attach filesystem to local file? >>> >>> Make sure, keyring is located at: >>> >>> /etc/ceph.new/ceph.client.admin.keyring >>> >>> And your cluster, public networks are fine. >>> >>> If you face same problem again, check: >>> >>> uptime >>> >>> And how about this: >>> >>> > tar cvf .tar \ >>> > /sys/class/net//statistics/* >>> >>> When did you face this issue? >>> From the beginning or...? >>> >>> Shinobu >>> >>> - Original Message - >>> From: "谷枫" <feiche...@gmail.com> >>> To: "Shinobu Kinjo" <ski...@redhat.com> >>> Cc: "ceph-users" <ceph-users@lists.ceph.com> >>> Sent: Sunday, September 13, 2015 12:06:25 PM >>> Subject: Re: [ceph-users] ceph-fuse auto down >>> >>> All clients use same ceph-fuse version. All of them by this problem >>> troubled. Just crash time different. >>> >>> >>> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: >>> >>> > So you are using same version on other clients? >>> > But only one client has problem? >>> > >>> > Can you provide: >>> > >>> > /sys/class/net//statistics/* >>> > >>> > just do: >>> > >>> > tar cvf .tar \ >>> > /sys/class/net//statistics/* >>> > >>> > Can you hold when same issue happen next? >>> > No reboot is necessary. >>> > >>> > But if you have to reboot, of course you can. >>> > >>> > Shinobu >>> > >>> > - Original Message - >>> > From: "谷枫" <feiche...@gmail.com> >>> > To: "Shinobu Kinjo" <ski...@redhat.com> >>> > Cc: "ceph-users" <ceph-users@lists.ceph.com> >>> > Sent: Sunday, September 13, 2015 11:30:57 AM >>> > Subject: Re: [ceph-users] ceph-fuse auto down >>> > >>> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't >>> > remount . Reboot the server is the only way I can do. >>> > But other client with ceph-fuse mount on them
Re: [ceph-users] ceph-fuse auto down
Thank you Shinobu and ZhengYan. Because the ceph cluster run on production and i can't put up with it crash what happend everyday. I change the kernal mount yesterday. I will try to have a test with the logrotate and ceph-fuse on some other server and i have to make a large number of write/read because if there is no or less write/read,the ceph-fuse will not crash. @Shinobu, Can you do me a favor to inspect the coredump with gdb, because i have no experiences with the C、c++ and gdb. But if you very busy and have no time do this , it's ok to me, thank you for you help. 2015-09-14 11:24 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Yes, that is exactly what I'm going to do. > Thanks for your follow-up. > > Shinobu > > - Original Message - > From: "Zheng Yan" <uker...@gmail.com> > To: "谷枫" <feiche...@gmail.com> > Cc: "Shinobu Kinjo" <ski...@redhat.com>, "ceph-users" < > ceph-users@lists.ceph.com> > Sent: Monday, September 14, 2015 12:19:44 PM > Subject: Re: [ceph-users] ceph-fuse auto down > > If it's caused by ceph-fuse crash. Please enable coredump when using > ceph-fuse. When ceph-fuse crash, use gdb to inspect the coredump and > send calltrace to us. This will helps us to locate the bug quickly. > > Yan, Zheng > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
Roger that! 2015-09-14 12:14 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Before dumping core: > > ulimit -c unlimited > > After dumping core: > > gdb > > # Just do backtrace > (gdb)bt > > # There would be some signal. > Then give full output to us. > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Cc: "Zheng Yan" <uker...@gmail.com>, "ceph-users" < > ceph-users@lists.ceph.com> > Sent: Monday, September 14, 2015 12:52:40 PM > Subject: Re: [ceph-users] ceph-fuse auto down > > Thank you Shinobu and ZhengYan. > Because the ceph cluster run on production and i can't put up with it crash > what happend everyday. > I change the kernal mount yesterday. > I will try to have a test with the logrotate and ceph-fuse on some other > server and i have to make a large number of write/read because if there is > no or less write/read,the ceph-fuse will not crash. > > @Shinobu, > Can you do me a favor to inspect the coredump with gdb, because i have > no experiences with the C、c++ and gdb. But if you very busy and have no > time do this , it's ok to me, thank you for you help. > > 2015-09-14 11:24 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > Yes, that is exactly what I'm going to do. > > Thanks for your follow-up. > > > > Shinobu > > > > - Original Message - > > From: "Zheng Yan" <uker...@gmail.com> > > To: "谷枫" <feiche...@gmail.com> > > Cc: "Shinobu Kinjo" <ski...@redhat.com>, "ceph-users" < > > ceph-users@lists.ceph.com> > > Sent: Monday, September 14, 2015 12:19:44 PM > > Subject: Re: [ceph-users] ceph-fuse auto down > > > > If it's caused by ceph-fuse crash. Please enable coredump when using > > ceph-fuse. When ceph-fuse crash, use gdb to inspect the coredump and > > send calltrace to us. This will helps us to locate the bug quickly. > > > > Yan, Zheng > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
sorry Shinobu, I don't understand what's the means what you pasted. Multi ceph-fuse crash just now today. The ceph-fuse completely unusable for me now. Maybe i must change the kernal mount with it. 2015-09-12 20:08 GMT+08:00 Shinobu Kinjo: > In _usr_bin_ceph-fuse.0.crash.client2.tar > > What I'm seeing now is: > > 3 Date: Sat Sep 12 06:37:47 2015 > ... > 6 ExecutableTimestamp: 1440614242 > ... > 7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring -m > 10.3.1.11,10.3.1.12,10.3.1.13 /grdata > ... > 30 7f32de7fe000-7f32deffe000 rw-p 00:00 0 > [stack:17270] > ... > 250 7f341021d000-7f3410295000 r-xp fd:01 267219 >/usr/lib/x86_64-linux-gnu/nss/libfreebl3.so > ... > 255 7f341049b000-7f341054f000 r-xp fd:01 266443 >/usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6 > ... > 260 7f3410754000-7f3410794000 r-xp fd:01 267222 >/usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so > ... > 266 7f3411197000-7f341119a000 r-xp fd:01 264953 >/usr/lib/x86_64-linux-gnu/libplds4.so > ... > 271 7f341139f000-7f341159e000 ---p 4000 fd:01 264955 >/usr/lib/x86_64-linux-gnu/libplc4.so > ... > 274 7f34115a-7f34115c5000 r-xp fd:01 267214 >/usr/lib/x86_64-linux-gnu/libnssutil3.so > ... > 278 7f34117cb000-7f34117ce000 r-xp fd:01 1189512 > /lib/x86_64-linux-gnu/libdl-2.19.so > ... > 287 7f3411d94000-7f3411daa000 r-xp fd:01 1179825 > /lib/x86_64-linux-gnu/libgcc_s.so.1 > ... > 294 7f34122b-7f3412396000 r-xp fd:01 266069 >/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 > ... > 458 State: D (disk sleep) > ... > 359 VmPeak: 5250648 kB > 360 VmSize: 4955592 kB > ... > > What were you trying to do? > > Shinobu > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
Yes, when some ceph-fuse crash , the mount driver has gone, and can't remount . Reboot the server is the only way I can do. But other client with ceph-fuse mount on them working well. Can writing / reading data on them. ceph-fuse --version ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b) ceph -s cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630 health HEALTH_OK monmap e1: 3 mons at {ceph01= 10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0} election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03 mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby osdmap e26: 4 osds: 4 up, 4 in pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects 289 GB used, 1709 GB / 1999 GB avail 320 active+clean client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s 2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Can you give us package version of ceph-fuse? > > > Multi ceph-fuse crash just now today. > > Did you just mount filesystem or was there any > activity on filesystem? > > e.g: writing / reading data > > Can you give us output of on cluster side: > > ceph -s > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Sunday, September 13, 2015 10:51:35 AM > Subject: Re: [ceph-users] ceph-fuse auto down > > sorry Shinobu, > I don't understand what's the means what you pasted. > Multi ceph-fuse crash just now today. > The ceph-fuse completely unusable for me now. > Maybe i must change the kernal mount with it. > > 2015-09-12 20:08 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > In _usr_bin_ceph-fuse.0.crash.client2.tar > > > > What I'm seeing now is: > > > > 3 Date: Sat Sep 12 06:37:47 2015 > > ... > > 6 ExecutableTimestamp: 1440614242 > > ... > > 7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring -m > > 10.3.1.11,10.3.1.12,10.3.1.13 /grdata > > ... > > 30 7f32de7fe000-7f32deffe000 rw-p 00:00 0 > > [stack:17270] > > ... > > 250 7f341021d000-7f3410295000 r-xp fd:01 267219 > >/usr/lib/x86_64-linux-gnu/nss/libfreebl3.so > > ... > > 255 7f341049b000-7f341054f000 r-xp fd:01 266443 > >/usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6 > > ... > > 260 7f3410754000-7f3410794000 r-xp fd:01 267222 > >/usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so > > ... > > 266 7f3411197000-7f341119a000 r-xp fd:01 264953 > >/usr/lib/x86_64-linux-gnu/libplds4.so > > ... > > 271 7f341139f000-7f341159e000 ---p 4000 fd:01 264955 > >/usr/lib/x86_64-linux-gnu/libplc4.so > > ... > > 274 7f34115a-7f34115c5000 r-xp fd:01 267214 > >/usr/lib/x86_64-linux-gnu/libnssutil3.so > > ... > > 278 7f34117cb000-7f34117ce000 r-xp fd:01 1189512 > > /lib/x86_64-linux-gnu/libdl-2.19.so > > ... > > 287 7f3411d94000-7f3411daa000 r-xp fd:01 1179825 > > /lib/x86_64-linux-gnu/libgcc_s.so.1 > > ... > > 294 7f34122b-7f3412396000 r-xp fd:01 266069 > >/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19 > > ... > > 458 State: D (disk sleep) > > ... > > 359 VmPeak: 5250648 kB > > 360 VmSize: 4955592 kB > > ... > > > > What were you trying to do? > > > > Shinobu > > > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
_usr_bin_ceph-fuse.0.crash.client1.tar.gz <https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web> _usr_bin_ceph-fuse.0.crash.client2.tar.gz <https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web> Hi,Shinobu I check the /var/log/dmesg carefully again, but not find the useful message about ceph-fuse crash. So i attach two _usr_bin_ceph-fuse.0.crash files on two clients to you. Please let me know if you want more about other log. Thank you very much! 2015-09-12 13:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Ah, you are using ubuntu, sorry for that. > How about: > > /var/log/dmesg > > I believe you can attach file not paste. > Pasting a bunch of logs would not be good for me -; > > And when did you notice that cephfs was hung? > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Cc: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Saturday, September 12, 2015 1:50:05 PM > Subject: Re: [ceph-users] ceph-fuse auto down > > hi,Shinobu > There is no /var/log/messages on my system but i saw the /var/log/syslog > and no useful messages be found. > I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the "fuse" > on the system. > Below is the message in it : > ProcStatus: > Name: ceph-fuse > State: D (disk sleep) > Tgid: 2903 > Ngid: 0 > Pid: 2903 > PPid: 1 > TracerPid: 0 > Uid: 0 0 0 0 > Gid: 0 0 0 0 > FDSize:64 > Groups:0 > VmPeak: 7428552 kB > VmSize: 6838728 kB > VmLck:0 kB > VmPin:0 kB > VmHWM: 1175864 kB > VmRSS: 343116 kB > VmData: 6786232 kB > VmStk: 136 kB > VmExe: 5628 kB > VmLib: 7456 kB > VmPTE: 3404 kB > VmSwap: 0 kB > Threads: 37 > SigQ: 1/64103 > SigPnd: > ShdPnd: > SigBlk:1000 > SigIgn:1000 > SigCgt:0001c18040eb > CapInh: > CapPrm:003f > CapEff:003f > CapBnd:003f > Seccomp: 0 > Cpus_allowed: > Cpus_allowed_list: 0-15 > Mems_allowed: ,0001 > Mems_allowed_list: 0 > voluntary_ctxt_switches: 25 > nonvoluntary_ctxt_switches:2 > Signal: 11 > Uname: Linux 3.19.0-28-generic x86_64 > UserGroups: > CoreDump: base64 > > Is this useful infomations? > > > 2015-09-12 12:33 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > > > There should be some complains in /var/log/messages. > > Can you attach? > > > > Shinobu > > > > - Original Message - > > From: "谷枫" <feiche...@gmail.com> > > To: "ceph-users" <ceph-users@lists.ceph.com> > > Sent: Saturday, September 12, 2015 1:30:49 PM > > Subject: [ceph-users] ceph-fuse auto down > > > > Hi,all > > My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu > > 14.04 the kernal version is 3.19.0. > > > > I mount the cephfs with ceph-fuse on 9 clients,but some of them > (ceph-fuse > > process) auto down sometimes and i can't find the reason seems like there > > is no other logs can be found except this file > > /var/log/ceph/ceph-client.admin.log that without useful messages for me. > > > > When the ceph-fuse down . The mount driver is gone. > > How can i find the reason of this problem. Can some guys give me good > > ideas? > > > > Regards > > > > > > ___ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
Sorry about that. I re-attach the crash log files and the mds logs. The mds log shows that the client session timeout then mds close the socket right? I think this happend behind the ceph-fuse crash. So the root cause is the ceph-fuse crash . _usr_bin_ceph-fuse.0.crash.client1.tar.gz <https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web> _usr_bin_ceph-fuse.0.crash.client2.tar.gz <https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web> 2015-09-12 19:10 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > Thank you for log archives. > > I went to dentist -; > Please do not forget CCing ceph-users from the next because there is a > bunch of really **awesome** guys; > > Can you re-attach log files again so that they see? > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "Shinobu Kinjo" <ski...@redhat.com> > Sent: Saturday, September 12, 2015 5:32:31 PM > Subject: Re: [ceph-users] ceph-fuse auto down > > The mds log shows that the client session timeout then mds close the socket > right? > I think this happend behind the ceph-fuse crash. So the root cause is the > ceph-fuse crash . > > 2015-09-12 15:09 GMT+08:00 谷枫 <feiche...@gmail.com>: > > > Hi,Shinobu > > I notice some useful message in the mds-log. > > > > 2015-09-12 14:29 GMT+08:00 谷枫 <feiche...@gmail.com>: > > > >> > >> _usr_bin_ceph-fuse.0.crash.client1.tar.gz > >> < > https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web > > > >> > >> _usr_bin_ceph-fuse.0.crash.client2.tar.gz > >> < > https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web > > > >> Hi,Shinobu > >> I check the /var/log/dmesg carefully again, but not find the useful > >> message about ceph-fuse crash. > >> So i attach two _usr_bin_ceph-fuse.0.crash files on two clients to you. > >> Please let me know if you want more about other log. Thank you very > much! > >> > >> 2015-09-12 13:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > >> > >>> Ah, you are using ubuntu, sorry for that. > >>> How about: > >>> > >>> /var/log/dmesg > >>> > >>> I believe you can attach file not paste. > >>> Pasting a bunch of logs would not be good for me -; > >>> > >>> And when did you notice that cephfs was hung? > >>> > >>> Shinobu > >>> > >>> - Original Message - > >>> From: "谷枫" <feiche...@gmail.com> > >>> To: "Shinobu Kinjo" <ski...@redhat.com> > >>> Cc: "ceph-users" <ceph-users@lists.ceph.com> > >>> Sent: Saturday, September 12, 2015 1:50:05 PM > >>> Subject: Re: [ceph-users] ceph-fuse auto down > >>> > >>> hi,Shinobu > >>> There is no /var/log/messages on my system but i saw the > /var/log/syslog > >>> and no useful messages be found. > >>> I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the > "fuse" > >>> on the system. > >>> Below is the message in it : > >>> ProcStatus: > >>> Name: ceph-fuse > >>> State: D (disk sleep) > >>> Tgid: 2903 > >>> Ngid: 0 > >>> Pid: 2903 > >>> PPid: 1 > >>> TracerPid: 0 > >>> Uid: 0 0 0 0 > >>> Gid: 0 0 0 0 > >>> FDSize:64 > >>> Groups:0 > >>> VmPeak: 7428552 kB > >>> VmSize: 6838728 kB > >>> VmLck:0 kB > >>> VmPin:0 kB > >>> VmHWM: 1175864 kB > >>> VmRSS: 343116 kB > >>> VmData: 6786232 kB > >>> VmStk: 136 kB > >>> VmExe: 5628 kB > >>> VmLib: 7456 kB > >>> VmPTE: 3404 kB > >>> VmSwap: 0 kB > >>> Threads: 37 > >>> SigQ: 1/64103 > >>> SigPnd: > >>> ShdPnd: > >>> SigBlk:1000 > >>> SigIgn:1000 > >>> SigCgt:0001c18040eb > >>> CapInh: > >>> CapPrm:003f > >>> CapEff:003f > >>>
[ceph-users] ceph-fuse auto down
Hi,all My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu 14.04 the kernal version is 3.19.0. I mount the cephfs with ceph-fuse on 9 clients,but some of them (ceph-fuse process) auto down sometimes and i can't find the reason seems like there is no other logs can be found except this file /var/log/ceph/ceph-client.admin.log that without useful messages for me. When the ceph-fuse down . The mount driver is gone. How can i find the reason of this problem. Can some guys give me good ideas? Regards ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] ceph-fuse auto down
hi,Shinobu There is no /var/log/messages on my system but i saw the /var/log/syslog and no useful messages be found. I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the "fuse" on the system. Below is the message in it : ProcStatus: Name: ceph-fuse State: D (disk sleep) Tgid: 2903 Ngid: 0 Pid: 2903 PPid: 1 TracerPid: 0 Uid: 0 0 0 0 Gid: 0 0 0 0 FDSize:64 Groups:0 VmPeak: 7428552 kB VmSize: 6838728 kB VmLck:0 kB VmPin:0 kB VmHWM: 1175864 kB VmRSS: 343116 kB VmData: 6786232 kB VmStk: 136 kB VmExe: 5628 kB VmLib: 7456 kB VmPTE: 3404 kB VmSwap: 0 kB Threads: 37 SigQ: 1/64103 SigPnd: ShdPnd: SigBlk:1000 SigIgn:1000 SigCgt:0001c18040eb CapInh: CapPrm:003f CapEff:003f CapBnd:003f Seccomp: 0 Cpus_allowed: Cpus_allowed_list: 0-15 Mems_allowed: ,0001 Mems_allowed_list: 0 voluntary_ctxt_switches: 25 nonvoluntary_ctxt_switches:2 Signal: 11 Uname: Linux 3.19.0-28-generic x86_64 UserGroups: CoreDump: base64 Is this useful infomations? 2015-09-12 12:33 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>: > There should be some complains in /var/log/messages. > Can you attach? > > Shinobu > > - Original Message - > From: "谷枫" <feiche...@gmail.com> > To: "ceph-users" <ceph-users@lists.ceph.com> > Sent: Saturday, September 12, 2015 1:30:49 PM > Subject: [ceph-users] ceph-fuse auto down > > Hi,all > My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu > 14.04 the kernal version is 3.19.0. > > I mount the cephfs with ceph-fuse on 9 clients,but some of them (ceph-fuse > process) auto down sometimes and i can't find the reason seems like there > is no other logs can be found except this file > /var/log/ceph/ceph-client.admin.log that without useful messages for me. > > When the ceph-fuse down . The mount driver is gone. > How can i find the reason of this problem. Can some guys give me good > ideas? > > Regards > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] mds0: Client failing to respond to cache pressure
I change the mds_cache_size to 50 from 10 get rid of the WARN temporary. Now dumping the mds daemon shows like this: inode_max: 50, inodes: 124213, But i have no idea if the indoes rises more than 50 , change the mds_cache_size again? Thanks. 2015-07-15 13:34 GMT+08:00 谷枫 feiche...@gmail.com: I change the mds_cache_size to 50 from 10 get rid of the WARN temporary. Now dumping the mds daemon shows like this: inode_max: 50, inodes: 124213, But i have no idea if the indoes rises more than 50 , change the mds_cache_size again? Thanks. 2015-07-15 11:06 GMT+08:00 Eric Eastman eric.east...@keepertech.com: Hi John, I cut the test down to a single client running only Ganesha NFS without any ceph drivers loaded on the Ceph FS client. After deleting all the files in the Ceph file system, rebooting all the nodes, I restarted the create 5 million file test using 2 NFS clients to the one Ceph file system node running Ganesha NFS. After a couple hours I am seeing the client ede-c2-gw01 failing to respond to cache pressure error: $ ceph -s cluster 6d8aae1e-1125-11e5-a708-001b78e265be health HEALTH_WARN mds0: Client ede-c2-gw01 failing to respond to cache pressure monmap e1: 3 mons at {ede-c2-mon01= 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0 } election epoch 22, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mdsmap e1860: 1/1/1 up {0=ede-c2-mds02=up:active}, 2 up:standby osdmap e323: 8 osds: 8 up, 8 in pgmap v302142: 832 pgs, 4 pools, 162 GB data, 4312 kobjects 182 GB used, 78459 MB / 263 GB avail 832 active+clean Dumping the mds daemon shows inodes inodes_max: # ceph daemon mds.ede-c2-mds02 perf dump mds { mds: { request: 21862302, reply: 21862302, reply_latency: { avgcount: 21862302, sum: 16728.480772060 }, forward: 0, dir_fetch: 13, dir_commit: 50788, dir_split: 0, inode_max: 10, inodes: 100010, inodes_top: 0, inodes_bottom: 0, inodes_pin_tail: 100010, inodes_pinned: 100010, inodes_expired: 4308279, inodes_with_caps: 8, caps: 8, subtrees: 2, traverse: 30802465, traverse_hit: 26394836, traverse_forward: 0, traverse_discover: 0, traverse_dir_fetch: 0, traverse_remote_ino: 0, traverse_lock: 0, load_cent: 2186230200, q: 0, exported: 0, exported_inodes: 0, imported: 0, imported_inodes: 0 } } Once this test finishes and I verify the files were all correctly written, I will retest using the SAMBA VFS interface, followed by the kernel test. Please let me know if there is more info you need and if you want me to open a ticket. Best regards Eric On Mon, Jul 13, 2015 at 9:40 AM, Eric Eastman eric.east...@keepertech.com wrote: Thanks John. I will back the test down to the simple case of 1 client without the kernel driver and only running NFS Ganesha, and work forward till I trip the problem and report my findings. Eric On Mon, Jul 13, 2015 at 2:18 AM, John Spray john.sp...@redhat.com wrote: On 13/07/2015 04:02, Eric Eastman wrote: Hi John, I am seeing this problem with Ceph v9.0.1 with the v4.1 kernel on all nodes. This system is using 4 Ceph FS client systems. They all have the kernel driver version of CephFS loaded, but none are mounting the file system. All 4 clients are using the libcephfs VFS interface to Ganesha NFS (V2.2.0-2) and Samba (Version 4.3.0pre1-GIT-0791bb0) to share out the Ceph file system. # ceph -s cluster 6d8aae1e-1125-11e5-a708-001b78e265be health HEALTH_WARN 4 near full osd(s) mds0: Client ede-c2-gw01 failing to respond to cache pressure mds0: Client ede-c2-gw02:cephfs failing to respond to cache pressure mds0: Client ede-c2-gw03:cephfs failing to respond to cache pressure monmap e1: 3 mons at {ede-c2-mon01= 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0 } election epoch 8, quorum 0,1,2 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03 mdsmap e912: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby osdmap e272: 8 osds: 8 up, 8 in pgmap v225264: 832 pgs, 4 pools, 188 GB data, 5173 kobjects 212 GB used, 48715 MB / 263 GB avail 832 active+clean client io 1379 kB/s rd, 20653 B/s wr, 98 op/s It would help if we knew whether it's the kernel clients or the userspace clients that are generating the warnings here. You've probably already done this, but I'd get rid of any unused kernel
Re: [ceph-users] mds0: Client failing to respond to cache pressure
Thank you John, All my server is ubuntu14.04 with 3.16 kernel. Not all of clients appear this problem, the cluster seems functioning well now. As you say,i will change the mds_cache_size to 50 from 10 to take a test, thanks again! 2015-07-10 17:00 GMT+08:00 John Spray john.sp...@redhat.com: This is usually caused by use of older kernel clients. I don't remember exactly what version it was fixed in, but iirc we've seen the problem with 3.14 and seen it go away with 3.18. If your system is otherwise functioning well, this is not a critical error -- it just means that the MDS might not be able to fully control its memory usage (i.e. it can exceed mds_cache_size). John On 10/07/2015 05:25, 谷枫 wrote: hi, I use CephFS in production environnement with 7osd,1mds,3mon now. So far so good,but i have a problem with it today. The ceph status report this: cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228 health HEALTH_WARN mds0: Client 34271 failing to respond to cache pressure mds0: Client 74175 failing to respond to cache pressure mds0: Client 74181 failing to respond to cache pressure mds0: Client 34247 failing to respond to cache pressure mds0: Client 64162 failing to respond to cache pressure mds0: Client 136744 failing to respond to cache pressure monmap e2: 3 mons at {node01= 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0 http://10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0} election epoch 186, quorum 0,1,2 node01,node02,node03 mdsmap e46: 1/1/1 up {0=tree01=up:active} osdmap e717: 7 osds: 7 up, 7 in pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects 138 GB used, 1364 GB / 1502 GB avail 264 active+clean client io 1018 B/s rd, 1273 B/s wr, 0 op/s I add two osds with the version 0.94.2 and other old osds is 0.94.1 yesterday. So the question is does this matter? What's the warning mean ,and how can i solve this problem.Thanks! This is my cluster config message with mds: name: mds.tree01, debug_mds: 1\/5, debug_mds_balancer: 1\/5, debug_mds_locker: 1\/5, debug_mds_log: 1\/5, debug_mds_log_expire: 1\/5, debug_mds_migrator: 1\/5, admin_socket: \/var\/run\/ceph\/ceph-mds.tree01.asok, log_file: \/var\/log\/ceph\/ceph-mds.tree01.log, keyring: \/var\/lib\/ceph\/mds\/ceph-tree01\/keyring, mon_max_mdsmap_epochs: 500, mon_mds_force_trim_to: 0, mon_debug_dump_location: \/var\/log\/ceph\/ceph-mds.tree01.tdump, client_use_random_mds: false, mds_data: \/var\/lib\/ceph\/mds\/ceph-tree01, mds_max_file_size: 1099511627776, mds_cache_size: 10, mds_cache_mid: 0.7, mds_max_file_recover: 32, mds_mem_max: 1048576, mds_dir_max_commit_size: 10, mds_decay_halflife: 5, mds_beacon_interval: 4, mds_beacon_grace: 15, mds_enforce_unique_name: true, mds_blacklist_interval: 1440, mds_session_timeout: 120, mds_revoke_cap_timeout: 60, mds_recall_state_timeout: 60, mds_freeze_tree_timeout: 30, mds_session_autoclose: 600, mds_health_summarize_threshold: 10, mds_reconnect_timeout: 45, mds_tick_interval: 5, mds_dirstat_min_interval: 1, mds_scatter_nudge_interval: 5, mds_client_prealloc_inos: 1000, mds_early_reply: true, mds_default_dir_hash: 2, mds_log: true, mds_log_skip_corrupt_events: false, mds_log_max_events: -1, mds_log_events_per_segment: 1024, mds_log_segment_size: 0, mds_log_max_segments: 30, mds_log_max_expiring: 20, mds_bal_sample_interval: 3, mds_bal_replicate_threshold: 8000, mds_bal_unreplicate_threshold: 0, mds_bal_frag: false, mds_bal_split_size: 1, mds_bal_split_rd: 25000, mds_bal_split_wr: 1, mds_bal_split_bits: 3, mds_bal_merge_size: 50, mds_bal_merge_rd: 1000, mds_bal_merge_wr: 1000, mds_bal_interval: 10, mds_bal_fragment_interval: 5, mds_bal_idle_threshold: 0, mds_bal_max: -1, mds_bal_max_until: -1, mds_bal_mode: 0, mds_bal_min_rebalance: 0.1, mds_bal_min_start: 0.2, mds_bal_need_min: 0.8, mds_bal_need_max: 1.2, mds_bal_midchunk: 0.3, mds_bal_minchunk: 0.001, mds_bal_target_removal_min: 5, mds_bal_target_removal_max: 10, mds_replay_interval: 1, mds_shutdown_check: 0, mds_thrash_exports: 0, mds_thrash_fragments: 0, mds_dump_cache_on_map: false, mds_dump_cache_after_rejoin: false, mds_verify_scatter: false, mds_debug_scatterstat: false, mds_debug_frag: false, mds_debug_auth_pins: false, mds_debug_subtrees: false, mds_kill_mdstable_at: 0, mds_kill_export_at: 0
[ceph-users] mds0: Client failing to respond to cache pressure
hi, I use CephFS in production environnement with 7osd,1mds,3mon now. So far so good,but i have a problem with it today. The ceph status report this: cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228 health HEALTH_WARN mds0: Client 34271 failing to respond to cache pressure mds0: Client 74175 failing to respond to cache pressure mds0: Client 74181 failing to respond to cache pressure mds0: Client 34247 failing to respond to cache pressure mds0: Client 64162 failing to respond to cache pressure mds0: Client 136744 failing to respond to cache pressure monmap e2: 3 mons at {node01=10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0} election epoch 186, quorum 0,1,2 node01,node02,node03 mdsmap e46: 1/1/1 up {0=tree01=up:active} osdmap e717: 7 osds: 7 up, 7 in pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects 138 GB used, 1364 GB / 1502 GB avail 264 active+clean client io 1018 B/s rd, 1273 B/s wr, 0 op/s I add two osds with the version 0.94.2 and other old osds is 0.94.1 yesterday. So the question is does this matter? What's the warning mean ,and how can i solve this problem.Thanks! This is my cluster config message with mds: name: mds.tree01, debug_mds: 1\/5, debug_mds_balancer: 1\/5, debug_mds_locker: 1\/5, debug_mds_log: 1\/5, debug_mds_log_expire: 1\/5, debug_mds_migrator: 1\/5, admin_socket: \/var\/run\/ceph\/ceph-mds.tree01.asok, log_file: \/var\/log\/ceph\/ceph-mds.tree01.log, keyring: \/var\/lib\/ceph\/mds\/ceph-tree01\/keyring, mon_max_mdsmap_epochs: 500, mon_mds_force_trim_to: 0, mon_debug_dump_location: \/var\/log\/ceph\/ceph-mds.tree01.tdump, client_use_random_mds: false, mds_data: \/var\/lib\/ceph\/mds\/ceph-tree01, mds_max_file_size: 1099511627776, mds_cache_size: 10, mds_cache_mid: 0.7, mds_max_file_recover: 32, mds_mem_max: 1048576, mds_dir_max_commit_size: 10, mds_decay_halflife: 5, mds_beacon_interval: 4, mds_beacon_grace: 15, mds_enforce_unique_name: true, mds_blacklist_interval: 1440, mds_session_timeout: 120, mds_revoke_cap_timeout: 60, mds_recall_state_timeout: 60, mds_freeze_tree_timeout: 30, mds_session_autoclose: 600, mds_health_summarize_threshold: 10, mds_reconnect_timeout: 45, mds_tick_interval: 5, mds_dirstat_min_interval: 1, mds_scatter_nudge_interval: 5, mds_client_prealloc_inos: 1000, mds_early_reply: true, mds_default_dir_hash: 2, mds_log: true, mds_log_skip_corrupt_events: false, mds_log_max_events: -1, mds_log_events_per_segment: 1024, mds_log_segment_size: 0, mds_log_max_segments: 30, mds_log_max_expiring: 20, mds_bal_sample_interval: 3, mds_bal_replicate_threshold: 8000, mds_bal_unreplicate_threshold: 0, mds_bal_frag: false, mds_bal_split_size: 1, mds_bal_split_rd: 25000, mds_bal_split_wr: 1, mds_bal_split_bits: 3, mds_bal_merge_size: 50, mds_bal_merge_rd: 1000, mds_bal_merge_wr: 1000, mds_bal_interval: 10, mds_bal_fragment_interval: 5, mds_bal_idle_threshold: 0, mds_bal_max: -1, mds_bal_max_until: -1, mds_bal_mode: 0, mds_bal_min_rebalance: 0.1, mds_bal_min_start: 0.2, mds_bal_need_min: 0.8, mds_bal_need_max: 1.2, mds_bal_midchunk: 0.3, mds_bal_minchunk: 0.001, mds_bal_target_removal_min: 5, mds_bal_target_removal_max: 10, mds_replay_interval: 1, mds_shutdown_check: 0, mds_thrash_exports: 0, mds_thrash_fragments: 0, mds_dump_cache_on_map: false, mds_dump_cache_after_rejoin: false, mds_verify_scatter: false, mds_debug_scatterstat: false, mds_debug_frag: false, mds_debug_auth_pins: false, mds_debug_subtrees: false, mds_kill_mdstable_at: 0, mds_kill_export_at: 0, mds_kill_import_at: 0, mds_kill_link_at: 0, mds_kill_rename_at: 0, mds_kill_openc_at: 0, mds_kill_journal_at: 0, mds_kill_journal_expire_at: 0, mds_kill_journal_replay_at: 0, mds_journal_format: 1, mds_kill_create_at: 0, mds_inject_traceless_reply_probability: 0, mds_wipe_sessions: false, mds_wipe_ino_prealloc: false, mds_skip_ino: 0, max_mds: 1, mds_standby_for_name: , mds_standby_for_rank: -1, mds_standby_replay: false, mds_enable_op_tracker: true, mds_op_history_size: 20, mds_op_history_duration: 600, mds_op_complaint_time: 30, mds_op_log_threshold: 5, mds_snap_min_uid: 0, mds_snap_max_uid: 65536, mds_verify_backtrace: 1, mds_action_on_write_error: 1, ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] MDS closing stale session
Hi everyone, I hava a five nodes ceph cluster with cephfs.Mount the ceph partition with ceph-fuse tools. I met a serious problem has no omens. One of the node the ceph-fuse procs down and the ceph partition that mounted with the ceph-fuse tools change to unavailable. ls the ceph partition, it's like this: d? ? ?? ?? ceph-data/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS closing stale session
sorry i send this mail careless, continue The mds error is : 2015-06-05 09:59:25.012130 7fa1ed118700 0 -- 10.3.1.5:6800/1365 10.3.1.4:0/18748 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0 c=0x4f935a0).fault with nothing to send, going to standby 2015-06-05 10:03:40.767822 7fa1f0a27700 0 log_channel(cluster) log [INF] : closing stale session client.24153 10.3.1.4:0/18748 after 300.071624 The apport error is : ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: called for pid 6331, signal 11, core limit 0 ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: executable: /usr/bin/ceph-fuse (command line ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata) ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 30184) Fri Jun 5 12:13:33 2015: wrote report /var/crash/_usr_bin_ceph-fuse.0.crash But the ceph-s is OK cluster add8fa43-9f84-4b5d-df32-095e3421a228 health HEALTH_OK monmap e2: 3 mons at {node01= 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0} election epoch 44, quorum 0,1,2 node01,node02,node03 mdsmap e37: 1/1/1 up {0=osd01=up:active} osdmap e526: 5 osds: 5 up, 5 in pgmap v392315: 264 pgs, 3 pools, 26953 MB data, 106 kobjects 81519 MB used, 1036 GB / 1115 GB avail 264 active+clean client io 10171 B/s wr, 1 op/s When i remount the ceph-partition, it's get nomal. I want to know is this a ceph bug ? or the ceph-fuse tool's bug? Should i change the mount type with the mount -t ceph ? 2015-06-05 22:34 GMT+08:00 谷枫 feiche...@gmail.com: Hi everyone, I hava a five nodes ceph cluster with cephfs.Mount the ceph partition with ceph-fuse tools. I met a serious problem has no omens. One of the node the ceph-fuse procs down and the ceph partition that mounted with the ceph-fuse tools change to unavailable. ls the ceph partition, it's like this: d? ? ?? ?? ceph-data/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS closing stale session
This is the /var/log/ceph/ceph-client.admin.log, but i found the time is late to the fault. 2015-06-05 10:29:07.180531 7f3f601dd7c0 0 ceph version 0.94.1 (e4bfad3a3c51054df7e234234ac8d0bf9be972ff), process ceph-fuse, pid 14002 2015-06-05 10:29:07.186763 7f3f601dd7c0 -1 init, newargv = 0x2c846e0 newargc=11 2015-06-05 10:29:07.186832 7f3f601dd7c0 -1 fuse_parse_cmdline failed. 2015-06-05 10:29:59.349762 7fed376c37c0 0 ceph version 0.94.1 (e4bfad3a3c51054df7e234234ac8d0bf9be972ff), process ceph-fuse, pid 14283 2015-06-05 10:29:59.352227 7fed376c37c0 -1 init, newargv = 0x2ca06b0 newargc=11 2015-06-05 12:42:40.005628 7fed277fe700 0 monclient: hunting for new mon 2015-06-05 12:42:40.005772 7fed277fe700 0 client.44334 ms_handle_reset on 10.3.1.2:6789/0 2015-06-05 14:31:46.431353 7fed2c478700 0 -- 10.3.1.4:0/14285 10.3.1.5:6800/1365 pipe(0x2cab000 sd=9 :56628 s=2 pgs=186 cs=1 l=0 c=0x2c9b790).fault, initiating reconnect 2015-06-05 14:31:46.433065 7fed2c579700 0 -- 10.3.1.4:0/14285 10.3.1.5:6800/1365 pipe(0x2cab000 sd=9 :57017 s=1 pgs=186 cs=2 l=0 c=0x2c9b790).fault 2015-06-05 22:51 GMT+08:00 John Spray john.sp...@redhat.com: On 05/06/2015 15:41, 谷枫 wrote: sorry i send this mail careless, continue The mds error is : 2015-06-05 09:59:25.012130 7fa1ed118700 0 -- 10.3.1.5:6800/1365 http://10.3.1.5:6800/1365 10.3.1.4:0/18748 http://10.3.1.4:0/18748 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0 c=0x4f935a0).fault with nothing to send, going to standby 2015-06-05 10:03:40.767822 7fa1f0a27700 0 log_channel(cluster) log [INF] : closing stale session client.24153 10.3.1.4:0/18748 http://10.3.1.4:0/18748 after 300.071624 The apport error is : ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: called for pid 6331, signal 11, core limit 0 ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: executable: /usr/bin/ceph-fuse (command line ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata) ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 30184) Fri Jun 5 12:13:33 2015: wrote report /var/crash/_usr_bin_ceph-fuse.0.crash Signal 11 is a segmentation fault, so it seems likely that ceph-fuse has hit a bad bug. Look in /var/log/ceph on the client machine for a client log with a backtrace in it. John ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] MDS closing stale session
Sorry to send a warong log with the apport. because i met the same problem twice today. This is the right time apport log . ERROR: apport (pid 7601) Fri Jun 5 09:58:45 2015: called for pid 18748, signal 6, core limit 0 ERROR: apport (pid 7601) Fri Jun 5 09:58:45 2015: executable: /usr/bin/ceph-fuse (command line ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m rain01,rain02,rain03 /grdata) ERROR: apport (pid 7601) Fri Jun 5 09:58:45 2015: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 7601) Fri Jun 5 09:59:25 2015: wrote report /var/crash/_usr_bin_ceph-fuse.0.crash ERROR: apport (pid 7733) Fri Jun 5 09:59:25 2015: pid 7578 crashed in a container At the time, the osd node has error log too. 2015-06-05 09:58:44.809822 7fac0de07700 0 -- 10.3.1.5:68**/ **.3.1.4:0/18748 pipe(0x10a8e000 sd=47 :6801 s=2 pgs=1253 cs=1 l=1 c=0x1112b860).reader bad tag 116 2015-06-05 09:58:44.812270 7fac0de07700 0 -- 10.3.1.5:68**/ **.3.1.4:0/18748 pipe(0x146e3000 sd=47 :6801 s=2 pgs=1359 cs=1 l=1 c=0x15310ec0).reader bad tag 32 2015-06-05 22:41 GMT+08:00 谷枫 feiche...@gmail.com: sorry i send this mail careless, continue The mds error is : 2015-06-05 09:59:25.012130 7fa1ed118700 0 -- 10.3.1.5:6800/1365 10.3.1.4:0/18748 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0 c=0x4f935a0).fault with nothing to send, going to standby 2015-06-05 10:03:40.767822 7fa1f0a27700 0 log_channel(cluster) log [INF] : closing stale session client.24153 10.3.1.4:0/18748 after 300.071624 The apport error is : ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: called for pid 6331, signal 11, core limit 0 ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: executable: /usr/bin/ceph-fuse (command line ceph-fuse -k /etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata) ERROR: apport (pid 30184) Fri Jun 5 12:13:03 2015: is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment ERROR: apport (pid 30184) Fri Jun 5 12:13:33 2015: wrote report /var/crash/_usr_bin_ceph-fuse.0.crash But the ceph-s is OK cluster add8fa43-9f84-4b5d-df32-095e3421a228 health HEALTH_OK monmap e2: 3 mons at {node01= 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0} election epoch 44, quorum 0,1,2 node01,node02,node03 mdsmap e37: 1/1/1 up {0=osd01=up:active} osdmap e526: 5 osds: 5 up, 5 in pgmap v392315: 264 pgs, 3 pools, 26953 MB data, 106 kobjects 81519 MB used, 1036 GB / 1115 GB avail 264 active+clean client io 10171 B/s wr, 1 op/s When i remount the ceph-partition, it's get nomal. I want to know is this a ceph bug ? or the ceph-fuse tool's bug? Should i change the mount type with the mount -t ceph ? 2015-06-05 22:34 GMT+08:00 谷枫 feiche...@gmail.com: Hi everyone, I hava a five nodes ceph cluster with cephfs.Mount the ceph partition with ceph-fuse tools. I met a serious problem has no omens. One of the node the ceph-fuse procs down and the ceph partition that mounted with the ceph-fuse tools change to unavailable. ls the ceph partition, it's like this: d? ? ?? ?? ceph-data/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com