Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
I attach filesystem to local use this command: ceph-fuse -k
/etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12,
10.3.1.13:6789 /data.

The key is right.
I attach the client1.tar in last mail. Please check it . Thank you !

2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> How do you attach filesystem to local file?
>
> Make sure, keyring is located at:
>
>   /etc/ceph.new/ceph.client.admin.keyring
>
> And your cluster, public networks are fine.
>
> If you face same problem again, check:
>
>   uptime
>
> And how about this:
>
> >   tar cvf .tar \
> >   /sys/class/net//statistics/*
>
> When did you face this issue?
> From the beginning or...?
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Sunday, September 13, 2015 12:06:25 PM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> All clients use same ceph-fuse version. All of them by this problem
> troubled. Just crash time different.
>
>
> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>
> > So you are using same version on other clients?
> > But only one client has problem?
> >
> > Can you provide:
> >
> >   /sys/class/net//statistics/*
> >
> > just do:
> >
> >   tar cvf .tar \
> >   /sys/class/net//statistics/*
> >
> > Can you hold when same issue happen next?
> > No reboot is necessary.
> >
> > But if you have to reboot, of course you can.
> >
> > Shinobu
> >
> > - Original Message -
> > From: "谷枫" <feiche...@gmail.com>
> > To: "Shinobu Kinjo" <ski...@redhat.com>
> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
> > Sent: Sunday, September 13, 2015 11:30:57 AM
> > Subject: Re: [ceph-users] ceph-fuse auto down
> >
> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't
> > remount . Reboot the server is the only way I can do.
> > But other client with ceph-fuse mount on them working well. Can writing /
> > reading data on them.
> >
> > ceph-fuse --version
> > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
> >
> > ceph -s
> > cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630
> >  health HEALTH_OK
> >  monmap e1: 3 mons at {ceph01=
> > 10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0}
> > election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
> >  mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby
> >  osdmap e26: 4 osds: 4 up, 4 in
> >   pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects
> > 289 GB used, 1709 GB / 1999 GB avail
> >  320 active+clean
> >   client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s
> >
> > 2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
> >
> > > Can you give us package version of ceph-fuse?
> > >
> > > > Multi ceph-fuse crash just now today.
> > >
> > > Did you just mount filesystem or was there any
> > > activity on filesystem?
> > >
> > >   e.g: writing / reading data
> > >
> > > Can you give us output of on cluster side:
> > >
> > >   ceph -s
> > >
> > > Shinobu
> > >
> > > - Original Message -
> > > From: "谷枫" <feiche...@gmail.com>
> > > To: "Shinobu Kinjo" <ski...@redhat.com>
> > > Cc: "ceph-users" <ceph-users@lists.ceph.com>
> > > Sent: Sunday, September 13, 2015 10:51:35 AM
> > > Subject: Re: [ceph-users] ceph-fuse auto down
> > >
> > > sorry Shinobu,
> > > I don't understand what's the means what you pasted.
> > > Multi ceph-fuse crash just now today.
> > > The ceph-fuse completely unusable for me now.
> > > Maybe i must change the kernal mount with it.
> > >
> > > 2015-09-12 20:08 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
> > >
> > > > In _usr_bin_ceph-fuse.0.crash.client2.tar
> > > >
> > > > What I'm seeing now is:
> > > >
> > > >   3 Date: Sat Sep 12 06:37:47 2015
> > > >  ...
> > > >   6 ExecutableTimestamp: 1440614242
> > > >  ...
> > > >   7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring
> > -m
> > > > 10.3.1.11,10.3.1.12,10.3.1.13 /grdat

Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
Hi,Shinobu

I found the logrotate script at /etc/logrotate.d/ceph. In this script osd
mon mds will be reload when rotate done.
The logrotate and the ceph-fuse crash at same time mainly.
So i think the problem with this matter.
How do you think?


The code snippet in  /etc/logrotate.d/ceph:
***
for daemon in osd mon mds ; do
  find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1
-regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf
'%P\n' \
| while read f; do
if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e
"/var/lib/ceph/$daemon/$f/ready" ] && [ -e
"/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e
"/var/lib/ceph/$daemon/$f/sysvinit" ]; then
  cluster="${f%%-*}"
  id="${f#*-}"

  initctl reload ceph-$daemon cluster="$cluster"
id="$id" 2>/dev/null || :
fi
  done
    done
***
Thank you!



2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>:

> I attach filesystem to local use this command: ceph-fuse -k
> /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12,
> 10.3.1.13:6789 /data.
>
> The key is right.
> I attach the client1.tar in last mail. Please check it . Thank you !
>
> 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>
>> How do you attach filesystem to local file?
>>
>> Make sure, keyring is located at:
>>
>>   /etc/ceph.new/ceph.client.admin.keyring
>>
>> And your cluster, public networks are fine.
>>
>> If you face same problem again, check:
>>
>>   uptime
>>
>> And how about this:
>>
>> >   tar cvf .tar \
>> >   /sys/class/net//statistics/*
>>
>> When did you face this issue?
>> From the beginning or...?
>>
>> Shinobu
>>
>> - Original Message -
>> From: "谷枫" <feiche...@gmail.com>
>> To: "Shinobu Kinjo" <ski...@redhat.com>
>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
>> Sent: Sunday, September 13, 2015 12:06:25 PM
>> Subject: Re: [ceph-users] ceph-fuse auto down
>>
>> All clients use same ceph-fuse version. All of them by this problem
>> troubled. Just crash time different.
>>
>>
>> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>>
>> > So you are using same version on other clients?
>> > But only one client has problem?
>> >
>> > Can you provide:
>> >
>> >   /sys/class/net//statistics/*
>> >
>> > just do:
>> >
>> >   tar cvf .tar \
>> >   /sys/class/net//statistics/*
>> >
>> > Can you hold when same issue happen next?
>> > No reboot is necessary.
>> >
>> > But if you have to reboot, of course you can.
>> >
>> > Shinobu
>> >
>> > - Original Message -
>> > From: "谷枫" <feiche...@gmail.com>
>> > To: "Shinobu Kinjo" <ski...@redhat.com>
>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
>> > Sent: Sunday, September 13, 2015 11:30:57 AM
>> > Subject: Re: [ceph-users] ceph-fuse auto down
>> >
>> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't
>> > remount . Reboot the server is the only way I can do.
>> > But other client with ceph-fuse mount on them working well. Can writing
>> /
>> > reading data on them.
>> >
>> > ceph-fuse --version
>> > ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)
>> >
>> > ceph -s
>> > cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630
>> >  health HEALTH_OK
>> >  monmap e1: 3 mons at {ceph01=
>> > 10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0}
>> > election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
>> >  mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby
>> >  osdmap e26: 4 osds: 4 up, 4 in
>> >   pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects
>> > 289 GB used, 1709 GB / 1999 GB avail
>> >  320 active+clean
>> >   client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s
>> >
>> > 2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>> >
>> > > Can you give us package version of ceph-fuse?
>> > >
>> > > > Multi ceph-fuse 

Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
I deploy the cluster with ceph-deploy and the script made by ceph-deploy.

2015-09-14 10:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Did you made that script, or be there by default?
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Monday, September 14, 2015 10:48:16 AM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> Hi,Shinobu
>
> I found the logrotate script at /etc/logrotate.d/ceph. In this script osd
> mon mds will be reload when rotate done.
> The logrotate and the ceph-fuse crash at same time mainly.
> So i think the problem with this matter.
> How do you think?
>
>
> The code snippet in  /etc/logrotate.d/ceph:
> ***
> for daemon in osd mon mds ; do
>   find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1
> -regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf
> '%P\n' \
> | while read f; do
> if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e
> "/var/lib/ceph/$daemon/$f/ready" ] && [ -e
> "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e
> "/var/lib/ceph/$daemon/$f/sysvinit" ]; then
>   cluster="${f%%-*}"
>   id="${f#*-}"
>
>   initctl reload ceph-$daemon cluster="$cluster"
> id="$id" 2>/dev/null || :
> fi
>   done
> done
> ***
> Thank you!
>
>
>
> 2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>:
>
> > I attach filesystem to local use this command: ceph-fuse -k
> > /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12,
> > 10.3.1.13:6789 /data.
> >
> > The key is right.
> > I attach the client1.tar in last mail. Please check it . Thank you !
> >
> > 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
> >
> >> How do you attach filesystem to local file?
> >>
> >> Make sure, keyring is located at:
> >>
> >>   /etc/ceph.new/ceph.client.admin.keyring
> >>
> >> And your cluster, public networks are fine.
> >>
> >> If you face same problem again, check:
> >>
> >>   uptime
> >>
> >> And how about this:
> >>
> >> >   tar cvf .tar \
> >> >   /sys/class/net//statistics/*
> >>
> >> When did you face this issue?
> >> From the beginning or...?
> >>
> >> Shinobu
> >>
> >> - Original Message -
> >> From: "谷枫" <feiche...@gmail.com>
> >> To: "Shinobu Kinjo" <ski...@redhat.com>
> >> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> >> Sent: Sunday, September 13, 2015 12:06:25 PM
> >> Subject: Re: [ceph-users] ceph-fuse auto down
> >>
> >> All clients use same ceph-fuse version. All of them by this problem
> >> troubled. Just crash time different.
> >>
> >>
> >> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
> >>
> >> > So you are using same version on other clients?
> >> > But only one client has problem?
> >> >
> >> > Can you provide:
> >> >
> >> >   /sys/class/net//statistics/*
> >> >
> >> > just do:
> >> >
> >> >   tar cvf .tar \
> >> >   /sys/class/net//statistics/*
> >> >
> >> > Can you hold when same issue happen next?
> >> > No reboot is necessary.
> >> >
> >> > But if you have to reboot, of course you can.
> >> >
> >> > Shinobu
> >> >
> >> > - Original Message -
> >> > From: "谷枫" <feiche...@gmail.com>
> >> > To: "Shinobu Kinjo" <ski...@redhat.com>
> >> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
> >> > Sent: Sunday, September 13, 2015 11:30:57 AM
> >> > Subject: Re: [ceph-users] ceph-fuse auto down
> >> >
> >> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't
> >> > remount . Reboot the server is the only way I can do.
> >> > But other client with ceph-fuse mount on them working well. Can
> writing
> >> /
> >> > 

Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
The logrotate run at 6:25 everyday (in crontab i saw this -- below red line)

cat /etc/crontab

SHELL=/bin/sh
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin

# m h dom mon dow user command
17 * * * * rootcd / && run-parts --report /etc/cron.hourly
*25 6 * * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report
/etc/cron.daily )*
47 6 * * 7 root test -x /usr/sbin/anacron || ( cd / && run-parts --report
/etc/cron.weekly )
52 6 1 * * root test -x /usr/sbin/anacron || ( cd / && run-parts --report
/etc/cron.monthly )
#

The time same to the crash time mainly.

2015-09-14 9:48 GMT+08:00 谷枫 <feiche...@gmail.com>:

> Hi,Shinobu
>
> I found the logrotate script at /etc/logrotate.d/ceph. In this script osd
> mon mds will be reload when rotate done.
> The logrotate and the ceph-fuse crash at same time mainly.
> So i think the problem with this matter.
> How do you think?
>
>
> The code snippet in  /etc/logrotate.d/ceph:
> ***
> for daemon in osd mon mds ; do
>   find -L /var/lib/ceph/$daemon/ -mindepth 1 -maxdepth 1
> -regextype posix-egrep -regex '.*/[A-Za-z0-9]+-[A-Za-z0-9._-]+' -printf
> '%P\n' \
> | while read f; do
> if [ -e "/var/lib/ceph/$daemon/$f/done" -o -e
> "/var/lib/ceph/$daemon/$f/ready" ] && [ -e
> "/var/lib/ceph/$daemon/$f/upstart" ] && [ ! -e
> "/var/lib/ceph/$daemon/$f/sysvinit" ]; then
>   cluster="${f%%-*}"
>   id="${f#*-}"
>
>   initctl reload ceph-$daemon cluster="$cluster"
> id="$id" 2>/dev/null || :
> fi
>   done
> done
> ***
> Thank you!
>
>
>
> 2015-09-14 8:50 GMT+08:00 谷枫 <feiche...@gmail.com>:
>
>> I attach filesystem to local use this command: ceph-fuse -k
>> /etc/ceph.new/ceph.client.admin.keyring -m 10.3.1.11,10.3.1.12,
>> 10.3.1.13:6789 /data.
>>
>> The key is right.
>> I attach the client1.tar in last mail. Please check it . Thank you !
>>
>> 2015-09-13 15:12 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>>
>>> How do you attach filesystem to local file?
>>>
>>> Make sure, keyring is located at:
>>>
>>>   /etc/ceph.new/ceph.client.admin.keyring
>>>
>>> And your cluster, public networks are fine.
>>>
>>> If you face same problem again, check:
>>>
>>>   uptime
>>>
>>> And how about this:
>>>
>>> >   tar cvf .tar \
>>> >   /sys/class/net//statistics/*
>>>
>>> When did you face this issue?
>>> From the beginning or...?
>>>
>>> Shinobu
>>>
>>> - Original Message -
>>> From: "谷枫" <feiche...@gmail.com>
>>> To: "Shinobu Kinjo" <ski...@redhat.com>
>>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
>>> Sent: Sunday, September 13, 2015 12:06:25 PM
>>> Subject: Re: [ceph-users] ceph-fuse auto down
>>>
>>> All clients use same ceph-fuse version. All of them by this problem
>>> troubled. Just crash time different.
>>>
>>>
>>> 2015-09-13 10:39 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>>>
>>> > So you are using same version on other clients?
>>> > But only one client has problem?
>>> >
>>> > Can you provide:
>>> >
>>> >   /sys/class/net//statistics/*
>>> >
>>> > just do:
>>> >
>>> >   tar cvf .tar \
>>> >   /sys/class/net//statistics/*
>>> >
>>> > Can you hold when same issue happen next?
>>> > No reboot is necessary.
>>> >
>>> > But if you have to reboot, of course you can.
>>> >
>>> > Shinobu
>>> >
>>> > - Original Message -
>>> > From: "谷枫" <feiche...@gmail.com>
>>> > To: "Shinobu Kinjo" <ski...@redhat.com>
>>> > Cc: "ceph-users" <ceph-users@lists.ceph.com>
>>> > Sent: Sunday, September 13, 2015 11:30:57 AM
>>> > Subject: Re: [ceph-users] ceph-fuse auto down
>>> >
>>> > Yes, when some ceph-fuse crash , the mount driver has gone, and can't
>>> > remount . Reboot the server is the only way I can do.
>>> > But other client with ceph-fuse mount on them 

Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
Thank you Shinobu and ZhengYan.
Because the ceph cluster run on production and i can't put up with it crash
what happend everyday.
I change the kernal mount yesterday.
I will try to have a test with the logrotate and ceph-fuse on some other
server and i have to make a large number of write/read because if there is
no or less write/read,the ceph-fuse will not crash.

@Shinobu,
Can you do me a favor to inspect the coredump with gdb, because i have
no experiences with the C、c++  and gdb. But if you very busy and have no
time do this , it's ok to me, thank you for you help.

2015-09-14 11:24 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Yes, that is exactly what I'm going to do.
> Thanks for your follow-up.
>
> Shinobu
>
> - Original Message -
> From: "Zheng Yan" <uker...@gmail.com>
> To: "谷枫" <feiche...@gmail.com>
> Cc: "Shinobu Kinjo" <ski...@redhat.com>, "ceph-users" <
> ceph-users@lists.ceph.com>
> Sent: Monday, September 14, 2015 12:19:44 PM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> If it's caused by ceph-fuse crash. Please enable coredump when using
> ceph-fuse. When ceph-fuse crash, use gdb to inspect the coredump and
> send calltrace to us. This will helps us to locate the bug quickly.
>
> Yan, Zheng
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-13 Thread
Roger that!

2015-09-14 12:14 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Before dumping core:
>
>  ulimit -c unlimited
>
> After dumping core:
>
>  gdb  
>
> # Just do backtrace
>  (gdb)bt
>
> # There would be some signal.
> Then give full output to us.
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Cc: "Zheng Yan" <uker...@gmail.com>, "ceph-users" <
> ceph-users@lists.ceph.com>
> Sent: Monday, September 14, 2015 12:52:40 PM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> Thank you Shinobu and ZhengYan.
> Because the ceph cluster run on production and i can't put up with it crash
> what happend everyday.
> I change the kernal mount yesterday.
> I will try to have a test with the logrotate and ceph-fuse on some other
> server and i have to make a large number of write/read because if there is
> no or less write/read,the ceph-fuse will not crash.
>
> @Shinobu,
> Can you do me a favor to inspect the coredump with gdb, because i have
> no experiences with the C、c++  and gdb. But if you very busy and have no
> time do this , it's ok to me, thank you for you help.
>
> 2015-09-14 11:24 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>
> > Yes, that is exactly what I'm going to do.
> > Thanks for your follow-up.
> >
> > Shinobu
> >
> > - Original Message -
> > From: "Zheng Yan" <uker...@gmail.com>
> > To: "谷枫" <feiche...@gmail.com>
> > Cc: "Shinobu Kinjo" <ski...@redhat.com>, "ceph-users" <
> > ceph-users@lists.ceph.com>
> > Sent: Monday, September 14, 2015 12:19:44 PM
> > Subject: Re: [ceph-users] ceph-fuse auto down
> >
> > If it's caused by ceph-fuse crash. Please enable coredump when using
> > ceph-fuse. When ceph-fuse crash, use gdb to inspect the coredump and
> > send calltrace to us. This will helps us to locate the bug quickly.
> >
> > Yan, Zheng
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-12 Thread
sorry Shinobu,
I don't understand what's the means what you pasted.
Multi ceph-fuse crash just now today.
The ceph-fuse completely unusable for me now.
Maybe i must change the kernal mount with it.

2015-09-12 20:08 GMT+08:00 Shinobu Kinjo :

> In _usr_bin_ceph-fuse.0.crash.client2.tar
>
> What I'm seeing now is:
>
>   3 Date: Sat Sep 12 06:37:47 2015
>  ...
>   6 ExecutableTimestamp: 1440614242
>  ...
>   7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring -m
> 10.3.1.11,10.3.1.12,10.3.1.13 /grdata
>  ...
>  30  7f32de7fe000-7f32deffe000 rw-p  00:00 0
> [stack:17270]
>  ...
> 250  7f341021d000-7f3410295000 r-xp  fd:01 267219
>/usr/lib/x86_64-linux-gnu/nss/libfreebl3.so
>  ...
> 255  7f341049b000-7f341054f000 r-xp  fd:01 266443
>/usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
>  ...
> 260  7f3410754000-7f3410794000 r-xp  fd:01 267222
>/usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so
>  ...
> 266  7f3411197000-7f341119a000 r-xp  fd:01 264953
>/usr/lib/x86_64-linux-gnu/libplds4.so
>  ...
> 271  7f341139f000-7f341159e000 ---p 4000 fd:01 264955
>/usr/lib/x86_64-linux-gnu/libplc4.so
>  ...
> 274  7f34115a-7f34115c5000 r-xp  fd:01 267214
>/usr/lib/x86_64-linux-gnu/libnssutil3.so
>  ...
> 278  7f34117cb000-7f34117ce000 r-xp  fd:01 1189512
> /lib/x86_64-linux-gnu/libdl-2.19.so
>  ...
> 287  7f3411d94000-7f3411daa000 r-xp  fd:01 1179825
> /lib/x86_64-linux-gnu/libgcc_s.so.1
>  ...
> 294  7f34122b-7f3412396000 r-xp  fd:01 266069
>/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19
>  ...
> 458  State: D (disk sleep)
>  ...
> 359  VmPeak: 5250648 kB
> 360  VmSize: 4955592 kB
>  ...
>
> What were you trying to do?
>
> Shinobu
>
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-12 Thread
Yes, when some ceph-fuse crash , the mount driver has gone, and can't
remount . Reboot the server is the only way I can do.
But other client with ceph-fuse mount on them working well. Can writing /
reading data on them.

ceph-fuse --version
ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)

ceph -s
cluster 0fddc8e0-9e64-4049-902a-2f0f6d531630
 health HEALTH_OK
 monmap e1: 3 mons at {ceph01=
10.3.1.11:6789/0,ceph02=10.3.1.12:6789/0,ceph03=10.3.1.13:6789/0}
election epoch 8, quorum 0,1,2 ceph01,ceph02,ceph03
 mdsmap e29: 1/1/1 up {0=ceph04=up:active}, 1 up:standby
 osdmap e26: 4 osds: 4 up, 4 in
  pgmap v94931: 320 pgs, 3 pools, 90235 MB data, 241 kobjects
289 GB used, 1709 GB / 1999 GB avail
 320 active+clean
  client io 1023 kB/s rd, 1210 kB/s wr, 72 op/s

2015-09-13 10:23 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Can you give us package version of ceph-fuse?
>
> > Multi ceph-fuse crash just now today.
>
> Did you just mount filesystem or was there any
> activity on filesystem?
>
>   e.g: writing / reading data
>
> Can you give us output of on cluster side:
>
>   ceph -s
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Sunday, September 13, 2015 10:51:35 AM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> sorry Shinobu,
> I don't understand what's the means what you pasted.
> Multi ceph-fuse crash just now today.
> The ceph-fuse completely unusable for me now.
> Maybe i must change the kernal mount with it.
>
> 2015-09-12 20:08 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>
> > In _usr_bin_ceph-fuse.0.crash.client2.tar
> >
> > What I'm seeing now is:
> >
> >   3 Date: Sat Sep 12 06:37:47 2015
> >  ...
> >   6 ExecutableTimestamp: 1440614242
> >  ...
> >   7 ProcCmdline: ceph-fuse -k /etc/ceph.new/ceph.client.admin.keyring -m
> > 10.3.1.11,10.3.1.12,10.3.1.13 /grdata
> >  ...
> >  30  7f32de7fe000-7f32deffe000 rw-p  00:00 0
> > [stack:17270]
> >  ...
> > 250  7f341021d000-7f3410295000 r-xp  fd:01 267219
> >/usr/lib/x86_64-linux-gnu/nss/libfreebl3.so
> >  ...
> > 255  7f341049b000-7f341054f000 r-xp  fd:01 266443
> >/usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6
> >  ...
> > 260  7f3410754000-7f3410794000 r-xp  fd:01 267222
> >/usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so
> >  ...
> > 266  7f3411197000-7f341119a000 r-xp  fd:01 264953
> >/usr/lib/x86_64-linux-gnu/libplds4.so
> >  ...
> > 271  7f341139f000-7f341159e000 ---p 4000 fd:01 264955
> >/usr/lib/x86_64-linux-gnu/libplc4.so
> >  ...
> > 274  7f34115a-7f34115c5000 r-xp  fd:01 267214
> >/usr/lib/x86_64-linux-gnu/libnssutil3.so
> >  ...
> > 278  7f34117cb000-7f34117ce000 r-xp  fd:01 1189512
> > /lib/x86_64-linux-gnu/libdl-2.19.so
> >  ...
> > 287  7f3411d94000-7f3411daa000 r-xp  fd:01 1179825
> > /lib/x86_64-linux-gnu/libgcc_s.so.1
> >  ...
> > 294  7f34122b-7f3412396000 r-xp  fd:01 266069
> >/usr/lib/x86_64-linux-gnu/libstdc++.so.6.0.19
> >  ...
> > 458  State: D (disk sleep)
> >  ...
> > 359  VmPeak: 5250648 kB
> > 360  VmSize: 4955592 kB
> >  ...
> >
> > What were you trying to do?
> >
> > Shinobu
> >
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-12 Thread
​
 _usr_bin_ceph-fuse.0.crash.client1.tar.gz
<https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web>
​​
 _usr_bin_ceph-fuse.0.crash.client2.tar.gz
<https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web>
​Hi,Shinobu
I check the /var/log/dmesg carefully again, but not find the useful message
about ceph-fuse crash.
So i attach two _usr_bin_ceph-fuse.0.crash files on two clients to you.
Please let me know if you want more about other log. Thank you very much!

2015-09-12 13:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Ah, you are using ubuntu, sorry for that.
> How about:
>
>   /var/log/dmesg
>
> I believe you can attach file not paste.
> Pasting a bunch of logs would not be good for me -;
>
> And when did you notice that cephfs was hung?
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Saturday, September 12, 2015 1:50:05 PM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> hi,Shinobu
> There is no /var/log/messages on my system but i saw the /var/log/syslog
> and no useful messages be found.
> I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the "fuse"
> on the system.
> Below is the message in it :
> ProcStatus:
>  Name:  ceph-fuse
>  State: D (disk sleep)
>  Tgid:  2903
>  Ngid:  0
>  Pid:   2903
>  PPid:  1
>  TracerPid: 0
>  Uid:   0   0   0   0
>  Gid:   0   0   0   0
>  FDSize:64
>  Groups:0
>  VmPeak: 7428552 kB
>  VmSize: 6838728 kB
>  VmLck:0 kB
>  VmPin:0 kB
>  VmHWM:  1175864 kB
>  VmRSS:   343116 kB
>  VmData: 6786232 kB
>  VmStk:  136 kB
>  VmExe: 5628 kB
>  VmLib: 7456 kB
>  VmPTE: 3404 kB
>  VmSwap:   0 kB
>  Threads:   37
>  SigQ:  1/64103
>  SigPnd:
>  ShdPnd:
>  SigBlk:1000
>  SigIgn:1000
>  SigCgt:0001c18040eb
>  CapInh:
>  CapPrm:003f
>  CapEff:003f
>  CapBnd:003f
>  Seccomp:   0
>  Cpus_allowed:  
>  Cpus_allowed_list: 0-15
>  Mems_allowed:  ,0001
>  Mems_allowed_list: 0
>  voluntary_ctxt_switches:   25
>  nonvoluntary_ctxt_switches:2
> Signal: 11
> Uname: Linux 3.19.0-28-generic x86_64
> UserGroups:
> CoreDump: base64
>
> Is this useful infomations?
>
>
> 2015-09-12 12:33 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
>
> > There should be some complains in /var/log/messages.
> > Can you attach?
> >
> > Shinobu
> >
> > - Original Message -
> > From: "谷枫" <feiche...@gmail.com>
> > To: "ceph-users" <ceph-users@lists.ceph.com>
> > Sent: Saturday, September 12, 2015 1:30:49 PM
> > Subject: [ceph-users] ceph-fuse auto down
> >
> > Hi,all
> > My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu
> > 14.04 the kernal version is 3.19.0.
> >
> > I mount the cephfs with ceph-fuse on 9 clients,but some of them
> (ceph-fuse
> > process) auto down sometimes and i can't find the reason seems like there
> > is no other logs can be found except this file
> > /var/log/ceph/ceph-client.admin.log that without useful messages for me.
> >
> > When the ceph-fuse down . The mount driver is gone.
> > How can i find the reason of this problem. Can some guys give me good
> > ideas?
> >
> > Regards
> >
> >
> > ___
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-12 Thread
Sorry about that.
I re-attach the crash log files and the mds logs.
The mds log shows that the client session timeout then mds close the socket
right?
I think this happend behind the ceph-fuse crash. So the root cause is the
ceph-fuse crash .
​
 _usr_bin_ceph-fuse.0.crash.client1.tar.gz
<https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web>
​​
 _usr_bin_ceph-fuse.0.crash.client2.tar.gz
<https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web>
​

2015-09-12 19:10 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> Thank you for log archives.
>
> I went to dentist -;
> Please do not forget CCing ceph-users from the next because there is a
> bunch of really **awesome** guys;
>
> Can you re-attach log files again so that they see?
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "Shinobu Kinjo" <ski...@redhat.com>
> Sent: Saturday, September 12, 2015 5:32:31 PM
> Subject: Re: [ceph-users] ceph-fuse auto down
>
> The mds log shows that the client session timeout then mds close the socket
> right?
> I think this happend behind the ceph-fuse crash. So the root cause is the
> ceph-fuse crash .
>
> 2015-09-12 15:09 GMT+08:00 谷枫 <feiche...@gmail.com>:
>
> > Hi,Shinobu
> > I notice some useful message in the mds-log.
> >
> > 2015-09-12 14:29 GMT+08:00 谷枫 <feiche...@gmail.com>:
> >
> >> ​
> >>  _usr_bin_ceph-fuse.0.crash.client1.tar.gz
> >> <
> https://drive.google.com/file/d/0Bw059OTYnFqAbDA2R0RlS3ZHNlU/view?usp=drive_web
> >
> >> ​​
> >>  _usr_bin_ceph-fuse.0.crash.client2.tar.gz
> >> <
> https://drive.google.com/file/d/0Bw059OTYnFqAMXVCWTV6UXVnRjg/view?usp=drive_web
> >
> >> ​Hi,Shinobu
> >> I check the /var/log/dmesg carefully again, but not find the useful
> >> message about ceph-fuse crash.
> >> So i attach two _usr_bin_ceph-fuse.0.crash files on two clients to you.
> >> Please let me know if you want more about other log. Thank you very
> much!
> >>
> >> 2015-09-12 13:01 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:
> >>
> >>> Ah, you are using ubuntu, sorry for that.
> >>> How about:
> >>>
> >>>   /var/log/dmesg
> >>>
> >>> I believe you can attach file not paste.
> >>> Pasting a bunch of logs would not be good for me -;
> >>>
> >>> And when did you notice that cephfs was hung?
> >>>
> >>> Shinobu
> >>>
> >>> - Original Message -
> >>> From: "谷枫" <feiche...@gmail.com>
> >>> To: "Shinobu Kinjo" <ski...@redhat.com>
> >>> Cc: "ceph-users" <ceph-users@lists.ceph.com>
> >>> Sent: Saturday, September 12, 2015 1:50:05 PM
> >>> Subject: Re: [ceph-users] ceph-fuse auto down
> >>>
> >>> hi,Shinobu
> >>> There is no /var/log/messages on my system but i saw the
> /var/log/syslog
> >>> and no useful messages be found.
> >>> I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the
> "fuse"
> >>> on the system.
> >>> Below is the message in it :
> >>> ProcStatus:
> >>>  Name:  ceph-fuse
> >>>  State: D (disk sleep)
> >>>  Tgid:  2903
> >>>  Ngid:  0
> >>>  Pid:   2903
> >>>  PPid:  1
> >>>  TracerPid: 0
> >>>  Uid:   0   0   0   0
> >>>  Gid:   0   0   0   0
> >>>  FDSize:64
> >>>  Groups:0
> >>>  VmPeak: 7428552 kB
> >>>  VmSize: 6838728 kB
> >>>  VmLck:0 kB
> >>>  VmPin:0 kB
> >>>  VmHWM:  1175864 kB
> >>>  VmRSS:   343116 kB
> >>>  VmData: 6786232 kB
> >>>  VmStk:  136 kB
> >>>  VmExe: 5628 kB
> >>>  VmLib: 7456 kB
> >>>  VmPTE: 3404 kB
> >>>  VmSwap:   0 kB
> >>>  Threads:   37
> >>>  SigQ:  1/64103
> >>>  SigPnd:
> >>>  ShdPnd:
> >>>  SigBlk:1000
> >>>  SigIgn:1000
> >>>  SigCgt:0001c18040eb
> >>>  CapInh:
> >>>  CapPrm:003f
> >>>  CapEff:003f
> >>>

[ceph-users] ceph-fuse auto down

2015-09-11 Thread
Hi,all
My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu
14.04 the kernal version is 3.19.0.

I mount the cephfs with ceph-fuse on 9 clients,but some of them (ceph-fuse
process) auto down sometimes and i can't find the reason seems like there
is no other logs can be found except this file
/var/log/ceph/ceph-client.admin.log that without useful messages for me.

When the ceph-fuse down . The mount driver is gone.
How can i find the reason of this problem. Can some guys  give me good
ideas?

Regards
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] ceph-fuse auto down

2015-09-11 Thread
hi,Shinobu
There is no /var/log/messages on my system but i saw the /var/log/syslog
and no useful messages be found.
I discover the /var/crash/_usr_bin_ceph-fuse.0.crash with grep the "fuse"
on the system.
Below is the message in it :
ProcStatus:
 Name:  ceph-fuse
 State: D (disk sleep)
 Tgid:  2903
 Ngid:  0
 Pid:   2903
 PPid:  1
 TracerPid: 0
 Uid:   0   0   0   0
 Gid:   0   0   0   0
 FDSize:64
 Groups:0
 VmPeak: 7428552 kB
 VmSize: 6838728 kB
 VmLck:0 kB
 VmPin:0 kB
 VmHWM:  1175864 kB
 VmRSS:   343116 kB
 VmData: 6786232 kB
 VmStk:  136 kB
 VmExe: 5628 kB
 VmLib: 7456 kB
 VmPTE: 3404 kB
 VmSwap:   0 kB
 Threads:   37
 SigQ:  1/64103
 SigPnd:
 ShdPnd:
 SigBlk:1000
 SigIgn:1000
 SigCgt:0001c18040eb
 CapInh:
 CapPrm:003f
 CapEff:003f
 CapBnd:003f
 Seccomp:   0
 Cpus_allowed:  
 Cpus_allowed_list: 0-15
 Mems_allowed:  ,0001
 Mems_allowed_list: 0
 voluntary_ctxt_switches:   25
 nonvoluntary_ctxt_switches:2
Signal: 11
Uname: Linux 3.19.0-28-generic x86_64
UserGroups:
CoreDump: base64

Is this useful infomations?


2015-09-12 12:33 GMT+08:00 Shinobu Kinjo <ski...@redhat.com>:

> There should be some complains in /var/log/messages.
> Can you attach?
>
> Shinobu
>
> - Original Message -
> From: "谷枫" <feiche...@gmail.com>
> To: "ceph-users" <ceph-users@lists.ceph.com>
> Sent: Saturday, September 12, 2015 1:30:49 PM
> Subject: [ceph-users] ceph-fuse auto down
>
> Hi,all
> My cephfs cluster deploy on three nodes with Ceph Hammer 0.94.3 on Ubuntu
> 14.04 the kernal version is 3.19.0.
>
> I mount the cephfs with ceph-fuse on 9 clients,but some of them (ceph-fuse
> process) auto down sometimes and i can't find the reason seems like there
> is no other logs can be found except this file
> /var/log/ceph/ceph-client.admin.log that without useful messages for me.
>
> When the ceph-fuse down . The mount driver is gone.
> How can i find the reason of this problem. Can some guys give me good
> ideas?
>
> Regards
>
>
> ___
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] mds0: Client failing to respond to cache pressure

2015-07-14 Thread
I change the mds_cache_size to 50 from 10 get rid of the
WARN temporary.
Now dumping the mds daemon shows like this:
inode_max: 50,
inodes: 124213,
But i have no idea if the indoes rises more than 50 , change the
mds_cache_size again?
Thanks.

2015-07-15 13:34 GMT+08:00 谷枫 feiche...@gmail.com:

 I change the mds_cache_size to 50 from 10 get rid of the
 WARN temporary.
 Now dumping the mds daemon shows like this:
 inode_max: 50,
 inodes: 124213,
 But i have no idea if the indoes rises more than 50 , change the
 mds_cache_size again?
 Thanks.

 2015-07-15 11:06 GMT+08:00 Eric Eastman eric.east...@keepertech.com:

 Hi John,

 I cut the test down to a single client running only Ganesha NFS
 without any ceph drivers loaded on the Ceph FS client.  After deleting
 all the files in the Ceph file system, rebooting all the nodes, I
 restarted the create 5 million file test using 2 NFS clients to the
 one Ceph file system node running Ganesha NFS. After a couple hours I
 am seeing the  client ede-c2-gw01 failing to respond to cache pressure
 error:

 $ ceph -s
 cluster 6d8aae1e-1125-11e5-a708-001b78e265be
  health HEALTH_WARN
 mds0: Client ede-c2-gw01 failing to respond to cache pressure
  monmap e1: 3 mons at
 {ede-c2-mon01=
 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0
 }
 election epoch 22, quorum 0,1,2
 ede-c2-mon01,ede-c2-mon02,ede-c2-mon03
  mdsmap e1860: 1/1/1 up {0=ede-c2-mds02=up:active}, 2 up:standby
  osdmap e323: 8 osds: 8 up, 8 in
   pgmap v302142: 832 pgs, 4 pools, 162 GB data, 4312 kobjects
 182 GB used, 78459 MB / 263 GB avail
  832 active+clean

 Dumping the mds daemon shows inodes  inodes_max:

 # ceph daemon mds.ede-c2-mds02 perf dump mds
 {
 mds: {
 request: 21862302,
 reply: 21862302,
 reply_latency: {
 avgcount: 21862302,
 sum: 16728.480772060
 },
 forward: 0,
 dir_fetch: 13,
 dir_commit: 50788,
 dir_split: 0,
 inode_max: 10,
 inodes: 100010,
 inodes_top: 0,
 inodes_bottom: 0,
 inodes_pin_tail: 100010,
 inodes_pinned: 100010,
 inodes_expired: 4308279,
 inodes_with_caps: 8,
 caps: 8,
 subtrees: 2,
 traverse: 30802465,
 traverse_hit: 26394836,
 traverse_forward: 0,
 traverse_discover: 0,
 traverse_dir_fetch: 0,
 traverse_remote_ino: 0,
 traverse_lock: 0,
 load_cent: 2186230200,
 q: 0,
 exported: 0,
 exported_inodes: 0,
 imported: 0,
 imported_inodes: 0
 }
 }

 Once this test finishes and I verify the files were all correctly
 written, I will retest using the SAMBA VFS interface, followed by the
 kernel test.

 Please let me know if there is more info you need and if you want me
 to open a ticket.

 Best regards
 Eric



 On Mon, Jul 13, 2015 at 9:40 AM, Eric Eastman
 eric.east...@keepertech.com wrote:
  Thanks John. I will back the test down to the simple case of 1 client
  without the kernel driver and only running NFS Ganesha, and work forward
  till I trip the problem and report my findings.
 
  Eric
 
  On Mon, Jul 13, 2015 at 2:18 AM, John Spray john.sp...@redhat.com
 wrote:
 
 
 
  On 13/07/2015 04:02, Eric Eastman wrote:
 
  Hi John,
 
  I am seeing this problem with Ceph v9.0.1 with the v4.1 kernel on all
  nodes.  This system is using 4 Ceph FS client systems. They all have
  the kernel driver version of CephFS loaded, but none are mounting the
  file system. All 4 clients are using the libcephfs VFS interface to
  Ganesha NFS (V2.2.0-2) and Samba (Version 4.3.0pre1-GIT-0791bb0) to
  share out the Ceph file system.
 
  # ceph -s
   cluster 6d8aae1e-1125-11e5-a708-001b78e265be
health HEALTH_WARN
   4 near full osd(s)
   mds0: Client ede-c2-gw01 failing to respond to cache
  pressure
   mds0: Client ede-c2-gw02:cephfs failing to respond to
 cache
  pressure
   mds0: Client ede-c2-gw03:cephfs failing to respond to
 cache
  pressure
monmap e1: 3 mons at
 
  {ede-c2-mon01=
 10.15.2.121:6789/0,ede-c2-mon02=10.15.2.122:6789/0,ede-c2-mon03=10.15.2.123:6789/0
 }
   election epoch 8, quorum 0,1,2
  ede-c2-mon01,ede-c2-mon02,ede-c2-mon03
mdsmap e912: 1/1/1 up {0=ede-c2-mds03=up:active}, 2 up:standby
osdmap e272: 8 osds: 8 up, 8 in
 pgmap v225264: 832 pgs, 4 pools, 188 GB data, 5173 kobjects
   212 GB used, 48715 MB / 263 GB avail
832 active+clean
 client io 1379 kB/s rd, 20653 B/s wr, 98 op/s
 
 
  It would help if we knew whether it's the kernel clients or the
 userspace
  clients that are generating the warnings here.  You've probably
 already done
  this, but I'd get rid of any unused kernel

Re: [ceph-users] mds0: Client failing to respond to cache pressure

2015-07-10 Thread
Thank you John,
All my server is ubuntu14.04 with 3.16 kernel.
Not all of clients appear this problem, the cluster seems functioning well
now.
As you say,i will change the mds_cache_size to 50 from 10 to take a
test, thanks again!

2015-07-10 17:00 GMT+08:00 John Spray john.sp...@redhat.com:


 This is usually caused by use of older kernel clients.  I don't remember
 exactly what version it was fixed in, but iirc we've seen the problem with
 3.14 and seen it go away with 3.18.

 If your system is otherwise functioning well, this is not a critical error
 -- it just means that the MDS might not be able to fully control its memory
 usage (i.e. it can exceed mds_cache_size).

 John

 On 10/07/2015 05:25, 谷枫 wrote:

 hi,
 I use CephFS in production environnement with 7osd,1mds,3mon now.
 So far so good,but i have a problem with it today.
 The ceph status report this:
 cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228
   health HEALTH_WARN
  mds0: Client 34271 failing to respond to cache pressure
  mds0: Client 74175 failing to respond to cache pressure
  mds0: Client 74181 failing to respond to cache pressure
  mds0: Client 34247 failing to respond to cache pressure
  mds0: Client 64162 failing to respond to cache pressure
  mds0: Client 136744 failing to respond to cache pressure
   monmap e2: 3 mons at {node01=
 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0  
 http://10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0}

  election epoch 186, quorum 0,1,2 node01,node02,node03
   mdsmap e46: 1/1/1 up {0=tree01=up:active}
   osdmap e717: 7 osds: 7 up, 7 in
pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects
  138 GB used, 1364 GB / 1502 GB avail
   264 active+clean
client io 1018 B/s rd, 1273 B/s wr, 0 op/s

 I add two osds with the version 0.94.2 and other old osds is 0.94.1
 yesterday.
 So the question is does this matter?
 What's the warning mean ,and how can i solve this problem.Thanks!
 This is my cluster config message with mds:
  name: mds.tree01,
  debug_mds: 1\/5,
  debug_mds_balancer: 1\/5,
  debug_mds_locker: 1\/5,
  debug_mds_log: 1\/5,
  debug_mds_log_expire: 1\/5,
  debug_mds_migrator: 1\/5,
  admin_socket: \/var\/run\/ceph\/ceph-mds.tree01.asok,
  log_file: \/var\/log\/ceph\/ceph-mds.tree01.log,
  keyring: \/var\/lib\/ceph\/mds\/ceph-tree01\/keyring,
  mon_max_mdsmap_epochs: 500,
  mon_mds_force_trim_to: 0,
  mon_debug_dump_location: \/var\/log\/ceph\/ceph-mds.tree01.tdump,
  client_use_random_mds: false,
  mds_data: \/var\/lib\/ceph\/mds\/ceph-tree01,
  mds_max_file_size: 1099511627776,
  mds_cache_size: 10,
  mds_cache_mid: 0.7,
  mds_max_file_recover: 32,
  mds_mem_max: 1048576,
  mds_dir_max_commit_size: 10,
  mds_decay_halflife: 5,
  mds_beacon_interval: 4,
  mds_beacon_grace: 15,
  mds_enforce_unique_name: true,
  mds_blacklist_interval: 1440,
  mds_session_timeout: 120,
  mds_revoke_cap_timeout: 60,
  mds_recall_state_timeout: 60,
  mds_freeze_tree_timeout: 30,
  mds_session_autoclose: 600,
  mds_health_summarize_threshold: 10,
  mds_reconnect_timeout: 45,
  mds_tick_interval: 5,
  mds_dirstat_min_interval: 1,
  mds_scatter_nudge_interval: 5,
  mds_client_prealloc_inos: 1000,
  mds_early_reply: true,
  mds_default_dir_hash: 2,
  mds_log: true,
  mds_log_skip_corrupt_events: false,
  mds_log_max_events: -1,
  mds_log_events_per_segment: 1024,
  mds_log_segment_size: 0,
  mds_log_max_segments: 30,
  mds_log_max_expiring: 20,
  mds_bal_sample_interval: 3,
  mds_bal_replicate_threshold: 8000,
  mds_bal_unreplicate_threshold: 0,
  mds_bal_frag: false,
  mds_bal_split_size: 1,
  mds_bal_split_rd: 25000,
  mds_bal_split_wr: 1,
  mds_bal_split_bits: 3,
  mds_bal_merge_size: 50,
  mds_bal_merge_rd: 1000,
  mds_bal_merge_wr: 1000,
  mds_bal_interval: 10,
  mds_bal_fragment_interval: 5,
  mds_bal_idle_threshold: 0,
  mds_bal_max: -1,
  mds_bal_max_until: -1,
  mds_bal_mode: 0,
  mds_bal_min_rebalance: 0.1,
  mds_bal_min_start: 0.2,
  mds_bal_need_min: 0.8,
  mds_bal_need_max: 1.2,
  mds_bal_midchunk: 0.3,
  mds_bal_minchunk: 0.001,
  mds_bal_target_removal_min: 5,
  mds_bal_target_removal_max: 10,
  mds_replay_interval: 1,
  mds_shutdown_check: 0,
  mds_thrash_exports: 0,
  mds_thrash_fragments: 0,
  mds_dump_cache_on_map: false,
  mds_dump_cache_after_rejoin: false,
  mds_verify_scatter: false,
  mds_debug_scatterstat: false,
  mds_debug_frag: false,
  mds_debug_auth_pins: false,
  mds_debug_subtrees: false,
  mds_kill_mdstable_at: 0,
  mds_kill_export_at: 0

[ceph-users] mds0: Client failing to respond to cache pressure

2015-07-09 Thread
hi,

I use CephFS in production environnement with 7osd,1mds,3mon now.

So far so good,but i have a problem with it today.

The ceph status report this:

cluster ad3421a43-9fd4-4b7a-92ba-09asde3b1a228
 health HEALTH_WARN
mds0: Client 34271 failing to respond to cache pressure
mds0: Client 74175 failing to respond to cache pressure
mds0: Client 74181 failing to respond to cache pressure
mds0: Client 34247 failing to respond to cache pressure
mds0: Client 64162 failing to respond to cache pressure
mds0: Client 136744 failing to respond to cache pressure
 monmap e2: 3 mons at
{node01=10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0}
election epoch 186, quorum 0,1,2 node01,node02,node03
 mdsmap e46: 1/1/1 up {0=tree01=up:active}
 osdmap e717: 7 osds: 7 up, 7 in
  pgmap v995836: 264 pgs, 3 pools, 51544 MB data, 118 kobjects
138 GB used, 1364 GB / 1502 GB avail
 264 active+clean
  client io 1018 B/s rd, 1273 B/s wr, 0 op/s


I add two osds with the version 0.94.2 and other old osds is 0.94.1 yesterday.

So the question is does this matter?

What's the warning mean ,and how can i solve this problem.Thanks!

This is my cluster config message with mds:

name: mds.tree01,
debug_mds: 1\/5,
debug_mds_balancer: 1\/5,
debug_mds_locker: 1\/5,
debug_mds_log: 1\/5,
debug_mds_log_expire: 1\/5,
debug_mds_migrator: 1\/5,
admin_socket: \/var\/run\/ceph\/ceph-mds.tree01.asok,
log_file: \/var\/log\/ceph\/ceph-mds.tree01.log,
keyring: \/var\/lib\/ceph\/mds\/ceph-tree01\/keyring,
mon_max_mdsmap_epochs: 500,
mon_mds_force_trim_to: 0,
mon_debug_dump_location: \/var\/log\/ceph\/ceph-mds.tree01.tdump,
client_use_random_mds: false,
mds_data: \/var\/lib\/ceph\/mds\/ceph-tree01,
mds_max_file_size: 1099511627776,
mds_cache_size: 10,
mds_cache_mid: 0.7,
mds_max_file_recover: 32,
mds_mem_max: 1048576,
mds_dir_max_commit_size: 10,
mds_decay_halflife: 5,
mds_beacon_interval: 4,
mds_beacon_grace: 15,
mds_enforce_unique_name: true,
mds_blacklist_interval: 1440,
mds_session_timeout: 120,
mds_revoke_cap_timeout: 60,
mds_recall_state_timeout: 60,
mds_freeze_tree_timeout: 30,
mds_session_autoclose: 600,
mds_health_summarize_threshold: 10,
mds_reconnect_timeout: 45,
mds_tick_interval: 5,
mds_dirstat_min_interval: 1,
mds_scatter_nudge_interval: 5,
mds_client_prealloc_inos: 1000,
mds_early_reply: true,
mds_default_dir_hash: 2,
mds_log: true,
mds_log_skip_corrupt_events: false,
mds_log_max_events: -1,
mds_log_events_per_segment: 1024,
mds_log_segment_size: 0,
mds_log_max_segments: 30,
mds_log_max_expiring: 20,
mds_bal_sample_interval: 3,
mds_bal_replicate_threshold: 8000,
mds_bal_unreplicate_threshold: 0,
mds_bal_frag: false,
mds_bal_split_size: 1,
mds_bal_split_rd: 25000,
mds_bal_split_wr: 1,
mds_bal_split_bits: 3,
mds_bal_merge_size: 50,
mds_bal_merge_rd: 1000,
mds_bal_merge_wr: 1000,
mds_bal_interval: 10,
mds_bal_fragment_interval: 5,
mds_bal_idle_threshold: 0,
mds_bal_max: -1,
mds_bal_max_until: -1,
mds_bal_mode: 0,
mds_bal_min_rebalance: 0.1,
mds_bal_min_start: 0.2,
mds_bal_need_min: 0.8,
mds_bal_need_max: 1.2,
mds_bal_midchunk: 0.3,
mds_bal_minchunk: 0.001,
mds_bal_target_removal_min: 5,
mds_bal_target_removal_max: 10,
mds_replay_interval: 1,
mds_shutdown_check: 0,
mds_thrash_exports: 0,
mds_thrash_fragments: 0,
mds_dump_cache_on_map: false,
mds_dump_cache_after_rejoin: false,
mds_verify_scatter: false,
mds_debug_scatterstat: false,
mds_debug_frag: false,
mds_debug_auth_pins: false,
mds_debug_subtrees: false,
mds_kill_mdstable_at: 0,
mds_kill_export_at: 0,
mds_kill_import_at: 0,
mds_kill_link_at: 0,
mds_kill_rename_at: 0,
mds_kill_openc_at: 0,
mds_kill_journal_at: 0,
mds_kill_journal_expire_at: 0,
mds_kill_journal_replay_at: 0,
mds_journal_format: 1,
mds_kill_create_at: 0,
mds_inject_traceless_reply_probability: 0,
mds_wipe_sessions: false,
mds_wipe_ino_prealloc: false,
mds_skip_ino: 0,
max_mds: 1,
mds_standby_for_name: ,
mds_standby_for_rank: -1,
mds_standby_replay: false,
mds_enable_op_tracker: true,
mds_op_history_size: 20,
mds_op_history_duration: 600,
mds_op_complaint_time: 30,
mds_op_log_threshold: 5,
mds_snap_min_uid: 0,
mds_snap_max_uid: 65536,
mds_verify_backtrace: 1,
mds_action_on_write_error: 1,
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] MDS closing stale session

2015-06-05 Thread
Hi everyone,
I hava a five nodes ceph cluster with cephfs.Mount the ceph partition with
ceph-fuse tools.
I met a serious problem has no omens.
One of the node the ceph-fuse procs down and the ceph partition that
mounted with the ceph-fuse tools change to unavailable.
ls the ceph partition, it's like this:
d?   ? ??   ?? ceph-data/
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS closing stale session

2015-06-05 Thread
sorry i send this mail careless, continue
The mds error is :
2015-06-05 09:59:25.012130 7fa1ed118700  0 -- 10.3.1.5:6800/1365 
10.3.1.4:0/18748 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0
c=0x4f935a0).fault with nothing to send, going to standby
2015-06-05 10:03:40.767822 7fa1f0a27700  0 log_channel(cluster) log [INF] :
closing stale session client.24153 10.3.1.4:0/18748 after 300.071624

The apport error is :

ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: called for pid 6331,
signal 11, core limit 0
ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: executable:
/usr/bin/ceph-fuse (command line ceph-fuse -k
/etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata)
ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: is_closing_session():
no DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 30184) Fri Jun  5 12:13:33 2015: wrote report
/var/crash/_usr_bin_ceph-fuse.0.crash

But the ceph-s is OK
cluster add8fa43-9f84-4b5d-df32-095e3421a228
 health HEALTH_OK
 monmap e2: 3 mons at {node01=
10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0}
election epoch 44, quorum 0,1,2 node01,node02,node03
 mdsmap e37: 1/1/1 up {0=osd01=up:active}
 osdmap e526: 5 osds: 5 up, 5 in
  pgmap v392315: 264 pgs, 3 pools, 26953 MB data, 106 kobjects
81519 MB used, 1036 GB / 1115 GB avail
 264 active+clean
  client io 10171 B/s wr, 1 op/s

When i remount the ceph-partition, it's get nomal.

I want to know is this a ceph bug ? or the ceph-fuse tool's bug?
Should i change the mount type with the mount -t ceph ?

2015-06-05 22:34 GMT+08:00 谷枫 feiche...@gmail.com:

 Hi everyone,
 I hava a five nodes ceph cluster with cephfs.Mount the ceph partition with
 ceph-fuse tools.
 I met a serious problem has no omens.
 One of the node the ceph-fuse procs down and the ceph partition that
 mounted with the ceph-fuse tools change to unavailable.
 ls the ceph partition, it's like this:
 d?   ? ??   ?? ceph-data/

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS closing stale session

2015-06-05 Thread
This is the /var/log/ceph/ceph-client.admin.log, but i found the time is
late to the fault.

2015-06-05 10:29:07.180531 7f3f601dd7c0  0 ceph version 0.94.1
(e4bfad3a3c51054df7e234234ac8d0bf9be972ff), process ceph-fuse, pid 14002
2015-06-05 10:29:07.186763 7f3f601dd7c0 -1 init, newargv = 0x2c846e0
newargc=11
2015-06-05 10:29:07.186832 7f3f601dd7c0 -1 fuse_parse_cmdline failed.
2015-06-05 10:29:59.349762 7fed376c37c0  0 ceph version 0.94.1
(e4bfad3a3c51054df7e234234ac8d0bf9be972ff), process ceph-fuse, pid 14283
2015-06-05 10:29:59.352227 7fed376c37c0 -1 init, newargv = 0x2ca06b0
newargc=11
2015-06-05 12:42:40.005628 7fed277fe700  0 monclient: hunting for new mon
2015-06-05 12:42:40.005772 7fed277fe700  0 client.44334 ms_handle_reset on
10.3.1.2:6789/0
2015-06-05 14:31:46.431353 7fed2c478700  0 -- 10.3.1.4:0/14285 
10.3.1.5:6800/1365 pipe(0x2cab000 sd=9 :56628 s=2 pgs=186 cs=1 l=0
c=0x2c9b790).fault, initiating reconnect
2015-06-05 14:31:46.433065 7fed2c579700  0 -- 10.3.1.4:0/14285 
10.3.1.5:6800/1365 pipe(0x2cab000 sd=9 :57017 s=1 pgs=186 cs=2 l=0
c=0x2c9b790).fault



2015-06-05 22:51 GMT+08:00 John Spray john.sp...@redhat.com:



 On 05/06/2015 15:41, 谷枫 wrote:

 sorry i send this mail careless, continue
 The mds error is :
 2015-06-05 09:59:25.012130 7fa1ed118700  0 -- 10.3.1.5:6800/1365 
 http://10.3.1.5:6800/1365  10.3.1.4:0/18748 http://10.3.1.4:0/18748
 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0 c=0x4f935a0).fault with
 nothing to send, going to standby
 2015-06-05 10:03:40.767822 7fa1f0a27700  0 log_channel(cluster) log [INF]
 : closing stale session client.24153 10.3.1.4:0/18748 
 http://10.3.1.4:0/18748 after 300.071624

 The apport error is :

 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: called for pid 6331,
 signal 11, core limit 0
 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: executable:
 /usr/bin/ceph-fuse (command line ceph-fuse -k
 /etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata)
 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: is_closing_session():
 no DBUS_SESSION_BUS_ADDRESS in environment
 ERROR: apport (pid 30184) Fri Jun  5 12:13:33 2015: wrote report
 /var/crash/_usr_bin_ceph-fuse.0.crash


 Signal 11 is a segmentation fault, so it seems likely that ceph-fuse has
 hit a bad bug.

 Look in /var/log/ceph on the client machine for a client log with a
 backtrace in it.

 John

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] MDS closing stale session

2015-06-05 Thread
Sorry to send a warong log with the apport. because i met the same problem
twice  today.
This is the right time apport log .
ERROR: apport (pid 7601) Fri Jun  5 09:58:45 2015: called for pid 18748,
signal 6, core limit 0
ERROR: apport (pid 7601) Fri Jun  5 09:58:45 2015: executable:
/usr/bin/ceph-fuse (command line ceph-fuse -k
/etc/ceph/ceph.client.admin.keyring -m rain01,rain02,rain03 /grdata)
ERROR: apport (pid 7601) Fri Jun  5 09:58:45 2015: is_closing_session(): no
DBUS_SESSION_BUS_ADDRESS in environment
ERROR: apport (pid 7601) Fri Jun  5 09:59:25 2015: wrote report
/var/crash/_usr_bin_ceph-fuse.0.crash
ERROR: apport (pid 7733) Fri Jun  5 09:59:25 2015: pid 7578 crashed in a
container

At the time, the osd node has error log too.

2015-06-05 09:58:44.809822 7fac0de07700  0 -- 10.3.1.5:68**/ 
**.3.1.4:0/18748 pipe(0x10a8e000 sd=47 :6801 s=2 pgs=1253 cs=1 l=1
c=0x1112b860).reader bad tag 116
2015-06-05 09:58:44.812270 7fac0de07700  0 -- 10.3.1.5:68**/ 
**.3.1.4:0/18748 pipe(0x146e3000 sd=47 :6801 s=2 pgs=1359 cs=1 l=1
c=0x15310ec0).reader bad tag 32

2015-06-05 22:41 GMT+08:00 谷枫 feiche...@gmail.com:

 sorry i send this mail careless, continue
 The mds error is :
 2015-06-05 09:59:25.012130 7fa1ed118700  0 -- 10.3.1.5:6800/1365 
 10.3.1.4:0/18748 pipe(0x5f81000 sd=22 :6800 s=2 pgs=1252 cs=1 l=0
 c=0x4f935a0).fault with nothing to send, going to standby
 2015-06-05 10:03:40.767822 7fa1f0a27700  0 log_channel(cluster) log [INF]
 : closing stale session client.24153 10.3.1.4:0/18748 after 300.071624

 The apport error is :

 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: called for pid 6331,
 signal 11, core limit 0
 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: executable:
 /usr/bin/ceph-fuse (command line ceph-fuse -k
 /etc/ceph/ceph.client.admin.keyring -m node01,node02,node03 /grdata)
 ERROR: apport (pid 30184) Fri Jun  5 12:13:03 2015: is_closing_session():
 no DBUS_SESSION_BUS_ADDRESS in environment
 ERROR: apport (pid 30184) Fri Jun  5 12:13:33 2015: wrote report
 /var/crash/_usr_bin_ceph-fuse.0.crash

 But the ceph-s is OK
 cluster add8fa43-9f84-4b5d-df32-095e3421a228
  health HEALTH_OK
  monmap e2: 3 mons at {node01=
 10.3.1.2:6789/0,node02=10.3.1.3:6789/0,node03=10.3.1.4:6789/0}
 election epoch 44, quorum 0,1,2 node01,node02,node03
  mdsmap e37: 1/1/1 up {0=osd01=up:active}
  osdmap e526: 5 osds: 5 up, 5 in
   pgmap v392315: 264 pgs, 3 pools, 26953 MB data, 106 kobjects
 81519 MB used, 1036 GB / 1115 GB avail
  264 active+clean
   client io 10171 B/s wr, 1 op/s

 When i remount the ceph-partition, it's get nomal.

 I want to know is this a ceph bug ? or the ceph-fuse tool's bug?
 Should i change the mount type with the mount -t ceph ?

 2015-06-05 22:34 GMT+08:00 谷枫 feiche...@gmail.com:

 Hi everyone,
 I hava a five nodes ceph cluster with cephfs.Mount the ceph partition
 with ceph-fuse tools.
 I met a serious problem has no omens.
 One of the node the ceph-fuse procs down and the ceph partition that
 mounted with the ceph-fuse tools change to unavailable.
 ls the ceph partition, it's like this:
 d?   ? ??   ?? ceph-data/



___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com