Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-19 Thread Yan, Zheng
reading) > and I am not so sure anymore that this is memory related. > > For further debugging, I've updated >http://tracker.ceph.com/issues/16610 > with a summary of my finding plus some log files: > - The gdb.txt I get after running > $ gdb /path/to/ceph-fuse cor

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-18 Thread Goncalo Borges
nning ceph-fuse in debug mode with 'debug client 20' and 'debug objectcacher = 20' Cheers Goncalo From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewe

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
-fuse in debug mode with 'debug client 20' and 'debug objectcacher = 20' Cheers Goncalo From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewe

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
some log files: > > - The gdb.txt I get after running > > $ gdb /path/to/ceph-fuse core. > > (gdb) set pag off > > (gdb) set log on > > (gdb) thread apply all bt > > (gdb) thread apply all bt full > > as advised by Brad > > - The deb

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Yan, Zheng
b) thread apply all bt > (gdb) thread apply all bt full > as advised by Brad > - The debug.out (gzipped) I get after running ceph-fuse in debug mode with > 'debug client 20' and 'debug objectcacher = 20' > > Cheers > Goncalo >

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Brad Hubbard
-fuse core. > >(gdb) set pag off > >(gdb) set log on > > (gdb) thread apply all bt > >(gdb) thread apply all bt full > >as advised by Brad > > - The debug.out (gzipped) I get after running ceph-fuse in debug mode with > > 'debug client 20' and '

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-14 Thread Goncalo Borges
:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2) Oh, is this one of your custom-built packages? Are they using tcmalloc? That difference between VSZ and RSS looks like a glibc malloc problem. -Greg On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Bor

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-12 Thread Goncalo Borges
ers Goncalo From: Gregory Farnum [gfar...@redhat.com] Sent: 12 July 2016 03:07 To: Goncalo Borges Cc: John Spray; ceph-users Subject: Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2) Oh, is this one of your custom-built packages? Are they using tcmal

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Yan, Zheng
On Tue, Jul 12, 2016 at 1:07 AM, Gregory Farnum wrote: > Oh, is this one of your custom-built packages? Are they using > tcmalloc? That difference between VSZ and RSS looks like a glibc > malloc problem. > -Greg > ceph-fuse at http://download.ceph.com/rpm-jewel/el7/x86_64/ is

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Gregory Farnum
Oh, is this one of your custom-built packages? Are they using tcmalloc? That difference between VSZ and RSS looks like a glibc malloc problem. -Greg On Mon, Jul 11, 2016 at 12:04 AM, Goncalo Borges wrote: > Hi John... > > Thank you for replying. > > Here is the

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Patrick Donnelly
Hi Goncalo, On Fri, Jul 8, 2016 at 3:01 AM, Goncalo Borges wrote: > 5./ I have noticed that ceph-fuse (in 10.2.2) consumes about 1.5 GB of > virtual memory when there is no applications using the filesystem. > > 7152 root 20 0 1108m 12m 5496 S 0.0 0.0

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread John Spray
On Mon, Jul 11, 2016 at 8:04 AM, Goncalo Borges wrote: > Hi John... > > Thank you for replying. > > Here is the result of the tests you asked but I do not see nothing abnormal. Thanks for running through that. Yes, nothing in the output struck me as unreasonable

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges
On 07/11/2016 05:04 PM, Goncalo Borges wrote: Hi John... Thank you for replying. Here is the result of the tests you asked but I do not see nothing abnormal. Actually, your suggestions made me see that: 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-11 Thread Goncalo Borges
Hi John... Thank you for replying. Here is the result of the tests you asked but I do not see nothing abnormal. Actually, your suggestions made me see that: 1) ceph-fuse 9.2.0 is presenting the same behaviour but with less memory consumption, probably, less enought so that it doesn't brake

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread John Spray
On Fri, Jul 8, 2016 at 8:01 AM, Goncalo Borges wrote: > Hi Brad, Patrick, All... > > I think I've understood this second problem. In summary, it is memory > related. > > This is how I found the source of the problem: > > 1./ I copied and adapted the user application

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-08 Thread Goncalo Borges
Hi Brad, Patrick, All... I think I've understood this second problem. In summary, it is memory related. This is how I found the source of the problem: 1./ I copied and adapted the user application to run in another cluster of ours. The idea was for me to understand the application

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Brad Hubbard
Hi Goncalo, If possible it would be great if you could capture a core file for this with full debugging symbols (preferably glibc debuginfo as well). How you do that will depend on the ceph version and your OS but we can offfer help if required I'm sure. Once you have the core do the following.

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Patrick Donnelly
On Thu, Jul 7, 2016 at 2:01 AM, Goncalo Borges wrote: > Unfortunately, the other user application breaks ceph-fuse again (It is a > completely different application then in my previous test). > > We have tested it in 4 machines with 4 cores. The user is submitting 16

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-07 Thread Goncalo Borges
My previous email did not go through because of its size. Here goes a new attempt: Cheers Goncalo --- * --- Hi Patrick, Brad... Unfortunately, the other user application breaks ceph-fuse again (It is a completely different application then in my previous test). We have tested it in 4

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Brad Hubbard
On Thu, Jul 7, 2016 at 12:31 AM, Patrick Donnelly wrote: > > The locks were missing in 9.2.0. There were probably instances of the > segfault unreported/unresolved. Or even unseen :) Race conditions are funny things and extremely subtle changes in timing introduced by any

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Patrick Donnelly
Hi Goncalo, On Wed, Jul 6, 2016 at 2:18 AM, Goncalo Borges wrote: > Just to confirm that, after applying the patch and recompiling, we are no > longer seeing segfaults. > > I just tested with a user application which would kill ceph-fuse almost > instantaneously.

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-06 Thread Goncalo Borges
Hi All... Just to confirm that, after applying the patch and recompiling, we are no longer seeing segfaults. I just tested with a user application which would kill ceph-fuse almost instantaneously. Now it is running for quite some time, reading and updating the files that it should. I

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Will do Brad. From you answer it should be a safe thing to do. Will report later. Thanks for the help Cheers Goncalo On 07/05/2016 02:42 PM, Brad Hubbard wrote: On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly wrote: Hi Goncalo, I believe this segfault may be the

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Hi Brad, Shinobu, Patrick... Indeed if I run with 'debug client = 20' it seems I get a very similar log to what Patrick has in the patch. However it is difficult for me to really say if it is exactly the same thing. One thing I could try is simply to apply the fix in the source code and

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 1:34 PM, Patrick Donnelly wrote: > Hi Goncalo, > > I believe this segfault may be the one fixed here: > > https://github.com/ceph/ceph/pull/10027 Ah, nice one Patrick. Goncalo, the patch is fairly simple, just the addition of a lock on two lines to

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Patrick Donnelly
Hi Goncalo, I believe this segfault may be the one fixed here: https://github.com/ceph/ceph/pull/10027 (Sorry for brief top-post. Im on mobile.) On Jul 4, 2016 9:16 PM, "Goncalo Borges" wrote: > > Dear All... > > We have recently migrated all our ceph

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Brad Hubbard
On Tue, Jul 5, 2016 at 12:13 PM, Shinobu Kinjo wrote: > Can you reproduce with debug client = 20? In addition to this I would suggest making sure you have debug symbols in your build and capturing a core file. You can do that by setting "ulimit -c unlimited" in the

Re: [ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Shinobu Kinjo
Can you reproduce with debug client = 20? On Tue, Jul 5, 2016 at 10:16 AM, Goncalo Borges < goncalo.bor...@sydney.edu.au> wrote: > Dear All... > > We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2. > > We are currently using ceph-fuse to mount cephfs in a number of

[ceph-users] ceph-fuse segfaults ( jewel 10.2.2)

2016-07-04 Thread Goncalo Borges
Dear All... We have recently migrated all our ceph infrastructure from 9.2.0 to 10.2.2. We are currently using ceph-fuse to mount cephfs in a number of clients. ceph-fuse 10.2.2 client is segfaulting in some situations. One of the scenarios where ceph-fuse segfaults is when a user submits a