Re: [Gluster-devel] crypt xlator bug

2015-04-03 Thread Emmanuel Dreyfus
On Fri, Apr 03, 2015 at 11:26:40AM +0530, Pranith Kumar Karampuri wrote:
 I still don't understand why http://review.gluster.org/10109 is working.
 Does anyone know the reason? How are you guys re-creating the crash? I ran
 crypt.t but no crashes on my laptop. Could some one help me re-create this
 issue.

The problem is 100% reproductible on NetBSD. Give it a try on nbslave75
which is disabled in jenkins right now.

-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-03 Thread Pranith Kumar Karampuri


On 04/01/2015 03:27 PM, Emmanuel Dreyfus wrote:

Hi

crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.

After running a git bisect, I found the offending commit after which
this behavior appeared:
 8a2e2b88fc21dc7879f838d18cd0413dd88023b7
 mem-pool: invalidate memory on GF_FREE to aid debugging

This means the bug has always been there, but this debugging aid
caused it to be reliable.

With the help of an assertion, I can detect when inode-ia_type gets
a corrupted value. It gives me this backtrace where in frame 4,
inode = 0xb9611880 and inode-ia_type = 12475 (which is wrong).
inode value comes from FUSE state-loc-inode and we get it from
frame 20 which is in crypt.c:

#4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030,
 inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030,
 loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030,
 loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030,
 loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030,
 loc=0xb9608840, xdata=0x0) at meta.c:49
#9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688,
 cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0,
 inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0,
 postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0,
 cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0,
 inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0,
 postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
 this=0xbb1df030, op_ret=0, op_errno=0,
 inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0,
  postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
 this=0xbb1de030, op_ret=0, op_errno=0,
 inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0,
 postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
 this=0xbb1db030, op_ret=0, op_errno=0,
 dict=0xbb18a470, xdata=0x0) at crypt.c:3830

In frame 20:
 case GF_FOP_LOOKUP:
STACK_UNWIND_STRICT(lookup,
frame,
op_ret,
op_errno,
op_ret = 0 ? local-inode : NULL,
op_ret = 0 ? local-buf : NULL,
local-xdata,
op_ret = 0 local-postbuf : NULL);
  
Here is the problem, local-inode is not the 0xb9611880 value anymore,

which means local got corrupted:

(gdb) print local-inode
$2 = (inode_t *) 0x1db030de

I now suspect local has been freed, but I do not find where in crypt.c
this operation is done. There is a local = mem_get0(this-local_pool)
in crypt_alloc_local, but where is that structure freed? There is
no mem_put() call in crypt xlator.
I joined this thread after seeing raghavendra talur's patch which fixed 
the issue, which seemed extremely odd to me. Just checked this mail from 
you and
local-inode in crypt need not be same as state-loc-inode because, 
inode_link in fuse_resolve_entry_cbk will give address of already linked 
inode with same gfid if one exists. I see hardlink related commands in 
crypt.t so this could be part of looking up extra link may be? which is 
resolving to older inode that is already linked. It is still some memory 
problem, but may not be anything to do with crypt. Could you let me know 
the details of the setup where you saw this issue? I can take a look.


Pranith





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Pranith Kumar Karampuri


On 04/02/2015 12:27 AM, Raghavendra Talur wrote:



On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org 
mailto:jus...@gluster.org wrote:


On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org
mailto:m...@netbsd.org wrote:
 Hi

 crypt.t was recently broken in NetBSD regression. The glusterfs
returns
 a node with file type invalid to FUSE, and that breaks the test.

 After running a git bisect, I found the offending commit after which
 this behavior appeared:
8a2e2b88fc21dc7879f838d18cd0413dd88023b7
mem-pool: invalidate memory on GF_FREE to aid debugging

 This means the bug has always been there, but this debugging aid
 caused it to be reliable.

Sounds like that commit is a good win then. :)

Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
any ideas? :)


I found one issue that local is not allocated using GF_CALLOC and with 
a mem-type.

This is a patch which *might* fix it.

diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h 
b/xlators/encryption/crypt/src/crypt-mem-types.h

index 2eab921..c417b67 100644
--- a/xlators/encryption/crypt/src/crypt-mem-types.h
+++ b/xlators/encryption/crypt/src/crypt-mem-types.h
@@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
gf_crypt_mt_key,
gf_crypt_mt_iovec,
gf_crypt_mt_char,
+gf_crypt_mt_local,
gf_crypt_mt_end,
 };
diff --git a/xlators/encryption/crypt/src/crypt.c 
b/xlators/encryption/crypt/src/crypt.c

index ae8cdb2..63c0977 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t 
*frame, xlator_t *this,

 {
crypt_local_t *local = NULL;
-   local = mem_get0(this-local_pool);
+local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
local is using the memory from pool earlier(i.e. with mem_get0()). Which 
seems ok to me. Changing it this way will include memory allocation in 
fop I/O path which is why xlators generally use the mem-pool approach.


Pranith

if (!local) {
gf_log(this-name, GF_LOG_ERROR, out of memory);
return NULL;


Niels should be able to recognize if this is sufficient fix or not.

Thanks,
Raghavendra Talur

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift
http://twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
*Raghavendra Talur *



___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Niels de Vos
On Thu, Apr 02, 2015 at 01:58:39PM +0530, Raghavendra Bhat wrote:
 On Thursday 02 April 2015 01:00 PM, Pranith Kumar Karampuri wrote:
 
 On 04/02/2015 12:27 AM, Raghavendra Talur wrote:
 
 
 On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org
 mailto:jus...@gluster.org wrote:
 
 On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org
 mailto:m...@netbsd.org wrote:
  Hi
 
  crypt.t was recently broken in NetBSD regression. The glusterfs
 returns
  a node with file type invalid to FUSE, and that breaks the test.
 
  After running a git bisect, I found the offending commit after
 which
  this behavior appeared:
 8a2e2b88fc21dc7879f838d18cd0413dd88023b7
 mem-pool: invalidate memory on GF_FREE to aid debugging
 
  This means the bug has always been there, but this debugging aid
  caused it to be reliable.
 
 Sounds like that commit is a good win then. :)
 
 Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
 any ideas? :)
 
 
 I found one issue that local is not allocated using GF_CALLOC and with a
 mem-type.
 This is a patch which *might* fix it.
 
 diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h
 b/xlators/encryption/crypt/src/crypt-mem-types.h
 index 2eab921..c417b67 100644
 --- a/xlators/encryption/crypt/src/crypt-mem-types.h
 +++ b/xlators/encryption/crypt/src/crypt-mem-types.h
 @@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
 gf_crypt_mt_key,
 gf_crypt_mt_iovec,
 gf_crypt_mt_char,
 +gf_crypt_mt_local,
 gf_crypt_mt_end,
  };
 diff --git a/xlators/encryption/crypt/src/crypt.c
 b/xlators/encryption/crypt/src/crypt.c
 index ae8cdb2..63c0977 100644
 --- a/xlators/encryption/crypt/src/crypt.c
 +++ b/xlators/encryption/crypt/src/crypt.c
 @@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t
 *frame, xlator_t *this,
  {
 crypt_local_t *local = NULL;
 -   local = mem_get0(this-local_pool);
 +local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
 local is using the memory from pool earlier(i.e. with mem_get0()). Which
 seems ok to me. Changing it this way will include memory allocation in fop
 I/O path which is why xlators generally use the mem-pool approach.
 
 Pranith
 
 I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND
 like other xlators which also use mem_get for local (such as AFR). I am
 suspecting crypt not doing mem_put might be the reason for the bug
 mentioned.

I've looked at this now too. The use of mem_get0() seems fine to me. But
indeed, calling mem_put() is missing. Whenever the crypt_local_t should
be released, mem_put() should get called, just like any
GF_CALLOC/GF_FREE combinations.

HTH,
Niels

 
 Regards,
 Raghavendra Bat
 
 if (!local) {
 gf_log(this-name, GF_LOG_ERROR, out of memory);
 return NULL;
 
 
 Niels should be able to recognize if this is sufficient fix or not.
 
 Thanks,
 Raghavendra Talur
 
 + Justin
 
 --
 GlusterFS - http://www.gluster.org
 
 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.
 
 My personal twitter: twitter.com/realjustinclift
 http://twitter.com/realjustinclift
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 
 
 
 
 -- 
 *Raghavendra Talur *
 
 
 
 
 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel
 

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel



pgp8J6DUYue6M.pgp
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Raghavendra Bhat

On Thursday 02 April 2015 01:00 PM, Pranith Kumar Karampuri wrote:


On 04/02/2015 12:27 AM, Raghavendra Talur wrote:



On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org 
mailto:jus...@gluster.org wrote:


On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org
mailto:m...@netbsd.org wrote:
 Hi

 crypt.t was recently broken in NetBSD regression. The glusterfs
returns
 a node with file type invalid to FUSE, and that breaks the test.

 After running a git bisect, I found the offending commit after
which
 this behavior appeared:
8a2e2b88fc21dc7879f838d18cd0413dd88023b7
mem-pool: invalidate memory on GF_FREE to aid debugging

 This means the bug has always been there, but this debugging aid
 caused it to be reliable.

Sounds like that commit is a good win then. :)

Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
any ideas? :)


I found one issue that local is not allocated using GF_CALLOC and 
with a mem-type.

This is a patch which *might* fix it.

diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h 
b/xlators/encryption/crypt/src/crypt-mem-types.h

index 2eab921..c417b67 100644
--- a/xlators/encryption/crypt/src/crypt-mem-types.h
+++ b/xlators/encryption/crypt/src/crypt-mem-types.h
@@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
gf_crypt_mt_key,
gf_crypt_mt_iovec,
gf_crypt_mt_char,
+gf_crypt_mt_local,
gf_crypt_mt_end,
 };
diff --git a/xlators/encryption/crypt/src/crypt.c 
b/xlators/encryption/crypt/src/crypt.c

index ae8cdb2..63c0977 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -48,7 +48,7 @@ static crypt_local_t 
*crypt_alloc_local(call_frame_t *frame, xlator_t *this,

 {
crypt_local_t *local = NULL;
-   local = mem_get0(this-local_pool);
+local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
local is using the memory from pool earlier(i.e. with mem_get0()). 
Which seems ok to me. Changing it this way will include memory 
allocation in fop I/O path which is why xlators generally use the 
mem-pool approach.


Pranith


I think, crypt xlator should do a mem_put of local after doing 
STACK_UNWIND like other xlators which also use mem_get for local (such 
as AFR). I am suspecting crypt not doing mem_put might be the reason for 
the bug mentioned.


Regards,
Raghavendra Bat


if (!local) {
gf_log(this-name, GF_LOG_ERROR, out of memory);
return NULL;


Niels should be able to recognize if this is sufficient fix or not.

Thanks,
Raghavendra Talur

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift
http://twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel




--
*Raghavendra Talur *





___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Jeff Darcy
 I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND
 like other xlators which also use mem_get for local (such as AFR). I am
 suspecting crypt not doing mem_put might be the reason for the bug
 mentioned.

My understanding was that mem_put should be called automatically from
FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop
completes (e.g. at FUSE or GFAPI).  On the other hand, I see that AFR
and others call mem_put themselves, without zeroing the local pointer.
In my (possibly no longer relevant) experience, freeing local myself
without zeroing the pointer would lead to a double free, and I don't
see why that's not the case here.  What am I missing?
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Pranith Kumar Karampuri


On 04/02/2015 07:27 PM, Raghavendra Bhat wrote:

On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote:
I think, crypt xlator should do a mem_put of local after doing 
STACK_UNWIND

like other xlators which also use mem_get for local (such as AFR). I am
suspecting crypt not doing mem_put might be the reason for the bug
mentioned.

My understanding was that mem_put should be called automatically from
FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop
completes (e.g. at FUSE or GFAPI).  On the other hand, I see that AFR
and others call mem_put themselves, without zeroing the local pointer.
In my (possibly no longer relevant) experience, freeing local myself
without zeroing the pointer would lead to a double free, and I don't
see why that's not the case here.  What am I missing?


As per my understanding, the xlators which get local by mem_get should 
be doing below things in callback funtion  just before unwinding:


1) save frame-local pointer (i.e. local = frame-local);
2) STACK_UNWIND
3) mem_put (local)

After STACK_UNWIND and before mem_put any reference to fd or inode or 
dict that might be present in the local should be unrefed (also any 
allocated resources that are present in local should be freed). So 
mem_put is done at last. To avoid double free in FRAME_DESTROY, 
frame-local is set to NULL before doing STACK_UNWIND.


I suspect not doing 1 of the above three operations (may be either 1st 
or 3rd) in crypt xlator might be the reason for the bug.
I still don't understand why http://review.gluster.org/10109 is working. 
Does anyone know the reason? How are you guys re-creating the crash? I 
ran crypt.t but no crashes on my laptop. Could some one help me 
re-create this issue.


Pranith


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-02 Thread Raghavendra Bhat

On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote:

I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND
like other xlators which also use mem_get for local (such as AFR). I am
suspecting crypt not doing mem_put might be the reason for the bug
mentioned.

My understanding was that mem_put should be called automatically from
FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop
completes (e.g. at FUSE or GFAPI).  On the other hand, I see that AFR
and others call mem_put themselves, without zeroing the local pointer.
In my (possibly no longer relevant) experience, freeing local myself
without zeroing the pointer would lead to a double free, and I don't
see why that's not the case here.  What am I missing?


As per my understanding, the xlators which get local by mem_get should 
be doing below things in callback funtion  just before unwinding:


1) save frame-local pointer (i.e. local = frame-local);
2) STACK_UNWIND
3) mem_put (local)

After STACK_UNWIND and before mem_put any reference to fd or inode or 
dict that might be present in the local should be unrefed (also any 
allocated resources that are present in local should be freed). So 
mem_put is done at last. To avoid double free in FRAME_DESTROY, 
frame-local is set to NULL before doing STACK_UNWIND.


I suspect not doing 1 of the above three operations (may be either 1st 
or 3rd) in crypt xlator might be the reason for the bug.


Regards,
Raghavendra Bhat


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


[Gluster-devel] crypt xlator bug

2015-04-01 Thread Emmanuel Dreyfus
Hi

crypt.t was recently broken in NetBSD regression. The glusterfs returns
a node with file type invalid to FUSE, and that breaks the test.

After running a git bisect, I found the offending commit after which 
this behavior appeared:
8a2e2b88fc21dc7879f838d18cd0413dd88023b7
mem-pool: invalidate memory on GF_FREE to aid debugging

This means the bug has always been there, but this debugging aid 
caused it to be reliable.

With the help of an assertion, I can detect when inode-ia_type gets
a corrupted value. It gives me this backtrace where in frame 4, 
inode = 0xb9611880 and inode-ia_type = 12475 (which is wrong). 
inode value comes from FUSE state-loc-inode and we get it from
frame 20 which is in crypt.c:

#4  0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030, 
inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471
#5  0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030, 
loc=0xb9608840, xdata=0x0) at md-cache.c:847
#6  0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030, 
loc=0xb9608840, xdata=0x0) at io-stats.c:1934
#7  0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030, 
loc=0xb9608840, xdata=0x0) at defaults.c:2138
#8  0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030, 
loc=0xb9608840, xdata=0x0) at meta.c:49
#9  0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607
#10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569
#11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644
#12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671
#13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635
#14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667
#15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687
#16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688, 
cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0, 
inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0, 
postparent=0xb961e628) at fuse-resolve.c:81
#17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0, 
cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0, 
inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0, 
postparent=0xb961e628) at io-stats.c:1512
#18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410,
this=0xbb1df030, op_ret=0, op_errno=0, 
inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0, 
 postparent=0xb961e628) at md-cache.c:816
#19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0,
this=0xbb1de030, op_ret=0, op_errno=0, 
inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0, 
postparent=0xb961e628) at io-cache.c:260
#20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200,
this=0xbb1db030, op_ret=0, op_errno=0, 
dict=0xbb18a470, xdata=0x0) at crypt.c:3830

In frame 20:
case GF_FOP_LOOKUP:
STACK_UNWIND_STRICT(lookup,
frame,
op_ret,
op_errno,
op_ret = 0 ? local-inode : NULL,
op_ret = 0 ? local-buf : NULL,
local-xdata,
op_ret = 0 local-postbuf : NULL);
 
Here is the problem, local-inode is not the 0xb9611880 value anymore,
which means local got corrupted:

(gdb) print local-inode
$2 = (inode_t *) 0x1db030de

I now suspect local has been freed, but I do not find where in crypt.c 
this operation is done. There is a local = mem_get0(this-local_pool) 
in crypt_alloc_local, but where is that structure freed? There is 
no mem_put() call in crypt xlator.


-- 
Emmanuel Dreyfus
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Emmanuel Dreyfus
Raghavendra Talur raghavendra.ta...@gmail.com wrote:

 I found one issue that local is not allocated using GF_CALLOC and with a
 mem-type.
 This is a patch which *might* fix it.

It does. The memory corruption disapeared and the test can complete.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Jeff Darcy
  I found one issue that local is not allocated using GF_CALLOC and with a
  mem-type.
  This is a patch which *might* fix it.
 
 It does. The memory corruption disapeared and the test can complete.

Interesting.  I suspect this means that we *are* in the case where the
previous comment came from.  Mem_get can allocate objects two ways:

* As one of many objects in a slab, tracking internally.

* As a singleton, directly via GF_*ALLOC.

In mem_put, we do some pretty nasty pointer arithmetic to figure out
which way an object was allocated.  If we get it wrong, and therefore
use the wrong *de*allocate method (either way I believe) then we'll
corrupt memory.  The symptoms so far suggest that an object was
allocated within a slab, then deallocated as a singleton (causing its
memory to be poisoned).  That sucks.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Emmanuel Dreyfus
Jeff Darcy jda...@redhat.com wrote:

  It does. The memory corruption disapeared and the test can complete.
 
 Interesting.  

FWIW I made 147 successful runs in a row with the patch, while it always
failed without.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Justin Clift
On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org wrote:
 Hi
 
 crypt.t was recently broken in NetBSD regression. The glusterfs returns
 a node with file type invalid to FUSE, and that breaks the test.
 
 After running a git bisect, I found the offending commit after which 
 this behavior appeared:
8a2e2b88fc21dc7879f838d18cd0413dd88023b7
mem-pool: invalidate memory on GF_FREE to aid debugging
 
 This means the bug has always been there, but this debugging aid 
 caused it to be reliable.

Sounds like that commit is a good win then. :)

Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
any ideas? :)

+ Justin

--
GlusterFS - http://www.gluster.org

An open source, distributed file system scaling to several
petabytes, and handling thousands of clients.

My personal twitter: twitter.com/realjustinclift

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Raghavendra Talur
On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org wrote:

 On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org wrote:
  Hi
 
  crypt.t was recently broken in NetBSD regression. The glusterfs returns
  a node with file type invalid to FUSE, and that breaks the test.
 
  After running a git bisect, I found the offending commit after which
  this behavior appeared:
 8a2e2b88fc21dc7879f838d18cd0413dd88023b7
 mem-pool: invalidate memory on GF_FREE to aid debugging
 
  This means the bug has always been there, but this debugging aid
  caused it to be reliable.

 Sounds like that commit is a good win then. :)

 Harsha/Pranith/Lala, your names are on the git blame for crypt.c...
 any ideas? :)


I found one issue that local is not allocated using GF_CALLOC and with a
mem-type.
This is a patch which *might* fix it.

diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h
b/xlators/encryption/crypt/src/crypt-mem-types.h
index 2eab921..c417b67 100644
--- a/xlators/encryption/crypt/src/crypt-mem-types.h
+++ b/xlators/encryption/crypt/src/crypt-mem-types.h
@@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ {
gf_crypt_mt_key,
gf_crypt_mt_iovec,
gf_crypt_mt_char,
+gf_crypt_mt_local,
gf_crypt_mt_end,
 };

diff --git a/xlators/encryption/crypt/src/crypt.c
b/xlators/encryption/crypt/src/crypt.c
index ae8cdb2..63c0977 100644
--- a/xlators/encryption/crypt/src/crypt.c
+++ b/xlators/encryption/crypt/src/crypt.c
@@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t
*frame, xlator_t *this,
 {
crypt_local_t *local = NULL;

-   local = mem_get0(this-local_pool);
+local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);
if (!local) {
gf_log(this-name, GF_LOG_ERROR, out of memory);
return NULL;


Niels should be able to recognize if this is sufficient fix or not.

Thanks,
Raghavendra Talur


 + Justin

 --
 GlusterFS - http://www.gluster.org

 An open source, distributed file system scaling to several
 petabytes, and handling thousands of clients.

 My personal twitter: twitter.com/realjustinclift

 ___
 Gluster-devel mailing list
 Gluster-devel@gluster.org
 http://www.gluster.org/mailman/listinfo/gluster-devel




-- 
*Raghavendra Talur *
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Jeff Darcy
 - local = mem_get0(this-local_pool);
 + local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local);

As I understand it, mem_get0 is a valid (and even more
efficient) way to allocate such objects.  The frame cleanup
code should recognize which method to use when deallocating.
If that's broken, we're going to have more numerous and
serious problems than this.  I'll look into it further.

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] crypt xlator bug

2015-04-01 Thread Jeff Darcy
 As I understand it, mem_get0 is a valid (and even more
 efficient) way to allocate such objects.  The frame cleanup
 code should recognize which method to use when deallocating.
 If that's broken, we're going to have more numerous and
 serious problems than this.  I'll look into it further.

I don't see anything obviously wrong, but I did find this
gem of a comment in mem_get:

* I am working around this by performing a regular allocation
* , just the way the caller would've done when not using the
* mem-pool. That also means, we're not padding the size with
* the list_head structure because, this will not be added to
* the list of chunks that belong to the mem-pool allocated
* initially.
*
* This is the best we can do without adding functionality for
* managing multiple slabs. That does not interest us at present
* because it is too much work knowing that a better slab
* allocator is coming RSN.

Now I'm curious to find out what effect your change will have,
but I suspect we'll still be a while figuring this out.
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel