Re: [Gluster-devel] crypt xlator bug
On Fri, Apr 03, 2015 at 11:26:40AM +0530, Pranith Kumar Karampuri wrote: I still don't understand why http://review.gluster.org/10109 is working. Does anyone know the reason? How are you guys re-creating the crash? I ran crypt.t but no crashes on my laptop. Could some one help me re-create this issue. The problem is 100% reproductible on NetBSD. Give it a try on nbslave75 which is disabled in jenkins right now. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On 04/01/2015 03:27 PM, Emmanuel Dreyfus wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. With the help of an assertion, I can detect when inode-ia_type gets a corrupted value. It gives me this backtrace where in frame 4, inode = 0xb9611880 and inode-ia_type = 12475 (which is wrong). inode value comes from FUSE state-loc-inode and we get it from frame 20 which is in crypt.c: #4 0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030, inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471 #5 0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030, loc=0xb9608840, xdata=0x0) at md-cache.c:847 #6 0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030, loc=0xb9608840, xdata=0x0) at io-stats.c:1934 #7 0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030, loc=0xb9608840, xdata=0x0) at defaults.c:2138 #8 0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030, loc=0xb9608840, xdata=0x0) at meta.c:49 #9 0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607 #10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569 #11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644 #12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671 #13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635 #14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667 #15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687 #16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688, cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0, inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0, postparent=0xb961e628) at fuse-resolve.c:81 #17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0, cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0, inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0, postparent=0xb961e628) at io-stats.c:1512 #18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410, this=0xbb1df030, op_ret=0, op_errno=0, inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0, postparent=0xb961e628) at md-cache.c:816 #19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0, this=0xbb1de030, op_ret=0, op_errno=0, inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0, postparent=0xb961e628) at io-cache.c:260 #20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200, this=0xbb1db030, op_ret=0, op_errno=0, dict=0xbb18a470, xdata=0x0) at crypt.c:3830 In frame 20: case GF_FOP_LOOKUP: STACK_UNWIND_STRICT(lookup, frame, op_ret, op_errno, op_ret = 0 ? local-inode : NULL, op_ret = 0 ? local-buf : NULL, local-xdata, op_ret = 0 local-postbuf : NULL); Here is the problem, local-inode is not the 0xb9611880 value anymore, which means local got corrupted: (gdb) print local-inode $2 = (inode_t *) 0x1db030de I now suspect local has been freed, but I do not find where in crypt.c this operation is done. There is a local = mem_get0(this-local_pool) in crypt_alloc_local, but where is that structure freed? There is no mem_put() call in crypt xlator. I joined this thread after seeing raghavendra talur's patch which fixed the issue, which seemed extremely odd to me. Just checked this mail from you and local-inode in crypt need not be same as state-loc-inode because, inode_link in fuse_resolve_entry_cbk will give address of already linked inode with same gfid if one exists. I see hardlink related commands in crypt.t so this could be part of looking up extra link may be? which is resolving to older inode that is already linked. It is still some memory problem, but may not be anything to do with crypt. Could you let me know the details of the setup where you saw this issue? I can take a look. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On 04/02/2015 12:27 AM, Raghavendra Talur wrote: On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org mailto:jus...@gluster.org wrote: On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org mailto:m...@netbsd.org wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. Sounds like that commit is a good win then. :) Harsha/Pranith/Lala, your names are on the git blame for crypt.c... any ideas? :) I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h b/xlators/encryption/crypt/src/crypt-mem-types.h index 2eab921..c417b67 100644 --- a/xlators/encryption/crypt/src/crypt-mem-types.h +++ b/xlators/encryption/crypt/src/crypt-mem-types.h @@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ { gf_crypt_mt_key, gf_crypt_mt_iovec, gf_crypt_mt_char, +gf_crypt_mt_local, gf_crypt_mt_end, }; diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c index ae8cdb2..63c0977 100644 --- a/xlators/encryption/crypt/src/crypt.c +++ b/xlators/encryption/crypt/src/crypt.c @@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t *frame, xlator_t *this, { crypt_local_t *local = NULL; - local = mem_get0(this-local_pool); +local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local); local is using the memory from pool earlier(i.e. with mem_get0()). Which seems ok to me. Changing it this way will include memory allocation in fop I/O path which is why xlators generally use the mem-pool approach. Pranith if (!local) { gf_log(this-name, GF_LOG_ERROR, out of memory); return NULL; Niels should be able to recognize if this is sufficient fix or not. Thanks, Raghavendra Talur + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift http://twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On Thu, Apr 02, 2015 at 01:58:39PM +0530, Raghavendra Bhat wrote: On Thursday 02 April 2015 01:00 PM, Pranith Kumar Karampuri wrote: On 04/02/2015 12:27 AM, Raghavendra Talur wrote: On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org mailto:jus...@gluster.org wrote: On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org mailto:m...@netbsd.org wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. Sounds like that commit is a good win then. :) Harsha/Pranith/Lala, your names are on the git blame for crypt.c... any ideas? :) I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h b/xlators/encryption/crypt/src/crypt-mem-types.h index 2eab921..c417b67 100644 --- a/xlators/encryption/crypt/src/crypt-mem-types.h +++ b/xlators/encryption/crypt/src/crypt-mem-types.h @@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ { gf_crypt_mt_key, gf_crypt_mt_iovec, gf_crypt_mt_char, +gf_crypt_mt_local, gf_crypt_mt_end, }; diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c index ae8cdb2..63c0977 100644 --- a/xlators/encryption/crypt/src/crypt.c +++ b/xlators/encryption/crypt/src/crypt.c @@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t *frame, xlator_t *this, { crypt_local_t *local = NULL; - local = mem_get0(this-local_pool); +local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local); local is using the memory from pool earlier(i.e. with mem_get0()). Which seems ok to me. Changing it this way will include memory allocation in fop I/O path which is why xlators generally use the mem-pool approach. Pranith I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND like other xlators which also use mem_get for local (such as AFR). I am suspecting crypt not doing mem_put might be the reason for the bug mentioned. I've looked at this now too. The use of mem_get0() seems fine to me. But indeed, calling mem_put() is missing. Whenever the crypt_local_t should be released, mem_put() should get called, just like any GF_CALLOC/GF_FREE combinations. HTH, Niels Regards, Raghavendra Bat if (!local) { gf_log(this-name, GF_LOG_ERROR, out of memory); return NULL; Niels should be able to recognize if this is sufficient fix or not. Thanks, Raghavendra Talur + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift http://twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel pgp8J6DUYue6M.pgp Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On Thursday 02 April 2015 01:00 PM, Pranith Kumar Karampuri wrote: On 04/02/2015 12:27 AM, Raghavendra Talur wrote: On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org mailto:jus...@gluster.org wrote: On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org mailto:m...@netbsd.org wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. Sounds like that commit is a good win then. :) Harsha/Pranith/Lala, your names are on the git blame for crypt.c... any ideas? :) I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h b/xlators/encryption/crypt/src/crypt-mem-types.h index 2eab921..c417b67 100644 --- a/xlators/encryption/crypt/src/crypt-mem-types.h +++ b/xlators/encryption/crypt/src/crypt-mem-types.h @@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ { gf_crypt_mt_key, gf_crypt_mt_iovec, gf_crypt_mt_char, +gf_crypt_mt_local, gf_crypt_mt_end, }; diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c index ae8cdb2..63c0977 100644 --- a/xlators/encryption/crypt/src/crypt.c +++ b/xlators/encryption/crypt/src/crypt.c @@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t *frame, xlator_t *this, { crypt_local_t *local = NULL; - local = mem_get0(this-local_pool); +local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local); local is using the memory from pool earlier(i.e. with mem_get0()). Which seems ok to me. Changing it this way will include memory allocation in fop I/O path which is why xlators generally use the mem-pool approach. Pranith I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND like other xlators which also use mem_get for local (such as AFR). I am suspecting crypt not doing mem_put might be the reason for the bug mentioned. Regards, Raghavendra Bat if (!local) { gf_log(this-name, GF_LOG_ERROR, out of memory); return NULL; Niels should be able to recognize if this is sufficient fix or not. Thanks, Raghavendra Talur + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift http://twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND like other xlators which also use mem_get for local (such as AFR). I am suspecting crypt not doing mem_put might be the reason for the bug mentioned. My understanding was that mem_put should be called automatically from FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop completes (e.g. at FUSE or GFAPI). On the other hand, I see that AFR and others call mem_put themselves, without zeroing the local pointer. In my (possibly no longer relevant) experience, freeing local myself without zeroing the pointer would lead to a double free, and I don't see why that's not the case here. What am I missing? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On 04/02/2015 07:27 PM, Raghavendra Bhat wrote: On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote: I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND like other xlators which also use mem_get for local (such as AFR). I am suspecting crypt not doing mem_put might be the reason for the bug mentioned. My understanding was that mem_put should be called automatically from FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop completes (e.g. at FUSE or GFAPI). On the other hand, I see that AFR and others call mem_put themselves, without zeroing the local pointer. In my (possibly no longer relevant) experience, freeing local myself without zeroing the pointer would lead to a double free, and I don't see why that's not the case here. What am I missing? As per my understanding, the xlators which get local by mem_get should be doing below things in callback funtion just before unwinding: 1) save frame-local pointer (i.e. local = frame-local); 2) STACK_UNWIND 3) mem_put (local) After STACK_UNWIND and before mem_put any reference to fd or inode or dict that might be present in the local should be unrefed (also any allocated resources that are present in local should be freed). So mem_put is done at last. To avoid double free in FRAME_DESTROY, frame-local is set to NULL before doing STACK_UNWIND. I suspect not doing 1 of the above three operations (may be either 1st or 3rd) in crypt xlator might be the reason for the bug. I still don't understand why http://review.gluster.org/10109 is working. Does anyone know the reason? How are you guys re-creating the crash? I ran crypt.t but no crashes on my laptop. Could some one help me re-create this issue. Pranith Regards, Raghavendra Bhat ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On Thursday 02 April 2015 05:50 PM, Jeff Darcy wrote: I think, crypt xlator should do a mem_put of local after doing STACK_UNWIND like other xlators which also use mem_get for local (such as AFR). I am suspecting crypt not doing mem_put might be the reason for the bug mentioned. My understanding was that mem_put should be called automatically from FRAME_DESTROY, which is itself called from STACK_DESTROY when the fop completes (e.g. at FUSE or GFAPI). On the other hand, I see that AFR and others call mem_put themselves, without zeroing the local pointer. In my (possibly no longer relevant) experience, freeing local myself without zeroing the pointer would lead to a double free, and I don't see why that's not the case here. What am I missing? As per my understanding, the xlators which get local by mem_get should be doing below things in callback funtion just before unwinding: 1) save frame-local pointer (i.e. local = frame-local); 2) STACK_UNWIND 3) mem_put (local) After STACK_UNWIND and before mem_put any reference to fd or inode or dict that might be present in the local should be unrefed (also any allocated resources that are present in local should be freed). So mem_put is done at last. To avoid double free in FRAME_DESTROY, frame-local is set to NULL before doing STACK_UNWIND. I suspect not doing 1 of the above three operations (may be either 1st or 3rd) in crypt xlator might be the reason for the bug. Regards, Raghavendra Bhat ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] crypt xlator bug
Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. With the help of an assertion, I can detect when inode-ia_type gets a corrupted value. It gives me this backtrace where in frame 4, inode = 0xb9611880 and inode-ia_type = 12475 (which is wrong). inode value comes from FUSE state-loc-inode and we get it from frame 20 which is in crypt.c: #4 0xb9bd2adf in mdc_inode_iatt_get (this=0xbb1df030, inode=0xb9611880, iatt=0xbf7fdfa0) at md-cache.c:471 #5 0xb9bd34e1 in mdc_lookup (frame=0xb9aa82b0, this=0xbb1df030, loc=0xb9608840, xdata=0x0) at md-cache.c:847 #6 0xb9bc216e in io_stats_lookup (frame=0xb9aa8200, this=0xbb1e0030, loc=0xb9608840, xdata=0x0) at io-stats.c:1934 #7 0xbb76755f in default_lookup (frame=0xb9aa8200, this=0xbb1d0030, loc=0xb9608840, xdata=0x0) at defaults.c:2138 #8 0xb9ba69cd in meta_lookup (frame=0xb9aa8200, this=0xbb1d0030, loc=0xb9608840, xdata=0x0) at meta.c:49 #9 0xbb277365 in fuse_lookup_resume (state=0xb9608830) at fuse-bridge.c:607 #10 0xbb276e07 in fuse_fop_resume (state=0xb9608830) at fuse-bridge.c:569 #11 0xbb274969 in fuse_resolve_done (state=0xb9608830) at fuse-resolve.c:644 #12 0xbb274a29 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:671 #13 0xbb274941 in fuse_resolve (state=0xb9608830) at fuse-resolve.c:635 #14 0xbb274a06 in fuse_resolve_all (state=0xb9608830) at fuse-resolve.c:667 #15 0xbb274a8e in fuse_resolve_continue (state=0xb9608830) at fuse-resolve.c:687 #16 0xbb2731f4 in fuse_resolve_entry_cbk (frame=0xb9609688, cookie=0xb96140a0, this=0xbb193030, op_ret=0, op_errno=0, inode=0xb9611880, buf=0xb961e558, xattr=0xbb18a1a0, postparent=0xb961e628) at fuse-resolve.c:81 #17 0xb9bbd0c1 in io_stats_lookup_cbk (frame=0xb96140a0, cookie=0xb9614150, this=0xbb1e0030, op_ret=0, op_errno=0, inode=0xb9611880, buf=0xb961e558, xdata=0xbb18a1a0, postparent=0xb961e628) at io-stats.c:1512 #18 0xb9bd33ff in mdc_lookup_cbk (frame=0xb9614150, cookie=0xb9614410, this=0xbb1df030, op_ret=0, op_errno=0, inode=0xb9611880, stbuf=0xb961e558, dict=0xbb18a1a0, postparent=0xb961e628) at md-cache.c:816 #19 0xb9be2b10 in ioc_lookup_cbk (frame=0xb9614410, cookie=0xb96144c0, this=0xbb1de030, op_ret=0, op_errno=0, inode=0xb9611880, stbuf=0xb961e558, xdata=0xbb18a1a0, postparent=0xb961e628) at io-cache.c:260 #20 0xbb227fb5 in load_file_size (frame=0xb96144c0, cookie=0xb9aa8200, this=0xbb1db030, op_ret=0, op_errno=0, dict=0xbb18a470, xdata=0x0) at crypt.c:3830 In frame 20: case GF_FOP_LOOKUP: STACK_UNWIND_STRICT(lookup, frame, op_ret, op_errno, op_ret = 0 ? local-inode : NULL, op_ret = 0 ? local-buf : NULL, local-xdata, op_ret = 0 local-postbuf : NULL); Here is the problem, local-inode is not the 0xb9611880 value anymore, which means local got corrupted: (gdb) print local-inode $2 = (inode_t *) 0x1db030de I now suspect local has been freed, but I do not find where in crypt.c this operation is done. There is a local = mem_get0(this-local_pool) in crypt_alloc_local, but where is that structure freed? There is no mem_put() call in crypt xlator. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
Raghavendra Talur raghavendra.ta...@gmail.com wrote: I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. It does. The memory corruption disapeared and the test can complete. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. It does. The memory corruption disapeared and the test can complete. Interesting. I suspect this means that we *are* in the case where the previous comment came from. Mem_get can allocate objects two ways: * As one of many objects in a slab, tracking internally. * As a singleton, directly via GF_*ALLOC. In mem_put, we do some pretty nasty pointer arithmetic to figure out which way an object was allocated. If we get it wrong, and therefore use the wrong *de*allocate method (either way I believe) then we'll corrupt memory. The symptoms so far suggest that an object was allocated within a slab, then deallocated as a singleton (causing its memory to be poisoned). That sucks. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
Jeff Darcy jda...@redhat.com wrote: It does. The memory corruption disapeared and the test can complete. Interesting. FWIW I made 147 successful runs in a row with the patch, while it always failed without. -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. Sounds like that commit is a good win then. :) Harsha/Pranith/Lala, your names are on the git blame for crypt.c... any ideas? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
On Wed, Apr 1, 2015 at 10:34 PM, Justin Clift jus...@gluster.org wrote: On 1 Apr 2015, at 10:57, Emmanuel Dreyfus m...@netbsd.org wrote: Hi crypt.t was recently broken in NetBSD regression. The glusterfs returns a node with file type invalid to FUSE, and that breaks the test. After running a git bisect, I found the offending commit after which this behavior appeared: 8a2e2b88fc21dc7879f838d18cd0413dd88023b7 mem-pool: invalidate memory on GF_FREE to aid debugging This means the bug has always been there, but this debugging aid caused it to be reliable. Sounds like that commit is a good win then. :) Harsha/Pranith/Lala, your names are on the git blame for crypt.c... any ideas? :) I found one issue that local is not allocated using GF_CALLOC and with a mem-type. This is a patch which *might* fix it. diff --git a/xlators/encryption/crypt/src/crypt-mem-types.h b/xlators/encryption/crypt/src/crypt-mem-types.h index 2eab921..c417b67 100644 --- a/xlators/encryption/crypt/src/crypt-mem-types.h +++ b/xlators/encryption/crypt/src/crypt-mem-types.h @@ -24,6 +24,7 @@ enum gf_crypt_mem_types_ { gf_crypt_mt_key, gf_crypt_mt_iovec, gf_crypt_mt_char, +gf_crypt_mt_local, gf_crypt_mt_end, }; diff --git a/xlators/encryption/crypt/src/crypt.c b/xlators/encryption/crypt/src/crypt.c index ae8cdb2..63c0977 100644 --- a/xlators/encryption/crypt/src/crypt.c +++ b/xlators/encryption/crypt/src/crypt.c @@ -48,7 +48,7 @@ static crypt_local_t *crypt_alloc_local(call_frame_t *frame, xlator_t *this, { crypt_local_t *local = NULL; - local = mem_get0(this-local_pool); +local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local); if (!local) { gf_log(this-name, GF_LOG_ERROR, out of memory); return NULL; Niels should be able to recognize if this is sufficient fix or not. Thanks, Raghavendra Talur + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- *Raghavendra Talur * ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
- local = mem_get0(this-local_pool); + local = GF_CALLOC (sizeof (*local), 1, gf_crypt_mt_local); As I understand it, mem_get0 is a valid (and even more efficient) way to allocate such objects. The frame cleanup code should recognize which method to use when deallocating. If that's broken, we're going to have more numerous and serious problems than this. I'll look into it further. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] crypt xlator bug
As I understand it, mem_get0 is a valid (and even more efficient) way to allocate such objects. The frame cleanup code should recognize which method to use when deallocating. If that's broken, we're going to have more numerous and serious problems than this. I'll look into it further. I don't see anything obviously wrong, but I did find this gem of a comment in mem_get: * I am working around this by performing a regular allocation * , just the way the caller would've done when not using the * mem-pool. That also means, we're not padding the size with * the list_head structure because, this will not be added to * the list of chunks that belong to the mem-pool allocated * initially. * * This is the best we can do without adding functionality for * managing multiple slabs. That does not interest us at present * because it is too much work knowing that a better slab * allocator is coming RSN. Now I'm curious to find out what effect your change will have, but I suspect we'll still be a while figuring this out. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel