Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Pranith Kumar Karampuri
Ram,
   As per the code, self-heal was the only candidate which *can* do it.
Could you check logs of self-heal daemon and the mount to check if there
are any metadata heals on root?


+Sanoj

Sanoj,
   Is there any systemtap script we can use to detect which process is
removing these xattrs?

On Sat, Jul 8, 2017 at 2:58 AM, Ankireddypalle Reddy 
wrote:

> We lost the attributes on all the bricks on servers glusterfs2 and
> glusterfs3 again.
>
>
>
> [root@glusterfs2 Log_Files]# gluster volume info
>
>
>
> Volume Name: StoragePool
>
> Type: Distributed-Disperse
>
> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>
> Status: Started
>
> Number of Bricks: 20 x (2 + 1) = 60
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>
> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>
> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>
> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>
> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>
> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>
> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>
> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>
> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>
> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>
> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>
> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>
> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>
> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>
> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>
> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>
> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>
> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>
> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>
> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>
> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>
> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>
> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>
> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>
> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>
> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>
> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>
> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>
> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>
> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>
> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>
> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>
> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>
> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>
> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>
> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>
> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>
> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>
> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>
> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>
> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>
> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>
> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>
> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>
> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>
> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>
> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>
> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>
> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>
> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>
> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>
> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>
> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>
> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>
> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>
> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>
> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>
> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>
> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>
> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> diagnostics.client-log-level: INFO
>
> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.
> commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
> *Sent:* Friday, July 07, 2017 12:15 PM
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy 
> wrote:
>
> 3.7.19
>
>
>
> These are the only callers for removexattr and only _posix_remove_xattr
> has the potential to do removexattr as posix_removexattr already makes sure
> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr
> happens only from healing code of afr/ec. And this can only happen if 

Re: [Gluster-devel] [Gluster-users] gfid and volume-id extended attributes lost

2017-07-07 Thread Vijay Bellur
Do you observe any event pattern (self-healing / disk failures / reboots
etc.) after which the extended attributes are missing?

Regards,
Vijay

On Fri, Jul 7, 2017 at 5:28 PM, Ankireddypalle Reddy 
wrote:

> We lost the attributes on all the bricks on servers glusterfs2 and
> glusterfs3 again.
>
>
>
> [root@glusterfs2 Log_Files]# gluster volume info
>
>
>
> Volume Name: StoragePool
>
> Type: Distributed-Disperse
>
> Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
>
> Status: Started
>
> Number of Bricks: 20 x (2 + 1) = 60
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: glusterfs1sds:/ws/disk1/ws_brick
>
> Brick2: glusterfs2sds:/ws/disk1/ws_brick
>
> Brick3: glusterfs3sds:/ws/disk1/ws_brick
>
> Brick4: glusterfs1sds:/ws/disk2/ws_brick
>
> Brick5: glusterfs2sds:/ws/disk2/ws_brick
>
> Brick6: glusterfs3sds:/ws/disk2/ws_brick
>
> Brick7: glusterfs1sds:/ws/disk3/ws_brick
>
> Brick8: glusterfs2sds:/ws/disk3/ws_brick
>
> Brick9: glusterfs3sds:/ws/disk3/ws_brick
>
> Brick10: glusterfs1sds:/ws/disk4/ws_brick
>
> Brick11: glusterfs2sds:/ws/disk4/ws_brick
>
> Brick12: glusterfs3sds:/ws/disk4/ws_brick
>
> Brick13: glusterfs1sds:/ws/disk5/ws_brick
>
> Brick14: glusterfs2sds:/ws/disk5/ws_brick
>
> Brick15: glusterfs3sds:/ws/disk5/ws_brick
>
> Brick16: glusterfs1sds:/ws/disk6/ws_brick
>
> Brick17: glusterfs2sds:/ws/disk6/ws_brick
>
> Brick18: glusterfs3sds:/ws/disk6/ws_brick
>
> Brick19: glusterfs1sds:/ws/disk7/ws_brick
>
> Brick20: glusterfs2sds:/ws/disk7/ws_brick
>
> Brick21: glusterfs3sds:/ws/disk7/ws_brick
>
> Brick22: glusterfs1sds:/ws/disk8/ws_brick
>
> Brick23: glusterfs2sds:/ws/disk8/ws_brick
>
> Brick24: glusterfs3sds:/ws/disk8/ws_brick
>
> Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
>
> Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
>
> Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
>
> Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
>
> Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
>
> Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
>
> Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
>
> Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
>
> Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
>
> Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
>
> Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
>
> Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
>
> Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
>
> Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
>
> Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
>
> Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
>
> Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
>
> Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
>
> Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
>
> Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
>
> Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
>
> Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
>
> Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
>
> Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
>
> Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
>
> Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
>
> Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
>
> Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
>
> Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
>
> Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
>
> Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
>
> Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
>
> Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
>
> Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
>
> Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
>
> Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
>
> Options Reconfigured:
>
> performance.readdir-ahead: on
>
> diagnostics.client-log-level: INFO
>
> auth.allow: glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.
> commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com
>
>
>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
> *Sent:* Friday, July 07, 2017 12:15 PM
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy 
> wrote:
>
> 3.7.19
>
>
>
> These are the only callers for removexattr and only _posix_remove_xattr
> has the potential to do removexattr as posix_removexattr already makes sure
> that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr
> happens only from healing code of afr/ec. And this can only happen if the
> source brick doesn't have gfid, which doesn't seem to match with the
> situation you explained.
>
>#   line  filename / context / line
>1   1234  

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Ankireddypalle Reddy
We lost the attributes on all the bricks on servers glusterfs2 and glusterfs3 
again.

[root@glusterfs2 Log_Files]# gluster volume info

Volume Name: StoragePool
Type: Distributed-Disperse
Volume ID: 149e976f-4e21-451c-bf0f-f5691208531f
Status: Started
Number of Bricks: 20 x (2 + 1) = 60
Transport-type: tcp
Bricks:
Brick1: glusterfs1sds:/ws/disk1/ws_brick
Brick2: glusterfs2sds:/ws/disk1/ws_brick
Brick3: glusterfs3sds:/ws/disk1/ws_brick
Brick4: glusterfs1sds:/ws/disk2/ws_brick
Brick5: glusterfs2sds:/ws/disk2/ws_brick
Brick6: glusterfs3sds:/ws/disk2/ws_brick
Brick7: glusterfs1sds:/ws/disk3/ws_brick
Brick8: glusterfs2sds:/ws/disk3/ws_brick
Brick9: glusterfs3sds:/ws/disk3/ws_brick
Brick10: glusterfs1sds:/ws/disk4/ws_brick
Brick11: glusterfs2sds:/ws/disk4/ws_brick
Brick12: glusterfs3sds:/ws/disk4/ws_brick
Brick13: glusterfs1sds:/ws/disk5/ws_brick
Brick14: glusterfs2sds:/ws/disk5/ws_brick
Brick15: glusterfs3sds:/ws/disk5/ws_brick
Brick16: glusterfs1sds:/ws/disk6/ws_brick
Brick17: glusterfs2sds:/ws/disk6/ws_brick
Brick18: glusterfs3sds:/ws/disk6/ws_brick
Brick19: glusterfs1sds:/ws/disk7/ws_brick
Brick20: glusterfs2sds:/ws/disk7/ws_brick
Brick21: glusterfs3sds:/ws/disk7/ws_brick
Brick22: glusterfs1sds:/ws/disk8/ws_brick
Brick23: glusterfs2sds:/ws/disk8/ws_brick
Brick24: glusterfs3sds:/ws/disk8/ws_brick
Brick25: glusterfs4sds.commvault.com:/ws/disk1/ws_brick
Brick26: glusterfs5sds.commvault.com:/ws/disk1/ws_brick
Brick27: glusterfs6sds.commvault.com:/ws/disk1/ws_brick
Brick28: glusterfs4sds.commvault.com:/ws/disk10/ws_brick
Brick29: glusterfs5sds.commvault.com:/ws/disk10/ws_brick
Brick30: glusterfs6sds.commvault.com:/ws/disk10/ws_brick
Brick31: glusterfs4sds.commvault.com:/ws/disk11/ws_brick
Brick32: glusterfs5sds.commvault.com:/ws/disk11/ws_brick
Brick33: glusterfs6sds.commvault.com:/ws/disk11/ws_brick
Brick34: glusterfs4sds.commvault.com:/ws/disk12/ws_brick
Brick35: glusterfs5sds.commvault.com:/ws/disk12/ws_brick
Brick36: glusterfs6sds.commvault.com:/ws/disk12/ws_brick
Brick37: glusterfs4sds.commvault.com:/ws/disk2/ws_brick
Brick38: glusterfs5sds.commvault.com:/ws/disk2/ws_brick
Brick39: glusterfs6sds.commvault.com:/ws/disk2/ws_brick
Brick40: glusterfs4sds.commvault.com:/ws/disk3/ws_brick
Brick41: glusterfs5sds.commvault.com:/ws/disk3/ws_brick
Brick42: glusterfs6sds.commvault.com:/ws/disk3/ws_brick
Brick43: glusterfs4sds.commvault.com:/ws/disk4/ws_brick
Brick44: glusterfs5sds.commvault.com:/ws/disk4/ws_brick
Brick45: glusterfs6sds.commvault.com:/ws/disk4/ws_brick
Brick46: glusterfs4sds.commvault.com:/ws/disk5/ws_brick
Brick47: glusterfs5sds.commvault.com:/ws/disk5/ws_brick
Brick48: glusterfs6sds.commvault.com:/ws/disk5/ws_brick
Brick49: glusterfs4sds.commvault.com:/ws/disk6/ws_brick
Brick50: glusterfs5sds.commvault.com:/ws/disk6/ws_brick
Brick51: glusterfs6sds.commvault.com:/ws/disk6/ws_brick
Brick52: glusterfs4sds.commvault.com:/ws/disk7/ws_brick
Brick53: glusterfs5sds.commvault.com:/ws/disk7/ws_brick
Brick54: glusterfs6sds.commvault.com:/ws/disk7/ws_brick
Brick55: glusterfs4sds.commvault.com:/ws/disk8/ws_brick
Brick56: glusterfs5sds.commvault.com:/ws/disk8/ws_brick
Brick57: glusterfs6sds.commvault.com:/ws/disk8/ws_brick
Brick58: glusterfs4sds.commvault.com:/ws/disk9/ws_brick
Brick59: glusterfs5sds.commvault.com:/ws/disk9/ws_brick
Brick60: glusterfs6sds.commvault.com:/ws/disk9/ws_brick
Options Reconfigured:
performance.readdir-ahead: on
diagnostics.client-log-level: INFO
auth.allow: 
glusterfs1sds,glusterfs2sds,glusterfs3sds,glusterfs4sds.commvault.com,glusterfs5sds.commvault.com,glusterfs6sds.commvault.com

Thanks and Regards,
Ram
From: Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
Sent: Friday, July 07, 2017 12:15 PM
To: Ankireddypalle Reddy
Cc: Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost



On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy 
> wrote:
3.7.19

These are the only callers for removexattr and only _posix_remove_xattr has the 
potential to do removexattr as posix_removexattr already makes sure that it is 
not gfid/volume-id. And surprise surprise _posix_remove_xattr happens only from 
healing code of afr/ec. And this can only happen if the source brick doesn't 
have gfid, which doesn't seem to match with the situation you explained.

   #   line  filename / context / line
   1   1234  xlators/mgmt/glusterd/src/glusterd-quota.c 
<>
 ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
   2   1243  xlators/mgmt/glusterd/src/glusterd-quota.c 
<>
 ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY);
   3   6102  xlators/mgmt/glusterd/src/glusterd-utils.c 
<>
 sys_lremovexattr (path, "trusted.glusterfs.test");
   4 80  xlators/storage/posix/src/posix-handle.h <>
 op_ret = sys_lremovexattr (path, key); \
   5   5026  xlators/storage/posix/src/posix.c <<_posix_remove_xattr>>
 op_ret 

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Pranith Kumar Karampuri
On Fri, Jul 7, 2017 at 9:25 PM, Ankireddypalle Reddy 
wrote:

> 3.7.19
>

These are the only callers for removexattr and only _posix_remove_xattr has
the potential to do removexattr as posix_removexattr already makes sure
that it is not gfid/volume-id. And surprise surprise _posix_remove_xattr
happens only from healing code of afr/ec. And this can only happen if the
source brick doesn't have gfid, which doesn't seem to match with the
situation you explained.

   #   line  filename / context / line
   1   1234  xlators/mgmt/glusterd/src/glusterd-quota.c
<>
 ret = sys_lremovexattr (abspath, QUOTA_LIMIT_KEY);
   2   1243  xlators/mgmt/glusterd/src/glusterd-quota.c
<>
 ret = sys_lremovexattr (abspath, QUOTA_LIMIT_OBJECTS_KEY);
   3   6102  xlators/mgmt/glusterd/src/glusterd-utils.c
<>
 sys_lremovexattr (path, "trusted.glusterfs.test");
   4 80  xlators/storage/posix/src/posix-handle.h <>
 op_ret = sys_lremovexattr (path, key); \
   5   5026  xlators/storage/posix/src/posix.c <<_posix_remove_xattr>>
 op_ret = sys_lremovexattr (filler->real_path, key);
   6   5101  xlators/storage/posix/src/posix.c <>
 op_ret = sys_lremovexattr (real_path, name);
   7   6811  xlators/storage/posix/src/posix.c <>
 sys_lremovexattr (dir_data->data, "trusted.glusterfs.test");

So there are only two possibilities:
1) Source directory in ec/afr doesn't have gfid
2) Something else removed these xattrs.

What is your volume info? May be that will give more clues.

 PS: sys_fremovexattr is called only from posix_fremovexattr(), so that
doesn't seem to be the culprit as it also have checks to guard against
gfid/volume-id removal.


>
> Thanks and Regards,
>
> Ram
>
> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
> *Sent:* Friday, July 07, 2017 11:54 AM
>
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
>
>
>
>
> On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy 
> wrote:
>
> Pranith,
>
>  Thanks for looking in to the issue. The bricks were
> mounted after the reboot. One more thing that I noticed was when the
> attributes were manually set when glusterd was up then on starting the
> volume the attributes were again lost. Had to stop glusterd set attributes
> and then start glusterd. After that the volume start succeeded.
>
>
>
> Which version is this?
>
>
>
>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
> *Sent:* Friday, July 07, 2017 11:46 AM
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
> Did anything special happen on these two bricks? It can't happen in the
> I/O path:
> posix_removexattr() has:
>   0 if (!strcmp (GFID_XATTR_KEY, name))
> {
>
>
>   1 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   2 "Remove xattr called on gfid for file %s",
> real_path);
>   3 op_ret = -1;
>
>   4 goto out;
>
>   5 }
>
>   6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
> {
>   7 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   8 "Remove xattr called on volume-id for file
> %s",
>   9 real_path);
>
>  10 op_ret = -1;
>
>  11 goto out;
>
>  12 }
>
> I just found that op_errno is not set correctly, but it can't happen in
> the I/O path, so self-heal/rebalance are off the hook.
>
> I also grepped for any removexattr of trusted.gfid from glusterd and
> didn't find any.
>
> So one thing that used to happen was that sometimes when machines reboot,
> the brick mounts wouldn't happen and this would lead to absence of both
> trusted.gfid and volume-id. So at the moment this is my wild guess.
>
>
>
>
>
> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy 
> wrote:
>
> Hi,
>
>We faced an issue in the production today. We had to stop the
> volume and reboot all the servers in the cluster.  Once the servers
> rebooted starting of the volume failed because the following extended
> attributes were not present on all the bricks on 2 servers.
>
> 1)  trusted.gfid
>
> 2)  trusted.glusterfs.volume-id
>
>
>
> We had to manually set these extended attributes to start the volume.  Are
> there any such known issues.
>
>
>
> Thanks and Regards,
>
> Ram
>
> ***Legal Disclaimer***
>
> "This communication may contain confidential and privileged material for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by 

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Ankireddypalle Reddy
3.7.19

Thanks and Regards,
Ram
From: Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
Sent: Friday, July 07, 2017 11:54 AM
To: Ankireddypalle Reddy
Cc: Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost



On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy 
> wrote:
Pranith,
 Thanks for looking in to the issue. The bricks were mounted 
after the reboot. One more thing that I noticed was when the attributes were 
manually set when glusterd was up then on starting the volume the attributes 
were again lost. Had to stop glusterd set attributes and then start glusterd. 
After that the volume start succeeded.

Which version is this?


Thanks and Regards,
Ram

From: Pranith Kumar Karampuri 
[mailto:pkara...@redhat.com]
Sent: Friday, July 07, 2017 11:46 AM
To: Ankireddypalle Reddy
Cc: Gluster Devel 
(gluster-devel@gluster.org); 
gluster-us...@gluster.org
Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost

Did anything special happen on these two bricks? It can't happen in the I/O 
path:
posix_removexattr() has:
  0 if (!strcmp (GFID_XATTR_KEY, name)) {
  1 gf_msg (this->name, GF_LOG_WARNING, 0, 
P_MSG_XATTR_NOT_REMOVED,
  2 "Remove xattr called on gfid for file %s", 
real_path);
  3 op_ret = -1;
  4 goto out;
  5 }
  6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) {
  7 gf_msg (this->name, GF_LOG_WARNING, 0, 
P_MSG_XATTR_NOT_REMOVED,
  8 "Remove xattr called on volume-id for file %s",
  9 real_path);
 10 op_ret = -1;
 11 goto out;
 12 }
I just found that op_errno is not set correctly, but it can't happen in the I/O 
path, so self-heal/rebalance are off the hook.
I also grepped for any removexattr of trusted.gfid from glusterd and didn't 
find any.
So one thing that used to happen was that sometimes when machines reboot, the 
brick mounts wouldn't happen and this would lead to absence of both 
trusted.gfid and volume-id. So at the moment this is my wild guess.


On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy 
> wrote:
Hi,
   We faced an issue in the production today. We had to stop the volume and 
reboot all the servers in the cluster.  Once the servers rebooted starting of 
the volume failed because the following extended attributes were not present on 
all the bricks on 2 servers.

1)  trusted.gfid

2)  trusted.glusterfs.volume-id

We had to manually set these extended attributes to start the volume.  Are 
there any such known issues.

Thanks and Regards,
Ram
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel



--
Pranith
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**



--
Pranith
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Pranith Kumar Karampuri
On Fri, Jul 7, 2017 at 9:20 PM, Ankireddypalle Reddy 
wrote:

> Pranith,
>
>  Thanks for looking in to the issue. The bricks were
> mounted after the reboot. One more thing that I noticed was when the
> attributes were manually set when glusterd was up then on starting the
> volume the attributes were again lost. Had to stop glusterd set attributes
> and then start glusterd. After that the volume start succeeded.
>

Which version is this?


>
>
> Thanks and Regards,
>
> Ram
>
>
>
> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
> *Sent:* Friday, July 07, 2017 11:46 AM
> *To:* Ankireddypalle Reddy
> *Cc:* Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
> *Subject:* Re: [Gluster-devel] gfid and volume-id extended attributes lost
>
>
>
> Did anything special happen on these two bricks? It can't happen in the
> I/O path:
> posix_removexattr() has:
>   0 if (!strcmp (GFID_XATTR_KEY, name))
> {
>
>
>   1 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   2 "Remove xattr called on gfid for file %s",
> real_path);
>   3 op_ret = -1;
>
>   4 goto out;
>
>   5 }
>
>   6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
> {
>   7 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   8 "Remove xattr called on volume-id for file
> %s",
>   9 real_path);
>
>  10 op_ret = -1;
>
>  11 goto out;
>
>  12 }
>
> I just found that op_errno is not set correctly, but it can't happen in
> the I/O path, so self-heal/rebalance are off the hook.
>
> I also grepped for any removexattr of trusted.gfid from glusterd and
> didn't find any.
>
> So one thing that used to happen was that sometimes when machines reboot,
> the brick mounts wouldn't happen and this would lead to absence of both
> trusted.gfid and volume-id. So at the moment this is my wild guess.
>
>
>
>
>
> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy 
> wrote:
>
> Hi,
>
>We faced an issue in the production today. We had to stop the
> volume and reboot all the servers in the cluster.  Once the servers
> rebooted starting of the volume failed because the following extended
> attributes were not present on all the bricks on 2 servers.
>
> 1)  trusted.gfid
>
> 2)  trusted.glusterfs.volume-id
>
>
>
> We had to manually set these extended attributes to start the volume.  Are
> there any such known issues.
>
>
>
> Thanks and Regards,
>
> Ram
>
> ***Legal Disclaimer***
>
> "This communication may contain confidential and privileged material for
> the
>
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
>
> by others is strictly prohibited. If you have received the message by
> mistake,
>
> please advise the sender by reply email and delete the message. Thank you."
>
> **
>
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>
>
>
>
> --
>
> Pranith
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Ankireddypalle Reddy
Pranith,
 Thanks for looking in to the issue. The bricks were mounted 
after the reboot. One more thing that I noticed was when the attributes were 
manually set when glusterd was up then on starting the volume the attributes 
were again lost. Had to stop glusterd set attributes and then start glusterd. 
After that the volume start succeeded.

Thanks and Regards,
Ram

From: Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
Sent: Friday, July 07, 2017 11:46 AM
To: Ankireddypalle Reddy
Cc: Gluster Devel (gluster-devel@gluster.org); gluster-us...@gluster.org
Subject: Re: [Gluster-devel] gfid and volume-id extended attributes lost

Did anything special happen on these two bricks? It can't happen in the I/O 
path:
posix_removexattr() has:
  0 if (!strcmp (GFID_XATTR_KEY, name)) {
  1 gf_msg (this->name, GF_LOG_WARNING, 0, 
P_MSG_XATTR_NOT_REMOVED,
  2 "Remove xattr called on gfid for file %s", 
real_path);
  3 op_ret = -1;
  4 goto out;
  5 }
  6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name)) {
  7 gf_msg (this->name, GF_LOG_WARNING, 0, 
P_MSG_XATTR_NOT_REMOVED,
  8 "Remove xattr called on volume-id for file %s",
  9 real_path);
 10 op_ret = -1;
 11 goto out;
 12 }
I just found that op_errno is not set correctly, but it can't happen in the I/O 
path, so self-heal/rebalance are off the hook.
I also grepped for any removexattr of trusted.gfid from glusterd and didn't 
find any.
So one thing that used to happen was that sometimes when machines reboot, the 
brick mounts wouldn't happen and this would lead to absence of both 
trusted.gfid and volume-id. So at the moment this is my wild guess.


On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy 
> wrote:
Hi,
   We faced an issue in the production today. We had to stop the volume and 
reboot all the servers in the cluster.  Once the servers rebooted starting of 
the volume failed because the following extended attributes were not present on 
all the bricks on 2 servers.

1)  trusted.gfid

2)  trusted.glusterfs.volume-id

We had to manually set these extended attributes to start the volume.  Are 
there any such known issues.

Thanks and Regards,
Ram
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**

___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel



--
Pranith
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Pranith Kumar Karampuri
On Fri, Jul 7, 2017 at 9:15 PM, Pranith Kumar Karampuri  wrote:

> Did anything special happen on these two bricks? It can't happen in the
> I/O path:
> posix_removexattr() has:
>   0 if (!strcmp (GFID_XATTR_KEY, name))
> {
>
>
>   1 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   2 "Remove xattr called on gfid for file %s",
> real_path);
>   3 op_ret = -1;
>
>   4 goto out;
>
>   5 }
>
>   6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
> {
>   7 gf_msg (this->name, GF_LOG_WARNING, 0,
> P_MSG_XATTR_NOT_REMOVED,
>   8 "Remove xattr called on volume-id for file
> %s",
>   9 real_path);
>
>  10 op_ret = -1;
>
>  11 goto out;
>
>  12 }
>
> I just found that op_errno is not set correctly, but it can't happen in
> the I/O path, so self-heal/rebalance are off the hook.
>
> I also grepped for any removexattr of trusted.gfid from glusterd and
> didn't find any.
>
> So one thing that used to happen was that sometimes when machines reboot,
> the brick mounts wouldn't happen and this would lead to absence of both
> trusted.gfid and volume-id. So at the moment this is my wild guess.
>

Fix for this was to mount the bricks. But considering that you guys did
setting of the xattrs instead, I am guessing the other data was intact and
only these particular xattrs were missing? I wonder what new problem this
is.


>
>
> On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy  > wrote:
>
>> Hi,
>>
>>We faced an issue in the production today. We had to stop the
>> volume and reboot all the servers in the cluster.  Once the servers
>> rebooted starting of the volume failed because the following extended
>> attributes were not present on all the bricks on 2 servers.
>>
>> 1)  trusted.gfid
>>
>> 2)  trusted.glusterfs.volume-id
>>
>>
>>
>> We had to manually set these extended attributes to start the volume.
>> Are there any such known issues.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>> ***Legal Disclaimer***
>> "This communication may contain confidential and privileged material for
>> the
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>> please advise the sender by reply email and delete the message. Thank
>> you."
>> **
>>
>> ___
>> Gluster-devel mailing list
>> Gluster-devel@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>
>
>
> --
> Pranith
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Pranith Kumar Karampuri
Did anything special happen on these two bricks? It can't happen in the I/O
path:
posix_removexattr() has:
  0 if (!strcmp (GFID_XATTR_KEY, name))
{

  1 gf_msg (this->name, GF_LOG_WARNING, 0,
P_MSG_XATTR_NOT_REMOVED,
  2 "Remove xattr called on gfid for file %s",
real_path);
  3 op_ret =
-1;
  4 goto
out;
  5
}
  6 if (!strcmp (GF_XATTR_VOL_ID_KEY, name))
{
  7 gf_msg (this->name, GF_LOG_WARNING, 0,
P_MSG_XATTR_NOT_REMOVED,
  8 "Remove xattr called on volume-id for file
%s",
  9
real_path);
 10 op_ret =
-1;
 11 goto
out;
 12 }

I just found that op_errno is not set correctly, but it can't happen in the
I/O path, so self-heal/rebalance are off the hook.

I also grepped for any removexattr of trusted.gfid from glusterd and didn't
find any.

So one thing that used to happen was that sometimes when machines reboot,
the brick mounts wouldn't happen and this would lead to absence of both
trusted.gfid and volume-id. So at the moment this is my wild guess.


On Fri, Jul 7, 2017 at 8:39 PM, Ankireddypalle Reddy 
wrote:

> Hi,
>
>We faced an issue in the production today. We had to stop the
> volume and reboot all the servers in the cluster.  Once the servers
> rebooted starting of the volume failed because the following extended
> attributes were not present on all the bricks on 2 servers.
>
> 1)  trusted.gfid
>
> 2)  trusted.glusterfs.volume-id
>
>
>
> We had to manually set these extended attributes to start the volume.  Are
> there any such known issues.
>
>
>
> Thanks and Regards,
>
> Ram
> ***Legal Disclaimer***
> "This communication may contain confidential and privileged material for
> the
> sole use of the intended recipient. Any unauthorized review, use or
> distribution
> by others is strictly prohibited. If you have received the message by
> mistake,
> please advise the sender by reply email and delete the message. Thank you."
> **
>
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-devel
>



-- 
Pranith
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] gfid and volume-id extended attributes lost

2017-07-07 Thread Ankireddypalle Reddy
Hi,
   We faced an issue in the production today. We had to stop the volume and 
reboot all the servers in the cluster.  Once the servers rebooted starting of 
the volume failed because the following extended attributes were not present on 
all the bricks on 2 servers.

1)  trusted.gfid

2)  trusted.glusterfs.volume-id

We had to manually set these extended attributes to start the volume.  Are 
there any such known issues.

Thanks and Regards,
Ram
***Legal Disclaimer***
"This communication may contain confidential and privileged material for the
sole use of the intended recipient. Any unauthorized review, use or distribution
by others is strictly prohibited. If you have received the message by mistake,
please advise the sender by reply email and delete the message. Thank you."
**
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Coverity covscan for 2017-07-07-0ae38df6 (master branch)

2017-07-07 Thread staticanalysis
GlusterFS Coverity covscan results are available from
http://download.gluster.org/pub/gluster/glusterfs/static-analysis/master/glusterfs-coverity/2017-07-07-0ae38df6
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel


Re: [Gluster-devel] is tests/basic/gfapi/libgfapi-fini-hang.t broken in NetBSD ?

2017-07-07 Thread Niels de Vos
On Fri, Jul 07, 2017 at 07:42:08AM -0400, Jeff Darcy wrote:
> 
> 
> On Fri, Jul 7, 2017, at 03:36 AM, Niels de Vos wrote:
> > The segfault is caused by GF_ASSERT() on
> > https://review.gluster.org/#/c/17662/2/libglusterfs/src/mem-pool.c@563 .
> > At the moment I'm not sure how this can happen, unless glfs_fini() is
> > called more than once on a glfs_t object.
> 
> It's because the test deliberately calls glfs_fini without calling
> glfs_init.  Unlike most APIs, in GFAPI "init" is neither a module-level
> operation  nor the first thing users should call.  Looking at what
> already happens in glfs_new vs. glfs_fini, it looks like the call to
> mem_pools_init should have gone into glfs_new anyway.  It's easy to fix;
> I'll send a patch for it shortly.

Thanks! Yes, glfs_new() would probably be more appropriate.

> The other mystery is why this same test passed on Centos.  Seems like it
> should have failed in exactly the same way.

A difference could be that ./configure is called with --enable-debug on
NetBSD, but not on CentOS. I think GF_ASSERT() only logs when in
non-debug mode.

Niels


signature.asc
Description: PGP signature
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] geo-rep regression because of node-uuid change

2017-07-07 Thread Xavier Hernandez

On 07/07/17 11:25, Pranith Kumar Karampuri wrote:



On Fri, Jul 7, 2017 at 2:46 PM, Xavier Hernandez > wrote:

On 07/07/17 10:12, Pranith Kumar Karampuri wrote:



On Fri, Jul 7, 2017 at 1:13 PM, Xavier Hernandez

>>
wrote:

Hi Pranith,

On 05/07/17 12:28, Pranith Kumar Karampuri wrote:



On Tue, Jul 4, 2017 at 2:26 PM, Xavier Hernandez

>
 

Re: [Gluster-devel] upstream regression suite is broken

2017-07-07 Thread Atin Mukherjee
On Fri, Jul 7, 2017 at 12:33 PM, Krutika Dhananjay 
wrote:

> The patch[1] that introduced tests/basic/stats-dump.t was merged in
> October 2015 and
> my patch underwent (and passed too![2]) centos regression tests, including
> stats-dump.t on 05 June, 2017.
> The only change that the test script underwent during this time was this
> line in 2016, which is harmless:
>
> a4f84d78 (Kaleb S KEITHLEY 2016-03-15 06:16:31 -0400 17) EXPECT_WITHIN
> $NFS_EXPORT_TIMEOUT "1" is_nfs_export_available
>
> So there was NO change that went between 5th June and the time my patch
> was merged, which could have broken the test suite, that could have been
> caught easily with a mere rebase? Or am I missing something here? The
> problem is simply that the test script didn't fail on my patch.
>

It's not the test but probably something in the code which has gone in to
make this test fail now. So no way it's something the developer/maintainer
has to take a blame for.


>
>
> [1] - https://review.gluster.org/12209
> [2] - https://build.gluster.org/job/centos6-regression/4897/consoleFull
>
> -Krutika
>
> On Fri, Jul 7, 2017 at 9:51 AM, Atin Mukherjee 
> wrote:
>
>> Krutika,
>>
>> tests/basis/stats-dump.t is failing all the time and as per my initial
>> analysis after https://review.gluster.org/#/c/17709/ got into the
>> mainline the failures are seen and reverting this patch makes the test to
>> run successfully. I do understand that the centos vote for this patch was
>> green but the last run was on 5th June which was 1 month back. So some
>> other changes have gone into in between which is now causing this patch to
>> break the test.
>>
>> This makes me think as a maintainer we do need to ensure the if the
>> regression vote on the patch is quite old, a rebase of the patch is must to
>> be on the safer side?
>>
>> ~Atin
>>
>
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] geo-rep regression because of node-uuid change

2017-07-07 Thread Xavier Hernandez

Hi Pranith,

On 05/07/17 12:28, Pranith Kumar Karampuri wrote:



On Tue, Jul 4, 2017 at 2:26 PM, Xavier Hernandez > wrote:

Hi Pranith,

On 03/07/17 08:33, Pranith Kumar Karampuri wrote:

Xavi,
  Now that the change has been reverted, we can resume this
discussion and decide on the exact format that considers, tier, dht,
afr, ec. People working geo-rep/dht/afr/ec had an internal
discussion
and we all agreed that this proposal would be a good way forward. I
think once we agree on the format and decide on the initial
encoding/decoding functions of the xattr and this change is
merged, we
can send patches on afr/ec/dht and geo-rep to take it to closure.

Could you propose the new format you have in mind that considers
all of
the xlators?


My idea was to create a new xattr not bound to any particular
function but which could give enough information to be used in many
places.

Currently we have another attribute called glusterfs.pathinfo that
returns hierarchical information about the location of a file. Maybe
we can extend this to unify all these attributes into a single
feature that could be used for multiple purposes.

Since we have time to discuss it, I would like to design it with
more information than we already talked.

First of all, the amount of information that this attribute can
contain is quite big if we expect to have volumes with thousands of
bricks. Even in the most simple case of returning only an UUID, we
can easily go beyond the limit of 64KB.

Consider also, for example, what shard should return when pathinfo
is requested for a file. Probably it should return a list of shards,
each one with all its associated pathinfo. We are talking about big
amounts of data here.

I think this kind of information doesn't fit very well in an
extended attribute. Another think to consider is that most probably
the requester of the data only needs a fragment of it, so we are
generating big amounts of data only to be parsed and reduced later,
dismissing most of it.

What do you think about using a very special virtual file to manage
all this information ? it could be easily read using normal read
fops, so it could manage big amounts of data easily. Also, accessing
only to some parts of the file we could go directly where we want,
avoiding the read of all remaining data.

A very basic idea could be this:

Each xlator would have a reserved area of the file. We can reserve
up to 4GB per xlator (32 bits). The remaining 32 bits of the offset
would indicate the xlator we want to access.

At offset 0 we have generic information about the volume. One of the
the things that this information should include is a basic hierarchy
of the whole volume and the offset for each xlator.

After reading this, the user will seek to the desired offset and
read the information related to the xlator it is interested in.

All the information should be stored in a format easily extensible
that will be kept compatible even if new information is added in the
future (for example doing special mappings of the 32 bits offsets
reserved for the xlator).

For example we can reserve the first megabyte of the xlator area to
have a mapping of attributes with its respective offset.

I think that using a binary format would simplify all this a lot.

Do you think this is a way to explore or should I stop wasting time
here ?


I think this just became a very big feature :-). Shall we just live with
it the way it is now?


I supposed it...

Only thing we need to check is if shard needs to handle this xattr. If 
so, what it should return ? only the UUID's corresponding to the first 
shard or the UUID's of all bricks containing at least one shard ? I 
guess that the first one is enough, but just to be sure...


My proposal was to implement a new xattr, for example glusterfs.layout, 
that contains enough information to be usable in all current use cases.


The idea would be that each xlator that makes a significant change in 
the way or the place where files are stored, should put information in 
this xattr. The information should include:


* Type (basically AFR, EC, DHT, ...)
* Basic configuration (replication and arbiter for AFR, data and 
redundancy for EC, # subvolumes for DHT, shard size for sharding, ...)

* Quorum imposed by the xlator
* UUID data comming from subvolumes (sorted by brick position)
* It should be easily extensible in the future

The last point is very important to avoid the issues we have seen now. 
We must be able to incorporate more information without breaking 
backward compatibility. To do so, we can add tags for each value.


For example, a distribute 2, replica 2 volume with 1 

Re: [Gluster-devel] upstream regression suite is broken

2017-07-07 Thread Krutika Dhananjay
The patch[1] that introduced tests/basic/stats-dump.t was merged in October
2015 and
my patch underwent (and passed too![2]) centos regression tests, including
stats-dump.t on 05 June, 2017.
The only change that the test script underwent during this time was this
line in 2016, which is harmless:

a4f84d78 (Kaleb S KEITHLEY 2016-03-15 06:16:31 -0400 17) EXPECT_WITHIN
$NFS_EXPORT_TIMEOUT "1" is_nfs_export_available

So there was NO change that went between 5th June and the time my patch was
merged, which could have broken the test suite, that could have been caught
easily with a mere rebase? Or am I missing something here? The problem is
simply that the test script didn't fail on my patch.



[1] - https://review.gluster.org/12209
[2] - https://build.gluster.org/job/centos6-regression/4897/consoleFull

-Krutika

On Fri, Jul 7, 2017 at 9:51 AM, Atin Mukherjee  wrote:

> Krutika,
>
> tests/basis/stats-dump.t is failing all the time and as per my initial
> analysis after https://review.gluster.org/#/c/17709/ got into the
> mainline the failures are seen and reverting this patch makes the test to
> run successfully. I do understand that the centos vote for this patch was
> green but the last run was on 5th June which was 1 month back. So some
> other changes have gone into in between which is now causing this patch to
> break the test.
>
> This makes me think as a maintainer we do need to ensure the if the
> regression vote on the patch is quite old, a rebase of the patch is must to
> be on the safer side?
>
> ~Atin
>
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] upstream regression suite is broken

2017-07-07 Thread Nigel Babu
Hello,

I've spoken to Atin, Ravi, and Amar and we agreed on a solution that I've just
completed.

1. I've landed a patch directly on master bypassing review. This patch will
   disabling the test that is currently failing. Now patches that are waiting
   to land in master can be landed safely while we get regressions to run on
   Ravi's patch
2. Ravi can rebase his patch on HEAD of master and re-enable the test along
   with his fix.

We've talked about this use-case in the past and how to best handle a broken
master. I'll work on a guide to handling this sort of incidents in the future.

On Fri, Jul 07, 2017 at 10:48:42AM +0530, Ravishankar N wrote:
> I've sent a fix @ https://review.gluster.org/#/c/17721
>
> On 07/07/2017 09:51 AM, Atin Mukherjee wrote:
> > Krutika,
> >
> > tests/basis/stats-dump.t is failing all the time and as per my initial
> > analysis after https://review.gluster.org/#/c/17709/ got into the
> > mainline the failures are seen and reverting this patch makes the test
> > to run successfully. I do understand that the centos vote for this patch
> > was green but the last run was on 5th June which was 1 month back. So
> > some other changes have gone into in between which is now causing this
> > patch to break the test.
> >
> > This makes me think as a maintainer we do need to ensure the if the
> > regression vote on the patch is quite old, a rebase of the patch is must
> > to be on the safer side?
> >

--
nigelb
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-devel