Re: [Gluster-devel] Rolling upgrades from glusterfs 3.4 to 3.5
On 06/12/2014 11:13 PM, Anand Avati wrote: On Thu, Jun 12, 2014 at 10:33 AM, Vijay Bellur vbel...@redhat.com mailto:vbel...@redhat.com wrote: On 06/12/2014 06:52 PM, Ravishankar N wrote: Hi Vijay, Since glusterfs 3.5, posix_lookup() sends ESTALE instead of ENOENT [1] when when a parent gfid (entry) is not present on the brick . In a replicate set up, this causes a problem because AFR gives more priority to ESTALE than ENOENT, causing IO to fail [2]. The fix is in progress at [3] and is client-side specific , and would make it to 3.5.2 But we will still hit the problem when rolling upgrade is performed from 3.4 to 3.5, unless the clients are also upgraded to 3.5: To elaborate an example: 0) Create a 1x2 volume using 2 nodes and mount it from client. All machines are glusterfs 3.4 1) Perform for i in {1..30}; do mkdir $i; tar xf glusterfs-3.5git.tar.gz -C $i done 2) While this is going on, kill one of the node in the replica pair and upgrade it to glusterfs 3.5 (simulating rolling upgrade) 3) After a while, kill all tar processes 4) Create a backup directory and move all 1..30 dirs inside 'backup' 5) Start the untar processes in 1) again 6) Bring up the upgraded node. Tar fails with estale errors. Essentially the errors occur because [3] is a client side fix. But rolling upgrades are targeted at servers while the older clients still need to access them without issues. A solution is to have a fix in the posix translator wherein the newer client passes it's version (3.5) to posix_lookup() which then sends ESTALE if version is 3.5 or newer but sends ENOENT instead if it is an older client. Does this seem okay? Cannot think of a better solution to this. Seamless rolling upgrades are necessary for us and the proposed fix does seem okay for that reason. Thanks, Vijay I also like Justin's proposal, of having fixes in 3.4.X and requiring clients to be at least 3.4.X in order to have rolling upgrade to 3.5.Y. This way we can add the special fix in 3.4.X client (just like the 3.5.2 client). Ravi's proposal works, but all LOOKUPs will have an extra xattr, and we will be carrying forward the compat code burden for a very long time. Whereas a 3.4.X client fix will remain in 3.4 branch. Thanks I have sent a fix for review (http://review.gluster.org/#/c/8080/) . The change is in the server side only. I reckon if we are asking users to upgrade clients to a 3.4.x which anyway involves app downtime, we might as well ask them to upgrade to 3.5. The fix is only sent on 3.5 - it does not need to go to master as I understand from Pranith that we only support compatibility between the current two releases. (meaning 3.6 servers require clients to be at at least 3.5 and not lower). Regards, Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories
On 2014-06-17 18:47, Anders Blomdell wrote: On 2014-06-17 17:49, Shyamsundar Ranganathan wrote: You maybe looking at the problem being fixed here, [1]. On a lookup attribute mismatch was not being healed across directories, and this patch attempts to address the same. Currently the version of the patch does not heal the S_ISUID and S_ISGID bits, which is work in progress (but easy enough to incorporate and test based on the patch at [1]). Thanks, will look into it tomorrow. On a separate note, add-brick just adds a brick to the cluster, the lookup is where the heal (or creation of the directory across all sub volumes in DHT xlator) is being done. Thanks for the clarification (I guess that a rebalance would trigger it as well?) Attached slightly modified version of patch [1] seems to work correctly after a rebalance that is allowed to run to completion on its own, if directories are traversed during rebalance, some 0 dirs show spurious 01777, 0 and sometimes ends up with the wrong permission. Continuing debug tomorrow... Shyam [1] http://review.gluster.org/#/c/6983/ - Original Message - From: Anders Blomdell anders.blomd...@control.lth.se To: Gluster Devel gluster-devel@gluster.org Sent: Tuesday, June 17, 2014 10:53:52 AM Subject: [Gluster-devel] 3.5.1-beta2 Problems with suid and sgid bits on directories With a glusterfs-3.5.1-0.3.beta2.fc20.x86_64 with a reverted 3dc56cbd16b1074d7ca1a4fe4c5bf44400eb63ff (due to local lack of IPv4 addresses), I get weird behavior if I: 1. Create a directory with suid/sgid/sticky bit set (/mnt/gluster/test) 2. Make a subdirectory of #1 (/mnt/gluster/test/dir1) 3. Do an add-brick Before add-brick 755 /mnt/gluster 7775 /mnt/gluster/test 2755 /mnt/gluster/test/dir1 After add-brick 755 /mnt/gluster 1775 /mnt/gluster/test 755 /mnt/gluster/test/dir1 On the server it looks like this: 7775 /data/disk1/gluster/test 2755 /data/disk1/gluster/test/dir1 1775 /data/disk2/gluster/test 755 /data/disk2/gluster/test/dir1 Filed as bug: https://bugzilla.redhat.com/show_bug.cgi?id=1110262 If somebody can point me to where the logic of add-brick is placed, I can give it a shot (a find/grep on mkdir didn't immediately point me to the right place). /Anders /Anders -- Anders Blomdell Email: anders.blomd...@control.lth.se Department of Automatic Control Lund University Phone:+46 46 222 4625 P.O. Box 118 Fax: +46 46 138118 SE-221 00 Lund, Sweden diff -urb glusterfs-3.5.1beta2/xlators/cluster/dht/src/dht-common.c glusterfs-3.5.1.orig/xlators/cluster/dht/src/dht-common.c --- glusterfs-3.5.1beta2/xlators/cluster/dht/src/dht-common.c 2014-06-10 18:55:22.0 +0200 +++ glusterfs-3.5.1.orig/xlators/cluster/dht/src/dht-common.c 2014-06-17 22:46:28.710636632 +0200 @@ -523,6 +523,28 @@ } int +permission_changed (ia_prot_t *local, ia_prot_t *stbuf) +{ +if( (local-owner.read != stbuf-owner.read) || +(local-owner.write != stbuf-owner.write) || +(local-owner.exec != stbuf-owner.exec) || +(local-group.read != stbuf-group.read) || +(local-group.write != stbuf-group.write) || +(local-group.exec != stbuf-group.exec) || +(local-other.read != stbuf-other.read) || +(local-other.write != stbuf-other.write) || +(local-other.exec != stbuf-other.exec ) || +(local-suid != stbuf-suid ) || +(local-sgid != stbuf-sgid ) || +(local-sticky != stbuf-sticky )) +{ +return 1; +} else { +return 0; +} +} + +int dht_revalidate_cbk (call_frame_t *frame, void *cookie, xlator_t *this, int op_ret, int op_errno, inode_t *inode, struct iatt *stbuf, dict_t *xattr, @@ -617,12 +639,16 @@ stbuf-ia_ctime_nsec)) { local-prebuf.ia_gid = stbuf-ia_gid; local-prebuf.ia_uid = stbuf-ia_uid; +local-prebuf.ia_prot = stbuf-ia_prot; } } if (local-stbuf.ia_type != IA_INVAL) { if ((local-stbuf.ia_gid != stbuf-ia_gid) || -(local-stbuf.ia_uid != stbuf-ia_uid)) { +(local-stbuf.ia_uid != stbuf-ia_uid) || +(permission_changed ((local-stbuf.ia_prot) +, ((stbuf-ia_prot) +{ local-need_selfheal = 1; } } @@ -669,6 +695,8 @@ uuid_copy (local-gfid, local-stbuf.ia_gfid); local-stbuf.ia_gid =
[Gluster-devel] GlusterFS 3.6 Feature Freeze date pushed back 2 weeks
Hi all, Just a small heads up. We're pushing back the GlusterFS Feature Freeze date by two weeks. This lets us focus on fixing bugs in 3.5 that have been reported recently, so people don't have to burn themselves out developing 3.6 features in a massive rush at the same time. ;) Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] tests and umount
On 06/16/2014 09:08 PM, Pranith Kumar Karampuri wrote: On 06/16/2014 09:00 PM, Jeff Darcy wrote: I see that most of the tests are doing umount and these may fail sometimes because of EBUSY etc. I am wondering if we should change all of them to umount -l. Let me know if you foresee any problems. I think I'd try umount -f first. Using -l too much can cause an accumulation of zombie mounts. When I'm hacking around on my own, I sometimes have to do umount -f twice but that's always sufficient. Cool, I will do some kind of EXPECT_WITHIN with umount -f may be 5 times just to be on the safer side. I submitted http://review.gluster.com/8104 for one of the tests as it is failing frequently. Will do the next round later. Pranith If no one has any objections I will send out a patch tomorrow for this. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel