Re: [Gluster-devel] Spurious failures because of nfs and snapshots
On 05/21/2014 08:50 PM, Vijaikumar M wrote: KP, Atin and myself did some debugging and found that there was a deadlock in glusterd. When creating a volume snapshot, the back-end operation 'taking a lvm_snapshot and starting brick' for the each brick are executed in parallel using synctask framework. brick_start was releasing a big_lock with brick_connect and does a lock again. This caused a deadlock in some race condition where main-thread waiting for one of the synctask thread to finish and synctask-thread waiting for the big_lock. We are working on fixing this issue. If this fix is going to take more time, can we please log a bug to track this problem and remove the test cases that need to be addressed from the test unit? This way other valid patches will not be blocked by the failure of the snapshot test unit. We can introduce these tests again as part of the fix for the problem. -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
It should be possible. I'll check and do the change. ~kaushal On Thu, May 22, 2014 at 8:14 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 6:23:16 AM Subject: Re: [Gluster-devel] Regression testing results for master branch - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 11:01:36 PM Subject: Re: [Gluster-devel] Regression testing results for master branch On 21/05/2014, at 6:17 PM, Justin Clift wrote: Hi all, Kicked off 21 VM's in Rackspace earlier today, running the regression tests against master branch. Only 3 VM's failed out of the 21 (86% PASS, 14% FAIL), with all three being for the same test: Test Summary Report --- ./tests/bugs/bug-948686.t (Wstat: 0 Tests: 20 Failed: 2) Failed tests: 13-14 Files=230, Tests=4373, 5601 wallclock secs ( 2.09 usr 1.58 sys + 1012.66 cusr 688.80 csys = 1705.13 CPU) Result: FAIL Interestingly, this one looks like a simple time based thing too. The failed tests are the ones after the sleep: ... #modify volume config to see change in volume-sync TEST $CLI_1 volume set $V0 write-behind off #add some files to the volume to see effect of volume-heal cmd TEST touch $M0/{1..100}; TEST $CLI_1 volume stop $V0; TEST $glusterd_3; sleep 3; TEST $CLI_3 volume start $V0; TEST $CLI_2 volume stop $V0; TEST $CLI_2 volume delete $V0; Do you already have this one on your radar? It wasn't, thanks for bringing it on my radar :-). Sent http://review.gluster.org/7837 to address this. Kaushal, I made this fix based on the assumption that the script seems to be waiting for all glusterds to be online. I could not check the logs because glusterds spawned by cluster.rc seem to be storing the logs not in the default location. Do you think we can make changes to the script so that we can get logs from glusterds spawned by cluster.rc as well? Pranith Pranith + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
The glusterds spawned using cluster.rc store their logs at /d/backends/N/glusterd.log . But the cleanup() function cleans /d/backends/, so those logs are lost before we can archive. cluster.rc should be fixed to use a better location for the logs. ~kaushal On Thu, May 22, 2014 at 11:45 AM, Kaushal M kshlms...@gmail.com wrote: It should be possible. I'll check and do the change. ~kaushal On Thu, May 22, 2014 at 8:14 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 6:23:16 AM Subject: Re: [Gluster-devel] Regression testing results for master branch - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Wednesday, May 21, 2014 11:01:36 PM Subject: Re: [Gluster-devel] Regression testing results for master branch On 21/05/2014, at 6:17 PM, Justin Clift wrote: Hi all, Kicked off 21 VM's in Rackspace earlier today, running the regression tests against master branch. Only 3 VM's failed out of the 21 (86% PASS, 14% FAIL), with all three being for the same test: Test Summary Report --- ./tests/bugs/bug-948686.t (Wstat: 0 Tests: 20 Failed: 2) Failed tests: 13-14 Files=230, Tests=4373, 5601 wallclock secs ( 2.09 usr 1.58 sys + 1012.66 cusr 688.80 csys = 1705.13 CPU) Result: FAIL Interestingly, this one looks like a simple time based thing too. The failed tests are the ones after the sleep: ... #modify volume config to see change in volume-sync TEST $CLI_1 volume set $V0 write-behind off #add some files to the volume to see effect of volume-heal cmd TEST touch $M0/{1..100}; TEST $CLI_1 volume stop $V0; TEST $glusterd_3; sleep 3; TEST $CLI_3 volume start $V0; TEST $CLI_2 volume stop $V0; TEST $CLI_2 volume delete $V0; Do you already have this one on your radar? It wasn't, thanks for bringing it on my radar :-). Sent http://review.gluster.org/7837 to address this. Kaushal, I made this fix based on the assumption that the script seems to be waiting for all glusterds to be online. I could not check the logs because glusterds spawned by cluster.rc seem to be storing the logs not in the default location. Do you think we can make changes to the script so that we can get logs from glusterds spawned by cluster.rc as well? Pranith Pranith + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression testing results for master branch
somepath/glusterd-backend%N.log maybe? On 22/05/2014, at 8:03 AM, Kaushal M wrote: The glusterds spawned using cluster.rc store their logs at /d/backends/N/glusterd.log . But the cleanup() function cleans /d/backends/, so those logs are lost before we can archive. cluster.rc should be fixed to use a better location for the logs. ~kaushal -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - From: Kaushal M kshlms...@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Justin Clift jus...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 4:40:25 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 3:54:46 PM Subject: bug-857330/normal.t failure Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in DEBUG mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND=volume rebalance $V0 status PATTERN=completed EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Guidelines for Maintainers
[Adding the right alias for gluster-devel this time around] On 05/22/2014 05:29 PM, Vijay Bellur wrote: Hi All, Given the addition of new sub-maintainers release maintainers to the community [1], I have felt the need to publish a set of guidelines for all categories of maintainers to have a non-ambiguous operational state. A first cut of one such document can be found at [2]. I would love to hear your thoughts and feedback to make the proposal very clear to everybody. We can convert this draft to a real set of guidelines once there is consensus. Cheers, Vijay [1] http://thread.gmane.org/gmane.comp.file-systems.gluster.devel/6249 [2] http://www.gluster.org/community/documentation/index.php/Guidelines_For_Maintainers ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
Thanks Justin, I found the problem. The VM can be deleted now. Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail. The particular test that failed is waiting for rebalance to finish. It does this by doing a 'gluster volume rebalance status' command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens. For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing. I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the 'rebalance status' command if we directly checked the rebalance process state using its pid etc. I don't particularly approve of this approach, as I think I used the 'rebalance status' command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn't mind changing the test to avoid rebalance status. ~kaushal On Thu, May 22, 2014 at 5:22 PM, Justin Clift jus...@gluster.org wrote: On 22/05/2014, at 12:32 PM, Kaushal M wrote: I haven't yet. But I will. Justin, Can I get take a peek inside the vm? Sure. IP: 23.253.57.20 User: root Password: foobar123 The stdout log from the regression test is in /tmp/regression.log. The GlusterFS git repo is in /root/glusterfs. Um, you should be able to find everything else pretty easily. Btw, this is just a temp VM, so feel free to do anything you want with it. When you're finished with it let me know so I can delete it. :) + Justin ~kaushal On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - From: Kaushal M kshlms...@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Justin Clift jus...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 4:40:25 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 3:54:46 PM Subject: bug-857330/normal.t failure Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in DEBUG mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t(Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND=volume rebalance $V0 status PATTERN=completed EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list
Re: [Gluster-devel] Changes needing review before a glusterfs-3.5.1 Beta is released
On Wed, May 21, 2014 at 06:40:57PM +0200, Niels de Vos wrote: A lot of work has been done on getting blockers resolved for the next 3.5 release. We're not there yet, but we're definitely getting close to releasing a 1st beta. Humble will follow-up with an email related to the documentation that is still missing for features introduced with 3.5. We will not hold back on the Beta if the documentation is STILL incomplete, but it is seen as a major blocker for the final 3.5.1 release. The following list is based on the bugs that have been requested as blockers¹: * 1089054 gf-error-codes.h is missing from source tarball Depends on 1038391 for getting the changes reviewed and included in the master branch firsh: - http://review.gluster.org/7714 - http://review.gluster.org/7786 These have been reviewed merged in the master branch. Backports have been posted for review: - http://review.gluster.org/7850 - http://review.gluster.org/7851 * 1096425 i/o error when one user tries to access RHS volume over NFS with 100+ Patches for 3.5 posted for review: - http://review.gluster.org/7829 - http://review.gluster.org/7830 Review of the backport http://review.gluster.org/7830 is still pending. * 1099878 Need support for handle based Ops to fetch/modify extended attributes of a file Patch for 3.5 posted for review: - http://review.gluster.org/7825 Got reviewed and merged! New addition, confirmed yesterday: * 1081016 glusterd needs xfsprogs and e2fsprogs packages (don't leave zombies if required programs aren't installed) Needs review+merging in master: http://review.gluster.org/7361 After approval for master, a backport for release-3.5 can be sent. Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Gluster driver for Archipelago - Development process
On 05/22/2014 02:10 AM, Alex Pyrgiotis wrote: On 02/17/2014 06:22 PM, Vijay Bellur wrote: On 02/17/2014 05:11 PM, Alex Pyrgiotis wrote: On 02/10/2014 07:06 PM, Vijay Bellur wrote: On 02/05/2014 04:10 PM, Alex Pyrgiotis wrote: Hi all, Just wondering, do we have any news on that? Hi Alex, I have started some work on this. The progress has been rather slow owing to 3.5 release cycle amongst other things. I intend to propose this as a feature for 3.6 and will keep you posted as we have something more to get you going. Hi Vijay, That sounds good. I suppose that if it gets included in 3.6, we will see it in this page [1], right? Hi Alex, Yes, that is correct. Thanks, Vijay Hi Vijay, On the planning page for 3.6 [1], I see that Archipelago is included (great!) and that the feature freeze was due to 21st of May. So, do we have any news on which features will get included on 3.6, as well as more info about the Archipelago integration? Yes, the gfapi and related changes needed by Archipelago are planned for inclusion in 3.6. The feature freeze was moved by a month after a discussion in yesterday's GlusterFS community meeting. I will ping you back once we have something tangible to get started with integration testing. Cheers, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode-i_mode)) I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate with each other by getting (and setting?) invalid xattrs; the posix xlator has logic to filter out invalid xattrs, but due to bugs this hasn't always worked perfectly. It would be interesting to know which xattrs are getting errors and on which fs types. FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs there are only six places where EOPNOTSUPP is returned, none of them related to xattrs. In ext[34] EOPNOTSUPP can be returned if the user_xattr option is not enabled (enabled by default in ext4.) And in the higher level vfs xattr code there are many places where EOPNOTSUPP _might_ be returned, primarily only if subordinate function calls aren't invoked which would clear the default or return a different error. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr
http://review.gluster.com/#/c/7823/ - the fix here On Thu, May 22, 2014 at 1:41 PM, Harshavardhana har...@harshavardhana.net wrote: Here are the important locations in the XFS tree coming from 2.6.32 branch STATIC int xfs_set_acl(struct inode *inode, int type, struct posix_acl *acl) { struct xfs_inode *ip = XFS_I(inode); unsigned char *ea_name; int error; if (S_ISLNK(inode-i_mode)) I would generally think this is the issue. return -EOPNOTSUPP; STATIC long xfs_vn_fallocate( struct inode*inode, int mode, loff_t offset, loff_t len) { longerror; loff_t new_size = 0; xfs_flock64_t bf; xfs_inode_t *ip = XFS_I(inode); int cmd = XFS_IOC_RESVSP; int attr_flags = XFS_ATTR_NOLOCK; if (mode ~(FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE)) return -EOPNOTSUPP; STATIC int xfs_ioc_setxflags( xfs_inode_t *ip, struct file *filp, void__user *arg) { struct fsxattr fa; unsigned intflags; unsigned intmask; int error; if (copy_from_user(flags, arg, sizeof(flags))) return -EFAULT; if (flags ~(FS_IMMUTABLE_FL | FS_APPEND_FL | \ FS_NOATIME_FL | FS_NODUMP_FL | \ FS_SYNC_FL)) return -EOPNOTSUPP; Perhaps some sort of system level acl's are being propagated by us over symlinks() ? - perhaps this is the related to the same issue of following symlinks? On Sun, May 18, 2014 at 10:48 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Sent the following patch to remove the special treatment of ENOTSUP here: http://review.gluster.org/7788 Pranith - Original Message - From: Kaleb KEITHLEY kkeit...@redhat.com To: gluster-devel@gluster.org Sent: Tuesday, May 13, 2014 8:01:53 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr On 05/13/2014 08:00 AM, Nagaprasad Sathyanarayana wrote: On 05/07/2014 03:44 PM, Pranith Kumar Karampuri wrote: - Original Message - From: Raghavendra Gowdappa rgowd...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Vijay Bellur vbel...@redhat.com, gluster-devel@gluster.org, Anand Avati aav...@redhat.com Sent: Wednesday, May 7, 2014 3:42:16 PM Subject: Re: [Gluster-devel] regarding special treatment of ENOTSUP for setxattr I think with repetitive log message suppression patch being merged, we don't really need gf_log_occasionally (except if they are logged in DEBUG or TRACE levels). That definitely helps. But still, setxattr calls are not supposed to fail with ENOTSUP on FS where we support gluster. If there are special keys which fail with ENOTSUPP, we can conditionally log setxattr failures only when the key is something new? I know this is about EOPNOTSUPP (a.k.a. ENOTSUPP) returned by setxattr(2) for legitimate attrs. But I can't help but wondering if this isn't related to other bugs we've had with, e.g., lgetxattr(2) called on invalid xattrs? E.g. see https://bugzilla.redhat.com/show_bug.cgi?id=765202. We have a hack where xlators communicate with each other by getting (and setting?) invalid xattrs; the posix xlator has logic to filter out invalid xattrs, but due to bugs this hasn't always worked perfectly. It would be interesting to know which xattrs are getting errors and on which fs types. FWIW, in a quick perusal of a fairly recent (3.14.3) kernel, in xfs there are only six places where EOPNOTSUPP is returned, none of them related to xattrs. In ext[34] EOPNOTSUPP can be returned if the user_xattr option is not enabled (enabled by default in ext4.) And in the higher level vfs xattr code there are many places where EOPNOTSUPP _might_ be returned, primarily only if subordinate function calls aren't invoked which would clear the default or return a different error. -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes -- Religious confuse piety with mere ritual, the virtuous confuse regulation with outcomes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] bug-857330/normal.t failure
- Original Message - From: Kaushal M kshlms...@gmail.com To: Justin Clift jus...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 6:04:29 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure Thanks Justin, I found the problem. The VM can be deleted now. Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail. The particular test that failed is waiting for rebalance to finish. It does this by doing a 'gluster volume rebalance status' command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens. For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing. Do you think we can print the status of the process as 'not-responding' when such a thing happens, instead of failing the command? Pranith I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the 'rebalance status' command if we directly checked the rebalance process state using its pid etc. I don't particularly approve of this approach, as I think I used the 'rebalance status' command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn't mind changing the test to avoid rebalance status. ~kaushal On Thu, May 22, 2014 at 5:22 PM, Justin Clift jus...@gluster.org wrote: On 22/05/2014, at 12:32 PM, Kaushal M wrote: I haven't yet. But I will. Justin, Can I get take a peek inside the vm? Sure. IP: 23.253.57.20 User: root Password: foobar123 The stdout log from the regression test is in /tmp/regression.log. The GlusterFS git repo is in /root/glusterfs. Um, you should be able to find everything else pretty easily. Btw, this is just a temp VM, so feel free to do anything you want with it. When you're finished with it let me know so I can delete it. :) + Justin ~kaushal On Thu, May 22, 2014 at 4:53 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal, Rebalance status command seems to be failing sometimes. I sent a mail about such spurious failure earlier today. Did you get a chance to look at the logs and confirm that rebalance didn't fail and it is indeed a timeout? Pranith - Original Message - From: Kaushal M kshlms...@gmail.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Justin Clift jus...@gluster.org , Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 4:40:25 PM Subject: Re: [Gluster-devel] bug-857330/normal.t failure The test is waiting for rebalance to finish. This is a rebalance with some actual data so it could have taken a long time to finish. I did set a pretty high timeout, but it seems like it's not enough for the new VMs. Possible options are, - Increase this timeout further - Reduce the amount of data. Currently this is 100 directories with 10 files each of size between 10-500KB ~kaushal On Thu, May 22, 2014 at 3:59 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: Kaushal has more context about these CCed. Keep the setup until he responds so that he can take a look. Pranith - Original Message - From: Justin Clift jus...@gluster.org To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 22, 2014 3:54:46 PM Subject: bug-857330/normal.t failure Hi Pranith, Ran a few VM's with your Gerrit CR 7835 applied, and in DEBUG mode (I think). One of the VM's had a failure in bug-857330/normal.t: Test Summary Report --- ./tests/basic/rpm.t (Wstat: 0 Tests: 0 Failed: 0) Parse errors: Bad plan. You planned 8 tests but ran 0. ./tests/bugs/bug-857330/normal.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 13 Files=230, Tests=4369, 5407 wallclock secs ( 2.13 usr 1.73 sys + 941.82 cusr 645.54 csys = 1591.22 CPU) Result: FAIL Seems to be this test: COMMAND=volume rebalance $V0 status PATTERN=completed EXPECT_WITHIN 300 $PATTERN get-task-status Is this one on your radar already? Btw, this VM is still online. Can give you access to retrieve logs if useful. + Justin -- Open Source and Standards @ Red Hat
Re: [Gluster-devel] bug-857330/normal.t failure
- Original Message - On 22/05/2014, at 1:34 PM, Kaushal M wrote: Thanks Justin, I found the problem. The VM can be deleted now. Done. :) Turns out, there was more than enough time for the rebalance to complete. But we hit a race, which caused a command to fail. The particular test that failed is waiting for rebalance to finish. It does this by doing a 'gluster volume rebalance status' command and checking the result. The EXPECT_WITHIN function runs this command till we have a match, the command fails or the timeout happens. For a rebalance status command, glusterd sends a request to the rebalance process (as a brick_op) to get the latest stats. It had done the same in this case as well. But while glusterd was waiting for the reply, the rebalance completed and the process stopped itself. This caused the rpc connection between glusterd and rebalance proc to close. This caused the all pending requests to be unwound as failures. Which in turnlead to the command failing. I cannot think of a way to avoid this race from within glusterd. For this particular test, we could avoid using the 'rebalance status' command if we directly checked the rebalance process state using its pid etc. I don't particularly approve of this approach, as I think I used the 'rebalance status' command for a reason. But I currently cannot recall the reason, and if cannot come with it soon, I wouldn't mind changing the test to avoid rebalance status. I think its the rebalance daemon's life cycle which is problematic. It makes it inconvenient, if not impossible, for glusterd to gather progress/status deterministically. The rebalance process could wait for the rebalance-commit subcommand to terminate. There is no other daemon, managed by glusterd, has this kind of life cycle. I don't see any good reason why rebalance should kill itself on completion of data migration. Thoughts? ~Krish Hmmm, is it the kind of thing where the rebalance status command should retry, if it's connection gets closed by a just-completed- rebalance (as happened here)? Or would that not work as well? + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel