[Gluster-devel] spurious regression failure in tests/bugs/quota/inode-quota.t
hi, http://build.gluster.org/job/rackspace-regression-2GB-triggered/8621/consoleFull failed regression. Could you please look into it Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
- Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release Notes draft for 3.7.0
Your welcome! Without the link this mail thread would remain incomplete in the archives forever :-) - Original Message - On 05/07/2015 03:02 PM, Krishnan Parthasarathi wrote: Could you provide the link to the etherpad containing the release notes draft? Thanks, missed it here. The link is posted at [1]. Posted it in #gluster-dev earlier. One more reason for our core developers to be hanging out there ;-). -Vijay [1] https://public.pad.fsfe.org/p/gluster-3.7-release-notes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failures in tests/basic/tier/tier.t
Dan/Joseph, Could you look into it please. [22:04:31] ./tests/basic/tier/tier.t .. not ok 25 Got 1 instead of 0 not ok 26 Got 1 instead of 0 Failed 2/34 subtests [22:04:31] Test Summary Report --- ./tests/basic/tier/tier.t (Wstat: 0 Tests: 34 Failed: 2) Failed tests: 25-26 Files=1, Tests=34, 72 wallclock secs ( 0.02 usr 0.00 sys + 1.68 cusr 0.81 csys = 2.51 CPU) Result: FAIL ./tests/basic/tier/tier.t: bad status 1 ./tests/basic/tier/tier.t: 1 new core files http://build.gluster.org/job/rackspace-regression-2GB-triggered/8588/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/bugs/glusterd/bug-974007.t
Nitya, Seems like rebalance is not completing in this test? Could you take a look. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8595/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release Notes draft for 3.7.0
Could you provide the link to the etherpad containing the release notes draft? - Original Message - Hi All, Humble, Raghavendra Bhat and me have worked on putting together a draft release notes for 3.7.0 at [1]. The release notes draft does need help with known issues and minor improvements. Can you please review the content update the etherpad with known issues/minor improvements from your components? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious failure in tests/basic/quota-nfs.t
hi Du, Please help with this one? * tests/basic/quota-nfs.t * Happens in: master * Being investigated by: ? * Tried to re-create it for more than an hour and it is not failing. * http://build.gluster.org/job/rackspace-regression-2GB-triggered/8625/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failures in tests/bugs/distribute/bug-1161156.t
Du, This seems like a quota issue as well. Could you look into this one. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8582/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious test failure in tests/basic/quota-anon-fd-nfs.t
hi, I compared the logs with a failure run and a successful run of the test based on the time stamps. Seems like it is not able to find the parent on which quota contribution is supposed to be updated as per the following logs: [2015-05-04 04:02:13.537672] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) [2015-05-04 04:02:14.904655] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) [2015-05-04 04:02:16.228797] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) Does that help? Seems like a bug in quota? This is the test that fails in the file: TEST ! $(dirname $0)/quota $N0/$deep/new_file_2 1048576 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote: - Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Cool. Could you please update the pad: https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info on this issue. Pranith Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding ./tests/bugs/replicate/bug-1015990.t
Sorry wrong test. Correct test is: tests/bugs/quota/bug-1035576.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8329/consoleFull) Pranith On 05/07/2015 01:53 PM, Pranith Kumar Karampuri wrote: Seems like the file $M0/a/f is not healed based on the execution log. Ravi, Could you please help. build log: http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Release Notes draft for 3.7.0
Hi All, Humble, Raghavendra Bhat and me have worked on putting together a draft release notes for 3.7.0 at [1]. The release notes draft does need help with known issues and minor improvements. Can you please review the content update the etherpad with known issues/minor improvements from your components? Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Release Notes draft for 3.7.0
On 05/07/2015 03:02 PM, Krishnan Parthasarathi wrote: Could you provide the link to the etherpad containing the release notes draft? Thanks, missed it here. The link is posted at [1]. Posted it in #gluster-dev earlier. One more reason for our core developers to be hanging out there ;-). -Vijay [1] https://public.pad.fsfe.org/p/gluster-3.7-release-notes ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
- Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: gluster-devel@gluster.org Sent: Thursday, May 7, 2015 10:20:40 PM Subject: [Gluster-devel] NetBSD regression status upate Here is NetBSD regression status update for remaining broken tests: Summary: split-brain-resolution.t is on track, quota-anon-fd-nfs.t still needs some work. basic/ec/ needs to be tested again. Details: - tests/basic/afr/split-brain-resolution.t Anuradha Talur is working on it, the change being still under review http://review.gluster.org/10134 This change is now merged in master. Yet to merge in release-3.7. A patch is sent : http://review.gluster.org/#/c/10624 . - tests/basic/cdc.t: fixed - tests/basic/ec/ This worked but with rare spurious faiures. They are the same as on Linux and work have been done, hence I think I should probably enable again, but it may have rotten a lot. I have to give it a try - tests/basic/quota-anon-fd-nfs.t A problem with kernel cache was reported. ( cd $N0 ; umount $N0 ) works it around, but the test still fails - tests/basic/mgmt_v3-locks.t: Fixed - tests/basic/tier/tier.t: Fixed, change awaits merge for release-3.7: http://review.gluster.org/10648 - tests/bugs Mostly uncharted terrirory, we will not work on it for release-3.7 - tests/geo-rep I started investigating with Kotresh Hiremath Ravishankar but it seems we are stalled now - tests/features/trash.t: fixed - tests/features/glupy.t: fixes await review/merge: http://review.gluster.org/10648 http://review.gluster.org/10616 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Thanks, Anuradha. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 4.0 ideas
On 05/07/2015 02:15 PM, Jeff Darcy wrote: * Centralized logging - The intention of the change/move from gf_log to gf_msg was to enable centralized logging mechanisms, among other things. In the discussions do consider needs and how this can fit into the same. Ref: http://www.gluster.org/community/documentation/index.php/Features/better-logging * Finer-grain version/feature negotiation between nodes. - Adding to this, one thought for DHT was to allow/disallow clients with older layouts, using something akin to a generation number than version/feature, and can allow client to reconfigure themselves to the latest graph/conf. Just posting this here, so that it may trigger thoughts at the summit. Additions: - I would like to add a framework for fault injection I know, I had bigger dreams on this in the past, but this time around something simpler. An extensible framework that we can add fault points to, and exercise in the regression tests, or other tests, triggering specific faults, or injecting specific waits. This can help test out a lot of the new (and older) code in various scenarios. For example, exercising FOPs between a rebalance phase 1 and rebalance phase 2, which requires a _wait/sleep_ in this state to be injected in the rebalance daemon. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] 4.0 ideas
Last week, those of us who were together in Bangalore had a meeting to discuss the GlusterFS 4.0 plan. Once we'd covered what's already in the plan[1] we had a very productive brainstorming session on what else we might want to consider adding. Here are some of my notes, in no particular order, for discussion either here on the list or in person at the upcoming Barcelona summit. * Traffic throttling Many internal services need this to keep from crowding out new user requests. * Centralized logging * Third-party copy (server to server, at client request) AIUI both SMB and NFS can make such requests, which we currently must satisfy by bouncing data through the proxy node. We could add it to GFAPI as well, for users there who also want to avoid the extra network traffic. * Better memory management (talloc, maybe even a real garbage collector) * Virtual nodes (DHT feature to improve rebalance behavior) * Hot-spare nodes/bricks * Better faiure detection Detecting failures via pairwise heartbeat (what we do now) doesn't work at scale. This might become part of the GlusterD v2 plan. * File level snapshots. * Finer-grain version/feature negotiation between nodes. * Better GFID-to-path translation * Retire NFSv3 * Make snapshots more modular (not solely dependent on LVM) * FTP or STFP (sshfs) client using GFAPI I've proposed this as a potential intern project. [1] http://www.gluster.org/community/documentation/index.php/Planning40 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD build broken
On 05/07/2015 11:49 PM, Emmanuel Dreyfus wrote: NetBSD build was broken here: http://review.gluster.org/10526/ CC dht-rebalance.lo dht-rebalance.c:22:25: fatal error: sys/sysinfo.h: No such file or directory #include sys/sysinfo.h And the change even went into release branch: http://review.gluster.org/10629 We have two problems here 1) the Jenkins bugs forced us to throw offline many NetBSD slave VM, and now are scarce on this resource and we have a huge backlog of regression tests to run, and many change get through without a single NetBSD regression run. We need to fix it, but how? Restoring the VM from image does not fix the problem as it is in jenkins config. 2) despite this the huge backlog, we should catch that kind of obvious problem using smoke test. All build failed: http://build.gluster.org/job/netbsd6-smoke/5991/ http://build.gluster.org/job/netbsd6-smoke/5968/ http://build.gluster.org/job/netbsd6-smoke/5921/ http://build.gluster.org/job/netbsd6-smoke/5829/ But that was never reported in gerrit. Why? Smoke tests are failing to report back after the gerrit upgrade. We have made some efforts to figure out why but have not found an easy/convincing answer. I suspect that the answer is hidden somewhere in this page [1]. -Vijay [1] https://wiki.jenkins-ci.org/display/JENKINS/Gerrit+Trigger ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 4.0 ideas
On 05/07/2015 02:15 PM, Jeff Darcy wrote: * Retire NFSv3 I believe the right way to express this is: retire the Gluster NFS (gnfs) server. (Ganesha does NFSv3, and will continue to do NFSv3, as well as 4, 4.1, 4.2, and pNFS.) Regards -- Kaleb ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD build broken
Emmanuel Dreyfus m...@netbsd.org wrote: NetBSD build was broken here: http://review.gluster.org/10526/ CC dht-rebalance.lo dht-rebalance.c:22:25: fatal error: sys/sysinfo.h: No such file or directory #include sys/sysinfo.h Here are the fixes, please merge soon: http://review.gluster.org/10652 (master) http://review.gluster.org/10653 (release-3.7) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 4.0 ideas
I believe the right way to express this is: retire the Gluster NFS (gnfs) server. (Ganesha does NFSv3, and will continue to do NFSv3, as well as 4, 4.1, 4.2, and pNFS.) Personally I'd like to go further and say that any features/omissions in NFSv3 (even Ganesha's) shouldn't be fixed at this point, but for the project I think you're correct. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] 4.0 ideas
On 05/07/2015 11:15 AM, Jeff Darcy wrote: Last week, those of us who were together in Bangalore had a meeting to discuss the GlusterFS 4.0 plan. Once we'd covered what's already in the plan[1] we had a very productive brainstorming session on what else we might want to consider adding. Here are some of my notes, in no particular order, for discussion either here on the list or in person at the upcoming Barcelona summit. * Traffic throttling Many internal services need this to keep from crowding out new user requests. +1 * Centralized logging -1 There's already 3rd party applications that do that. Is that worth the effort to duplicate something that's done very well already? * Third-party copy (server to server, at client request) AIUI both SMB and NFS can make such requests, which we currently must satisfy by bouncing data through the proxy node. We could add it to GFAPI as well, for users there who also want to avoid the extra network traffic. +1!!1!111!!1!! * Better memory management (talloc, maybe even a real garbage collector) +1 * Virtual nodes (DHT feature to improve rebalance behavior) * Hot-spare nodes/bricks -1 From what I've seen, most users are against spending money on hardware that's not being used. An auto-rebalance, or some such mechanism, to ensure redundancy after a failure would be much more welcome. * Better faiure detection Detecting failures via pairwise heartbeat (what we do now) doesn't work at scale. This might become part of the GlusterD v2 plan. * File level snapshots. * Finer-grain version/feature negotiation between nodes. Or at least one that doesn't require user intervention. Right now that mechanism sometimes fails and the user has to manually set the op-version. * Better GFID-to-path translation * Retire NFSv3 * Make snapshots more modular (not solely dependent on LVM) +1 * FTP or STFP (sshfs) client using GFAPI I've proposed this as a potential intern project. [1] http://www.gluster.org/community/documentation/index.php/Planning40 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] failures in tests/basic/tier/tier.t
Sure looking into it. - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Dan Lambright dlamb...@redhat.com, Joseph Fernandes josfe...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 7, 2015 3:24:29 PM Subject: failures in tests/basic/tier/tier.t Dan/Joseph, Could you look into it please. [22:04:31] ./tests/basic/tier/tier.t .. not ok 25 Got 1 instead of 0 not ok 26 Got 1 instead of 0 Failed 2/34 subtests [22:04:31] Test Summary Report --- ./tests/basic/tier/tier.t (Wstat: 0 Tests: 34 Failed: 2) Failed tests: 25-26 Files=1, Tests=34, 72 wallclock secs ( 0.02 usr 0.00 sys + 1.68 cusr 0.81 csys = 2.51 CPU) Result: FAIL ./tests/basic/tier/tier.t: bad status 1 ./tests/basic/tier/tier.t: 1 new core files http://build.gluster.org/job/rackspace-regression-2GB-triggered/8588/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] quota test failures
hi, It seems like the test failures in quota are happening because of feature bugs. Sachin/Du, Please feel free to update the status of the problems, what your recommendations are for the release, etc. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions
- Original Message - hi, I think we fixed quite a few heavy hitters in the past week and reasonable number of regression runs are passing which is a good sign. Most of the new heavy hitters in regression failures seem to be code problems in quota/afr/ec, not sure about tier.t (Need to get more info about arbiter.t, read-subvol.t etc). Do you guys have any ideas in keeping the regression failures under control? The deluge of regression failures is a direct consequence of last minute merges during (extended) feature freeze. We did well to contain this. Great stuff! If we want to avoid this we should not accept (large) feature merges just before feature freeze. Here are some of the things that I can think of: 0) Maintainers should also maintain tests that are in their component. It is not possible for me as glusterd co-maintainer to 'maintain' tests that are added under tests/bugs/glusterd. Most of them don't test core glusterd functionality. They are almost always tied to a particular feature whose implementation had bugs in its glusterd code. I would expect the test authors (esp. the more recent ones) to chip in. Thoughts/Suggestions? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
On 05/08/2015 09:58 AM, Emmanuel Dreyfus wrote: Pranith Kumar Karampuri pkara...@redhat.com wrote: I just sent a mail about the known issues we found in ec :-). We have fix for one submitted by Xavi but the other one will take a bit of time. These bugs were there in 3.6.0 as well, so they are not really regressions. Just that they are failing more often. I just submitted a NetBSD fix for ec-readdir.t There is also a problem in quota.t, whith the aux mount that does not get unmounted at volume stop, failing the last test. I found nothing relevant in the logs, where in the code should a volume stop trigger the aux mount unmount? Please look at glusterd_stop_volume () in xlators/mgmt/glusterd/src/glusterd-volume-ops.c ~Atin -- ~Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
Emmanuel Dreyfus m...@netbsd.org wrote: - tests/basic/ec/ This worked but with rare spurious faiures. They are the same as on Linux and work have been done, hence I think I should probably enable again, but it may have rotten a lot. I have to give it a try It is rather grim. NetBSD ec tests went from rare spurious failures a few weeks ago to complete reproductible failure (see below). Anyone interested looking at it? A lot of errors ae preceded by Input/Output error messages that suggest a common root. Test Summary Report --- ./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4) Failed tests: 133-134, 138-139 ./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6) Failed tests: 152-153, 157-158, 162-163 ./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8) Failed tests: 171-172, 176-177, 181-182, 186-187 ./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1) Failed test: 9 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 24 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 5) Failed tests: 184, 195, 206, 217, 228 Files=15, Tests=2711, 3306 wallclock secs -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] good job on fixing heavy hitters in spurious regressions
hi, I think we fixed quite a few heavy hitters in the past week and reasonable number of regression runs are passing which is a good sign. Most of the new heavy hitters in regression failures seem to be code problems in quota/afr/ec, not sure about tier.t (Need to get more info about arbiter.t, read-subvol.t etc). Do you guys have any ideas in keeping the regression failures under control? Here are some of the things that I can think of: 0) Maintainers should also maintain tests that are in their component. 1) If you guys see a spurious failure that is not seen before, please add it to https://public.pad.fsfe.org/p/gluster-spurious-failures and send a mail on gluster-devel with relevant info. CC component owner. 2) If the same test fails on different patches more than 'x' number of times we should do something drastic. Let us decide on 'x' and what the drastic measure is. 3) tests that fail with less amount of information should at least be fixed with adding more info to the test or improving logs in the code so that when it happens next time we have more information. Other option is to enable DEBUG logs, I am not a big fan of this because when users report problems also we should have just enough information to debug the problem, and users are not going to enable DEBUG logs. Some good things I found this time around compared to 3.6.0 release: 1) Failing the regression on first failure is helping locating the failure logs really fast 2) More people chipped in fixing the tests that are not at all their responsibility, which is always great to see. I think we should remove if it is a known bad test treat it as success code in some time and never add it again in future. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/08/2015 10:02 AM, Atin Mukherjee wrote: On 05/07/2015 03:00 PM, Krishnan Parthasarathi wrote: Atin would be doing this, since he is looking into it. HTH, KP - Original Message - On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote: - Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Cool. Could you please update the pad: https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info on this issue. glusterd did log the following failure when volume stop was executed: [2015-05-06 13:09:58.534114] I [socket.c:3358:socket_submit_request] 0-management: not connected (priv-connected = 0) [2015-05-06 13:09:58.534137] W [rpc-clnt.c:1566:rpc_clnt_submit] 0-management: failed to submit rpc-request (XID: 0x1 Program: brick operations, ProgVers: 2, Proc: 1) to rpc-transport (management) This indicates the underlying transport connection was broken and glusterd failed to send the rpc request to the brick. For this case, glusterd didn't populate errstr because of which in cmd_history.log volume stop was logged with a failure and a blank error message. I've sent patch [1] to populate errstr for this failure. Thanks Atin, please move this test to resolved section in the pad if not already. Pranith [1] http://review.gluster.org/10659 ~Atin Pranith Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/basic/afr/arbiter.t
hi Ravi, Could you look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8723/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
On 05/08/2015 07:45 AM, Emmanuel Dreyfus wrote: Emmanuel Dreyfus m...@netbsd.org wrote: - tests/basic/ec/ This worked but with rare spurious faiures. They are the same as on Linux and work have been done, hence I think I should probably enable again, but it may have rotten a lot. I have to give it a try It is rather grim. NetBSD ec tests went from rare spurious failures a few weeks ago to complete reproductible failure (see below). Anyone interested looking at it? A lot of errors ae preceded by Input/Output error messages that suggest a common root. I just sent a mail about the known issues we found in ec :-). We have fix for one submitted by Xavi but the other one will take a bit of time. These bugs were there in 3.6.0 as well, so they are not really regressions. Just that they are failing more often. Pranith Test Summary Report --- ./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4) Failed tests: 133-134, 138-139 ./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6) Failed tests: 152-153, 157-158, 162-163 ./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8) Failed tests: 171-172, 176-177, 181-182, 186-187 ./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1) Failed test: 9 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 24 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 5) Failed tests: 184, 195, 206, 217, 228 Files=15, Tests=2711, 3306 wallclock secs ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
Pranith Kumar Karampuri pkara...@redhat.com wrote: I just sent a mail about the known issues we found in ec :-). We have fix for one submitted by Xavi but the other one will take a bit of time. These bugs were there in 3.6.0 as well, so they are not really regressions. Just that they are failing more often. I just submitted a NetBSD fix for ec-readdir.t There is also a problem in quota.t, whith the aux mount that does not get unmounted at volume stop, failing the last test. I found nothing relevant in the logs, where in the code should a volume stop trigger the aux mount unmount? -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/basic/afr/read-subvol-entry.t
Ravi, Please look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8735/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/bugs/snapshot/bug-1166197.t
hi, Could you look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8734/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious ec failures
On 05/08/2015 09:46 AM, Emmanuel Dreyfus wrote: Pranith Kumar Karampuri pkara...@redhat.com wrote: 1) Fops failing with EIO when locks are failing with errno other than EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407 should fix it. 2) Fop failing with EIO because of race with lookup and version update code which leads to less than 'ec-fragments' number of bricks agreeing on the version of the file. We are still working on this issue On NetBSD, I have EIO because of this, does it falls into the second case? [2015-05-08 03:15:41.046889] W [socket.c:642:__socket_rwv] 0-patchy-client-1: readv on 23.253.160.60:49153 failed (No message available) [2015-05-08 03:15:41.047012] I [client.c:2086:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-05-08 03:15:41.095988] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.218894] W [ec-combine.c:811:ec_combine_check] 0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP' Yes versions are obtained in xdata. Pranith [2015-05-08 03:15:41.219466] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: ----0001/dir1: failed to resolve (Input/output error) [2015-05-08 03:15:41.219624] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.223435] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.227372] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.232227] W [ec-combine.c:811:ec_combine_check] 0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP' [2015-05-08 03:15:41.232770] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2123: LOOKUP() /dir1/small = -1 (Input/output error) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] A HowTo for setting up network encryption with GlusterFS
I don't think the post is correct style for a wiki (I may be influenced by Wikipedias guidelines) as is, but can be added after some changes. On Fri, May 8, 2015 at 2:57 AM, Justin Clift jus...@gluster.org wrote: On 7 May 2015, at 16:53, Kaushal M kshlms...@gmail.com wrote: Forgot the link. :D [1]: https://kshlm.in/network-encryption-in-glusterfs/ Any interest in copying that onto the main wiki? :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious ec failures
We have come to a point where the spurious failures in ec are because of bugs in code. There are two problems that need to be solved: 1) Fops failing with EIO when locks are failing with errno other than EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407 should fix it. 2) Fop failing with EIO because of race with lookup and version update code which leads to less than 'ec-fragments' number of bricks agreeing on the version of the file. We are still working on this issue Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/07/2015 03:00 PM, Krishnan Parthasarathi wrote: Atin would be doing this, since he is looking into it. HTH, KP - Original Message - On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote: - Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Cool. Could you please update the pad: https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info on this issue. glusterd did log the following failure when volume stop was executed: [2015-05-06 13:09:58.534114] I [socket.c:3358:socket_submit_request] 0-management: not connected (priv-connected = 0) [2015-05-06 13:09:58.534137] W [rpc-clnt.c:1566:rpc_clnt_submit] 0-management: failed to submit rpc-request (XID: 0x1 Program: brick operations, ProgVers: 2, Proc: 1) to rpc-transport (management) This indicates the underlying transport connection was broken and glusterd failed to send the rpc request to the brick. For this case, glusterd didn't populate errstr because of which in cmd_history.log volume stop was logged with a failure and a blank error message. I've sent patch [1] to populate errstr for this failure. [1] http://review.gluster.org/10659 ~Atin Pranith Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith -- ~Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD build broken
My apologies Emmanuel. We should have caught that. Regards, Nithya - Original Message - From: Emmanuel Dreyfus m...@netbsd.org To: gluster-devel@gluster.org Sent: Friday, May 8, 2015 12:11:23 AM Subject: Re: [Gluster-devel] NetBSD build broken Emmanuel Dreyfus m...@netbsd.org wrote: NetBSD build was broken here: http://review.gluster.org/10526/ CC dht-rebalance.lo dht-rebalance.c:22:25: fatal error: sys/sysinfo.h: No such file or directory #include sys/sysinfo.h Here are the fixes, please merge soon: http://review.gluster.org/10652 (master) http://review.gluster.org/10653 (release-3.7) -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Jupyter notebook support on GitHub
This could be really useful for us: https://github.com/blog/1995-github-jupyter-notebooks-3 GitHub now supports Jupyter notebooks directly. Similar to how Markdown (.md) files are displayed in their rendered format, Jupyter notebook (.ipynb) files are now too. Should make for better docs for us, as we can do intro stuff and other technical concept bits with graphics now instead of just ascii art. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] A HowTo for setting up network encryption with GlusterFS
Forgot the link. :D [1]: https://kshlm.in/network-encryption-in-glusterfs/ On Thu, May 7, 2015 at 8:51 PM, Kaushal M kshlms...@gmail.com wrote: I've written a how-to for setting up network encryption on GlusterFS at [1]. This was something that was requested as setting up network encryption is not really easy. I've tried to cover all possible cases. Please read through, and let me know of any changes,improvements needed. Thanks, Kaushal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Configuration Error during gerrit login
On 05/07/2015 04:35 AM, Aravinda wrote: I faced this issue today. In browser address bar I changed login to logout. Then I logged in again. It worked. Thanks! this seems to be a handy workaround :). -Vijay -- regards Aravinda On 05/01/2015 12:31 AM, Vijay Bellur wrote: Ran into Configuration Error several times today. The error message states: The HTTP server did not provide the username in the GITHUB_USERheader when it forwarded the request to Gerrit Code Review... Switching browsers was useful for me to overcome the problem. Annoying for sure, but we seem to have a workaround :). HTH, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] A HowTo for setting up network encryption with GlusterFS
I've written a how-to for setting up network encryption on GlusterFS at [1]. This was something that was requested as setting up network encryption is not really easy. I've tried to cover all possible cases. Please read through, and let me know of any changes,improvements needed. Thanks, Kaushal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] NetBSD regression status upate
Here is NetBSD regression status update for remaining broken tests: Summary: split-brain-resolution.t is on track, quota-anon-fd-nfs.t still needs some work. basic/ec/ needs to be tested again. Details: - tests/basic/afr/split-brain-resolution.t Anuradha Talur is working on it, the change being still under review http://review.gluster.org/10134 - tests/basic/cdc.t: fixed - tests/basic/ec/ This worked but with rare spurious faiures. They are the same as on Linux and work have been done, hence I think I should probably enable again, but it may have rotten a lot. I have to give it a try - tests/basic/quota-anon-fd-nfs.t A problem with kernel cache was reported. ( cd $N0 ; umount $N0 ) works it around, but the test still fails - tests/basic/mgmt_v3-locks.t: Fixed - tests/basic/tier/tier.t: Fixed, change awaits merge for release-3.7: http://review.gluster.org/10648 - tests/bugs Mostly uncharted terrirory, we will not work on it for release-3.7 - tests/geo-rep I started investigating with Kotresh Hiremath Ravishankar but it seems we are stalled now - tests/features/trash.t: fixed - tests/features/glupy.t: fixes await review/merge: http://review.gluster.org/10648 http://review.gluster.org/10616 -- Emmanuel Dreyfus http://hcpnet.free.fr/pubz m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure in tests/basic/quota-nfs.t
On 05/07/2015 03:16 PM, Pranith Kumar Karampuri wrote: hi Du, Please help with this one? * tests/basic/quota-nfs.t * Happens in: master * Being investigated by: ? * Tried to re-create it for more than an hour and it is not failing. * http://build.gluster.org/job/rackspace-regression-2GB-triggered/8625/consoleFull Failed again: http://build.gluster.org/job/rackspace-regression-2GB-triggered/8722/consoleFull -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel