[Gluster-devel] failures in tests/basic/tier/tier.t
Dan/Joseph, Could you look into it please. [22:04:31] ./tests/basic/tier/tier.t .. not ok 25 Got 1 instead of 0 not ok 26 Got 1 instead of 0 Failed 2/34 subtests [22:04:31] Test Summary Report --- ./tests/basic/tier/tier.t (Wstat: 0 Tests: 34 Failed: 2) Failed tests: 25-26 Files=1, Tests=34, 72 wallclock secs ( 0.02 usr 0.00 sys + 1.68 cusr 0.81 csys = 2.51 CPU) Result: FAIL ./tests/basic/tier/tier.t: bad status 1 ./tests/basic/tier/tier.t: 1 new core files http://build.gluster.org/job/rackspace-regression-2GB-triggered/8588/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/bugs/glusterd/bug-974007.t
Nitya, Seems like rebalance is not completing in this test? Could you take a look. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8595/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious failure in tests/basic/quota-nfs.t
hi Du, Please help with this one? * tests/basic/quota-nfs.t * Happens in: master * Being investigated by: ? * Tried to re-create it for more than an hour and it is not failing. * http://build.gluster.org/job/rackspace-regression-2GB-triggered/8625/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failures in tests/bugs/distribute/bug-1161156.t
Du, This seems like a quota issue as well. Could you look into this one. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8582/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious test failure in tests/basic/quota-anon-fd-nfs.t
hi, I compared the logs with a failure run and a successful run of the test based on the time stamps. Seems like it is not able to find the parent on which quota contribution is supposed to be updated as per the following logs: [2015-05-04 04:02:13.537672] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) [2015-05-04 04:02:14.904655] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) [2015-05-04 04:02:16.228797] E [marker-quota.c:2870:mq_start_quota_txn_v2] 0-patchy-marker: contribution node list is empty (c22c0d82-2027-46b3-8bd6-278df1b39774) Does that help? Seems like a bug in quota? This is the test that fails in the file: TEST ! $(dirname $0)/quota $N0/$deep/new_file_2 1048576 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote: - Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Cool. Could you please update the pad: https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info on this issue. Pranith Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regarding ./tests/bugs/replicate/bug-1015990.t
Sorry wrong test. Correct test is: tests/bugs/quota/bug-1035576.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8329/consoleFull) Pranith On 05/07/2015 01:53 PM, Pranith Kumar Karampuri wrote: Seems like the file $M0/a/f is not healed based on the execution log. Ravi, Could you please help. build log: http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] quota test failures
hi, It seems like the test failures in quota are happening because of feature bugs. Sachin/Du, Please feel free to update the status of the problems, what your recommendations are for the release, etc. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] good job on fixing heavy hitters in spurious regressions
hi, I think we fixed quite a few heavy hitters in the past week and reasonable number of regression runs are passing which is a good sign. Most of the new heavy hitters in regression failures seem to be code problems in quota/afr/ec, not sure about tier.t (Need to get more info about arbiter.t, read-subvol.t etc). Do you guys have any ideas in keeping the regression failures under control? Here are some of the things that I can think of: 0) Maintainers should also maintain tests that are in their component. 1) If you guys see a spurious failure that is not seen before, please add it to https://public.pad.fsfe.org/p/gluster-spurious-failures and send a mail on gluster-devel with relevant info. CC component owner. 2) If the same test fails on different patches more than 'x' number of times we should do something drastic. Let us decide on 'x' and what the drastic measure is. 3) tests that fail with less amount of information should at least be fixed with adding more info to the test or improving logs in the code so that when it happens next time we have more information. Other option is to enable DEBUG logs, I am not a big fan of this because when users report problems also we should have just enough information to debug the problem, and users are not going to enable DEBUG logs. Some good things I found this time around compared to 3.6.0 release: 1) Failing the regression on first failure is helping locating the failure logs really fast 2) More people chipped in fixing the tests that are not at all their responsibility, which is always great to see. I think we should remove if it is a known bad test treat it as success code in some time and never add it again in future. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious test failure in tests/bugs/replicate/bug-1015990.t
On 05/08/2015 10:02 AM, Atin Mukherjee wrote: On 05/07/2015 03:00 PM, Krishnan Parthasarathi wrote: Atin would be doing this, since he is looking into it. HTH, KP - Original Message - On 05/07/2015 02:53 PM, Krishnan Parthasarathi wrote: - Original Message - On 05/07/2015 02:41 PM, Krishnan Parthasarathi wrote: Pranith, The above snippet says that the volume has to be stopped before deleted. It also says that volume-stop failed. I would look into glusterd logs to see why volume-stop failed, cmd-history.log tells us only so much. http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull has the logs. I didn't find much information. Please feel free to take a look. What can we add to the code so that this failure can be debugged better in future? Please at least add that much for now? Atin is already looking into this. Without the root cause, it's not useful to speculate how we could help debugging this. As we root cause, I am sure we will find things that we could have logged to reduce time to root cause. Does that make sense? Cool. Could you please update the pad: https://public.pad.fsfe.org/p/gluster-spurious-failures with latest info on this issue. glusterd did log the following failure when volume stop was executed: [2015-05-06 13:09:58.534114] I [socket.c:3358:socket_submit_request] 0-management: not connected (priv-connected = 0) [2015-05-06 13:09:58.534137] W [rpc-clnt.c:1566:rpc_clnt_submit] 0-management: failed to submit rpc-request (XID: 0x1 Program: brick operations, ProgVers: 2, Proc: 1) to rpc-transport (management) This indicates the underlying transport connection was broken and glusterd failed to send the rpc request to the brick. For this case, glusterd didn't populate errstr because of which in cmd_history.log volume stop was logged with a failure and a blank error message. I've sent patch [1] to populate errstr for this failure. Thanks Atin, please move this test to resolved section in the pad if not already. Pranith [1] http://review.gluster.org/10659 ~Atin Pranith Pranith HTH, KP - Original Message - hi, Volume delete is failing without logging much about why it is failing. Know anything about this? (http://build.gluster.org/job/rackspace-regression-2GB-triggered/8522/consoleFull) 1 [2015-05-06 13:09:58.311519] : volume heal patchy statistics heal-count : SUCCESS 0 [2015-05-06 13:09:58.534917] : volume stop patchy : FAILED : 1 [2015-05-06 13:09:58.904333] : volume delete patchy : FAILED : Volume patchy has been started.Volume needs to be stopped before deletion. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/basic/afr/arbiter.t
hi Ravi, Could you look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8723/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression status upate
On 05/08/2015 07:45 AM, Emmanuel Dreyfus wrote: Emmanuel Dreyfus m...@netbsd.org wrote: - tests/basic/ec/ This worked but with rare spurious faiures. They are the same as on Linux and work have been done, hence I think I should probably enable again, but it may have rotten a lot. I have to give it a try It is rather grim. NetBSD ec tests went from rare spurious failures a few weeks ago to complete reproductible failure (see below). Anyone interested looking at it? A lot of errors ae preceded by Input/Output error messages that suggest a common root. I just sent a mail about the known issues we found in ec :-). We have fix for one submitted by Xavi but the other one will take a bit of time. These bugs were there in 3.6.0 as well, so they are not really regressions. Just that they are failing more often. Pranith Test Summary Report --- ./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4) Failed tests: 133-134, 138-139 ./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6) Failed tests: 152-153, 157-158, 162-163 ./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8) Failed tests: 171-172, 176-177, 181-182, 186-187 ./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1) Failed test: 9 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 24 ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 5) Failed tests: 184, 195, 206, 217, 228 Files=15, Tests=2711, 3306 wallclock secs ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/basic/afr/read-subvol-entry.t
Ravi, Please look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8735/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] failure in tests/bugs/snapshot/bug-1166197.t
hi, Could you look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/8734/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious ec failures
On 05/08/2015 09:46 AM, Emmanuel Dreyfus wrote: Pranith Kumar Karampuri pkara...@redhat.com wrote: 1) Fops failing with EIO when locks are failing with errno other than EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407 should fix it. 2) Fop failing with EIO because of race with lookup and version update code which leads to less than 'ec-fragments' number of bricks agreeing on the version of the file. We are still working on this issue On NetBSD, I have EIO because of this, does it falls into the second case? [2015-05-08 03:15:41.046889] W [socket.c:642:__socket_rwv] 0-patchy-client-1: readv on 23.253.160.60:49153 failed (No message available) [2015-05-08 03:15:41.047012] I [client.c:2086:client_rpc_notify] 0-patchy-client-1: disconnected from patchy-client-1. Client process will keep trying to connect to glusterd until brick's port is available [2015-05-08 03:15:41.095988] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.218894] W [ec-combine.c:811:ec_combine_check] 0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP' Yes versions are obtained in xdata. Pranith [2015-05-08 03:15:41.219466] W [fuse-resolve.c:67:fuse_resolve_entry_cbk] 0-fuse: ----0001/dir1: failed to resolve (Input/output error) [2015-05-08 03:15:41.219624] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.223435] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.227372] W [ec-common.c:412:ec_child_select] 0-patchy-disperse-0: Executing operation with some subvolumes unavailable (2) [2015-05-08 03:15:41.232227] W [ec-combine.c:811:ec_combine_check] 0-patchy-disperse-0: Mismatching xdata in answers of 'LOOKUP' [2015-05-08 03:15:41.232770] W [fuse-bridge.c:484:fuse_entry_cbk] 0-glusterfs-fuse: 2123: LOOKUP() /dir1/small = -1 (Input/output error) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious ec failures
We have come to a point where the spurious failures in ec are because of bugs in code. There are two problems that need to be solved: 1) Fops failing with EIO when locks are failing with errno other than EAGAIN (mostly ESTALE at the moment). http://review.gluster.com/9407 should fix it. 2) Fop failing with EIO because of race with lookup and version update code which leads to less than 'ec-fragments' number of bricks agreeing on the version of the file. We are still working on this issue Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] New regression failure
On 05/08/2015 03:47 PM, Atin Mukherjee wrote: http://build.gluster.org/job/rackspace-regression-2GB-triggered/8782/consoleFull Failed test case : tests/bugs/replicate/bug-976800.t I've added it in the etherpad as well. Thanks Atin! I see that the test doesn't disable flush-behind, which can lead to delayed closing of the file. For now adding that to see if it still shows up in future. http://review.gluster.org/10666 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression failure release-3.7 for tests/basic/afr/entry-self-heal.t
On 05/08/2015 10:53 PM, Justin Clift wrote: Seems like a new one, so it's been added to the Etherpad. http://build.gluster.org/job/regression-test-burn-in/23/console This looks a lot similar to the data-self-heal.t test where healing fails to happen because both the threads end up not getting enough locks to perform heal in self-heal domain. taking blocking locks seem like an easy solution but that will decrease self-heal through put, so Ravi and I are still thinking about best way to solve this problem. Will take some time. I can add this and data-self-heal.t to badtests for now, if that helps. Pranith It's on a new slave VM (slave1), which has been disconnected in Jenkins so it can be investigated. It's using our standard Jenkins auth. + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Proposal for improving throughput for regression test
On 05/08/2015 08:54 PM, Justin Clift wrote: On 8 May 2015, at 10:02, Mohammed Rafi K C rkavu...@redhat.com wrote: Hi All, As we all know, our regression tests are killing us. An average, one regression will take approximately two and half hours to complete the run. So i guess this is the right time to think about enhancing our regression. Proposal 1: Create a new option for the daemons to specify that it is running as test mode, then we can skip fsync calls used for data durability. Proposal 2: Use ip address instead of host name, because it takes some good amount of time to resolve from host name, and even some times causes spurious failure. Proposal 3: Each component has a lot of .t files and there is redundancy in tests, We can do a rework to reduce the .t files and make less number of tests that covers unit testing for a component , and run regression runs once in a day (nightly) . Please provide your inputs for the proposed ideas , and feel free to add a new idea. Proposal 4: Break the regression tests into parts that can be run in parallel. So, instead of the regression testing for a particular CR going from the first test to the last in a serial sequence, we break it up into a number of chunks (dir based?) and make each of these a task. That won't reduce the overall number of tests, but it should get the time down for the result to be finished. Caveat : We're going to need more VM's, as once we get into things queueing up it's not going to help. :/ Raghavendra Talur(CCed) did some work on this earlier by using more docker isntances on a single VM to get the running time under an hour. Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] ec spurious regression failures
On 05/05/2015 01:35 PM, Vijay Bellur wrote: On 05/05/2015 11:40 AM, Emmanuel Dreyfus wrote: Emmanuel Dreyfus m...@netbsd.org wrote: I sent http://review.gluster.org/10540 to address it completely. Not sure if it works on netBSD. Emmanuel help!! I launched test runs in a loop on nbslave70. More later. Failed on first pass: Test Summary Report --- ./tests/basic/ec/ec-3-1.t(Wstat: 0 Tests: 217 Failed: 4) Failed tests: 133-134, 138-139 ./tests/basic/ec/ec-4-1.t(Wstat: 0 Tests: 253 Failed: 6) Failed tests: 152-153, 157-158, 162-163 ./tests/basic/ec/ec-5-1.t(Wstat: 0 Tests: 289 Failed: 8) Failed tests: 171-172, 176-177, 181-182, 186-187 ./tests/basic/ec/ec-readdir.t(Wstat: 0 Tests: 9 Failed: 1) Failed test: 9 ./tests/basic/ec/quota.t (Wstat: 0 Tests: 24 Failed: 1) Failed test: 24 In addition ec-12-4.t has started failing again [1]. Have added a note about this to the etherpad. Already updated the status about this in the earlier mail. http://review.gluster.org/10539 is the fix. -Vijay [1] http://build.gluster.org/job/rackspace-regression-2GB-triggered/8312/consoleFull ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] ec spurious regression failures
On 05/05/2015 01:54 PM, Emmanuel Dreyfus wrote: On Tue, May 05, 2015 at 01:45:03PM +0530, Pranith Kumar Karampuri wrote: Already updated the status about this in the earlier mail. http://review.gluster.org/10539 is the fix. That one only touches bug-1202244-support-inode-quota.t ... RCA: http://www.gluster.org/pipermail/gluster-devel/2015-May/044799.html Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions
On 05/08/2015 09:14 AM, Krishnan Parthasarathi wrote: - Original Message - hi, I think we fixed quite a few heavy hitters in the past week and reasonable number of regression runs are passing which is a good sign. Most of the new heavy hitters in regression failures seem to be code problems in quota/afr/ec, not sure about tier.t (Need to get more info about arbiter.t, read-subvol.t etc). Do you guys have any ideas in keeping the regression failures under control? The deluge of regression failures is a direct consequence of last minute merges during (extended) feature freeze. We did well to contain this. Great stuff! If we want to avoid this we should not accept (large) feature merges just before feature freeze. Hmm... I am not sure, most of the fixes in the last week I saw were bugs in tests or .rc files. The failures in afr and ec were problems that existed even in 3.6. They are showing up more now probably because 3.7 is a bit more parallel. Pranith Here are some of the things that I can think of: 0) Maintainers should also maintain tests that are in their component. It is not possible for me as glusterd co-maintainer to 'maintain' tests that are added under tests/bugs/glusterd. Most of them don't test core glusterd functionality. They are almost always tied to a particular feature whose implementation had bugs in its glusterd code. I would expect the test authors (esp. the more recent ones) to chip in. Thoughts/Suggestions? How about moving these tests to the respective component and not accepting tests in other components to be under tests/bugs/glusterd in future? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions
On 05/08/2015 04:45 PM, Ravishankar N wrote: On 05/08/2015 08:45 AM, Pranith Kumar Karampuri wrote: Do you guys have any ideas in keeping the regression failures under control? I sent a patch to append the commands being run in the .t files to gluster logs @ http://review.gluster.org/#/c/10667/ While it certainly doesn't help check regression failures, I think it makes log analysis a bit easier. Comments welcome. :-) Neat :-). Do you think we can also add .t before the test? From: [2015-05-08 11:02:43.062108594]:++ TEST: 47 abc cat /mnt/glusterfs/0/b ++ To: [2015-05-08 11:02:43.062108594]:++ test-name.t:TEST: 47 abc cat /mnt/glusterfs/0/b ++ Pranith -Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions
On 05/09/2015 12:33 AM, Jeff Darcy wrote: I submit a patch for new-component/changing log-level of one of the logs for which there is not a single caller after you moved it from INFO - DEBUG. So the code is not at all going to be executed. Yet the regressions will fail. I am 100% sure it has nothing to do with my patch. I neither have time nor expertise to debug the test that I have no clue about, so the least I can do is to intimate people who may do something about it i.e. owner of test or maintainer of module. You feel lets ask the owner of the test about what the problem is, owner of the test moves on to different component and is busy with their own work. So you are left with going to the maintainer who tells you so and so is the problem and so and so is the reason as soon as you show the test number, so you end up feeling why didn't I ask him/her first. What you describe sounds more like a problem than a solution. The component maintainers shouldn't be the only ones who have this information. I think this is already solved by having the public pad. Both patch submitters and test owners should be able to find it on a public test-status page. Yes, they are already refering to the pad. The test owner should be *very* well aware of the problem, because it should be at or near the top of their priority list. What is so special about 'test' code? It is still code, if maintainers are maintaining feature code and held responsible, why not test code? It is not that maintainer is the only one who fixes all the problems in the code they maintain, but they are still responsible for maintaining quality of code. Why shouldn't they do the same for quality of tests that test the component they maintain? By putting the onus on the test owner, we achieve two positive things: we lessen the burden on component (or release) maintainers, and we give other people a strong incentive to fix problems in their own (test) code. This has been successful only when people who wrote the tests are still working on same component. Assigning primary responsibility to maintainers has the exact opposite effects. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)
On 05/09/2015 03:19 PM, Krishnan Parthasarathi wrote: Why not break glusterd into small parts and distribute the load to different people? Did you guys plan anything for 4.0 for breaking glusterd? It is going to be a maintenance hell if we don't break it sooner. Good idea. We have thought about it. Just re-architecting glusterd doesn't (and will not) solve the division of responsibility issue that is being discussed here. It's already difficult to maintain glusterd. I have already explained the reasons in the previous thread. I was thinking *-cli xlators could be maintained by the respective fs team itself. It is easier to maintain it this way because each of those xls can be put in xlators/cluster/afr/cli, xlators/cluster/dht/cli, etc. There will be clear demarcation of who owns what this way is my feeling. Even the tests can be organized to tests/afr-cli, tests/dht-cli etc etc. Glusterd does a lot of things: Lets see how we can break things up one thing at a time. I would love to spend some quality time thinking about this problem once I am done with ec work, but this is a rough idea I have for glusterd. 1) CLI handling: Glusterd-cli-xlator should act something like fuse in fs. It just gets the commands and passes it down, just like fuse gets the fops and passes it down. In glusterd process there should be snapshot.so, afr-cli.so, ec-cli.so, dht-cli.so loaded as management-xlators. Just like we have fops lets have mops (management operations). LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every time the top xlator in glusterd receives commands from cli, it converts the params into the arguments (req, op, dict etc) which are needed to carryout the cli. Now it winds the fop to all its children. One of the children is going to handle it locally, while the other child will send the cli to different glusterds that are in cluster. Second child of gluster-cli-xlator (give it a better name, but for now lets call it: mgmtcluster) will collate the responses and give the list of responses to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets call it local-handler) to collate the responses, i.e. logic for collating responses should also be in snapshot.so, afr-cli.so, dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP, COMMIT-OP send response to CLI. 2) Volinfo should become more like inode_t in fs where each *-cli xlator can store their own ctx like snapshot-cli can store all snapshot related info for that volume in that context and afr can store afr-related info in the ctx. Volinfo data strcuture should have very minimal information. Maybe name, bricks etc. 3) Daemon handling: Daemon-manager xlator should have MOPS like START/STOP/INFO and this xlator should be accessible for all the -cli xlators which want to do their own management of the daemons. i.e. ec-cli/afr-cli should do self-heal-daemon handling. dht should do rebalance process handling etc. to give an example: while winding START mop it has to specify the daemon as self-heal-daemon and give enough info etc. 4) Peer handling: mgmtcluster(second child of top-xlator) should have MOPS like PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going to wind these operations based on the peer-cli-commands to this xlator. 5) volgen: top xlator is going to wind MOP called GET_NODE_LINKS, which takes the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli will construct its node(s), stuff options and tell the parent xl-name to which it needs to be linked to. Top xlator is going to just link the nodes to construct the graph and does graph_print to generate the volfile. I am pretty sure I forgot some more aspects of what glusterd does but you get the picture right? Break each aspect into different xlator and have MOPS to solve them. We have some initial ideas on how glusterd for 4.0 would look like. We won't be continuing with glusterd is also a translator model. The above model would work well only if we stuck with the stack of translators approach. Oh nice, I might have missed the mails. Do you mind sharing the plan for 4.0? Any reason why you guys do not want to continue glusterd as translator model? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)
On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote: Ah! now I understood the confusion. I never said maintainer should fix all the bugs in tests. I am only saying that they maintain tests, just like we maintain code. Whether you personally work on it or not, you at least have an idea of what is the problem and what is the solution so someone can come and ask you and you know the status of it. Expectation is not to fix every test failure that comes maintainer's way by maintainer alone. But he/she would know about problem/solution because he/she at least reviews it and merges it. We want to make sure that the tests are in good quality as well just like we make sure code is of good quality. Core is a special case. We will handle it separately. Glusterd is also a 'special' case. As a glusterd maintainer, I am _not_ maintaining insert-your-favourite-gluster-command-here's implementation. So, I don't 'know'/'understand' how it has been implemented and by extension I wouldn't be able to fix it (forget maintaining it :-) ). Given the no. of gluster commands, I won't be surprised if I didn't have an inkling on how your-favourite-gluster-command worked ;-) I hope this encourages other contributors, i.e, any gluster (feature) contributor, to join Kaushal and me in maintaining glusterd. I understand the frustration kp :-). Human brain can only take so much. I think we are solving wrong problem by putting more people on the code. Why not break glusterd into small parts and distribute the load to different people? Did you guys plan anything for 4.0 for breaking glusterd? It is going to be a maintenance hell if we don't break it sooner. Glusterd does a lot of things: Lets see how we can break things up one thing at a time. I would love to spend some quality time thinking about this problem once I am done with ec work, but this is a rough idea I have for glusterd. 1) CLI handling: Glusterd-cli-xlator should act something like fuse in fs. It just gets the commands and passes it down, just like fuse gets the fops and passes it down. In glusterd process there should be snapshot.so, afr-cli.so, ec-cli.so, dht-cli.so loaded as management-xlators. Just like we have fops lets have mops (management operations). LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every time the top xlator in glusterd receives commands from cli, it converts the params into the arguments (req, op, dict etc) which are needed to carryout the cli. Now it winds the fop to all its children. One of the children is going to handle it locally, while the other child will send the cli to different glusterds that are in cluster. Second child of gluster-cli-xlator (give it a better name, but for now lets call it: mgmtcluster) will collate the responses and give the list of responses to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets call it local-handler) to collate the responses, i.e. logic for collating responses should also be in snapshot.so, afr-cli.so, dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP, COMMIT-OP send response to CLI. 2) Volinfo should become more like inode_t in fs where each *-cli xlator can store their own ctx like snapshot-cli can store all snapshot related info for that volume in that context and afr can store afr-related info in the ctx. Volinfo data strcuture should have very minimal information. Maybe name, bricks etc. 3) Daemon handling: Daemon-manager xlator should have MOPS like START/STOP/INFO and this xlator should be accessible for all the -cli xlators which want to do their own management of the daemons. i.e. ec-cli/afr-cli should do self-heal-daemon handling. dht should do rebalance process handling etc. to give an example: while winding START mop it has to specify the daemon as self-heal-daemon and give enough info etc. 4) Peer handling: mgmtcluster(second child of top-xlator) should have MOPS like PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going to wind these operations based on the peer-cli-commands to this xlator. 5) volgen: top xlator is going to wind MOP called GET_NODE_LINKS, which takes the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli will construct its node(s), stuff options and tell the parent xl-name to which it needs to be linked to. Top xlator is going to just link the nodes to construct the graph and does graph_print to generate the volfile. I am pretty sure I forgot some more aspects of what glusterd does but you get the picture right? Break each aspect into different xlator and have MOPS to solve them. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)
On 05/09/2015 02:21 PM, Atin Mukherjee wrote: On 05/09/2015 01:36 PM, Pranith Kumar Karampuri wrote: On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote: Ah! now I understood the confusion. I never said maintainer should fix all the bugs in tests. I am only saying that they maintain tests, just like we maintain code. Whether you personally work on it or not, you at least have an idea of what is the problem and what is the solution so someone can come and ask you and you know the status of it. Expectation is not to fix every test failure that comes maintainer's way by maintainer alone. But he/she would know about problem/solution because he/she at least reviews it and merges it. We want to make sure that the tests are in good quality as well just like we make sure code is of good quality. Core is a special case. We will handle it separately. Glusterd is also a 'special' case. As a glusterd maintainer, I am _not_ maintaining insert-your-favourite-gluster-command-here's implementation. So, I don't 'know'/'understand' how it has been implemented and by extension I wouldn't be able to fix it (forget maintaining it :-) ). Given the no. of gluster commands, I won't be surprised if I didn't have an inkling on how your-favourite-gluster-command worked ;-) I hope this encourages other contributors, i.e, any gluster (feature) contributor, to join Kaushal and me in maintaining glusterd. I understand the frustration kp :-). Human brain can only take so much. I think we are solving wrong problem by putting more people on the code. Why not break glusterd into small parts and distribute the load to different people? Did you guys plan anything for 4.0 for breaking glusterd? It is going to be a maintenance hell if we don't break it sooner. Glusterd does a lot of things: Lets see how we can break things up one thing at a time. I would love to spend some quality time thinking about this problem once I am done with ec work, but this is a rough idea I have for glusterd. 1) CLI handling: Glusterd-cli-xlator should act something like fuse in fs. It just gets the commands and passes it down, just like fuse gets the fops and passes it down. In glusterd process there should be snapshot.so, afr-cli.so, ec-cli.so, dht-cli.so loaded as management-xlators. Just like we have fops lets have mops (management operations). LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every time the top xlator in glusterd receives commands from cli, it converts the params into the arguments (req, op, dict etc) which are needed to carryout the cli. Now it winds the fop to all its children. One of the children is going to handle it locally, while the other child will send the cli to different glusterds that are in cluster. Second child of gluster-cli-xlator (give it a better name, but for now lets call it: mgmtcluster) will collate the responses and give the list of responses to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets call it local-handler) to collate the responses, i.e. logic for collating responses should also be in snapshot.so, afr-cli.so, dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP, COMMIT-OP send response to CLI. 2) Volinfo should become more like inode_t in fs where each *-cli xlator can store their own ctx like snapshot-cli can store all snapshot related info for that volume in that context and afr can store afr-related info in the ctx. Volinfo data strcuture should have very minimal information. Maybe name, bricks etc. 3) Daemon handling: Daemon-manager xlator should have MOPS like START/STOP/INFO and this xlator should be accessible for all the -cli xlators which want to do their own management of the daemons. i.e. ec-cli/afr-cli should do self-heal-daemon handling. dht should do rebalance process handling etc. to give an example: while winding START mop it has to specify the daemon as self-heal-daemon and give enough info etc. 4) Peer handling: mgmtcluster(second child of top-xlator) should have MOPS like PEER_ADD/PEER_DEL/PEER_UPDATE etc to do the needful. top xlator is going to wind these operations based on the peer-cli-commands to this xlator. 5) volgen: top xlator is going to wind MOP called GET_NODE_LINKS, which takes the type of volfile (i.e. mount/nfs/shd/brick etc) on which each *-cli will construct its node(s), stuff options and tell the parent xl-name to which it needs to be linked to. Top xlator is going to just link the nodes to construct the graph and does graph_print to generate the volfile. I am pretty sure I forgot some more aspects of what glusterd does but you get the picture right? Break each aspect into different xlator and have MOPS to solve them. Sounds interesting but needs to be thought out in details. For 4.0,wWe do have a plan to make core glusterd algorithms work as a glusterd engine and other features will work have interfaces to connect to it. Your proposal looks another alternative. I would like to hear from
Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)
On 05/09/2015 03:04 PM, Kaushal M wrote: Modularising GlusterD is something we plan to do. As of now, it's just that a plan. We don't yet have a design to achieve it yet. What Atin mentioned and what you've mentioned seem to be the same at a high level. The core of GlusterD will be a co-ordinating engine, which defines an interface for commands to use to do their work. The commands will each be a seperate module implementing this interface. Depending on how we implement, the actual names will be different. Yes, this is a nice approach. It would be nice if there is a clear demarcation as well for the code, so there won't be any dependency with merging dht changes vs say afr changes in cli. That is why I was suggesting xlator based solution. But other ways of doing it where there is clear demarcation is welcome as well. Would love to know more about the other approaches :-). Pranith On Sat, May 9, 2015 at 2:24 PM, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 05/09/2015 02:21 PM, Atin Mukherjee wrote: On 05/09/2015 01:36 PM, Pranith Kumar Karampuri wrote: On 05/09/2015 11:08 AM, Krishnan Parthasarathi wrote: Ah! now I understood the confusion. I never said maintainer should fix all the bugs in tests. I am only saying that they maintain tests, just like we maintain code. Whether you personally work on it or not, you at least have an idea of what is the problem and what is the solution so someone can come and ask you and you know the status of it. Expectation is not to fix every test failure that comes maintainer's way by maintainer alone. But he/she would know about problem/solution because he/she at least reviews it and merges it. We want to make sure that the tests are in good quality as well just like we make sure code is of good quality. Core is a special case. We will handle it separately. Glusterd is also a 'special' case. As a glusterd maintainer, I am _not_ maintaining insert-your-favourite-gluster-command-here's implementation. So, I don't 'know'/'understand' how it has been implemented and by extension I wouldn't be able to fix it (forget maintaining it :-) ). Given the no. of gluster commands, I won't be surprised if I didn't have an inkling on how your-favourite-gluster-command worked ;-) I hope this encourages other contributors, i.e, any gluster (feature) contributor, to join Kaushal and me in maintaining glusterd. I understand the frustration kp :-). Human brain can only take so much. I think we are solving wrong problem by putting more people on the code. Why not break glusterd into small parts and distribute the load to different people? Did you guys plan anything for 4.0 for breaking glusterd? It is going to be a maintenance hell if we don't break it sooner. Glusterd does a lot of things: Lets see how we can break things up one thing at a time. I would love to spend some quality time thinking about this problem once I am done with ec work, but this is a rough idea I have for glusterd. 1) CLI handling: Glusterd-cli-xlator should act something like fuse in fs. It just gets the commands and passes it down, just like fuse gets the fops and passes it down. In glusterd process there should be snapshot.so, afr-cli.so, ec-cli.so, dht-cli.so loaded as management-xlators. Just like we have fops lets have mops (management operations). LOCK/STAGE/BRICK-OP/COMMIT-OP if there are more add them as well. Every time the top xlator in glusterd receives commands from cli, it converts the params into the arguments (req, op, dict etc) which are needed to carryout the cli. Now it winds the fop to all its children. One of the children is going to handle it locally, while the other child will send the cli to different glusterds that are in cluster. Second child of gluster-cli-xlator (give it a better name, but for now lets call it: mgmtcluster) will collate the responses and give the list of responses to glusterd-cli-xlator, it will call COLLATE mop on the first-child(lets call it local-handler) to collate the responses, i.e. logic for collating responses should also be in snapshot.so, afr-cli.so, dht-cli.so etc etc. Once the top translator does LOCK, STAGE, BRICK-OP, COMMIT-OP send response to CLI. 2) Volinfo should become more like inode_t in fs where each *-cli xlator can store their own ctx like snapshot-cli can store all snapshot related info for that volume in that context and afr can store afr-related info in the ctx. Volinfo data strcuture should have very minimal information. Maybe name, bricks etc. 3) Daemon handling: Daemon-manager xlator should have MOPS like START/STOP/INFO and this xlator should be accessible for all the -cli xlators which want to do their own management of the daemons. i.e. ec-cli/afr-cli should do self-heal-daemon handling. dht should do rebalance process handling etc. to give an example: while winding START mop it has to specify the daemon as self-heal-daemon and give enough info etc. 4) Peer handling: mgmtcluster(second child of top
Re: [Gluster-devel] break glusterd into small parts (Re: good job on fixing heavy hitters in spurious regressions)
On 05/09/2015 04:23 PM, Krishnan Parthasarathi wrote: Oh nice, I might have missed the mails. Do you mind sharing the plan for 4.0? Any reason why you guys do not want to continue glusterd as translator model? I don't understand why we are using the translator model in the first place. I guess it was to reuse rpc code. You should be able to shed more light here. Even I am not sure :-). It was a translator by the time I got in. A quick google search with glusterd 2.0 gluster-users, gave me this http://www.gluster.org/pipermail/gluster-users/2014-September/018639.html. Interestingly you asked us to consider AFR/NSR for distributed configuration management, which lead to http://www.gluster.org/pipermail/gluster-devel/2014-November/042944.html This proposal didn't go in the expected direction. I don't want to get into why not use translators now. We are currently heading in the direction visible in the above threads. If glusterd can't be a translator anymore, so be it. Kaushal's response gave the answers I was looking for. We should probably discuss it more once you guys come up with the interface CLI handling code needs to follow. I was thinking it would be great if you come up with a model where the handler code will be separate from the code of glusterd, which is what you guys seem to be targeting. Translator model is one way of achieving it, I personally love it on the FS side, that is why I was curious why it was not used. But any other way where the above requirements are met is welcome. Really excited to see what will come up :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] good job on fixing heavy hitters in spurious regressions
On 05/09/2015 02:31 AM, Jeff Darcy wrote: What is so special about 'test' code? A broken test blocks everybody's progress in a way that an incomplete feature does not. It is still code, if maintainers are maintaining feature code and held responsible, why not test code? It is not that maintainer is the only one who fixes all the problems in the code they maintain, but they are still responsible for maintaining quality of code. Why shouldn't they do the same for quality of tests that test the component they maintain? You said it yourself: the maintainer isn't the only one who fixes all of the problems. I would certainly hope that people working on a component would keep that component's maintainer informed about what they're doing, but that's not the same as making the component maintainer *directly* responsible for every fix. That especially doesn't work for core which is a huge grab-bag full of different things best understood by different people. To turn your own question around, what's so special about test code that we should short-circuit bugs to the maintainer right away? Ah! now I understood the confusion. I never said maintainer should fix all the bugs in tests. I am only saying that they maintain tests, just like we maintain code. Whether you personally work on it or not, you at least have an idea of what is the problem and what is the solution so someone can come and ask you and you know the status of it. Expectation is not to fix every test failure that comes maintainer's way by maintainer alone. But he/she would know about problem/solution because he/she at least reviews it and merges it. We want to make sure that the tests are in good quality as well just like we make sure code is of good quality. Core is a special case. We will handle it separately. Pranith By putting the onus on the test owner, we achieve two positive things: we lessen the burden on component (or release) maintainers, and we give other people a strong incentive to fix problems in their own (test) code. This has been successful only when people who wrote the tests are still working on same component. Owner and original author are not necessarily the same thing. If someone is unavailable (e.g. new job), or has forgotten too much to be effective, then ownership should already have been reassigned. Then owner first or maintainer first doesn't matter, because they're the same person. The only really tricky case is when the original author and person most qualified to work on a test is still around but unable/unwilling to work on fixing a test, e.g. because their employer insists they work on something else. Perhaps those issues are best addressed on a different mailing list. ;) As far as the project is concerned, what can we do? Our only practical option might be to have someone else fix the test. If that's the case then so be it, but that should be a case-by-case decision and not a default. In the more common cases, responsibility for fixing tests should rest with the same person who's responsible for the associated production code. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regression: brick crashed because of changelog xlator init failure
hi Kotresh/Aravinda, Do you guys know anything about following core which comes because of changelog xlator init failure? It just failed regression on one of my patches: http://review.gluster.org/#/c/10688 24 [2015-05-08 21:34:47.750460] E [xlator.c:426:xlator_init] 0-patchy-changelog: Initialization of volume 'patchy-changelog' failed, review your volfile again 23 [2015-05-08 21:34:47.750485] E [graph.c:322:glusterfs_graph_init] 0-patchy-changelog: initializing translator failed 22 [2015-05-08 21:34:47.750497] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed 21 [2015-05-08 21:34:47.749020] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 20 pending frames: 19 frame : type(0) op(0) 18 patchset: git://git.gluster.com/glusterfs.git 17 signal received: 11 16 time of crash: 15 2015-05-08 21:34:47 14 configuration details: 13 argp 1 12 backtrace 1 11 dlfcn 1 10 libpthread 1 9 llistxattr 1 8 setfsid 1 7 spinlock 1 6 epoll.h 1 5 xattr.h 1 4 st_atim.tv_nsec 1 3 package-string: glusterfs 3.7.0beta1 2 pending frames: 1 frame : type(0) op(0) 0 patchset: git://git.gluster.com/glusterfs.git Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] regression: brick crashed because of changelog xlator init failure
On 05/09/2015 03:26 AM, Pranith Kumar Karampuri wrote: hi Kotresh/Aravinda, Do you guys know anything about following core which comes because of changelog xlator init failure? It just failed regression on one of my patches: http://review.gluster.org/#/c/10688 Sorry wrong URL, this is the correct one: http://review.gluster.com/#/c/10693/ Pranith 24 [2015-05-08 21:34:47.750460] E [xlator.c:426:xlator_init] 0-patchy-changelog: Initialization of volume 'patchy-changelog' failed, review your volfile again 23 [2015-05-08 21:34:47.750485] E [graph.c:322:glusterfs_graph_init] 0-patchy-changelog: initializing translator failed 22 [2015-05-08 21:34:47.750497] E [graph.c:661:glusterfs_graph_activate] 0-graph: init failed 21 [2015-05-08 21:34:47.749020] I [event-epoll.c:629:event_dispatch_epoll_worker] 0-epoll: Started thread with index 2 20 pending frames: 19 frame : type(0) op(0) 18 patchset: git://git.gluster.com/glusterfs.git 17 signal received: 11 16 time of crash: 15 2015-05-08 21:34:47 14 configuration details: 13 argp 1 12 backtrace 1 11 dlfcn 1 10 libpthread 1 9 llistxattr 1 8 setfsid 1 7 spinlock 1 6 epoll.h 1 5 xattr.h 1 4 st_atim.tv_nsec 1 3 package-string: glusterfs 3.7.0beta1 2 pending frames: 1 frame : type(0) op(0) 0 patchset: git://git.gluster.com/glusterfs.git Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?
On 04/08/2015 07:08 PM, Justin Clift wrote: On 8 Apr 2015, at 14:13, Pranith Kumar Karampuri pkara...@redhat.com wrote: On 04/08/2015 06:20 PM, Justin Clift wrote: snip Hagarth mentioned in the weekly IRC meeting that you have an idea what might be causing the regression tests to generate cores? Can you outline that quickly, as Jeff has some time and might be able to help narrow it down further. :) (and these core files are really annoying :/) I feel it is a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1184417. clear-locks command is not handled properly after we did the client_t refactor. I believe that is the reason for the crashes but I could be wrong. But After looking at the code I feel there is high probability that this is the issue. I didn't find it easy to fix. We will need to change the lock structure list maintenance heavily. Easier thing would be to disable clear-locks functionality tests in the regression as it is not something that is used by the users IMO and see if it indeed is the same issue. There are 2 tests using this command: 18:34:00 :) ⚡ git grep clear-locks tests tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / kind all inode tests/bugs/glusterd/bug-824753-file-locker.c: gluster volume clear-locks %s /%s kind all posix 0,7-1 | If even after disabling these two tests it fails then we will need to look again. I think jeff's patch which will find the test which triggered the core should help here. Thanks Pranith. :) Is this other problem when disconnecting BZ possibly related, or is that a different thing? https://bugzilla.redhat.com/show_bug.cgi?id=1195415 I feel 1195415 could be a duplicate of 1184417. Pranith + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding sharding
hi, As I am not able to spend much time on sharding, Kritika is the handling it completely now. I am only doing reviews. Just letting everyone know so that future communication will happen directly with the active developer :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Possibly root cause for the Gluster regression test cores?
On 04/08/2015 06:20 PM, Justin Clift wrote: Hi Pranith, Hagarth mentioned in the weekly IRC meeting that you have an idea what might be causing the regression tests to generate cores? Can you outline that quickly, as Jeff has some time and might be able to help narrow it down further. :) (and these core files are really annoying :/) I feel it is a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1184417. clear-locks command is not handled properly after we did the client_t refactor. I believe that is the reason for the crashes but I could be wrong. But After looking at the code I feel there is high probability that this is the issue. I didn't find it easy to fix. We will need to change the lock structure list maintenance heavily. Easier thing would be to disable clear-locks functionality tests in the regression as it is not something that is used by the users IMO and see if it indeed is the same issue. There are 2 tests using this command: 18:34:00 :) ⚡ git grep clear-locks tests tests/bugs/disperse/bug-1179050.t:TEST $CLI volume clear-locks $V0 / kind all inode tests/bugs/glusterd/bug-824753-file-locker.c: gluster volume clear-locks %s /%s kind all posix 0,7-1 | If even after disabling these two tests it fails then we will need to look again. I think jeff's patch which will find the test which triggered the core should help here. Pranith Regards and best wishes, Justin Clift -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Moratorium on new patch acceptance
On 05/21/2015 12:07 AM, Vijay Bellur wrote: On 05/19/2015 11:56 PM, Vijay Bellur wrote: On 05/18/2015 08:03 PM, Vijay Bellur wrote: On 05/16/2015 03:34 PM, Vijay Bellur wrote: I will send daily status updates from Monday (05/18) about this so that we are clear about where we are and what needs to be done to remove this moratorium. Appreciate your help in having a clean set of regression tests going forward! We have made some progress since Saturday. The problem with glupy.t has been fixed - thanks to Niels! All but following tests have developers looking into them: ./tests/basic/afr/entry-self-heal.t ./tests/bugs/replicate/bug-976800.t ./tests/bugs/replicate/bug-1015990.t ./tests/bugs/quota/bug-1038598.t ./tests/basic/ec/quota.t ./tests/basic/quota-nfs.t ./tests/bugs/glusterd/bug-974007.t Can submitters of these test cases or current feature owners pick these up and start looking into the failures please? Do update the spurious failures etherpad [1] once you pick up a particular test. [1] https://public.pad.fsfe.org/p/gluster-spurious-failures Update for today - all tests that are known to fail have owners. Thanks everyone for chipping in! I think we should be able to lift this moratorium and resume normal patch acceptance shortly. Today's update - Pranith fixed a bunch of failures in erasure coding and Avra removed a test that was not relevant anymore - thanks for that! Xavi and I both sent a patch each for fixing these. But.. I ran the regression 4 times and it succeeded 3 times and failed once on xml.t before merging, I thought these were the last fixes for this problem. Ashish found a way to recreate these same EIO errors so all is not well yet. Xavi is sending one more patch tomorrow which addresses that problem as well. While testing another patch on master I found that there is use after free issue in ec :-(. I am not able to send the fix for it because gerrit ran out of space? Compressing objects: 100% (9/9), done. Writing objects: 100% (9/9), 1.10 KiB | 0 bytes/s, done. Total 9 (delta 7), reused 0 (delta 0) fatal: Unpack error, check server log error: unpack failed: error No space left on device -- PS: Since valgrind is giving so much pain, I used Address sanitizer for debugging this mem-corruption. It is amazing! I followed http://tsdgeos.blogspot.in/2014/03/asan-and-gcc-how-to-get-line-numbers-in.html for getting the backtrace with line-numbers. It doesn't generate core with gcc-4.8 though (I had to use -N flag for starting mount process to get the output on stderr). I think in future versions of gcc we don't need to do all this. I will try and post my experience once I upgrade to fedora22 which has gcc5. Pranith Quota, afr, snapshot tiering tests are being looked into. Will provide an update on where we are with these tomorrow. Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failures? (master)
On 06/05/2015 02:12 AM, Shyam wrote: Just checking, This review request: http://review.gluster.org/#/c/11073/ Failed in the following tests: 1) Linux [20:20:16] ./tests/bugs/replicate/bug-880898.t .. not ok 4 This seems to be same RC as in self-heald.t where heal info is not failing sometimes when the brick is down. Failed 1/4 subtests [20:20:16] http://build.gluster.org/job/rackspace-regression-2GB-triggered/10088/consoleFull 2) NetBSD (Du seems to have faced the same) [11:56:45] ./tests/basic/afr/sparse-file-self-heal.t .. not ok 52 Got instead of 1 not ok 53 Got instead of 1 not ok 54 not ok 55 Got 2 instead of 0 not ok 56 Got d41d8cd98f00b204e9800998ecf8427e instead of b6d81b360a5672d80c27430f39153e2c not ok 60 Got 0 instead of 1 Failed 6/64 subtests [11:56:45] http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/6233/consoleFull There is a bug in statedump code path, If it races with STACK_RESET then shd seems to crash. I see the following output indicating the process died. kill: usage: kill [-s sigspec | -n signum | -sigspec] pid | jobspec ... or kill -l [sigspec] I have not done any analysis, and also the change request should not affect the paths that this test is failing on. Checking the logs for Linux did not throw any more light on the cause, although the brick logs are not updated(?) to reflect the volume create and start as per the TC in (1). Anyone know anything (more) about this? Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure with sparse-file-heal.t test
On 06/05/2015 09:10 AM, Krishnan Parthasarathi wrote: - Original Message - This seems to happen because of race between STACK_RESET and stack statedump. Still thinking how to fix it without taking locks around writing to file. Why should we still keep the stack being reset as part of pending pool of frames? Even we if we had to (can't guess why?), when we remove we should do the following to prevent gf_proc_dump_pending_frames from crashing. ... call_frame_t *toreset = NULL; LOCK (stack-pool-lock) { toreset = stack-frames; stack-frames = NULL; } UNLOCK (stack-pool-lock); ... Now, perform all operations that are done on stack-frames on toreset instead. Thoughts? Is there a reason you want to avoid locks here? STACK_DESTROY uses the call_pool lock to remove the stack from the list of pending frames. It is always better to prevent spin-locks while doing a slow operation like write. That is the only reasoning behind it. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure with sparse-file-heal.t test
On 06/07/2015 05:40 PM, Pranith Kumar Karampuri wrote: On 06/05/2015 09:10 AM, Krishnan Parthasarathi wrote: - Original Message - This seems to happen because of race between STACK_RESET and stack statedump. Still thinking how to fix it without taking locks around writing to file. Why should we still keep the stack being reset as part of pending pool of frames? Even we if we had to (can't guess why?), when we remove we should do the following to prevent gf_proc_dump_pending_frames from crashing. ... call_frame_t *toreset = NULL; LOCK (stack-pool-lock) { toreset = stack-frames; stack-frames = NULL; } UNLOCK (stack-pool-lock); ... Now, perform all operations that are done on stack-frames on toreset instead. Thoughts? Is there a reason you want to avoid locks here? STACK_DESTROY uses the call_pool lock to remove the stack from the list of pending frames. It is always better to prevent spin-locks while doing a slow operation like write. That is the only reasoning behind it. Seems like we are already inside pool-lock while doing statedump which does writes to files, so may be I shouldn't think too much :-/. I will take a look at your patch once. Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure with sparse-file-heal.t test
On 06/05/2015 09:01 AM, Krishnan Parthasarathi wrote: This seems to happen because of race between STACK_RESET and stack statedump. Still thinking how to fix it without taking locks around writing to file. Why should we still keep the stack being reset as part of pending pool of frames? Even we if we had to (can't guess why?), when we remove we should do the following to prevent gf_proc_dump_pending_frames from crashing. C stack actually gives up the memory it takes when the function call returns. But there was no such mechanism for gluster stacks before STACK_RESET. So for long running operations like BIG file self-heal, Big directory read etc we can keep RESETting the stack to prevent it to grow to a large size. Pranith ... call_frame_t *toreset = NULL; LOCK (stack-pool-lock) { toreset = stack-frames; stack-frames = NULL; } UNLOCK (stack-pool-lock); ... Now, perform all operations that are done on stack-frames on toreset instead. Thoughts? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] self-heald.t failures
On 06/05/2015 04:01 PM, Anuradha Talur wrote: gluster volume heal volname info doesn't seem to fail because the process is crashing in afr_notify when invoked by glfs_fini. As a result proper error codes are not being propagated. Pranith had recently sent a patch : http://review.gluster.org/#/c/11001/ to not invoke glfs_fini in non-debug builds. Given that regression is run on debug builds, we are observing the failure. I will send a patch to temporarily not invoke glfs_fini in glfs-heal.c. Sorry for the delayed response. I see that your patch is already merged. Did you get a chance to find why afr_notify is crashing? I would love to keep executing glfs_fini for DEBUG builds so that bugs are found as soon as possible in that code path. Pranith - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Vijay Bellur vbel...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Thursday, June 4, 2015 3:03:02 PM Subject: Re: [Gluster-devel] self-heald.t failures Yeah, I am looking into it. Basically gluster volume heal volname info must fail after volume stop. But sometimes it doesn't seem to :-(. Will need some time to RC. Will update the list. Pranith On 06/04/2015 02:19 PM, Vijay Bellur wrote: On 06/03/2015 10:30 AM, Vijay Bellur wrote: self-heald.t seems to fail intermittently. One such instance was seen recently [1]. Can somebody look into this please? ./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 83 Failed: 1) Failed test: 78 Thanks, Vijay http://build.gluster.org/job/rackspace-regression-2GB-triggered/10029/consoleFull One more failure of self-heald.t: http://build.gluster.org/job/rackspace-regression-2GB-triggered/10092/consoleFull -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regression failures
On 06/03/2015 04:36 PM, Sachin Pandit wrote: Hi, http://review.gluster.org/#/c/11024/ failed in tests/basic/volume-snapshot-clone.t testcase. http://build.gluster.org/job/rackspace-regression-2GB-triggered/10057/consoleFull http://review.gluster.org/#/c/11000/ failed in tests/bugs/replicate/bug-979365.t testcase. http://build.gluster.org/job/rackspace-regression-2GB-triggered/9985/consoleFull It failed in gluster volume stop test: volume stop: patchy: failed volume start: patchy: failed: Volume patchy already started ./tests/bugs/replicate/../../volume.rc: line 201: kill: (18684) - No such process umount: /mnt/glusterfs/0: not mounted [08:04:56] ./tests/bugs/replicate/bug-979365.t .. Atin, Could you help please. Pranith Seems like a spurious failure. Can anyone please have a look at this. Regards, Sachin Pandit. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] answer_list in EC xlator
On 06/03/2015 09:21 PM, fanghuang.d...@yahoo.com wrote: On Wednesday, 3 June 2015, 19:43, fanghuang.d...@yahoo.com fanghuang.d...@yahoo.com wrote: On Wednesday, 3 June 2015, 15:22, Xavier Hernandez xhernan...@datalab.es wrote: On 06/03/2015 05:40 AM, Pranith Kumar Karampuri wrote: On 06/02/2015 08:08 PM, fanghuang.d...@yahoo.com wrote: Hi all, As I reading the source codes of EC xlator, I am confused by the cbk_list and answer_list defined in struct _ec_fop_data. Why do we need two lists to combine the results of callback? Especially for the answer_list, it is initialized in ec_fop_data_allocate, then the nodes are added in ec_cbk_data_allocate. Without being any accessed during the lifetime of fop, the whole list finally is released in ec_fop_cleanup. Do I miss something for the answer_list? +Xavi. hi, The only reason I found is that It is easier to cleanup cbks using answers_list. You can check ec_fop_cleanup() function on latest master to check how this is. You are right. Currently answer_list is only used to cleanup all cbks received while cbk_list is used to track groups of consistent answers. Although currently doesn't happen, if error coercing or special attribute handling are implemented, it could be possible that one cbk gets referenced more than once in cbk_list, making answer_list absolutely necessary. That's a good point to put all the cbks into one group and put those with consistent answers into the other group. But this designing policy cannot be understood easily from the comments, source codes or the list names (cbk_list, answer_list). Could we rename the cbk_list to consist_list or something else easier to be followed? Combining of cbks is a bit involved until you understand it but once you do, it is amazing. I tried to add comments for this part of code and sent a patch, but we forgot to merge it :-) http://review.gluster.org/9982. If you think we can add more comments/change this part of code in a way it makes it easier, let us know. We would love your feedback :-). Wait for Xavi's response as well. This patch is much clearer. For the function ec_combine_update_groups, since we only operate on one list, should we use ec_combine_update_group? The word groups is confusing for readers who may think there are two or more groups. I got it finally. The cbk-list actually maintains multi-groups of the same answer sorted by the count. As Xavi said, one cbk may exist in different groups. So we need an answer_list to do cleanup job. Pranith's patch explains it clearly. Well it is really amazing. Told ya! :-). I will resend the patch with updated comments about how the groups work. Pranith -- Fang Huang ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious failure with sparse-file-heal.t test
I see that statedump is generating core because of which this test spuriously fails. I am looking into it. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Valgrind + glusterfs
On 06/25/2015 12:53 PM, Venky Shankar wrote: On Thu, Jun 25, 2015 at 9:57 AM, Pranith Kumar Karampuri pkara...@redhat.com wrote: hi, Does anyone know why glusterfs hangs with valgrind? /proc/pid/stack ? That was giving futex_wait() CPU shoots up to 100% Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] build failure after merging http://review.gluster.com/10448
hi, I merged a patch before a dependent patch by mistake which lead to build failure. Merged http://review.gluster.com/11413 to fix the same. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GF_FOP_IPC changes
On 06/24/2015 07:44 PM, Soumya Koduri wrote: On 06/24/2015 10:14 AM, Krishnan Parthasarathi wrote: - Original Message - I've been looking at the recent patches to redirect GF_FOP_IPC to an active subvolume instead of always to the first. Specifically, these: http://review.gluster.org/11346 for DHT http://review.gluster.org/11347 for EC http://review.gluster.org/11348 for AFR I can't help but wonder if there's a simpler and more generic way to do this, instead of having to do this in a translator-specific way each time - then again for NSR, or for a separate tiering translator, and so on. For example what if each translator had a first_active_child callback? xlator_t * (*first_active_child) (xlator_t *parent); Then default_ipc could invoke this, if it exists, where it currently invokes FIRST_CHILD. Each translator could implement a bare minimum to select a child, then step out of the way for a fop it really wasn't all that interested in to begin with. Any thoughts? We should do this right away. This change doesn't affect external interfaces. we should be bold and implement the first solution. Over time we could improve on this. +1. It would definitely ease the implementation of many such fops which have to default to first active child. We need not keep track of all the fops which may get affected with new clustering xlators being added. I haven't seen the patches yet. Failures can happen just at the time of winding, leading to same failures. It at least needs to have the logic of picking next_active_child. EC needs to lock+xattrop the bricks to find bricks with good copies. AFR needs to perform getxattr to find good copies. Just giving more information to see if it helps. Pranith Thanks, Soumya ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GF_FOP_IPC changes
On 06/24/2015 08:26 PM, Jeff Darcy wrote: I haven't seen the patches yet. Failures can happen just at the time of winding, leading to same failures. It at least needs to have the logic of picking next_active_child. EC needs to lock+xattrop the bricks to find bricks with good copies. AFR needs to perform getxattr to find good copies. Is that really true? I thought they each had a readily-available idea of which children are up or down, which they already use e.g. for reads. It knows which bricks are up/down. But they may not be the latest. Will that matter? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Regresssion Failure (3.7 branch): afr-quota-xattr-mdata-heal.t
This is a known spurious failure. Pranith On 06/25/2015 11:14 AM, Kotresh Hiremath Ravishankar wrote: Hi, I see the above test case failing for my patch which is not related. Could some one from AFR team look into it? http://build.gluster.org/job/rackspace-regression-2GB-triggered/11332/consoleFull Thanks and Regards, Kotresh H R ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GF_FOP_IPC changes
On 06/25/2015 02:49 AM, Jeff Darcy wrote: It knows which bricks are up/down. But they may not be the latest. Will that matter? AFAIK it's sufficient at this point to know which are up/down. In that case, we need two functions which give first active child and next_active_child in case of failure. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] GF_FOP_IPC changes
On 06/25/2015 12:10 PM, Soumya Koduri wrote: On 06/25/2015 09:00 AM, Pranith Kumar Karampuri wrote: On 06/25/2015 02:49 AM, Jeff Darcy wrote: It knows which bricks are up/down. But they may not be the latest. Will that matter? AFAIK it's sufficient at this point to know which are up/down. In that case, we need two functions which give first active child and next_active_child in case of failure. Do you suggest then in all default_*_cbk(), on receiving ENOTCONN failure, we re-send fop to next_active_child? Yeah, I think that would be more generic than depending on up-subvols of the cluster xlator. 1) In default_ipc(), wind it to first subvol. 2) If it gives ENOTCONN wind to next child as long as it is not the last child. Pranith Thanks, Soumya Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Valgrind + glusterfs
I tried EC volume with 2+1 config. dd of=a.txt if=/dev/urandom bs=128k count=1024 worked fine. When I increased bs=1M it hung. This is on my laptop. Pranith On 06/25/2015 10:32 AM, Krishnan Parthasarathi wrote: - Original Message - hi, Does anyone know why glusterfs hangs with valgrind? When do you observe the hang? I started a single brick volume, enabled valgrind on bricks and mounted it via fuse. I didn't observe the mount hang. Could you share the set of steps which lead to the hang? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Valgrind + glusterfs
hi, Does anyone know why glusterfs hangs with valgrind? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Public key problem on new vms for NetBSD
hi, I see that NetBSD regressions are passing but not able to give +1 because of following problem: + ssh 'nb7bu...@review.gluster.org' gerrit review --message ''\''http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/consoleFull : SUCCESS'\''' --project=glusterfs --code-review=0 '--verified=+1' 276ba2dbd076a2c4b86e8afd0eaf2db7376ea2a8 Permission denied (publickey). I saw it happened for 2 of my patches: http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7046/console http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/7047/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] NetBSD regression tests hanging after ./tests/basic/mgmt_v3-locks.t
Emmanuel, I am not sure of the feasibility but just wanted to ask you. Do you think there is a possibility to error out operations on the mount when mount crashes instead of hanging? That would prevent a lot of manual intervention even in future. Pranith. On 06/15/2015 01:35 PM, Niels de Vos wrote: Hi, sometimes the NetBSD regression tests hang with messages like this: [12:29:07] ./tests/basic/mgmt_v3-locks.t ... ok79867 ms No volumes present mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied mount_nfs: can't access /patchy: Permission denied Most (if not all) of these hangs are caused by a crashing Gluster/NFS process. Once the Gluster/NFS server is not reachable anymore, unmounting fails. The only way to recover is to reboot the VM and retrigger the test. For rebooting, the http://build.gluster.org/job/reboot-vm job can be used, and retriggering works by clicking the retrigger link in the left menu once the test has been marked as failed/aborted. When logging in on the NetBSD system that hangs, you can verify with these steps: 1. check if there is a /glusterfsd.core file 2. run gdb on the core: # cd /build/install # gdb --core=/glusterfsd.core sbin/glusterfs ... Program terminated with signal SIGSEGV, Segmentation fault. #0 0xb9b94f0b in auth_cache_lookup (cache=0xb9aa2310, fh=0xb9044bf8, host_addr=0xb900e400 104.130.205.187, timestamp=0xbf7fd900, can_write=0xbf7fd8fc) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/nfs/server/src/auth-cache.c:164 164 *can_write = lookup_res-item-opts-rw; 3. verify the lookup_res structure: (gdb) p *lookup_res $1 = {timestamp = 1434284981, item = 0xb901e3b0} (gdb) p *lookup_res-item $2 = {name = 0xff00 error: Cannot access memory at address 0xff00, opts = 0x} A fix for this has been sent, it is currently waiting for an update to the prosed reference counting: - http://review.gluster.org/11022 core: add gf_ref_t for common refcounting structures - http://review.gluster.org/11023 nfs: refcount each auth_cache_entry and related data_t Thanks, Niels ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Spurious failure in tests/bugs/glusterd/bug-963541.t
+gluster-devel On 06/11/2015 10:22 AM, Pranith Kumar Karampuri wrote: hi, Could you guys help in finding RCA for http://build.gluster.org/job/rackspace-regression-2GB-triggered/10449/consoleFull failures in tests/bugs/glusterd/bug-963541.t Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Unable to send patches to gerrit
Last time when this happened Kaushal/vijay fixed it if I remember correctly. +kaushal +Vijay Pranith On 06/11/2015 10:38 AM, Anoop C S wrote: On 06/11/2015 10:33 AM, Ravishankar N wrote: I'm unable to push a patch on release-3.6, getting different errors every time: This happens for master too. I continuously get the following error: error: unpack failed: error No space left on device [ravi@tuxpad glusterfs]$ ./rfc.sh [detached HEAD a59646a] afr: honour selfheal enable/disable volume set options Date: Sat May 30 10:23:33 2015 +0530 3 files changed, 108 insertions(+), 4 deletions(-) create mode 100644 tests/basic/afr/client-side-heal.t Successfully rebased and updated refs/heads/3.6_honour_heal_options. Counting objects: 11, done. Delta compression using up to 4 threads. Compressing objects: 100% (11/11), done. Writing objects: 100% (11/11), 1.77 KiB | 0 bytes/s, done. Total 11 (delta 9), reused 0 (delta 0) *error: unpack failed: error No space left on device** **fatal: Unpack error, check server log* To ssh://itisr...@git.gluster.org/glusterfs.git ! [remote rejected] HEAD - refs/for/release-3.6/bug-1230259 (n/a (unpacker error)) error: failed to push some refs to 'ssh://itisr...@git.gluster.org/glusterfs.git' [ravi@tuxpad glusterfs]$ [ravi@tuxpad glusterfs]$ ./rfc.sh [detached HEAD 8b28efd] afr: honour selfheal enable/disable volume set options Date: Sat May 30 10:23:33 2015 +0530 3 files changed, 108 insertions(+), 4 deletions(-) create mode 100644 tests/basic/afr/client-side-heal.t Successfully rebased and updated refs/heads/3.6_honour_heal_options. *fatal: internal server error** **fatal: Could not read from remote repository.** ** **Please make sure you have the correct access rights** **and the repository exists.* Anybody else facing problems? -Ravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] answer_list in EC xlator
On 06/02/2015 08:08 PM, fanghuang.d...@yahoo.com wrote: Hi all, As I reading the source codes of EC xlator, I am confused by the cbk_list and answer_list defined in struct _ec_fop_data. Why do we need two lists to combine the results of callback? Especially for the answer_list, it is initialized in ec_fop_data_allocate, then the nodes are added in ec_cbk_data_allocate. Without being any accessed during the lifetime of fop, the whole list finally is released in ec_fop_cleanup. Do I miss something for the answer_list? +Xavi. hi, The only reason I found is that It is easier to cleanup cbks using answers_list. You can check ec_fop_cleanup() function on latest master to check how this is. Combining of cbks is a bit involved until you understand it but once you do, it is amazing. I tried to add comments for this part of code and sent a patch, but we forgot to merge it :-) http://review.gluster.org/9982. If you think we can add more comments/change this part of code in a way it makes it easier, let us know. We would love your feedback :-). Wait for Xavi's response as well. Pranith Regards, Fang Huang ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] DHTv2 design discussion
On 06/03/2015 01:14 AM, Jeff Darcy wrote: I've put together a document which I hope captures the most recent discussions I've had, particularly those in Barcelona. Commenting should be open to anyone, so please feel free to weigh in before too much code is written. ;) https://docs.google.com/document/d/1nJuG1KHtzU99HU9BK9Qxoo1ib9VXf2vwVuHzVQc_lKg/edit?usp=sharing Jeff, Do you guys have any date before which the comments need to be given? It helps in prioritizing with other work I have. I would love to make time and go through this in detail and ask questions. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to find total number of gluster mounts?
On 06/01/2015 11:07 AM, Bipin Kunal wrote: Hi All, Is there a way to find total number of gluster mounts? If not, what would be the complexity for this RFE? As far as I understand finding the number of fuse mount should be possible but seems unfeasible for nfs and samba mounts. True. Bricks have connections from each of the clients. Each of fuse/nfs/glustershd/quotad/glfsapi-based-clients(samba/glfsheal) would have separate client-context set on the bricks. So We can get this information. But like you said I am not sure how it can be done in nfs server/samba. Adding more people. Pranith Please let me know your precious thoughts on this. Thanks, Bipin Kunal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] tests/bugs/glusterd/bug-948686.t gave a core
Glustershd is crashing because afr wound xattrop with null gfid in loc. Could one of you look into this failure? http://build.gluster.org/job/rackspace-regression-2GB-triggered/10095/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure with sparse-file-heal.t test
This seems to happen because of race between STACK_RESET and stack statedump. Still thinking how to fix it without taking locks around writing to file. Pranith On 06/04/2015 02:13 PM, Pranith Kumar Karampuri wrote: I see that statedump is generating core because of which this test spuriously fails. I am looking into it. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Only netbsd regressions seem to be triggered
On 06/03/2015 10:26 AM, Raghavendra Gowdappa wrote: All, It seems only netbsd regressions are triggered. Linux based regressions seems to be not triggered. I've observed this with two patches [1][2]. Pranith also feels same. Have any of you seen similar issue? I saw it happen in reverse. I think the netbsd jobs on my patches failed more because it couldn't fetch the patch from gerrit. It does happen quite a bit though. Pranith [1]http://review.gluster.org/#/c/10943/ [2]http://review.gluster.org/#/c/10834/ regards, Raghavendra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] git fetch is failing
hi, git fetch on local repo fails with the following error. I asked on #gluster-dev, some of the people online now face the same error. pk1@localhost - ~/workspace/gerrit-repo (cooperative-locking-3.7) 08:54:14 :( ⚡ git fetch ssh_exchange_identification: Connection closed by remote host fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated bug workflow
On 05/29/2015 10:41 PM, Nagaprasad Sathyanarayana wrote: When similar automation was discussed, somebody had raised the concern when more than one patch is associated with a BZ. Either we keep 1:1 between BZ and patch. Otherwise the workflow needs to be improvised to inform gerrit when the last patch is submitted for a BZ so that the state can be appropriately changed. Thoughts? rfc.sh will ask if this patch is the last one for the bug, or if more patches are expected. Based on this input it acts on bugzilla. Pranith Thanks Naga On 29-May-2015, at 10:21 pm, Niels de Vos nde...@redhat.com wrote: Hi all, today we had a discussion about how to get the status of reported bugs more correct and up to date. It is something that has come up several times already, but now we have a BIG solution as Pranith calls it. The goal is rather simple, but is requires some thinking about rules and components that can actually take care of the automation. The general user-visible results would be: * rfc.sh will ask if this patch it the last one for the bug, or if more patches are expected * Gerrit will receive the patch with the answer, and modify the status of the bug to POST * when the patch is merged, Gerrit will change (or not) the status of the bug to MODIFIED * when a nightly build is made, all bugs that have patches included and the status of the bug is MODIFIED, the build script will change the status to ON_QA and set a fixed in version This is a simplified view, there are some other cases that we need to take care of. These are documented in the etherpad linked below. We value any input for this, Kaleb and Rafi already gave some, thanks! Please let us know over email or IRC and we'll update the etherpad. Thanks, Pranith Niels Etherpad with detailed step by step actions to take: https://public.pad.fsfe.org/p/gluster-automated-bug-workflow IRC log, where the discussion started: https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Automated bug workflow
On 05/29/2015 11:23 PM, Shyam wrote: On 05/29/2015 12:51 PM, Niels de Vos wrote: Hi all, today we had a discussion about how to get the status of reported bugs more correct and up to date. It is something that has come up several times already, but now we have a BIG solution as Pranith calls it. The goal is rather simple, but is requires some thinking about rules and components that can actually take care of the automation. The general user-visible results would be: * rfc.sh will ask if this patch it the last one for the bug, or if more patches are expected * Gerrit will receive the patch with the answer, and modify the status of the bug to POST I like to do this manually. Instead of just yes/no may be we should also let it accept an input 'disable' so that no automated BUG state modifications are done. * when the patch is merged, Gerrit will change (or not) the status of the bug to MODIFIED I like to do this manually too... but automation does not hurt, esp. when I control when the bug moves to POST. hmm... if we have the 'marker' to say 'disabled' even this part won't be automatically done when the patch is merged. ./rfc.sh needs to take more inputs about what kind of automation is needed and act occardingly i.e. don't do 'moving to POST' but if the bug is already in POST move it to MODIFIED etc. Pranith. * when a nightly build is made, all bugs that have patches included and the status of the bug is MODIFIED, the build script will change the status to ON_QA and set a fixed in version This I would like automated, as I am not tracking when it was released (of sorts). But, if I miss the nightly boat, I assume the automation would not pick this up, as a result automation on the MODIFIED step is good, as that would take care of this miss for me. This is a simplified view, there are some other cases that we need to take care of. These are documented in the etherpad linked below. We value any input for this, Kaleb and Rafi already gave some, thanks! Please let us know over email or IRC and we'll update the etherpad. Overall, we can have all of this, but I guess I will possibly never use the POST automation and do that myself. Is this a personal preference or you think improving something in the tool will persuade you to let the tool take care of moving to POST? Pranith Thanks, Pranith Niels Etherpad with detailed step by step actions to take: https://public.pad.fsfe.org/p/gluster-automated-bug-workflow IRC log, where the discussion started: https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gluster builds are failing in rpmbuilding
On 05/30/2015 08:10 AM, Pranith Kumar Karampuri wrote: I see that kaleb already sent a patch for this: http://review.gluster.org/#/c/11007 - master http://review.gluster.org/#/c/11008 - NetBSD I meant http://review.gluster.org/#/c/11008 for release-3.7 :-) Pranith I am going to abandon my patch. Pranith On 05/30/2015 07:54 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote: It appears to me that glusterd-errno.h was added in the patch http://review.gluster.org/10313, which was merged on 29th. Please correct me if I am wrong. I think it is supposed to be added to Makefile as well. Let me do some testing. http://review.gluster.org/11010 fixes this. Thanks a lot Naga :-) Pranith Pranith Thanks Naga - Original Message - From: Nagaprasad Sathyanarayana nsath...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 7:23:21 AM Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding Could it be due to the compilation errors? http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ : glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or directory CC glusterd_la-glusterd-mgmt-handler.lo glusterd-locks.c: In function 'glusterd_mgmt_v3_lock': glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in this function) glusterd-locks.c:557: error: (Each undeclared identifier is reported only once glusterd-locks.c:557: error: for each function it appears in.) make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs make[4]: *** [all-recursive] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 RPM build errors: error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Child return code was: 1 http://build.gluster.org/job/glusterfs-devrpms/9141/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt-handler.lo CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. Thanks Naga - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 6:57:41 AM Subject: [Gluster-devel] gluster builds are failing in rpmbuilding hi, I don't understand rpmbuild logs that well. But the following seems to be the issue: Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) Config(epel-6-x86_64) 1 minutes 5 seconds Please feel free to take a look at the following links for sample runs: http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console http://build.gluster.org/job/glusterfs-devrpms/9141/console http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Regression fails in tests/bugs/nfs/bug-904065.t
Niels, As per git you are author for the test above. Could you please help find RC for the failure. Log: http://build.gluster.org/job/rackspace-regression-2GB-triggered/9812/consoleFull I am going to re-trigger the build. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] gluster builds are failing in rpmbuilding
hi, I don't understand rpmbuild logs that well. But the following seems to be the issue: Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) Config(epel-6-x86_64) 1 minutes 5 seconds Please feel free to take a look at the following links for sample runs: http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console http://build.gluster.org/job/glusterfs-devrpms/9141/console http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gluster builds are failing in rpmbuilding
On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote: It appears to me that glusterd-errno.h was added in the patch http://review.gluster.org/10313, which was merged on 29th. Please correct me if I am wrong. I think it is supposed to be added to Makefile as well. Let me do some testing. http://review.gluster.org/11010 fixes this. Thanks a lot Naga :-) Pranith Pranith Thanks Naga - Original Message - From: Nagaprasad Sathyanarayana nsath...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 7:23:21 AM Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding Could it be due to the compilation errors? http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ : glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or directory CC glusterd_la-glusterd-mgmt-handler.lo glusterd-locks.c: In function 'glusterd_mgmt_v3_lock': glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in this function) glusterd-locks.c:557: error: (Each undeclared identifier is reported only once glusterd-locks.c:557: error: for each function it appears in.) make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs make[4]: *** [all-recursive] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 RPM build errors: error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Child return code was: 1 http://build.gluster.org/job/glusterfs-devrpms/9141/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt-handler.lo CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. Thanks Naga - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 6:57:41 AM Subject: [Gluster-devel] gluster builds are failing in rpmbuilding hi, I don't understand rpmbuild logs that well. But the following seems to be the issue: Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) Config(epel-6-x86_64) 1 minutes 5 seconds Please feel free to take a look at the following links for sample runs: http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console http://build.gluster.org/job/glusterfs-devrpms/9141/console http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gluster builds are failing in rpmbuilding
On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote: It appears to me that glusterd-errno.h was added in the patch http://review.gluster.org/10313, which was merged on 29th. Please correct me if I am wrong. I think it is supposed to be added to Makefile as well. Let me do some testing. Pranith Thanks Naga - Original Message - From: Nagaprasad Sathyanarayana nsath...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 7:23:21 AM Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding Could it be due to the compilation errors? http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ : glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or directory CC glusterd_la-glusterd-mgmt-handler.lo glusterd-locks.c: In function 'glusterd_mgmt_v3_lock': glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in this function) glusterd-locks.c:557: error: (Each undeclared identifier is reported only once glusterd-locks.c:557: error: for each function it appears in.) make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs make[4]: *** [all-recursive] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 RPM build errors: error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Child return code was: 1 http://build.gluster.org/job/glusterfs-devrpms/9141/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt-handler.lo CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. Thanks Naga - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 6:57:41 AM Subject: [Gluster-devel] gluster builds are failing in rpmbuilding hi, I don't understand rpmbuild logs that well. But the following seems to be the issue: Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) Config(epel-6-x86_64) 1 minutes 5 seconds Please feel free to take a look at the following links for sample runs: http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console http://build.gluster.org/job/glusterfs-devrpms/9141/console http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gluster builds are failing in rpmbuilding
On 05/30/2015 09:20 AM, Avra Sengupta wrote: That is because the patch that introduces glusterd-errno.h is not yet merged in 3.7. So glusterd-errno.h is still not present int release 3.7. I will update the patch introducing the header file itself with the required change, and will abandon http://review.gluster.org/#/c/11008 Thanks avra. Seems like I cloned a branch from master but named it 3.7 haha :-D. Pranith Regards, Avra On 05/30/2015 08:29 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 08:11 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 08:10 AM, Pranith Kumar Karampuri wrote: I see that kaleb already sent a patch for this: http://review.gluster.org/#/c/11007 - master http://review.gluster.org/#/c/11008 - NetBSD I meant http://review.gluster.org/#/c/11008 for release-3.7 :-) This fails in smoke with the following failure :-(. make[4]: *** No rule to make target `glusterd-errno.h', needed by `all-am'. Stop. make[4]: *** Waiting for unfinished jobs On my laptop it succeeds though :-/. Any clues? Pranith Pranith I am going to abandon my patch. Pranith On 05/30/2015 07:54 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 07:44 AM, Pranith Kumar Karampuri wrote: On 05/30/2015 07:33 AM, Nagaprasad Sathyanarayana wrote: It appears to me that glusterd-errno.h was added in the patch http://review.gluster.org/10313, which was merged on 29th. Please correct me if I am wrong. I think it is supposed to be added to Makefile as well. Let me do some testing. http://review.gluster.org/11010 fixes this. Thanks a lot Naga :-) Pranith Pranith Thanks Naga - Original Message - From: Nagaprasad Sathyanarayana nsath...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 7:23:21 AM Subject: Re: [Gluster-devel] gluster builds are failing in rpmbuilding Could it be due to the compilation errors? http://build.gluster.org/job/glusterfs-devrpms-el6/9019/ : glusterd-locks.c:24:28: error: glusterd-errno.h: No such file or directory CC glusterd_la-glusterd-mgmt-handler.lo glusterd-locks.c: In function 'glusterd_mgmt_v3_lock': glusterd-locks.c:557: error: 'EANOTRANS' undeclared (first use in this function) glusterd-locks.c:557: error: (Each undeclared identifier is reported only once glusterd-locks.c:557: error: for each function it appears in.) make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs make[4]: *** [all-recursive] Error 1 make[3]: *** [all-recursive] Error 1 make[2]: *** [all-recursive] Error 1 make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2 RPM build errors: error: Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Bad exit status from /var/tmp/rpm-tmp.E46NjW (%build) Child return code was: 1 http://build.gluster.org/job/glusterfs-devrpms/9141/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt-handler.lo CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 http://build.gluster.org/job/glusterfs-devrpms-el7/2179/ : glusterd-locks.c:24:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. CC glusterd_la-glusterd-mgmt.lo make[5]: *** [glusterd_la-glusterd-locks.lo] Error 1 make[5]: *** Waiting for unfinished jobs glusterd-mgmt.c:26:28: fatal error: glusterd-errno.h: No such file or directory #include glusterd-errno.h ^ compilation terminated. Thanks Naga - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org Sent: Saturday, May 30, 2015 6:57:41 AM Subject: [Gluster-devel] gluster builds are failing in rpmbuilding hi, I don't understand rpmbuild logs that well. But the following seems to be the issue: Start: build phase for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Finish: build setup for glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm Start: rpmbuild glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm ERROR: Exception(glusterfs-3.8dev-0.314.git471b2e0.el6.src.rpm) Config(epel-6-x86_64) 1 minutes 5 seconds Please feel free to take a look at the following links for sample runs: http://build.gluster.org/job/glusterfs-devrpms-el6/9019/console http://build.gluster.org/job/glusterfs-devrpms/9141/console http://build.gluster.org/job/glusterfs-devrpms-el7/2179/console Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http
Re: [Gluster-devel] New regression failure with EC
I am looking into it. Pranith On 05/28/2015 11:03 AM, Kaushal M wrote: Got a EC test failure ( ./tests/bugs/disperse/bug-1161621.t) on http://build.gluster.org/job/rackspace-regression-2GB-triggered/9628/consoleFull The change being tested was a pure GlusterD change, so this is most likely a (new?) spurious failure. ~kaushal ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] ec/self-heal.t failure
On 06/02/2015 10:40 AM, Krishnan Parthasarathi wrote: ec/self-heal.t failed regression reporting: not ok 71 Got -rw--- instead of -rw-r--r-- (regression had passed with earlier patchset). Console output is: http://build.gluster.org/job/rackspace-regression-2GB-triggered/9881/consoleFull Mind having a look? ec/self-heal.t failed for me on release-3.7 branch. See http://build.gluster.org/job/rackspace-regression-2GB-triggered/9978/consoleFull snip ./tests/basic/ec/self-heal.t (Wstat: 0 Tests: 257 Failed: 4) Failed tests: 104, 115, 148, 159 /snip Would it help if we added this to is_bad_test() until it's root caused? http://review.gluster.org/11018 is the fix on master. http://review.gluster.org/11027 on release-3.7 Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious failures tests/bugs/tier/bug-1205545-CTR-and-trash-integration.t
hi, http://build.gluster.org/job/rackspace-regression-2GB-triggered/11757/consoleFull has the logs. Could you please look into it. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Unable to send patches to review.gluster.org
I get the following error: error: unpack failed: error No space left on device fatal: Unpack error, check server log Pranith On 07/02/2015 09:58 AM, Atin Mukherjee wrote: + Infra, can any one of you just take a look at it? On 07/02/2015 09:53 AM, Anuradha Talur wrote: Hi, I'm unable to send patches to r.g.o, also not able to login. I'm getting the following errors respectively: 1) Permission denied (publickey). fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. 2) Internal server error or forbidden access. Is anyone else facing the same issue? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t
Thanks Dan!. Pranith On 07/02/2015 06:14 PM, Dan Lambright wrote: I'll check on this. - Original Message - From: Pranith Kumar Karampuri pkara...@redhat.com To: Gluster Devel gluster-devel@gluster.org, Joseph Fernandes josfe...@redhat.com Sent: Thursday, July 2, 2015 5:40:34 AM Subject: [Gluster-devel] Failure in tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t hi Joseph, Could you take a look at http://build.gluster.org/job/rackspace-regression-2GB-triggered/11842/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Mount hangs because of connection delays
hi, When glusterfs mount process is coming up all cluster xlators wait for at least one event from all the children before propagating the status upwards. Sometimes client xlator takes upto 2 minutes to propogate this event(https://bugzilla.redhat.com/show_bug.cgi?id=1054694#c0) Due to this xavi implemented timer in ec notify where we treat a child as down if it doesn't come up in 10 seconds. Similar patch went up for review @http://review.gluster.org/#/c/3 for afr. Kritika raised an interesting point in the review that all cluster xlators need to have this logic for the mount to not hang, and the correct place to fix it would be client xlator itself. i.e. add the timer logic in client xlator. Which seems like a better approach. I just want to take inputs from everyone before we go ahead in that direction. i.e. on PARENT_UP in client xlator it will start a timer and if no rpc notification is received in that timeout it treats the client xlator as down. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] regarding mem_0filled, iov_0filled and memdup
hi, These functions return 0 when 0filled and non-zero value when not 0filled. This is quite unintuitive as people think that it should return _gf_true when 0filled and false when it is not 0filled. This comes up as bug in reviews quite a few times, so decided may be it is better to change the api itself. What do you say? Along the same lines is memdup. It is a function in common-utils which does GF_CALLOC, so the memory needs to be freed with GF_FREE. But since it sounds so much like standard api, I have seen people do free instead of GF_FREE. May be it is better to change it to gf_memdup? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] tests/bugs/fuse/bug-924726.t spurious failure
hi, tests/bugs/fuse/bug-924726.t failed in http://build.gluster.org/job/rackspace-regression-2GB-triggered/9553/consoleFull Could you take a look. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Serialization of fops acting on same dentry on server
+ Ravi, Anuradha On 08/17/2015 10:39 AM, Raghavendra Gowdappa wrote: All, Pranith and me were discussing about implementation of compound operations like create + lock, mkdir + lock, open + lock etc. These operations are useful in situations like: 1. To prevent locking on all subvols during directory creation as part of self heal in dht. Currently we are following approach of locking _all_ subvols by both rmdir and lookup-heal [1]. 2. To lock a file in advance so that there is less performance hit during transactions in afr. While thinking about implementing such compound operations, it occurred to me that one of the problems would be how do we handle a racing mkdir/create and a (named lookup - simply referred as lookup from now on - followed by lock). This is because, 1. creation of directory/file on backend 2. linking of the inode with the gfid corresponding to that file/directory are not atomic. It is not guaranteed that inode passed down during mkdir/create call need not be the one that survives in inode table. Since posix-locks xlator maintains all the lock-state in inode, it would be a problem if a different inode is linked in inode table than the one passed during mkdir/create. One way to solve this problem is to serialize fops (like mkdir/create, lookup, rename, rmdir, unlink) that are happening on a particular dentry. This serialization would also solve other bugs like: 1. issues solved by [2][3] and possibly many such issues. 2. Stale dentries left out in bricks' inode table because of a racing lookup and dentry modification ops (like rmdir, unlink, rename etc). Initial idea I've now is to maintain fops in-progress on a dentry in parent inode (may be resolver code in protocol/server). Based on this we can serialize the operations. Since we need to serialize _only_ operations on a dentry (we don't serialize nameless lookups), it is guaranteed that we do have a parent inode always. Any comments/discussion on this would be appreciated. [1] http://review.gluster.org/11725 [2] http://review.gluster.org/9913 [3] http://review.gluster.org/5240 regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Justin's last day at Red Hat today ;)
All the best Justin! Pranith On 06/30/2015 08:11 PM, Justin Clift wrote: Hi us, It's my last day at Red Hat today, so I've just adjusted the jus...@gluster.org email address to redirect things to jus...@postgresql.org instead. So, people can still email me. I do have some Gluster things I'd like to finish off, it's just I need a bit of a break first. ;) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] reviving spurious failures tracking
hi, I just updated https://public.pad.fsfe.org/p/gluster-spurious-failures with the latest spurious failures we saw in linux and NetBSD regressions. Could you guys update with any more spurious regressions that you guys are observing but not listed on the pad. Could you guys help in fixing these issues fast as the number of failures is increasing quite a bit nowadays. Tests to be fixed (Linux) tests/bugs/distribute/bug-1066798.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/12908/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12908/console%29(http://build.gluster.org/job/rackspace-regression-2GB-triggered/12907/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12907/console%29 tests/bitrot/bug-1244613.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/12906/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12906/console%29 tests/bugs/snapshot/bug-1109889.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/12905/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12905/console%29 tests/bugs/replicate/bug-1238508-self-heal.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/12904/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12904/console%29 tests/basic/nufa.t (http://build.gluster.org/job/rackspace-regression-2GB-triggered/12902/console) http://build.gluster.org/job/rackspace-regression-2GB-triggered/12902/console%29 On NetBSD: tests/basic/mount-nfs-auth.t (http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8796/console) http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8796/console%29 tests/basic/tier/tier-attach-many.t (http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8789/console) http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8789/console%29 tests/basic/afr/arbiter.t (http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8785/console) http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8785/console%29 tests/basic/tier/bug-1214222-directories_miising_after_attach_tier.t (http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8784/console) http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8784/console%29 tests/basic/quota.t (http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8780/console) http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/8780/console%29 First step is to move the tests above to Tests being looked at: (please put your name against the test you are looking into): section by the respective developers. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] reviving spurious failures tracking
On 07/29/2015 06:10 PM, Emmanuel Dreyfus wrote: On Wed, Jul 29, 2015 at 04:06:43PM +0530, Vijay Bellur wrote: - If there are tests that cannot be fixed easily in the near term, we move such tests to a different folder or drop such test units. A tests/disabled directory seemsthe way to go. But before going there, the test maintaniner should be notified. Perhaps we should have a list of contacts ina comment on the topof each test? Jeff already implemented bad-tests infra already. We can use the same? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] pluggability of some aspects in afr/nsr/ec
hi, I want to understand how are you guys planning to integrate NSR volumes to the existing CLIs. Here are some thoughts I had, wanted to know your thoughts: At the heart of both the replication/ec schemes we have 1) synchronization mechanisms a) afr,ec does it using locks b) nsr does it using leader election 2) Metadata to figure out the healing/reconciliation aspects a) afr,ec does it using xattrs b) nsr does it using journals I want to understand if there is a possibility of exposing these as different modules that we can mix and match, using options. If the users choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future if we come up with better metadata journals/stores it should be easy to plug them is what I'm thinking. The idea I have is based on the workload, users should be able to decide which pair of synchronization/metadata works best for them (Or we can also recommend based on our tests). Wanted to seek your inputs. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] pluggability of some aspects in afr/nsr/ec
On 10/29/2015 12:18 PM, Venky Shankar wrote: On Thu, Oct 29, 2015 at 11:36 AM, Pranith Kumar Karampuri <pkara...@redhat.com> wrote: hi, I want to understand how are you guys planning to integrate NSR volumes to the existing CLIs. Here are some thoughts I had, wanted to know your thoughts: At the heart of both the replication/ec schemes we have 1) synchronization mechanisms a) afr,ec does it using locks b) nsr does it using leader election 2) Metadata to figure out the healing/reconciliation aspects a) afr,ec does it using xattrs b) nsr does it using journals I want to understand if there is a possibility of exposing these as different modules that we can mix and match, using options. If the users Do you mean abstracting it out during volume creation? At a high level this could be in the form of client or server side replication. Not that AFR cannot be used on the server side (you'd know better than me), but, if at all this level of abstraction is used, we'd need to default to what fits best in what use case (as you already mentioned below) but still retaining the flexibility to override it. precisely. I think switching is not that difficult once we make sure healing is complete. Switching is a rare operation IMO so we can probably ask the users to do stop/choose-new-value/start the volume after choosing the options. This way is simpler than to migrate between the volumes where you have to probably copy the data. Pranith choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future if we come up with better metadata journals/stores it should be easy to plug them is what I'm thinking. The idea I have is based on the workload, users should be able to decide which pair of synchronization/metadata works best for them (Or we can also recommend based on our tests). Wanted to seek your inputs. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] pluggability of some aspects in afr/nsr/ec
On 10/29/2015 06:11 PM, Jeff Darcy wrote: I want to understand if there is a possibility of exposing these as different modules that we can mix and match, using options. It’s not only possible, but it’s easier than you might think. If an option is set (cluster.nsr IIRC) then we replace cluster/afr with cluster/nsr-client and then add some translators to the server-side stack. A year ago that was just one nsr-server translator. The journaling part has already been split out, and I plan to do the same with the leader-election parts (making them usable for server-side AFR or EC) as well. It shouldn’t be hard to control the addition and removal of these and related translators (e.g. index) with multiple options instead of just one. The biggest stumbling block I’ve actually hit when trying to do this with AFR on the server side is the *tests*, many of which can’t handle delays on the client side while the server side elects leaders and cross-connects peers. That’s all solvable. It just would have taken more time than I had available for the experiment. precisely. I think switching is not that difficult once we make sure healing is complete. Switching is a rare operation IMO so we can probably ask the users to do stop/choose-new-value/start the volume after choosing the options. This way is simpler than to migrate between the volumes where you have to probably copy the data. The two sets of metadata are *entirely* disjoint, which puts us in a good position compared e.g. to DHT/tiering which had overlaps. As long as the bricks are “clean” switching back and forth should be simple. In fact I expect to do this a lot when we get to characterizing performance etc. Good to hear this. choose 1b, 2b it becomes nsr and 1a, 2a becomes afr/ec. In future if we come up with better metadata journals/stores it should be easy to plug them is what I'm thinking. The idea I have is based on the workload, users should be able to decide which pair of synchronization/metadata works best for them (Or we can also recommend based on our tests). Wanted to seek your inputs. Absolutely. As I’m sure you’re tired of hearing, I believe NSR will outperform AFR by a significant margin for most workloads and configurations. I wouldn’t be the project’s initiator/leader if I didn’t believe that, but I’m OK if others disagree. We’ll find out eventually. ;) More importantly, “most” is still not “all”. Even by my own reckoning, there are cases in which AFR will perform better or be preferable for other reasons. EC’s durability and space-efficiency advantages make an even stronger case for preserving both kinds of data paths and metadata arrangements. That’s precisely why I want to make the journaling and leader-election parts more generic. All the best for your endeavors! Lets make users happy. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failure in ./tests/basic/ec/ec-readdir.t
Thanks Gaurav, Xavi is already looking into it. Meanwhile a patch to mark it bad test is already posted for review: http://review.gluster.org/#/c/12481/ Pranith On 11/02/2015 06:21 PM, Gaurav Garg wrote: Hi ./tests/basic/ec/ec-readdir.t test case. seems to be spurious failure in ec https://build.gluster.org/job/rackspace-regression-2GB-triggered/15395/consoleFull https://build.gluster.org/job/rackspace-regression-2GB-triggered/15388/consoleFull https://build.gluster.org/job/rackspace-regression-2GB-triggered/15386/consoleFull ccing ec team members. Thanx, ~Gaurav ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] glusterfs-3.7.5 released
Hi all, I'm pleased to announce the release of GlusterFS-3.7.5. This release includes 70 changes after 3.7.4. The list of fixed bugs is included below. Tarball and RPMs can be downloaded from http://download.gluster.org/pub/gluster/glusterfs/3.7/3.7.5/ Ubuntu debs are available from https://launchpad.net/~gluster/+archive/ubuntu/glusterfs-3.7 Debian Unstable (sid) packages have been updated and should be available from default repos. NetBSD has updated ports at ftp://ftp.netbsd.org/pub/pkgsrc/current/pkgsrc/filesystems/glusterfs/README.html Upgrade notes from 3.7.2 and earlier GlusterFS uses insecure ports by default from release v3.7.3. This causes problems when upgrading from release 3.7.2 and below to 3.7.3 and above. Performing the following steps before upgrading helps avoid problems. - Enable insecure ports for all volumes. ``` gluster volume set server.allow-insecure on gluster volume set client.bind-insecure on ``` - Enable insecure ports for GlusterD. Set the following line in `/etc/glusterfs/glusterd.vol` ``` option rpc-auth-allow-insecure on ``` This needs to be done on all the members in the cluster. Fixed bugs == 1258313 - Start self-heal and display correct heal info after replace brick 1268804 - Test tests/bugs/shard/bug-1245547.t failing consistently when run with patch http://review.gluster.org/#/c/11938/ 1261234 - Possible memory leak during rebalance with large quantity of files 1259697 - Disperse volume: Huge memory leak of glusterfsd process 1267817 - No quota API to get real hard-limit value. 1267822 - Have a way to disable readdirp on dht from glusterd volume set command 1267823 - Perf: Getting bad performance while doing ls 1267532 - Data Tiering:CLI crashes with segmentation fault when user tries "gluster v tier" command 1267149 - Perf: Getting bad performance while doing ls 1266822 - Add more logs in failure code paths + port existing messages to the msg-id framework 1262335 - Fix invalid logic in tier.t 1251821 - /usr/lib/glusterfs/ganesha/ganesha_ha.sh is distro specific 1258338 - Data Tiering: Tiering related information is not displayed in gluster volume info xml output 1266872 - FOP handling during file migration is broken in the release-3.7 branch. 1266882 - RFE: posix: xattrop 'GF_XATTROP_ADD_DEF_ARRAY' implementation 1246397 - POSIX ACLs as used by a FUSE mount can not use more than 32 groups 1265633 - AFR : "gluster volume heal dest=:1.65 reply_serial=2" 1265890 - rm command fails with "Transport end point not connected" during add brick 1261444 - cli : volume start will create/overwrite ganesha export file 1258347 - Data Tiering: Tiering related information is not displayed in gluster volume status xml output 1258340 - Data Tiering:Volume task status showing as remove brick when detach tier is trigger 1260919 - Quota+Rebalance : While rebalance is in progress , quota list shows 'Used Space' more than the Hard Limit set 1264738 - 'gluster v tier/attach-tier/detach-tier help' command shows the usage, and then throws 'Tier command failed' error message 1262700 - DHT + rebalance :- file permission got changed (sticky bit and setgid is set) after file migration failure 1263191 - Error not propagated correctly if selfheal layout lock fails 1258244 - Data Tieirng:Change error message as detach-tier error message throws as "remove-brick" 1263746 - Data Tiering:Setting only promote frequency and no demote frequency causes crash 1262408 - Data Tieirng:Detach tier status shows number of failures even when all files are migrated successfully 1262547 - `getfattr -n replica.split-brain-status ' command hung on the mount 1262547 - `getfattr -n replica.split-brain-status ' command hung on the mount 1262344 - quota: numbers of warning messages in nfs.log a single file itself 1260858 - glusterd: volume status backward compatibility 1261742 - Tier: glusterd crash when trying to detach , when hot tier is having exactly one brick and cold tier is of replica type 1262197 - DHT: Few files are missing after remove-brick operation 1261008 - Do not expose internal sharding xattrs to the application. 1262341 - Database locking due to write contention between CTR sql connection and tier migrator sql connection 1261715 - [HC] Fuse mount crashes, when client-quorum is not met 1260511 - fuse client crashed during i/o 1261664 - Tiering status command is very cumbersome. 1259694 - Data Tiering:Regression:Commit of detach tier passes without directly without even issuing a detach tier start 1260859 - snapshot: from nfs-ganesha mount no content seen in .snaps/ directory 1260856 - xml output for volume status on tiered volume 1260593 - man or info page of gluster needs to be updated with self-heal commands. 1257394 - Provide more meaningful errors on peer probe and peer detach 1258769 - Porting log messages to new framework 1255110 - client is sending io to arbiter with replica 2 1259652 - quota test 'quota-nfs.t'
Re: [Gluster-devel] Backup support for GlusterFS
Probably a good question on gluster-users (CCed) Pranith On 10/14/2015 03:57 AM, Brian Lahoue wrote: Has anyone tested backing up a fairly large Gluster implementation with Amanda/ZManda recently? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [release-3.7] seeing multiple crashes with 3.7 linux regression
On 10/07/2015 05:48 PM, Pranith Kumar Karampuri wrote: Sent the fix @http://review.gluster.org/12309 This fixes the afr issue. Will take a look at the other crash. Pranith On 10/07/2015 05:38 PM, Vijaikumar Mallikarjuna wrote: *https://build.gluster.org/job/rackspace-regression-2GB-triggered/14753/consoleFull* #gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.7122' /build/install/sbin/glusterfsd Program terminated with signal SIGSEGV, Segmentation fault. #0 0x00407eae in emancipate (ctx=0x0, ret=-1) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:1329 1329/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c: No such file or directory. (gdb) bt #0 0x00407eae in emancipate (ctx=0x0, ret=-1) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd.c:1329 #1 0x0040f806 in mgmt_pmap_signin_cbk (req=0x7f51e806fdec, iov=0x7f51eece15e0, count=1, myframe=0x7f51e806ee1c) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/glusterfsd/src/glusterfsd-mgmt.c:2174 #2 0x7f51fab1a6c7 in saved_frames_unwind (saved_frames=0x1535fb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:366 #3 0x7f51fab1a766 in saved_frames_destroy (frames=0x1535fb0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:383 #4 0x7f51fab1abf8 in rpc_clnt_connection_cleanup (conn=0x1534b70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:536 #5 0x7f51fab1b670 in rpc_clnt_notify (trans=0x1534fc0, mydata=0x1534b70, event=RPC_TRANSPORT_DISCONNECT, data=0x1534fc0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-clnt.c:856 #6 0x7f51fab17af3 in rpc_transport_notify (this=0x1534fc0, event=RPC_TRANSPORT_DISCONNECT, data=0x1534fc0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-lib/src/rpc-transport.c:544 #7 0x7f51f0305621 in socket_event_poll_err (this=0x1534fc0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:1151 #8 0x7f51f030a34c in socket_event_handler (fd=9, idx=1, data=0x1534fc0, poll_in=1, poll_out=0, poll_err=24) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/rpc/rpc-transport/socket/src/socket.c:2356 #9 0x7f51fadcb7c0 in event_dispatch_epoll_handler (event_pool=0x14fac90, event=0x7f51eece1e70) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:575 #10 0x7f51fadcbbae in event_dispatch_epoll_worker (data=0x1536180) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/libglusterfs/src/event-epoll.c:678 #11 0x7f51fa032a51 in start_thread () from ./lib64/libpthread.so.0 #12 0x7f51f999c93d in clone () from ./lib64/libc.so.6 *https://build.gluster.org/job/rackspace-regression-2GB-triggered/14748/consoleFull* #gdb -ex 'set sysroot ./' -ex 'core-file ./build/install/cores/core.25320' ./build/install/sbin/glusterfs #0 0x7fae978ccb0f in afr_local_replies_wipe (local=0x0, priv=0x7fae900125b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241 1241/home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c: No such file or directory. (gdb) bt #0 0x7fae978ccb0f in afr_local_replies_wipe (local=0x0, priv=0x7fae900125b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-common.c:1241 #1 0x7fae978b7aaf in afr_selfheal_inodelk (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, inode=0x7fae8c00609c, dom=0x7fae900099f0 "patchy-replicate-0", off=8126464, size=131072, locked_on=0x7fae96b4f110 "") at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-common.c:879 #2 0x7fae978bbeb5 in afr_selfheal_data_block (frame=0x7fae8c000c0c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", offset=8126464, size=131072, type=1, replies=0x7fae96b4f2b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:243 #3 0x7fae978bc91d in afr_selfheal_data_do (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, source=0, healed_sinks=0x7fae96b4f8a0 "", replies=0x7fae96b4f2b0) at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/xlators/cluster/afr/src/afr-self-heal-data.c:365 #4 0x7fae978bdc7b in __afr_selfheal_data (frame=0x7fae8c006c9c, this=0x7fae9000a6d0, fd=0x7fae8c006e6c, locked_on=0x7fae96b4fa00 "\001\001\240") at /home/jenkins/root/workspace/rackspace-regression-2GB-triggered/x
Re: [Gluster-devel] Automated bug workflow
On 07/07/2015 02:42 PM, Rafi Kavungal Chundattu Parambil wrote: Since we have some common interest in proposed design, IMHO let's start doing the implementation by keeping all of this valuable suggestions in mind. If any one interested to volunteer this project, please reply to this thread. I really want to contribute to this, but I am tied up with other work till end of this month. When are we trying to start this? Pranith Regards Rafi KC - Original Message - From: Shyam srang...@redhat.com To: Niels de Vos nde...@redhat.com, gluster-devel@gluster.org Sent: Friday, May 29, 2015 11:23:34 PM Subject: Re: [Gluster-devel] Automated bug workflow On 05/29/2015 12:51 PM, Niels de Vos wrote: Hi all, today we had a discussion about how to get the status of reported bugs more correct and up to date. It is something that has come up several times already, but now we have a BIG solution as Pranith calls it. The goal is rather simple, but is requires some thinking about rules and components that can actually take care of the automation. The general user-visible results would be: * rfc.sh will ask if this patch it the last one for the bug, or if more patches are expected * Gerrit will receive the patch with the answer, and modify the status of the bug to POST I like to do this manually. * when the patch is merged, Gerrit will change (or not) the status of the bug to MODIFIED I like to do this manually too... but automation does not hurt, esp. when I control when the bug moves to POST. * when a nightly build is made, all bugs that have patches included and the status of the bug is MODIFIED, the build script will change the status to ON_QA and set a fixed in version This I would like automated, as I am not tracking when it was released (of sorts). But, if I miss the nightly boat, I assume the automation would not pick this up, as a result automation on the MODIFIED step is good, as that would take care of this miss for me. This is a simplified view, there are some other cases that we need to take care of. These are documented in the etherpad linked below. We value any input for this, Kaleb and Rafi already gave some, thanks! Please let us know over email or IRC and we'll update the etherpad. Overall, we can have all of this, but I guess I will possibly never use the POST automation and do that myself. Thanks, Pranith Niels Etherpad with detailed step by step actions to take: https://public.pad.fsfe.org/p/gluster-automated-bug-workflow IRC log, where the discussion started: https://botbot.me/freenode/gluster-dev/2015-05-29/?msg=40450336page=2 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t
Sorry, seems like this is already fixed, I just need to rebase. Pranith On 07/09/2015 03:56 PM, Pranith Kumar Karampuri wrote: hi, Could you please look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] gfapi 3.6.3 QEMU 2.3 Ubuntu 14.04 testing
CC Prasanna who will be looking into it. On 07/06/2015 07:30 PM, Josh Boon wrote: Hey folks, Does anyone have test environment running Ubuntu 14.04, QEMU 2.0, and Gluster 3.6.3? I'm looking to have some folks test out QEMU 2.3 for stability and performance and see if it removes the segfault errors. Another group of folks are experiencing the same segfaults I still experience but looking over their logs my theory of it being related to a self-heal didn't work out. I've included the stack trace below from their environment which matches mine. I've already put together a PPA over at https://launchpad.net/~josh-boon/+archive/ubuntu/qemu-edge-glusterfs with QEMU 2.3 and deps built for trusty. If anyone has the time or the resources that I could get into I'd appreciate the support. I'd like to get this ironed out so I can give my full vote of confidence to Gluster again. Thanks, Josh Stack #0 0x7f369c95248c in ?? () No symbol table info available. #1 0x7f369bd2b3b1 in glfs_io_async_cbk (ret=optimized out, frame=optimized out, data=0x7f369ee536c0) at glfs-fops.c:598 gio = 0x7f369ee536c0 #2 0x7f369badb66a in syncopctx_setfspid (pid=0x7f369ee536c0) at syncop.c:191 opctx = 0x0 ret = -1 #3 0x00100011 in ?? () No symbol table info available. #4 0x7f36a5ae26b0 in ?? () No symbol table info available. #5 0x7f36a81e2800 in ?? () No symbol table info available. #6 0x7f36a5ae26b0 in ?? () No symbol table info available. #7 0x7f36a81e2800 in ?? () No symbol table info available. #8 0x in ?? () No symbol table info available. Full log attached. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] spurious failures in tests/bugs/snapshot/bug-1109889.t
hi, Could you please look into http://build.gluster.org/job/rackspace-regression-2GB-triggered/12150/consoleFull Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] context based defaults for volume options in glusterd
hi, Afr needs context based defaults for quorum where by default quorum value is 'none' for 2-way replica and 'auto' for 3 way replica. Anuradha sent http://review.gluster.org/11872 to fix the same. May be we can come up with more generic solution. The present solution remembers the default in volinfo->options and also written to the store. So the default will be shown in "volume info" output and if we want to change the defaults in future we will need to carefully think of all the things that could go wrong especially peers getting rejected because of the md5sum mismatch. Another way to solve the same problem is to generate the default value of the vme-option based on the context of the volume when we have to write to the volfile. In this particular case, we need to generate default as 'none' for 2-way-replica-count volume. and 'auto' for 3-replica-count volume. For volume-get command handling also we need to consider this dynamic default value. For implementing this, we can add a new member 'context_based_default_value_get()'(please feel free to come up with better name for the function :-) ) to the vme-table which can be invoked to get the default option which takes the volinfo as parameter at least, and not set .value i.e implicitly .value will be NULL. This is based on earlier design detail in the comment of vme-table: * Fourth field is . In this context they are used to specify * a default. That is, even the volume dict doesn't have a value, * we procced as if the default value were set for it. We just want to enhance the existing behavior with this proposed change. It seems more generic than the present solution in the patch. In future people can write their own implementations of context based default value generation following same procedure. Let me know your comments. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] context based defaults for volume options in glusterd
On 09/01/2015 11:55 AM, Krishnan Parthasarathi wrote: - Original Message - hi, Afr needs context based defaults for quorum where by default quorum value is 'none' for 2-way replica and 'auto' for 3 way replica. Anuradha sent http://review.gluster.org/11872 to fix the same. May be we can come up with more generic solution. The present solution remembers the default in volinfo->options and also written to the store. So the default will be shown in "volume info" output and if we want to change the defaults in future we will need to carefully think of all the things that could go wrong especially peers getting rejected because of the md5sum mismatch. Another way to solve the same problem is to generate the default value of the vme-option based on the context of the volume when we have to write to the volfile. In this particular case, we need to generate default as 'none' for 2-way-replica-count volume. and 'auto' for 3-replica-count volume. For volume-get command handling also we need to consider this dynamic default value. For implementing this, we can add a new member 'context_based_default_value_get()'(please feel free to come up with better name for the function :-) ) to the vme-table which can be invoked to get the default option which takes the volinfo as parameter at least, and not set .value i.e implicitly .value will be NULL. This is based on earlier design detail in the comment of vme-table: * Fourth field is . In this context they are used to specify * a default. That is, even the volume dict doesn't have a value, * we procced as if the default value were set for it. We just want to enhance the existing behavior with this proposed change. It seems more generic than the present solution in the patch. In future people can write their own implementations of context based default value generation following same procedure. Let me know your comments. Here are a few things that are not clear to me. 1) Does the context-based default value for an option comes into effect only when .value in vme table is NULL? If there is context based default then the static default value should be NULL is my feeling. 2) IIUC, the generated default value is applied to the volume files generated and persisted no place else. Is this correct? Yes. 3) What happens if the context_based_default_get() is not available in all glusterds in the cluster? e.g, upgrade from 3.6 to 3.7.x (where this may land). Shouldn't this behaviour also be 'versioned' to prevent different volume files being served by different nodes of the cluster? In the context_based_default_value_get() we can add the version checks and generate it the way we want. Pranith Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] context based defaults for volume options in glusterd
On 09/01/2015 12:05 PM, Krishnan Parthasarathi wrote: Here are a few things that are not clear to me. 1) Does the context-based default value for an option comes into effect only when .value in vme table is NULL? If there is context based default then the static default value should be NULL is my feeling. 2) IIUC, the generated default value is applied to the volume files generated and persisted no place else. Is this correct? Yes. 3) What happens if the context_based_default_get() is not available in all glusterds in the cluster? e.g, upgrade from 3.6 to 3.7.x (where this may land). Shouldn't this behaviour also be 'versioned' to prevent different volume files being served by different nodes of the cluster? In the context_based_default_value_get() we can add the version checks and generate it the way we want. Hmm. We have op-version for the options in vme table against which we ensure that all servers generate the same volume files. What versions would the context based default value generator functions use? I'd recommend documenting these details of the proposal and send a PR to gluster-specs repository. This needs to be reviewed carefully with all the details available in one place. The version context based defaults will use is dependent on the version the change needs to go into. We will do one thing. We will add this proposal to the specs repo and as an example we will give the link to the patch for afr quorum which implements this proposal. It will be very similar to the current implementation Anuradha came up with absent storing on gluster-store from op-version Point of view. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel