Re: [Gluster-devel] changelog bug
On Mon, Feb 08, 2016 at 12:53:33AM -0500, Manikandan Selvaganesh wrote: > Thanks and as you have mentioned, I have no clue how my changes > produced a core due to a NULL pointer in changelog. It is probably an unrelated bug that was nice enough to pop up here. Too often people disregard NetBSD failures and just retrigger without looking at the cause, but it has already proven its ability to expose bugs that are unwilling to come to light in Linux regressions, but still exist on Linux. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
- Original Message - > From: "Joe Julian"> To: gluster-devel@gluster.org > Sent: Monday, February 8, 2016 12:20:27 PM > Subject: Re: [Gluster-devel] Rebalance data migration and corruption > > Is this in current release versions? Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less. > > On 02/07/2016 07:43 PM, Shyam wrote: > > On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: > >> > >> > >> - Original Message - > >>> From: "Raghavendra Gowdappa" > >>> To: "Sakshi Bansal" , "Susant Palai" > >>> > >>> Cc: "Gluster Devel" , "Nithya > >>> Balachandran" , "Shyamsundar > >>> Ranganathan" > >>> Sent: Friday, February 5, 2016 4:32:40 PM > >>> Subject: Re: Rebalance data migration and corruption > >>> > >>> +gluster-devel > >>> > > Hi Sakshi/Susant, > > - There is a data corruption issue in migration code. Rebalance > process, > 1. Reads data from src > 2. Writes (say w1) it to dst > > However, 1 and 2 are not atomic, so another write (say w2) to > same region > can happen between 1. But these two writes can reach dst in the > order > (w2, > w1) resulting in a subtle corruption. This issue is not fixed > yet and can > cause subtle data corruptions. The fix is simple and involves > rebalance > process acquiring a mandatory lock to make 1 and 2 atomic. > >>> > >>> We can make use of compound fop framework to make sure we don't > >>> suffer a > >>> significant performance hit. Following will be the sequence of > >>> operations > >>> done by rebalance process: > >>> > >>> 1. issues a compound (mandatory lock, read) operation on src. > >>> 2. writes this data to dst. > >>> 3. issues unlock of lock acquired in 1. > >>> > >>> Please co-ordinate with Anuradha for implementation of this compound > >>> fop. > >>> > >>> Following are the issues I see with this approach: > >>> 1. features/locks provides mandatory lock functionality only for > >>> posix-locks > >>> (flock and fcntl based locks). So, mandatory locks will be > >>> posix-locks which > >>> will conflict with locks held by application. So, if an application > >>> has held > >>> an fcntl/flock, migration cannot proceed. > >> > >> We can implement a "special" domain for mandatory internal locks. > >> These locks will behave similar to posix mandatory locks in that > >> conflicting fops (like write, read) are blocked/failed if they are > >> done while a lock is held. > >> > >>> 2. data migration will be less efficient because of an extra unlock > >>> (with > >>> compound lock + read) or extra lock and unlock (for non-compound fop > >>> based > >>> implementation) for every read it does from src. > >> > >> Can we use delegations here? Rebalance process can acquire a > >> mandatory-write-delegation (an exclusive lock with a functionality > >> that delegation is recalled when a write operation happens). In that > >> case rebalance process, can do something like: > >> > >> 1. Acquire a read delegation for entire file. > >> 2. Migrate the entire file. > >> 3. Remove/unlock/give-back the delegation it has acquired. > >> > >> If a recall is issued from brick (when a write happens from mount), > >> it completes the current write to dst (or throws away the read from > >> src) to maintain atomicity. Before doing next set of (read, src) and > >> (write, dst) tries to reacquire lock. > > > > With delegations this simplifies the normal path, when a file is > > exclusively handled by rebalance. It also improves the case where a > > client and rebalance are conflicting on a file, to degrade to > > mandatory locks by either parties. > > > > I would prefer we take the delegation route for such needs in the future. > > > >> > >> @Soumyak, can something like this be done with delegations? > >> > >> @Pranith, > >> Afr does transactions for writing to its subvols. Can you suggest any > >> optimizations here so that rebalance process can have a transaction > >> for (read, src) and (write, dst) with minimal performance overhead? > >> > >> regards, > >> Raghavendra. > >> > >>> > >>> Comments? > >>> > > regards, > Raghavendra. > >>> > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote: > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull > > > [08:44:20] ./tests/basic/afr/self-heald.t .. > not ok 37 Got "0" instead of "1" > not ok 52 Got "0" instead of "1" > not ok 67 > Failed 4/83 subtests There is a core but it is from NetBSD FUSE subsystem. The trace is not helpful but suggests an abort() call because of unexpected situation: Core was generated by `perfused'. Program terminated with signal SIGABRT, Aborted. #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 (gdb) bt #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 /var/log/messages has a hint: Feb 8 08:43:15 nbslave7c perfused: file write grow without resize Indeed I have this assertion in NetBSD FUSE to catch a race condition. I think it is the first time I see hit raised, but I am unable to conclude on the cause. Let us retrigger (I did it) and see if someone else ever hit that again. The bug is more likely in NetBSD FUSE than in glusterfs. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
Le lundi 08 février 2016 à 16:22 +0530, Pranith Kumar Karampuri a écrit : > > On 02/08/2016 04:16 PM, Ravishankar N wrote: > > [Removing Milind, adding Pranith] > > > > On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote: > >> On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: > >>> The patch to add it to bad tests has already been merged, so I guess > >>> this > >>> .t's failure won't pop up again. > >> IMo that was a bit too quick. > > I guess Pranith merged it because of last week's complaint for the > > same .t and not wanting to block other patches from being merged. > > Yes, two people came to my desk and said their patches are blocked > because of this. So had to merge until we figure out the problem. I suspect it would be better if people did use the list rather than going to the desk, as it would help others who are either absent, in another office or even not working in the same company be aware of the issue. next time this happen, can you direct people to gluster-devel ? -- Michael Scherer Sysadmin, Community Infrastructure and Platform, OSAS signature.asc Description: This is a digitally signed message part ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/08/2016 09:13 AM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa"To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. What if the file is opened with O_NONBLOCK? Cant rebalance process skip the file and continue in case if mandatory lock acquisition fails? We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. So is the only difference between mandatory internal locks and posix mandatory locks is that internal locks shall not conflict with other application locks(advisory/mandatory)? 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. Right. But if there are simultaneous access to the same file from any other client and rebalance process, delegations shall not be granted or revoked if granted even though they are operating at different offsets. So if you rely only on delegations, migration may not proceed if an application has held a lock or doing any I/Os. Also ideally rebalance process has to take write delegation as it would end up writing the data on destination brick which shall affect READ I/Os, (though of course we can have special checks/hacks for internal generated fops). That said, having delegations shall definitely ensure correctness with respect to exclusive file access. Thanks, Soumya @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Reviewers needed for NSR client and server patches
Hi, We have two patches(mentioned below) for NSR client and NSR server available. These patches provide the basic client and server functionality as described in the design (https://docs.google.com/document/d/1bbxwjUmKNhA08wTmqJGkVd_KNCyaAMhpzx4dswokyyA/edit?usp=sharing). It would be great if people interested, could have a look at the patches and review them. NSR Client patch : http://review.gluster.org/#/c/12388/ NSR Server patch : http://review.gluster.org/#/c/12705/ Regards, Avra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/quota-anon-fd-nfs.t, ./tests/basic/tier/fops-during-migration.t, ./tests/basic/tier/record-metadata-heat.t
http://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14096/consoleFull [11:56:33] ./tests/basic/quota-anon-fd-nfs.t .. not ok 21 not ok 22 not ok 24 not ok 26 not ok 28 not ok 30 not ok 32 not ok 34 not ok 36 Failed 9/40 subtests [12:10:07] ./tests/basic/tier/fops-during-migration.t .. not ok 22 Failed 1/22 subtests [12:14:30] ./tests/basic/tier/record-metadata-heat.t .. not ok 16 Got "no" instead of "yes" Failed 1/18 subtests Looks like some cores are available as well. Please advise. -- Milind ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull [08:44:20] ./tests/basic/afr/self-heald.t .. not ok 37 Got "0" instead of "1" not ok 52 Got "0" instead of "1" not ok 67 Failed 4/83 subtests Please advise. -- Milind ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On 02/08/2016 04:16 PM, Ravishankar N wrote: [Removing Milind, adding Pranith] On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: The patch to add it to bad tests has already been merged, so I guess this .t's failure won't pop up again. IMo that was a bit too quick. I guess Pranith merged it because of last week's complaint for the same .t and not wanting to block other patches from being merged. Yes, two people came to my desk and said their patches are blocked because of this. So had to merge until we figure out the problem. Pranith What is the procedure to get out of the list? Usually, you just fix the problem with the testcase and send a patch with the fix and removing it from bad_tests. (For example http://review.gluster.org/13233) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: > The patch to add it to bad tests has already been merged, so I guess this > .t's failure won't pop up again. IMo that was a bit too quick. What is the procedure to get out of the list? -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] How to cope with spurious regression failures
On Tue, Jan 19, 2016 at 8:33 PM, Emmanuel Dreyfuswrote: > On Tue, Jan 19, 2016 at 07:08:03PM +0530, Raghavendra Talur wrote: > > a. Allowing re-running to tests to make them pass leads to complacency > with > > how tests are written. > > b. A test is bad if it is not deterministic and running a bad test has > *no* > > value. We are wasting time even if the test runs for a few seconds. > > I agree with your vision for the long term, but my proposal address the > short term situation. But we could use the retry approahc to fuel your > blacklist approach: > > We could immagine a system where the retry feature would cast votes on > individual tests: each time we fail once and succeed on retry, cast > a +1 unreliable for the test. > > After a few days, we will have a wall of shame for unreliable tests, > which could either be fixed or go to the blacklist. > > I do not know what software to use to collect and display the results, > though. Should we have a gerrit change for each test? > This should be the process of adding tests to bad tests list. However, I have run out of time on this one. If someone would like to implement go ahead. I don't see myself trying this any soon. > > -- > Emmanuel Dreyfus > m...@netbsd.org Thanks for the inputs. I have refactored run-tests.sh to use retry option. If run-tests.sh is started with -r flag, failed tests would be run once again and won't be considered as failed if they pass. Note: Adding -r flag to jenkins config is not done yet. I have also implemented a better version of blacklist which complies with requirements from Manu on granularity of bad tests to be OS. Here is the patch: http://review.gluster.org/#/c/13393/ ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On 02/08/2016 03:37 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 03:26:54PM +0530, Milind Changire wrote: https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14089/consoleFull [08:44:20] ./tests/basic/afr/self-heald.t .. not ok 37 Got "0" instead of "1" not ok 52 Got "0" instead of "1" not ok 67 Failed 4/83 subtests There is a core but it is from NetBSD FUSE subsystem. The trace is not helpful but suggests an abort() call because of unexpected situation: Core was generated by `perfused'. Program terminated with signal SIGABRT, Aborted. #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 (gdb) bt #0 0xbb7574b7 in _lwp_kill () from /usr/lib/libc.so.12 /var/log/messages has a hint: Feb 8 08:43:15 nbslave7c perfused: file write grow without resize Indeed I have this assertion in NetBSD FUSE to catch a race condition. I think it is the first time I see hit raised, but I am unable to conclude on the cause. Let us retrigger (I did it) and see if someone else ever hit that again. The bug is more likely in NetBSD FUSE than in glusterfs. The .t has been added to bad tests for now @ http://review.gluster.org/#/c/13344/, so you can probably rebase your patch. I'm not sure this is a problem with the case, the same issue was reported by Manikandan last week : https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull Is it one of those vndconfig errors? The .t seems to have skipped a few tests: --- ./tests/basic/afr/self-heald.t (Wstat: 0 Tests: 82 Failed: 3) Failed tests: 37, 52, 67 Parse errors: Tests out of sequence. Found (31) but expected (30) Tests out of sequence. Found (32) but expected (31) Tests out of sequence. Found (33) but expected (32) Tests out of sequence. Found (34) but expected (33) Tests out of sequence. Found (35) but expected (34) Displayed the first 5 of 54 TAP syntax errors. Re-run prove with the -p option to see them all. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote: > Indeed, same problem. But unfortunately it is not very reproductible since > we need to make a full week of runs to see it again. I am tempted to > just remove the assertion. NB: this does not fail on stock NetBSD release: the assertion is only there because FUSE is build with -DDEBUG on NetBSD slave VM. OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to get it by looping on the test for a while. I will try this on nbslave70. In the meatime if that one pops up too often and gets annoying, I can get rid of it by just disabling debug mode. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On 02/08/2016 04:22 PM, Pranith Kumar Karampuri wrote: On 02/08/2016 04:16 PM, Ravishankar N wrote: [Removing Milind, adding Pranith] On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: The patch to add it to bad tests has already been merged, so I guess this .t's failure won't pop up again. IMo that was a bit too quick. I guess Pranith merged it because of last week's complaint for the same .t and not wanting to block other patches from being merged. Yes, two people came to my desk and said their patches are blocked because of this. So had to merge until we figure out the problem. Patch is from last week though. Pranith Pranith What is the procedure to get out of the list? Usually, you just fix the problem with the testcase and send a patch with the fix and removing it from bad_tests. (For example http://review.gluster.org/13233) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On Mon, Feb 08, 2016 at 03:44:43PM +0530, Ravishankar N wrote: > The .t has been added to bad tests for now @ I am note sure this is relevant: does it fails again? I am very interested if it is reproductible. > http://review.gluster.org/#/c/13344/, so you can probably rebase your patch. > I'm not sure this is a problem with the case, the same issue was reported by > Manikandan last week : > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13895/consoleFull Indeed, same problem. But unfortunately it is not very reproductible since we need to make a full week of runs to see it again. I am tempted to just remove the assertion. > Is it one of those vndconfig errors? The .t seems to have skipped a few > tests: This is because FUSE went away during the test. The vnconfig problems are fixed now and should not happen anymore. > -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On 02/08/2016 04:00 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 10:26:22AM +, Emmanuel Dreyfus wrote: Indeed, same problem. But unfortunately it is not very reproductible since we need to make a full week of runs to see it again. I am tempted to just remove the assertion. NB: this does not fail on stock NetBSD release: the assertion is only there because FUSE is build with -DDEBUG on NetBSD slave VM. OTOH if it happens only in tests/basic/afr/self-heal.t I may be able to get it by looping on the test for a while. I will try this on nbslave70. Thanks Emmanuel! In the meatime if that one pops up too often and gets annoying, I can get rid of it by just disabling debug mode. The patch to add it to bad tests has already been merged, so I guess this .t's failure won't pop up again. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
[Removing Milind, adding Pranith] On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: The patch to add it to bad tests has already been merged, so I guess this .t's failure won't pop up again. IMo that was a bit too quick. I guess Pranith merged it because of last week's complaint for the same .t and not wanting to block other patches from being merged. What is the procedure to get out of the list? Usually, you just fix the problem with the testcase and send a patch with the fix and removing it from bad_tests. (For example http://review.gluster.org/13233) ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On Mon, Feb 8, 2016 at 4:31 PM, Soumya Koduriwrote: > > > On 02/08/2016 09:13 AM, Shyam wrote: > >> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: >> >>> >>> >>> - Original Message - >>> From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel > Hi Sakshi/Susant, > > - There is a data corruption issue in migration code. Rebalance > process, >1. Reads data from src >2. Writes (say w1) it to dst > >However, 1 and 2 are not atomic, so another write (say w2) to > same region >can happen between 1. But these two writes can reach dst in the > order >(w2, >w1) resulting in a subtle corruption. This issue is not fixed yet > and can >cause subtle data corruptions. The fix is simple and involves > rebalance >process acquiring a mandatory lock to make 1 and 2 atomic. > We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. >>> > What if the file is opened with O_NONBLOCK? Cant rebalance process skip > the file and continue in case if mandatory lock acquisition fails? Similar functionality can be achieved by acquiring non-blocking inodelk like SETLK (as opposed to SETLKW). However whether rebalance process should block or not depends on the use case. In Some use-cases (like remove-brick) rebalance process _has_ to migrate all the files. Even for other scenarios skipping too many files is not a good idea as it beats the purpose of running rebalance. So one of the design goals is to migrate as many files as possible without making design too complex. > > >>> We can implement a "special" domain for mandatory internal locks. >>> These locks will behave similar to posix mandatory locks in that >>> conflicting fops (like write, read) are blocked/failed if they are >>> done while a lock is held. >>> >> > So is the only difference between mandatory internal locks and posix > mandatory locks is that internal locks shall not conflict with other > application locks(advisory/mandatory)? Yes. Mandatory internal locks (aka Mandatory inodelk for this discussion) will conflict only in their domain. They also conflict with any fops that might change the file (primarily write here, but different fops can be added based on requirement). So in a fop like writev we need to check in two lists - external lock (posix lock) list _and_ mandatory inodelk list. The reason (if not clear) for using mandatory locks by rebalance process is that clients need not be bothered with acquiring a lock (which will unnecessarily degrade performance of I/O when there is no rebalance going on). Thanks to Raghavendra Talur for suggesting this idea (though in a different context of lock migration, but the use-cases are similar). > > >>> 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. >>> >>> Can we use delegations here? Rebalance process can acquire a >>> mandatory-write-delegation (an exclusive lock with a functionality >>> that delegation is recalled when a write operation happens). In that >>> case rebalance process, can do something like: >>> >>> 1. Acquire a read delegation for entire file. >>> 2. Migrate the entire file. >>> 3. Remove/unlock/give-back the delegation it has acquired. >>> >>> If a recall is issued from brick (when a write happens from mount), it >>> completes the current write to dst (or throws away the read from src) >>> to maintain atomicity. Before doing next set of (read, src) and >>> (write, dst) tries to reacquire lock. >>> >> >> With delegations this simplifies the normal path, when a file is >> exclusively handled by rebalance. It also improves the case where a >> client and rebalance are
Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull
On 02/08/2016 08:20 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 07:27:46PM +0530, Pranith Kumar Karampuri wrote: I don't see any logs in the archive. Did we change something? I think thay are in a different tarball, in /archives/logs I think the regression run is not giving that link anymore when the crash happens? Could you please add that also as a link in regression run? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
>>Right. But if there are simultaneous access to the same file from > any other client and rebalance process, delegations shall not be >> granted or revoked if granted even though they are operating at >> different offsets. So if you rely only on delegations, migration may >> not proceed if an application has held a lock or doing any I/Os. >> >> >> Does the brick process wait for the response of delegation holder >> (rebalance process here) before it wipes out the delegation/locks? If >> that's the case, rebalance process can complete one transaction of >> (read, src) and (write, dst) before responding to a delegation recall. >> That way there is no starvation for both applications and rebalance >> process (though this makes both of them slower, but that cannot helped I >> think). >> > > yes. Brick process should wait for certain period before revoking the > delegations forcefully in case if it is not returned by the client. Also if > required (like done by NFS servers) we can choose to increase this timeout > value at run time if the client is diligently flushing the data. hmm.. I would prefer an infinite timeout. The only scenario where brick process can forcefully flush leases would be connection lose with rebalance process. The more scenarios where brick can flush leases without knowledge of rebalance process, we open up more race-windows for this bug to occur. In fact at least in theory to be correct, rebalance process should replay all the transactions that happened during the lease which got flushed out by brick (after re-acquiring that lease). So, we would like to avoid any such scenarios. Btw, what is the necessity of timeouts? Is it an insurance against rogue clients who won't respond back to lease recalls? ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
- Original Message - > From: "Joe Julian"> To: "Raghavendra Gowdappa" > Cc: gluster-devel@gluster.org > Sent: Monday, February 8, 2016 9:08:45 PM > Subject: Re: [Gluster-devel] Rebalance data migration and corruption > > > > On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote: > > > > - Original Message - > >> From: "Joe Julian" > >> To: gluster-devel@gluster.org > >> Sent: Monday, February 8, 2016 12:20:27 PM > >> Subject: Re: [Gluster-devel] Rebalance data migration and corruption > >> > >> Is this in current release versions? > > Yes. This bug is present in currently released versions. However, it can > > happen only if writes from application are happening to a file when it is > > being migrated. So, vaguely one can say probability is less. > > Probability is quite high when the volume is used for VM images, which > many are. The primary requirement for this corruption is that file should be under migration. Given that rebalance is done only during add/remove brick scenarios (or may be as a routine housekeeping to make lookups faster), I added that probability is lower. However, this will not be the case with tier where files can be under constant promotion/demotion because of access patterns. If there is a constant migration, dht too is susceptible to this bug with similar probability. > > > > >> On 02/07/2016 07:43 PM, Shyam wrote: > >>> On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: > > - Original Message - > > From: "Raghavendra Gowdappa" > > To: "Sakshi Bansal" , "Susant Palai" > > > > Cc: "Gluster Devel" , "Nithya > > Balachandran" , "Shyamsundar > > Ranganathan" > > Sent: Friday, February 5, 2016 4:32:40 PM > > Subject: Re: Rebalance data migration and corruption > > > > +gluster-devel > > > >> Hi Sakshi/Susant, > >> > >> - There is a data corruption issue in migration code. Rebalance > >> process, > >> 1. Reads data from src > >> 2. Writes (say w1) it to dst > >> > >> However, 1 and 2 are not atomic, so another write (say w2) to > >> same region > >> can happen between 1. But these two writes can reach dst in the > >> order > >> (w2, > >> w1) resulting in a subtle corruption. This issue is not fixed > >> yet and can > >> cause subtle data corruptions. The fix is simple and involves > >> rebalance > >> process acquiring a mandatory lock to make 1 and 2 atomic. > > We can make use of compound fop framework to make sure we don't > > suffer a > > significant performance hit. Following will be the sequence of > > operations > > done by rebalance process: > > > > 1. issues a compound (mandatory lock, read) operation on src. > > 2. writes this data to dst. > > 3. issues unlock of lock acquired in 1. > > > > Please co-ordinate with Anuradha for implementation of this compound > > fop. > > > > Following are the issues I see with this approach: > > 1. features/locks provides mandatory lock functionality only for > > posix-locks > > (flock and fcntl based locks). So, mandatory locks will be > > posix-locks which > > will conflict with locks held by application. So, if an application > > has held > > an fcntl/flock, migration cannot proceed. > We can implement a "special" domain for mandatory internal locks. > These locks will behave similar to posix mandatory locks in that > conflicting fops (like write, read) are blocked/failed if they are > done while a lock is held. > > > 2. data migration will be less efficient because of an extra unlock > > (with > > compound lock + read) or extra lock and unlock (for non-compound fop > > based > > implementation) for every read it does from src. > Can we use delegations here? Rebalance process can acquire a > mandatory-write-delegation (an exclusive lock with a functionality > that delegation is recalled when a write operation happens). In that > case rebalance process, can do something like: > > 1. Acquire a read delegation for entire file. > 2. Migrate the entire file. > 3. Remove/unlock/give-back the delegation it has acquired. > > If a recall is issued from brick (when a write happens from mount), > it completes the current write to dst (or throws away the read from > src) to maintain atomicity. Before doing next set of (read, src) and > (write, dst) tries to reacquire lock. > >>> With delegations this simplifies the normal path, when a file is > >>> exclusively handled by rebalance. It also improves the case where a > >>> client and rebalance are conflicting on a file, to
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/09/2016 10:27 AM, Raghavendra G wrote: On Mon, Feb 8, 2016 at 4:31 PM, Soumya Koduri> wrote: On 02/08/2016 09:13 AM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" > To: "Sakshi Bansal" >, "Susant Palai" > Cc: "Gluster Devel" >, "Nithya Balachandran" >, "Shyamsundar Ranganathan" > Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. What if the file is opened with O_NONBLOCK? Cant rebalance process skip the file and continue in case if mandatory lock acquisition fails? Similar functionality can be achieved by acquiring non-blocking inodelk like SETLK (as opposed to SETLKW). However whether rebalance process should block or not depends on the use case. In Some use-cases (like remove-brick) rebalance process _has_ to migrate all the files. Even for other scenarios skipping too many files is not a good idea as it beats the purpose of running rebalance. So one of the design goals is to migrate as many files as possible without making design too complex. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. So is the only difference between mandatory internal locks and posix mandatory locks is that internal locks shall not conflict with other application locks(advisory/mandatory)? Yes. Mandatory internal locks (aka Mandatory inodelk for this discussion) will conflict only in their domain. They also conflict with any fops that might change the file (primarily write here, but different fops can be added based on requirement). So in a fop like writev we need to check in two lists - external lock (posix lock) list _and_ mandatory inodelk list. The reason (if not clear) for using mandatory locks by rebalance process is that clients need not be bothered with acquiring a lock (which will unnecessarily degrade performance of I/O when there is no rebalance going on). Thanks to Raghavendra Talur for suggesting this idea (though in a different context of lock migration, but the use-cases are similar). 2.
[Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull
Emmanuel, I don't see any logs in the archive. Did we change something? Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/afr/self-heald.t
On 02/08/2016 05:04 PM, Michael Scherer wrote: Le lundi 08 février 2016 à 16:22 +0530, Pranith Kumar Karampuri a écrit : On 02/08/2016 04:16 PM, Ravishankar N wrote: [Removing Milind, adding Pranith] On 02/08/2016 04:09 PM, Emmanuel Dreyfus wrote: On Mon, Feb 08, 2016 at 04:05:44PM +0530, Ravishankar N wrote: The patch to add it to bad tests has already been merged, so I guess this .t's failure won't pop up again. IMo that was a bit too quick. I guess Pranith merged it because of last week's complaint for the same .t and not wanting to block other patches from being merged. Yes, two people came to my desk and said their patches are blocked because of this. So had to merge until we figure out the problem. I suspect it would be better if people did use the list rather than going to the desk, as it would help others who are either absent, in another office or even not working in the same company be aware of the issue. next time this happen, can you direct people to gluster-devel ? Will do :-). Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [FAILED] NetBSD-regression for ./tests/basic/quota-anon-fd-nfs.t, ./tests/basic/tier/fops-during-migration.t, ./tests/basic/tier/record-metadata-heat.t
On Mon, Feb 08, 2016 at 06:25:09PM +0530, Milind Changire wrote: > Looks like some cores are available as well. > Please advise. #0 0xb99912b4 in gf_changelog_reborp_rpcsvc_notify (rpc=0xb7b160f0, mydata=0xb7b1a830, event=RPCSVC_EVENT_ACCEPT, data=0xb76a4030) at /home/jenkins/root/workspace/rackspace-netbsd7-regression-triggered/xlators/features/changelog/lib/src/gf-changelog-reborp.c:110 110 return 0; Crash on return: That smells like stack coruption. -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Cores on NetBSD of brick https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/14100/consoleFull
On Mon, Feb 08, 2016 at 07:27:46PM +0530, Pranith Kumar Karampuri wrote: > I don't see any logs in the archive. Did we change something? I think thay are in a different tarball, in /archives/logs -- Emmanuel Dreyfus m...@netbsd.org ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Rebalance data migration and corruption
On 02/08/2016 12:18 AM, Raghavendra Gowdappa wrote: - Original Message - From: "Joe Julian"To: gluster-devel@gluster.org Sent: Monday, February 8, 2016 12:20:27 PM Subject: Re: [Gluster-devel] Rebalance data migration and corruption Is this in current release versions? Yes. This bug is present in currently released versions. However, it can happen only if writes from application are happening to a file when it is being migrated. So, vaguely one can say probability is less. Probability is quite high when the volume is used for VM images, which many are. On 02/07/2016 07:43 PM, Shyam wrote: On 02/06/2016 06:36 PM, Raghavendra Gowdappa wrote: - Original Message - From: "Raghavendra Gowdappa" To: "Sakshi Bansal" , "Susant Palai" Cc: "Gluster Devel" , "Nithya Balachandran" , "Shyamsundar Ranganathan" Sent: Friday, February 5, 2016 4:32:40 PM Subject: Re: Rebalance data migration and corruption +gluster-devel Hi Sakshi/Susant, - There is a data corruption issue in migration code. Rebalance process, 1. Reads data from src 2. Writes (say w1) it to dst However, 1 and 2 are not atomic, so another write (say w2) to same region can happen between 1. But these two writes can reach dst in the order (w2, w1) resulting in a subtle corruption. This issue is not fixed yet and can cause subtle data corruptions. The fix is simple and involves rebalance process acquiring a mandatory lock to make 1 and 2 atomic. We can make use of compound fop framework to make sure we don't suffer a significant performance hit. Following will be the sequence of operations done by rebalance process: 1. issues a compound (mandatory lock, read) operation on src. 2. writes this data to dst. 3. issues unlock of lock acquired in 1. Please co-ordinate with Anuradha for implementation of this compound fop. Following are the issues I see with this approach: 1. features/locks provides mandatory lock functionality only for posix-locks (flock and fcntl based locks). So, mandatory locks will be posix-locks which will conflict with locks held by application. So, if an application has held an fcntl/flock, migration cannot proceed. We can implement a "special" domain for mandatory internal locks. These locks will behave similar to posix mandatory locks in that conflicting fops (like write, read) are blocked/failed if they are done while a lock is held. 2. data migration will be less efficient because of an extra unlock (with compound lock + read) or extra lock and unlock (for non-compound fop based implementation) for every read it does from src. Can we use delegations here? Rebalance process can acquire a mandatory-write-delegation (an exclusive lock with a functionality that delegation is recalled when a write operation happens). In that case rebalance process, can do something like: 1. Acquire a read delegation for entire file. 2. Migrate the entire file. 3. Remove/unlock/give-back the delegation it has acquired. If a recall is issued from brick (when a write happens from mount), it completes the current write to dst (or throws away the read from src) to maintain atomicity. Before doing next set of (read, src) and (write, dst) tries to reacquire lock. With delegations this simplifies the normal path, when a file is exclusively handled by rebalance. It also improves the case where a client and rebalance are conflicting on a file, to degrade to mandatory locks by either parties. I would prefer we take the delegation route for such needs in the future. @Soumyak, can something like this be done with delegations? @Pranith, Afr does transactions for writing to its subvols. Can you suggest any optimizations here so that rebalance process can have a transaction for (read, src) and (write, dst) with minimal performance overhead? regards, Raghavendra. Comments? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel