Re: [Gluster-devel] Sharding - what next?
- Original Message - > From: "Lindsay Mathieson" > To: "Krutika Dhananjay" , "Gluster Devel" > , "gluster-users" > Sent: Wednesday, December 9, 2015 6:48:40 PM > Subject: Re: Sharding - what next? > Hi Guys, sorry for the late reply, my attention tends to be somewhat sporadic > due to work and the large number of rescue dogs/cats I care for :) > On 3/12/2015 8:34 PM, Krutika Dhananjay wrote: > > We would love to hear from you on what you think of the feature and where > > it > > could be improved. > > > Specifically, the following are the questions we are seeking feedback on: > > > a) your experience testing sharding with VM store use-case - any bugs you > > ran > > into, any performance issues, etc > > Testing was initially somewhat stressful as I regularly encountered file > corruption. However I don't think that was due to bugs, rather incorrect > settings for the VM usecase. Once I got that sorted out it has been very > stable - I have really stressed failure modes we run into at work - nodes > going down while heavy writes were happening. Live migrations during heals. > gluster software being killed while VM were running on the host. So far its > held up without a hitch. > To that end, one thing I think should be made more obvious is the settings > required for VM Hosting: > > quick-read=off > > > read-ahead=off > > > io-cache=off > > > stat-prefetch=off > > > eager-lock=enable > > > remote-dio=enable > > > quorum-type=auto > > > server-quorum-type=server > > They are quite crucial and very easy to miss in the online docs. And they are > only recommended with noo mention that you will corrupt KVM VM's if you live > migrate them between gluster nodes without them set. Also the virt group is > missing from the debian packages. Hi Lindsay, Thanks for the feedback. I will get in touch with Humble to find out what can be done about the docs. > Setting them does seem to have slowed sequential writes by about 10% but I > need to test that more. > Something related - sharding is useful because it makes heals much more > granular and hence faster. To that end it would be really useful if there > was a heal info variant that gave a overview of the process - rather than > list the shards that are being healed, just a aggregate total, e.g. > $ gluster volume heal datastore1 status > volume datastore1 > - split brain: 0 > - Wounded:65 > - healing:4 > It gives one a easy feeling of progress - heals aren't happening faster, but > it would feel that way :) There is a 'heal-info summary' command that is under review, written by Mohammed Ashiq @ http://review.gluster.org/#/c/12154/3 which prints the number of files that are yet to be healed. It could perhaps be enhanced to print files in split-brain and also files which are possibly being healed. Note that these counts are printed per brick. It does not print a single list of counts with aggregated values. Would that be something you would consider useful? > Also, it would be great if the heal info command could return faster, > sometimes it takes over a minute. Yeah, I think part of the problem could be eager-lock feature which is causing the GlusterFS client process to not relinquish the network lock on the file soon enough, causing the heal info utility to be blocked for longer duration. There is an enhancement Anuradha Talur is working on where heal-info would do away with taking locks altogether. Once that is in place, heal-info should return faster. -Krutika > Thanks for the great work, > Lindsay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] libgfapi compound operations - multiple writes
Answers inline. Regards, Poornima - Original Message - > From: "Raghavendra Gowdappa" > To: "Poornima Gurusiddaiah" > Cc: "Gluster Devel" > Sent: Wednesday, December 9, 2015 9:00:55 PM > Subject: libgfapi compound operations - multiple writes > > forking off since it muddles the original conversation. I've some questions: > > 1. Why do multiple writes need to be compounded together? If the application splits the large write into fixed sized chunks (Samba 64KB), it would be an option to compound it. > 2. If the reason is aggregation, cant we tune write-behind to do the same? Yes surely. IO-cache would be a better candidate? write behind mostly doesn't aggregate. Since this can also be one by compound fops, just added as good to have. But not mandatory as it can be achieved otherwise. > > regards, > Raghavendra. > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] test throws core intermittently: tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t
Hi, this issue already reported by community and it seems that there is problem during cleanup when features.encryption is enable. previous discussion on the same core: http://nongnu.13855.n7.nabble.com/Upstream-regression-crash-https-build-gluster-org-job-rackspace-regression-2GB-triggered-16191-consol-td206079.html will look into this issue further. Thanks, Gaurav - Original Message - From: "Vijay Bellur" To: "Michael Adam" , gluster-devel@gluster.org, "Gaurav Garg" Sent: Thursday, December 10, 2015 9:12:08 AM Subject: Re: [Gluster-devel] test throws core intermittently: tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t On 12/09/2015 07:33 PM, Michael Adam wrote: > by > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16674/consoleFull > > Gaurav - can you please check this test? It caused the baseline regression to fail as well: https://build.gluster.org/job/regression-test-burn-in/47/console Regards, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] libgfapi compound operations - multiple writes
- Original Message - > From: "Jeff Darcy" > To: "Raghavendra Gowdappa" , "Poornima Gurusiddaiah" > > Cc: "Gluster Devel" > Sent: Wednesday, December 9, 2015 10:36:43 PM > Subject: Re: [Gluster-devel] libgfapi compound operations - multiple writes > > > > > On December 9, 2015 at 10:31:03 AM, Raghavendra Gowdappa > (rgowd...@redhat.com) wrote: > > forking off since it muddles the original conversation. I've some > > questions: > > > > 1. Why do multiple writes need to be compounded together? > > 2. If the reason is aggregation, cant we tune write-behind to do the same? > > I think compounding (as we’ve been discussing it) is only necessary when > there’s a dependency between operations. For example, if the first > creates a value (e.g. file descriptor) used by the second, or if the > second should not proceed unless the first (e.g. a lock) succeeded. If > multiple operations are completely independent of one another, as is the > case for writes without fsync, then I think we should rely on > write-behind or something similar instead. Compounding is likely to be > the wrong solution here for two reasons: > > * Correctness: if the writes are independent, there’s no reason why > failure of the first should cause the second not to be issued (as > would be the case with compounding). > > * Performance: compounding would keep the writes separate, whereas > write-behind can reduce overhead even more by coalescing them into a > single request. Yes. I had similar thoughts while asking the question. Thanks for elaborating. > > There is, however, one case where compounding would be the right answer: > when there really is a dependency between the writes. There’s no way to > specify this through the POSIX/VFS interface (more’s the pity), but it’s > easy to imagine GFAPI or internal use cases where a second write should > not overtake or continue without the first - e.g. a key/value store > that writes new data followed by an index update pointing to that data. > The strictly-sequential behavior of a compound operation might be just > the right match for such cases. We have one such use-case already i.e., O_APPEND writes. In fact write-behind has enough logic to address dependencies like conflicting writes, read, stat etc on just written regions etc (Of course, we would loose performance gains as write-behind still wind calls across network for dependent ops. But again, if write-behind cache is sufficient enough, this latency is not witnessed by application). So, I am wondering can we pass down these dependency requirements down the stack and let write-behind handle them. @Poornima and others, Did you've any such use-cases in mind when you proposed compounding? regards, Raghavendra ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] intermittent failure - tests/bugs/glusterd/bug-1225716-brick-online-validation-remove-brick.t
Hi Sakshi, can you please take a look into ./tests/bugs/glusterd/bug-1225716-brick-online-validation-remove-brick.t ? A non-related patch got affected by this test: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16686/consoleFull Thanks, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] test throws core intermittently: tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t
On 12/09/2015 07:33 PM, Michael Adam wrote: by https://build.gluster.org/job/rackspace-regression-2GB-triggered/16674/consoleFull Gaurav - can you please check this test? It caused the baseline regression to fail as well: https://build.gluster.org/job/regression-test-burn-in/47/console Regards, Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] everything builds and installs on FreeBSD via ports tarball
Hi, Just to let you know, others are working on a "port" for FreeBSD and everything builds/installs when you use it. (It uses gcc and some other things I didn't use. I suspect my "make install" problem had to do with using BSD make, but I don't know.) Anyhow, it build/installs and work once you: cp /usr/local/etc/glusterfs/glusterd.vol.sample /usr/local/etc/glusterfs/glusterd.vol Hopefully the "port" will be fixed for this and set up soon, which will make it much easier for others to test on FreeBSD. Just fyi, rick ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] submitted one patch for marking several tests bad
Since all those patches mutually prevent the other patches addition to the bad tests list from successfully running regressions, I created a patch to add all those that I have seen recently: http://review.gluster.org/#/c/12933/ If it is too much for your taste, I'll reduce... :-) Cheers - Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] test throws core intermittently: tests/bugs/snapshot/bug-1140162-file-snapshot-features-encrypt-opts-validation.t
by https://build.gluster.org/job/rackspace-regression-2GB-triggered/16674/consoleFull signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] intermittent failure: tests/bugs/tier/bug-1279376-rename-demoted-file.t
Another one? https://build.gluster.org/job/rackspace-regression-2GB-triggered/16675/console Triggered by: http://review.gluster.org/12930 Cheers - Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent failure: tests/basic/afr/split-brain-healing.t
On 2015-12-09 at 17:00 +0100, Michael Adam wrote: > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16652/consoleFull > > triggered by > > http://review.gluster.org/#/c/12826/ More of these happen. E.g.: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16680/consoleFull Created a bug https://bugzilla.redhat.com/show_bug.cgi?id=1290245 and a patch mark the test as bad: http://review.gluster.org/#/c/12932/ Thanks - Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent failure: tests/features/weighted-rebalance.t
On 2015-12-09 at 19:59 +0100, Michael Adam wrote: > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12530/consoleFull > > http://review.gluster.org/#/c/12929/ > > Michael Having eliminated arbiter-statfs.t (in the review request above), this seems to be the next suspect. https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12538/consoleFull Created a BZ: https://bugzilla.redhat.com/show_bug.cgi?id=1290204 and a patch to mark it bad: http://review.gluster.org/12931 Cheers - Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] intermittent failure: tests/features/weighted-rebalance.t
https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12530/consoleFull http://review.gluster.org/#/c/12929/ Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
On 12/09/2015 08:11 PM, Shyam wrote: On 12/09/2015 02:37 AM, Soumya Koduri wrote: On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote: On 12/09/2015 06:37 AM, Vijay Bellur wrote: On 12/08/2015 03:45 PM, Jeff Darcy wrote: On December 8, 2015 at 12:53:04 PM, Ira Cooper (i...@redhat.com) wrote: Raghavendra Gowdappa writes: I propose that we define a "compound op" that contains ops. Within each op, there are fields that can be "inherited" from the previous op, via use of a sentinel value. Sentinel is -1, for all of these examples. So: LOOKUP (1, "foo") (Sets the gfid value to be picked up by compounding, 1 is the root directory, as a gfid, by convention.) OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.) WRITE(-1, "foo", 3) (Uses the glfd compound value.) CLOSE(-1) (Uses the glfd compound value) So, basically, what the programming-language types would call futures and promises. It’s a good and well studied concept, which is necessary to solve the second-order problem of how to specify an argument in sub-operation N+1 that’s not known until sub-operation N completes. To be honest, some of the highly general approaches suggested here scare me too. Wrapping up the arguments for one sub-operation in xdata for another would get pretty hairy if we ever try to go beyond two sub-operations and have to nest sub-operation #3’s args within sub-operation #2’s xdata which is itself encoded within sub-operation #1’s xdata. There’s also not much clarity about how to handle errors in that model. Encoding N sub-operations’ arguments in a linear structure as Shyam proposes seems a bit cleaner that way. If I were to continue down that route I’d suggest just having start_compound and end-compound fops, plus an extra field (or by-convention xdata key) that either the client-side or server-side translator could use to build whatever structure it wants and schedule sub-operations however it wants. However, I’d be even more comfortable with an even simpler approach that avoids the need to solve what the database folks (who have dealt with complex transactions for years) would tell us is a really hard problem. Instead of designing for every case we can imagine, let’s design for the cases that we know would be useful for improving performance. Open plus read/write plus close is an obvious one. Raghavendra mentions create+inodelk as well. For each of those, we can easily define a structure that contains the necessary fields, we don’t need a client-side translator, and the server-side translator can take care of “forwarding” results from one sub-operation to the next. We could even use GF_FOP_IPC to prototype this. If we later find that the number of “one-off” compound requests is growing too large, then at least we’ll have some experience to guide our design of a more general alternative. Right now, I think we’re trying to look further ahead than we can see clearly. Yes Agree. This makes implementation on the client side simpler as well. So it is welcome. Just updating the solution. 1) New RPCs are going to be implemented. 2) client stack will use these new fops. 3) On the server side we have server xlator implementing these new fops to decode the RPC request then resolve_resume and compound-op-receiver(Better name for this is welcome) which sends one op after other and send compound fop response. @Pranith, I assume you would expand on this at a later date (something along the lines of what Soumya has done below, right? I will talk to her tomorrow to know more about this. Not saying this is what I will be implementing (There doesn't seem to be any consensus yet). But I would love to know how it is implemented. Pranith List of compound fops identified so far: Swift/S3: PUT: creat(), write()s, setxattr(), fsync(), close(), rename() Dht: mkdir + inodelk Afr: xattrop+writev, xattrop+unlock to begin with. Could everyone who needs compound fops add to this list? I see that Niels is back on 14th. Does anyone else know the list of compound fops he has in mind? From the discussions we had with Niels regarding the kerberos support on GlusterFS, I think below are the set of compound fops which are required. set_uid + set_gid + set_lkowner (or kerberos principal name) + actual_fop Also gfapi does lookup (first time/to refresh inode) before performing actual fops most of the times. It may really help if we can club such fops - @Soumya +5 (just a random number :) ) This came to my mind as well, and is a good candidate for compounding. LOOKUP + FOP (OPEN etc) Coming to the design proposed, I agree with Shyam, Ira and Jeff's thoughts. Defining different compound fops for each specific set of operations and wrapping up those arguments in xdata seem rather complex and difficult to maintain going further. Having being worked with NFS, may I suggest why not we follow (or in similar lines) the approach being taken by NFS protocol to define and implem
Re: [Gluster-devel] Netbsd failures on ./tests/basic/afr/arbiter-statfs.t
On 2015-12-09 at 10:17 -0500, Vijay Bellur wrote: > On 08/24/2015 07:01 AM, Susant Palai wrote: > >Ravi, > > The test case ./tests/basic/afr/arbiter-statfs.t failing frequently on > > netbsd machine. Requesting to take a look. > > > > tests/basic/afr/arbiter-statfs.t seems to be affecting most NetBSD runs now. > Ravi - can you please take a look in? > > Sample test run that got affected by this test unit: > > https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12516/consoleFull This seems to prevent any NetBSD regression run from succeeding currently. Have seen it many times since your mail. I have created a bug: https://bugzilla.redhat.com/show_bug.cgi?id=1290125 and a patch to add the test to bad tests for now: http://review.gluster.org/12929 Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] libgfapi compound operations - multiple writes
On December 9, 2015 at 10:31:03 AM, Raghavendra Gowdappa (rgowd...@redhat.com) wrote: > forking off since it muddles the original conversation. I've some questions: > > 1. Why do multiple writes need to be compounded together? > 2. If the reason is aggregation, cant we tune write-behind to do the same? I think compounding (as we’ve been discussing it) is only necessary when there’s a dependency between operations. For example, if the first creates a value (e.g. file descriptor) used by the second, or if the second should not proceed unless the first (e.g. a lock) succeeded. If multiple operations are completely independent of one another, as is the case for writes without fsync, then I think we should rely on write-behind or something similar instead. Compounding is likely to be the wrong solution here for two reasons: * Correctness: if the writes are independent, there’s no reason why failure of the first should cause the second not to be issued (as would be the case with compounding). * Performance: compounding would keep the writes separate, whereas write-behind can reduce overhead even more by coalescing them into a single request. There is, however, one case where compounding would be the right answer: when there really is a dependency between the writes. There’s no way to specify this through the POSIX/VFS interface (more’s the pity), but it’s easy to imagine GFAPI or internal use cases where a second write should not overtake or continue without the first - e.g. a key/value store that writes new data followed by an index update pointing to that data. The strictly-sequential behavior of a compound operation might be just the right match for such cases. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
On 12/09/2015 08:08 PM, Shyam wrote: On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote: On 12/09/2015 10:39 AM, Prashanth Pai wrote: However, I’d be even more comfortable with an even simpler approach that avoids the need to solve what the database folks (who have dealt with complex transactions for years) would tell us is a really hard problem. Instead of designing for every case we can imagine, let’s design for the cases that we know would be useful for improving performance. Open plus read/write plus close is an obvious one. Raghavendra mentions create+inodelk as well. From object interface (Swift/S3) perspective, this is the fop order and flow for object operations: GET: open(), fstat(), fgetxattr()s, read()s, close() Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In posix there is an implementation of GF_CONTENT_KEY which is used to read a file in lookup by quick-read. This needs to be exposed for fds as well I think. So you can do all this using fstat on anon-fd. HEAD: stat(), getxattr()s Krutika already implemented this for sharding http://review.gluster.org/10158. You can do this using stat fop. I believe we need to fork this part of the conversation, i.e the stat + xattr information clubbing. My view on a stat for gluster is, POSIX stat + gluster extended information being returned. I state this as, a file system when it stats its inode, should get all information regarding the inode, and not just the POSIX ones. In the case of other local FS, the inode structure has more fields than just what POSIX needs, so when the inode is *read* the FS can populate all its internal inode information and return to the application/syscall the relevant fields that it needs. I believe gluster should do the same, so in the cases above, we should actually extend our stat information (not elaborating how) to include all information from the brick, i.e stat from POSIX and all the extended attrs for the inode (file or dir). This can then be consumed by any layer as needed. Currently, each layer adds what it needs in addition to the stat information in the xdata, as an xattr request, this can continue or go away, if the relevant FOPs return the whole inode information upward. This also has useful outcomes in readdirp calls, where we get the extended stat information for each entry. You can use "list-xattr" in xdata request to get this. With the patches referred to, and older patches, this seems to be the direction sought (around 2013), any reasons why this is not prevalent across the stack and made so? Or am I mistaken? No reason. We can revive it. There didn't seem to be any interest. So I didn't follow up to get it in. Pranith PUT: creat(), write()s, setxattr(), fsync(), close(), rename() This I think should be a new compound fop. Nothing similar exists. DELETE: getxattr(), unlink() This can also be clubbed in unlink already because xdata exists on the wire already. Compounding some of these ops and exposing them as consumable libgfapi APIs like glfs_get() and glfs_put() similar to librados compound APIs[1] would greatly improve performance for object based access. [1]: https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219 Thanks. - Prashanth Pai ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] intermittent failure: tests/basic/afr/split-brain-healing.t
https://build.gluster.org/job/rackspace-regression-2GB-triggered/16652/consoleFull triggered by http://review.gluster.org/#/c/12826/ Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
On 2015-12-09 at 14:49 +0100, Michael Adam wrote: > On 2015-12-09 at 13:20 +0100, Michael Adam wrote: > > On 2015-12-09 at 09:19 +0100, Michael Adam wrote: > > > Another one: > > > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull > > > > > > by > > > > > > http://review.gluster.org/#/c/12826/ > > > > > > Cheers - Michael > > > > > > Again: > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16644/consoleFull > > and again: > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16652/consoleFull Forget that -- it is a different test. Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] libgfapi compound operations - multiple writes
forking off since it muddles the original conversation. I've some questions: 1. Why do multiple writes need to be compounded together? 2. If the reason is aggregation, cant we tune write-behind to do the same? regards, Raghavendra. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Netbsd failures on ./tests/basic/afr/arbiter-statfs.t
On 08/24/2015 07:01 AM, Susant Palai wrote: Ravi, The test case ./tests/basic/afr/arbiter-statfs.t failing frequently on netbsd machine. Requesting to take a look. tests/basic/afr/arbiter-statfs.t seems to be affecting most NetBSD runs now. Ravi - can you please take a look in? Sample test run that got affected by this test unit: https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/12516/consoleFull -Vijay ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
On 12/09/2015 09:32 AM, Jeff Darcy wrote: On December 9, 2015 at 7:07:06 AM, Ira Cooper (i...@redhat.com) wrote: A simple "abort on failure" and let the higher levels clean it up is probably right for the type of compounding I propose. It is what SMB2 does. So, if you get an error return value, cancel the rest of the request, and have it return ECOMPOUND as the errno. This is exactly the part that worries me. If a compound operation fails, some parts of it will often need to be undone. “Let the higher levels clean it up” means that rollback code will be scattered among all of the translators that use compound operations. Some of them will do it right. Others . . . less so. ;) All willl have to be tested separately. If we centralize dispatch of compound operations into one piece of code, we can centralize error detection and recovery likewise. That ensures uniformity of implementation, and facilitates focused testing (or even formal proof) of that implementation. My take on this, is whichever layer started the compounding takes into account the error handling. I do not see any requirement for undoing things that are done, and would almost say (without further thought (that's the gunslinger in me talking ;) )) that this is not supported as a part of the compounding. Can we gain the same benefits with a more generic design? Perhaps. It would require that the compounding translator know how to reverse each type of operation, so that it can do so after an error. That’s feasible, though it does mean maintaining a stack of undo actions instead of a simple state. It might also mean testing combinations and scenarios that will actually never occur in other components’ usage of the compounding feature. More likely it means that people will *think* they can use the facility in unanticipated ways, until their unanticipated usage creates a combination or scenario that was never tested and doesn’t work. Those are going to be hard problems to debug. I think it’s better to be explicit about which permutations we actually expect to work, and have those working earlier. Jeff, a clarification, are you suggesting fop_xxx extensions for each compound operation supported? Or, Suggesting a *single* FOP, that carries compounded requests, but is specific about what requests can be compounded? (for example, allows open+write, but when building out the compound request, disallows *say* anything else) (If any doubt, I am with the latter and not so gaga about the former as it explodes the FOP list) Also, I think the compound list has exploded (in this mail conversation) and provided a lot of compounding requests... I would say this means we need a clear way of doing the latter. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel P.S: Ignore this... gunslinger: "a man who carries a gun and shoots well." I claim to be neither... just stating ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
- Original Message - > From: "Ira Cooper" > To: "Jeff Darcy" , "Raghavendra Gowdappa" > , "Pranith Kumar Karampuri" > > Cc: "Gluster Devel" > Sent: Wednesday, December 9, 2015 5:37:05 PM > Subject: Re: [Gluster-devel] compound fop design first cut > > Jeff Darcy writes: > > > However, I’d be even more comfortable with an even simpler approach that > > avoids the need to solve what the database folks (who have dealt with > > complex transactions for years) would tell us is a really hard problem. > > Instead of designing for every case we can imagine, let’s design for the > > cases that we know would be useful for improving performance. Open plus > > read/write plus close is an obvious one. Raghavendra mentions > > create+inodelk as well. For each of those, we can easily define a > > structure that contains the necessary fields, we don’t need a > > client-side translator, and the server-side translator can take care of > > “forwarding” results from one sub-operation to the next. We could even > > use GF_FOP_IPC to prototype this. If we later find that the number of > > “one-off” compound requests is growing too large, then at least we’ll > > have some experience to guide our design of a more general alternative. > > Right now, I think we’re trying to look further ahead than we can see > > clearly. > > Actually, I'm taking the design, I've seen another network protocol use, > SMB2, and proposing it here, I'd be shocked if NFS doesn't behave in the > same way. > > Interestingly, all the cases, really deal with a single file, and a > single lock, and a single... > > There's a reason I talked about a single sentinel value, and not > multiple ones. Because I wanted to keep it simple. Yes, the extensions > you mention are obvious, but they lead to a giant mess, that we may not > want initially. (But that we CAN extend into if we want them. I made > the choice not to go there because honestly, I found the complexity too > much for me.) > > A simple "abort on failure" and let the higher levels clean it up is > probably right for the type of compounding I propose. It is what SMB2 > does. So, if you get an error return value, cancel the rest of the > request, and have it return ECOMPOUND as the errno. > > Note: How you keep the list to be compounded doesn't matter much to me. > the semantics matter, because those are what I can ask for later, and > allow us to create ops the original desginers hadn't thought of, which > is usually the hallmark of a good design. > > I think you should look for a simple design you can "grow into" instead > of creating one off ops, to satisfy a demand today. > I agree with Ira here. This problem is already addressed by NFS and SMB. So instead of reinventing the wheel lets pick the best bits from these solutions and incorporate in Gluster. From multi-protocol point of view we like to compound operations like open + set_leaseID + lk and many more. With the current approach it would be really messy to have separate functions for each such combinations and a dedicated translator to handle them. As others have mentioned I think it would be better to have a general fop (fop_compound) which can handle compound fop. Each translator can choose to implement it or not. Each translator can take a decision whether to compound more fops or de-compound them. e.g. currently you can make the protocol server de-compound all the compound fops. -Rajesh > My thoughts, > > -Ira > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
On 12/09/2015 02:37 AM, Soumya Koduri wrote: On 12/09/2015 11:44 AM, Pranith Kumar Karampuri wrote: On 12/09/2015 06:37 AM, Vijay Bellur wrote: On 12/08/2015 03:45 PM, Jeff Darcy wrote: On December 8, 2015 at 12:53:04 PM, Ira Cooper (i...@redhat.com) wrote: Raghavendra Gowdappa writes: I propose that we define a "compound op" that contains ops. Within each op, there are fields that can be "inherited" from the previous op, via use of a sentinel value. Sentinel is -1, for all of these examples. So: LOOKUP (1, "foo") (Sets the gfid value to be picked up by compounding, 1 is the root directory, as a gfid, by convention.) OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.) WRITE(-1, "foo", 3) (Uses the glfd compound value.) CLOSE(-1) (Uses the glfd compound value) So, basically, what the programming-language types would call futures and promises. It’s a good and well studied concept, which is necessary to solve the second-order problem of how to specify an argument in sub-operation N+1 that’s not known until sub-operation N completes. To be honest, some of the highly general approaches suggested here scare me too. Wrapping up the arguments for one sub-operation in xdata for another would get pretty hairy if we ever try to go beyond two sub-operations and have to nest sub-operation #3’s args within sub-operation #2’s xdata which is itself encoded within sub-operation #1’s xdata. There’s also not much clarity about how to handle errors in that model. Encoding N sub-operations’ arguments in a linear structure as Shyam proposes seems a bit cleaner that way. If I were to continue down that route I’d suggest just having start_compound and end-compound fops, plus an extra field (or by-convention xdata key) that either the client-side or server-side translator could use to build whatever structure it wants and schedule sub-operations however it wants. However, I’d be even more comfortable with an even simpler approach that avoids the need to solve what the database folks (who have dealt with complex transactions for years) would tell us is a really hard problem. Instead of designing for every case we can imagine, let’s design for the cases that we know would be useful for improving performance. Open plus read/write plus close is an obvious one. Raghavendra mentions create+inodelk as well. For each of those, we can easily define a structure that contains the necessary fields, we don’t need a client-side translator, and the server-side translator can take care of “forwarding” results from one sub-operation to the next. We could even use GF_FOP_IPC to prototype this. If we later find that the number of “one-off” compound requests is growing too large, then at least we’ll have some experience to guide our design of a more general alternative. Right now, I think we’re trying to look further ahead than we can see clearly. Yes Agree. This makes implementation on the client side simpler as well. So it is welcome. Just updating the solution. 1) New RPCs are going to be implemented. 2) client stack will use these new fops. 3) On the server side we have server xlator implementing these new fops to decode the RPC request then resolve_resume and compound-op-receiver(Better name for this is welcome) which sends one op after other and send compound fop response. @Pranith, I assume you would expand on this at a later date (something along the lines of what Soumya has done below, right? List of compound fops identified so far: Swift/S3: PUT: creat(), write()s, setxattr(), fsync(), close(), rename() Dht: mkdir + inodelk Afr: xattrop+writev, xattrop+unlock to begin with. Could everyone who needs compound fops add to this list? I see that Niels is back on 14th. Does anyone else know the list of compound fops he has in mind? From the discussions we had with Niels regarding the kerberos support on GlusterFS, I think below are the set of compound fops which are required. set_uid + set_gid + set_lkowner (or kerberos principal name) + actual_fop Also gfapi does lookup (first time/to refresh inode) before performing actual fops most of the times. It may really help if we can club such fops - @Soumya +5 (just a random number :) ) This came to my mind as well, and is a good candidate for compounding. LOOKUP + FOP (OPEN etc) Coming to the design proposed, I agree with Shyam, Ira and Jeff's thoughts. Defining different compound fops for each specific set of operations and wrapping up those arguments in xdata seem rather complex and difficult to maintain going further. Having being worked with NFS, may I suggest why not we follow (or in similar lines) the approach being taken by NFS protocol to define and implement compound procedures. The basic structure of the NFS COMPOUND procedure is: +-+--++---+---+---+-- | tag | minorversion | numops | op + args | op + args | op + args | +-+--++
Re: [Gluster-devel] compound fop design first cut
On 12/09/2015 12:52 AM, Pranith Kumar Karampuri wrote: On 12/09/2015 10:39 AM, Prashanth Pai wrote: However, I’d be even more comfortable with an even simpler approach that avoids the need to solve what the database folks (who have dealt with complex transactions for years) would tell us is a really hard problem. Instead of designing for every case we can imagine, let’s design for the cases that we know would be useful for improving performance. Open plus read/write plus close is an obvious one. Raghavendra mentions create+inodelk as well. From object interface (Swift/S3) perspective, this is the fop order and flow for object operations: GET: open(), fstat(), fgetxattr()s, read()s, close() Krutika implemented fstat+fgetxattr(http://review.gluster.org/10180). In posix there is an implementation of GF_CONTENT_KEY which is used to read a file in lookup by quick-read. This needs to be exposed for fds as well I think. So you can do all this using fstat on anon-fd. HEAD: stat(), getxattr()s Krutika already implemented this for sharding http://review.gluster.org/10158. You can do this using stat fop. I believe we need to fork this part of the conversation, i.e the stat + xattr information clubbing. My view on a stat for gluster is, POSIX stat + gluster extended information being returned. I state this as, a file system when it stats its inode, should get all information regarding the inode, and not just the POSIX ones. In the case of other local FS, the inode structure has more fields than just what POSIX needs, so when the inode is *read* the FS can populate all its internal inode information and return to the application/syscall the relevant fields that it needs. I believe gluster should do the same, so in the cases above, we should actually extend our stat information (not elaborating how) to include all information from the brick, i.e stat from POSIX and all the extended attrs for the inode (file or dir). This can then be consumed by any layer as needed. Currently, each layer adds what it needs in addition to the stat information in the xdata, as an xattr request, this can continue or go away, if the relevant FOPs return the whole inode information upward. This also has useful outcomes in readdirp calls, where we get the extended stat information for each entry. With the patches referred to, and older patches, this seems to be the direction sought (around 2013), any reasons why this is not prevalent across the stack and made so? Or am I mistaken? PUT: creat(), write()s, setxattr(), fsync(), close(), rename() This I think should be a new compound fop. Nothing similar exists. DELETE: getxattr(), unlink() This can also be clubbed in unlink already because xdata exists on the wire already. Compounding some of these ops and exposing them as consumable libgfapi APIs like glfs_get() and glfs_put() similar to librados compound APIs[1] would greatly improve performance for object based access. [1]: https://github.com/ceph/ceph/blob/master/src/include/rados/librados.h#L2219 Thanks. - Prashanth Pai ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
libgfapi compound fops added inline. - Original Message - > From: "Kotresh Hiremath Ravishankar" > To: "Pranith Kumar Karampuri" > Cc: "Gluster Devel" > Sent: Wednesday, December 9, 2015 2:18:47 PM > Subject: Re: [Gluster-devel] compound fop design first cut > > Geo-rep requirements inline. > > Thanks and Regards, > Kotresh H R > > - Original Message - > > From: "Pranith Kumar Karampuri" > > To: "Vijay Bellur" , "Jeff Darcy" , > > "Raghavendra Gowdappa" > > , "Ira Cooper" > > Cc: "Gluster Devel" > > Sent: Wednesday, December 9, 2015 11:44:52 AM > > Subject: Re: [Gluster-devel] compound fop design first cut > > > > > > > > On 12/09/2015 06:37 AM, Vijay Bellur wrote: > > > On 12/08/2015 03:45 PM, Jeff Darcy wrote: > > >> > > >> > > >> > > >> On December 8, 2015 at 12:53:04 PM, Ira Cooper (i...@redhat.com) wrote: > > >>> Raghavendra Gowdappa writes: > > >>> I propose that we define a "compound op" that contains ops. > > >>> > > >>> Within each op, there are fields that can be "inherited" from the > > >>> previous op, via use of a sentinel value. > > >>> > > >>> Sentinel is -1, for all of these examples. > > >>> > > >>> So: > > >>> > > >>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by > > >>> compounding, 1 > > >>> is the root directory, as a gfid, by convention.) > > >>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.) > > >>> WRITE(-1, "foo", 3) (Uses the glfd compound value.) > > >>> CLOSE(-1) (Uses the glfd compound value) > > >> > > >> So, basically, what the programming-language types would call futures > > >> and promises. It’s a good and well studied concept, which is necessary > > >> to solve the second-order problem of how to specify an argument in > > >> sub-operation N+1 that’s not known until sub-operation N completes. > > >> > > >> To be honest, some of the highly general approaches suggested here scare > > >> me too. Wrapping up the arguments for one sub-operation in xdata for > > >> another would get pretty hairy if we ever try to go beyond two > > >> sub-operations and have to nest sub-operation #3’s args within > > >> sub-operation #2’s xdata which is itself encoded within sub-operation > > >> #1’s xdata. There’s also not much clarity about how to handle errors in > > >> that model. Encoding N sub-operations’ arguments in a linear structure > > >> as Shyam proposes seems a bit cleaner that way. If I were to continue > > >> down that route I’d suggest just having start_compound and end-compound > > >> fops, plus an extra field (or by-convention xdata key) that either the > > >> client-side or server-side translator could use to build whatever > > >> structure it wants and schedule sub-operations however it wants. > > >> > > >> However, I’d be even more comfortable with an even simpler approach that > > >> avoids the need to solve what the database folks (who have dealt with > > >> complex transactions for years) would tell us is a really hard problem. > > >> Instead of designing for every case we can imagine, let’s design for the > > >> cases that we know would be useful for improving performance. Open plus > > >> read/write plus close is an obvious one. Raghavendra mentions > > >> create+inodelk as well. For each of those, we can easily define a > > >> structure that contains the necessary fields, we don’t need a > > >> client-side translator, and the server-side translator can take care of > > >> “forwarding” results from one sub-operation to the next. We could even > > >> use GF_FOP_IPC to prototype this. If we later find that the number of > > >> “one-off” compound requests is growing too large, then at least we’ll > > >> have some experience to guide our design of a more general alternative. > > >> Right now, I think we’re trying to look further ahead than we can see > > >> clearly. > > Yes Agree. This makes implementation on the client side simpler as well. > > So it is welcome. > > > > Just updating the solution. > > 1) New RPCs are going to be implemented. > > 2) client stack will use these new fops. > > 3) On the server side we have server xlator implementing these new fops > > to decode the RPC request then resolve_resume and > > compound-op-receiver(Better name for this is welcome) which sends one op > > after other and send compound fop response. > > > > List of compound fops identified so far: > > Swift/S3: > > PUT: creat(), write()s, setxattr(), fsync(), close(), rename() > > > > Dht: > > mkdir + inodelk > > > > Afr: > > xattrop+writev, xattrop+unlock to begin with. > > Geo-rep: > mknod,entrylk,stat(on backend gfid) > mkdir,entrylk,stat (on backend gfid) > symlink,entrylk,stat(on backend gfid) > libgfapi : glfs_setfsuid, glfs_setfsgid, glfs_setfsgroups, glfs_set_lkowner and leaseid - these are not network fops, hence mostly impact gfapi interface for compound fops. open/create + lease + lk readir + stat + getxattrs => already being discussed to replace this with readdirplus M
Re: [Gluster-devel] compound fop design first cut
On December 9, 2015 at 7:07:06 AM, Ira Cooper (i...@redhat.com) wrote: > A simple "abort on failure" and let the higher levels clean it up is > probably right for the type of compounding I propose. It is what SMB2 > does. So, if you get an error return value, cancel the rest of the > request, and have it return ECOMPOUND as the errno. This is exactly the part that worries me. If a compound operation fails, some parts of it will often need to be undone. “Let the higher levels clean it up” means that rollback code will be scattered among all of the translators that use compound operations. Some of them will do it right. Others . . . less so. ;) All willl have to be tested separately. If we centralize dispatch of compound operations into one piece of code, we can centralize error detection and recovery likewise. That ensures uniformity of implementation, and facilitates focused testing (or even formal proof) of that implementation. Can we gain the same benefits with a more generic design? Perhaps. It would require that the compounding translator know how to reverse each type of operation, so that it can do so after an error. That’s feasible, though it does mean maintaining a stack of undo actions instead of a simple state. It might also mean testing combinations and scenarios that will actually never occur in other components’ usage of the compounding feature. More likely it means that people will *think* they can use the facility in unanticipated ways, until their unanticipated usage creates a combination or scenario that was never tested and doesn’t work. Those are going to be hard problems to debug. I think it’s better to be explicit about which permutations we actually expect to work, and have those working earlier. ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/bugs/tier/bug-1279376-rename-demoted-file.t
> > > - Original Message - > > From: "Michael Adam" > > To: gluster-devel@gluster.org > > Sent: Wednesday, December 9, 2015 1:46:32 PM > > Subject: [Gluster-devel] intermittent test failure: > > tests/bugs/tier/bug-1279376-rename-demoted-file.t > > > > Hi, > > > > found another one. See > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16603/consoleFull > > This run failed because the rename operation could not get the inodelk - looks like the file was being migrated. I have posted a patch: http://review.gluster.org/#/c/12926/ which should prevent the demotion from happening too quickly for the dst file. > > Run by http://review.gluster.org/#/c/12830/ > > which should not change any test result. > > A bug has been filed at: > https://bugzilla.redhat.com/show_bug.cgi?id=1289845 > > > > > Michael > > > > ___ > > Gluster-devel mailing list > > Gluster-devel@gluster.org > > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel > ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
On 2015-12-09 at 13:20 +0100, Michael Adam wrote: > On 2015-12-09 at 09:19 +0100, Michael Adam wrote: > > Another one: > > > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull > > > > by > > > > http://review.gluster.org/#/c/12826/ > > > > Cheers - Michael > > > Again: > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16644/consoleFull and again: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16652/consoleFull signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/bugs/tier/bug-1279376-rename-demoted-file.t
- Original Message - > From: "Michael Adam" > To: gluster-devel@gluster.org > Sent: Wednesday, December 9, 2015 1:46:32 PM > Subject: [Gluster-devel] intermittent test failure: > tests/bugs/tier/bug-1279376-rename-demoted-file.t > > Hi, > > found another one. See > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16603/consoleFull > > Run by http://review.gluster.org/#/c/12830/ > which should not change any test result. A bug has been filed at: https://bugzilla.redhat.com/show_bug.cgi?id=1289845 > > Michael > > ___ > Gluster-devel mailing list > Gluster-devel@gluster.org > http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
I'm able to repro the issue (i.e Failed test #36 of sparse-file-self-heal.t) on my ancient rhs-2.1 VM but not on newer Fedora 21 machines: Create a 1x2 replica and from the mount, do : `dd if=/dev/zero of=file bs=1024 count=1024` When both bricks are up, `du /brick/file` = 1024 When one of the bricks is killed and the test repeated, `du /brick/file` = 1028 I have no idea why. The issue is reproducible on NFS and fuse mounts on the rhs-2.1 VM running 2.6.32 kernel, which is incidentally the same version running on slave29.cloud.gluster.org While I try to figure out the issue, I am adding the test case to bad tests for the moment @ http://review.gluster.org/#/c/12925/ . Makes me wonder if we can upgrade the build machines to at least centos7 if not fedora. 2.6 is really an old kernel! Thanks, Ravi On 12/09/2015 02:40 PM, Ravishankar N wrote: I'll take a look at this one. -Ravi On 12/09/2015 01:49 PM, Michael Adam wrote: Another one: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull by http://review.gluster.org/#/c/12826/ Cheers - Michael ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Ravishankar N work: +91 80 3924 5143 extension: 8373143 mobile: +91 96118 43905 irc nick: itisravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Ravishankar N work: +91 80 3924 5143 extension: 8373143 mobile: +91 96118 43905 irc nick: itisravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Meeting minutes of Gluster community meeting 2015-12-09
Minutes: http://meetbot.fedoraproject.org/gluster-meeting/2015-12-09/gluster_community_weekly_meeting.2015-12-09-12.00.html Minutes (text): http://meetbot.fedoraproject.org/gluster-meeting/2015-12-09/gluster_community_weekly_meeting.2015-12-09-12.00.txt Log: http://meetbot.fedoraproject.org/gluster-meeting/2015-12-09/gluster_community_weekly_meeting.2015-12-09-12.00.log.html Meeting summary --- * Roll Call (atinm, 12:01:08) * AIs from last week (atinm, 12:03:58) * ACTION: ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes (atinm, 12:05:23) * ACTION: raghu to call for volunteers and help from maintainers for doing backports listed by rwareing to 3.6.8 (atinm, 12:06:52) * bug triage meeting doodle poll result to be announced on December 22, need more votes (atinm, 12:09:10) * agenda is right here https://public.pad.fsfe.org/p/gluster-community-meetings (atinm, 12:09:49) * ACTION: rastar and msvbhat to publish a test exit criterion for major/minor releases on gluster.org (atinm, 12:10:39) * ACTION: kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github (atinm, 12:11:24) * ACTION: hagarth to decide on 3.7.7 release manager (atinm, 12:14:12) * ACTION: amye to get on top of disucssion on long-term releases. (atinm, 12:15:28) * ACTION: hagarth to post Gluster Monthly News this week (atinm, 12:17:47) * GlusterFS 3.7 (atinm, 12:18:36) * GlusterFS 3.6 (atinm, 12:19:49) * raghu to create 3.6.8 tracker (atinm, 12:20:27) * ACTION: hagarth to create 3.6.8 for bugzilla version (atinm, 12:21:24) * ACTION: community needs to find out 3.6.8 release manager (atinm, 12:23:38) * ACTION: raghu to ask for volunteers for release manager for 3.6.8 (atinm, 12:24:22) * GlusterFS 3.8 (atinm, 12:25:16) * GlusterFS 4.0 (atinm, 12:27:13) * 3.8 feature freeze to happen on mid-last Jan 2016 (atinm, 12:31:36) * ACTION: kkeithley_ to send a mail about using sanity checker tools in the codebase (atinm, 12:32:47) * Another follow up meeting on 3.8 to take place on first week of January, 2016 (atinm, 12:33:23) * Open Floor (atinm, 12:34:02) * LINK: http://www.gluster.org/pipermail/gluster-devel/2015-November/047125.html (atinm, 12:37:25) * ACTION: rastar to continue the discussion on rebase+fast forward as an option to gerrit submit type (atinm, 12:40:49) Meeting ended at 12:49:26 UTC. Action Items * ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes * raghu to call for volunteers and help from maintainers for doing backports listed by rwareing to 3.6.8 * rastar and msvbhat to publish a test exit criterion for major/minor releases on gluster.org * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github * hagarth to decide on 3.7.7 release manager * amye to get on top of disucssion on long-term releases. * hagarth to post Gluster Monthly News this week * hagarth to create 3.6.8 for bugzilla version * community needs to find out 3.6.8 release manager * raghu to ask for volunteers for release manager for 3.6.8 * kkeithley_ to send a mail about using sanity checker tools in the codebase * rastar to continue the discussion on rebase+fast forward as an option to gerrit submit type Action Items, by person --- * kkeithley_ * kkeithley_ to send a mail about using sanity checker tools in the codebase * msvbhat * rastar and msvbhat to publish a test exit criterion for major/minor releases on gluster.org * raghu * raghu to call for volunteers and help from maintainers for doing backports listed by rwareing to 3.6.8 * raghu to ask for volunteers for release manager for 3.6.8 * rastar * rastar and msvbhat to publish a test exit criterion for major/minor releases on gluster.org * rastar to continue the discussion on rebase+fast forward as an option to gerrit submit type * **UNASSIGNED** * ndevos to send out a reminder to the maintainers about more actively enforcing backports of bugfixes * kshlm & csim to set up faux/pseudo user email for gerrit, bugzilla, github * hagarth to decide on 3.7.7 release manager * amye to get on top of disucssion on long-term releases. * hagarth to post Gluster Monthly News this week * hagarth to create 3.6.8 for bugzilla version * community needs to find out 3.6.8 release manager People Present (lines said) --- * atinm (116) * obnox (21) * kkeithley_ (15) * raghu (10) * rastar (10) * jiffin (7) * rafi (5) * hgowtham (4) * anoopcs (3) * zodbot (3) * pranithk (3) * Manikandan (2) * skoduri (1) * msvbhat (1) * ggarg (1) * partner (1) * rjoseph (1) Cheers, Atin ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-d
Re: [Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
On 2015-12-09 at 09:19 +0100, Michael Adam wrote: > Another one: > > https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull > > by > > http://review.gluster.org/#/c/12826/ > > Cheers - Michael Again: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16644/consoleFull same patch (rebased) signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
Jeff Darcy writes: > However, I’d be even more comfortable with an even simpler approach that > avoids the need to solve what the database folks (who have dealt with > complex transactions for years) would tell us is a really hard problem. > Instead of designing for every case we can imagine, let’s design for the > cases that we know would be useful for improving performance. Open plus > read/write plus close is an obvious one. Raghavendra mentions > create+inodelk as well. For each of those, we can easily define a > structure that contains the necessary fields, we don’t need a > client-side translator, and the server-side translator can take care of > “forwarding” results from one sub-operation to the next. We could even > use GF_FOP_IPC to prototype this. If we later find that the number of > “one-off” compound requests is growing too large, then at least we’ll > have some experience to guide our design of a more general alternative. > Right now, I think we’re trying to look further ahead than we can see > clearly. Actually, I'm taking the design, I've seen another network protocol use, SMB2, and proposing it here, I'd be shocked if NFS doesn't behave in the same way. Interestingly, all the cases, really deal with a single file, and a single lock, and a single... There's a reason I talked about a single sentinel value, and not multiple ones. Because I wanted to keep it simple. Yes, the extensions you mention are obvious, but they lead to a giant mess, that we may not want initially. (But that we CAN extend into if we want them. I made the choice not to go there because honestly, I found the complexity too much for me.) A simple "abort on failure" and let the higher levels clean it up is probably right for the type of compounding I propose. It is what SMB2 does. So, if you get an error return value, cancel the rest of the request, and have it return ECOMPOUND as the errno. Note: How you keep the list to be compounded doesn't matter much to me. the semantics matter, because those are what I can ask for later, and allow us to create ops the original desginers hadn't thought of, which is usually the hallmark of a good design. I think you should look for a simple design you can "grow into" instead of creating one off ops, to satisfy a demand today. My thoughts, -Ira ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
I'll take a look at this one. -Ravi On 12/09/2015 01:49 PM, Michael Adam wrote: Another one: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull by http://review.gluster.org/#/c/12826/ Cheers - Michael ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel -- Ravishankar N work: +91 80 3924 5143 extension: 8373143 mobile: +91 96118 43905 irc nick: itisravi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Storing pNFS related state on GlusterFS
Hi, pNFS is a feature introduced as part of NFSv4.1 protocol to allow direct client access to storage devices containing file data (in short parallel I/O). Client request for the layouts of entire file or specific range. On receiving the layout information, they shall directly contact the server containing the data for the I/O. In case of a cluster of (NFS)servers, * Meta-data servers (MDS) are responsible to provide layouts of the file and recall them in case of any change in the layout. * Data servers (DS) contain the actual data and process the I/O. For more information, kindly refer to [1]. Currently with NFS-Ganesha+GlusterFS, we support FILE_LAYOUTs but with single MDS. So * to avoid single point of failure & be able to support multiple MDS and * to recall the layout in case of cluster of (NFS)servers, we need to store the layouts on the back-end filesystem(GlusterFS) and recall them in case of any conflicting access which may change the file layout. Since it is on similar lines to storing and recalling lease state (with slightly different semantics), we are planning to store and process them as a special type of lease ('LAYOUT') in the lease xlator being worked upon as part of [2]. More details are captured in the below spec [3] : http://review.gluster.org/#/c/12367 Kindly review the same and provide your inputs/comments. Thanks, Soumya [1] https://tools.ietf.org/rfc/rfc5661.txt (Section 12. Parallel NFS (pNFS)) [2] http://review.gluster.org/#/c/11980/ [3] http://review.gluster.org/#/c/12367 ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] compound fop design first cut
Geo-rep requirements inline. Thanks and Regards, Kotresh H R - Original Message - > From: "Pranith Kumar Karampuri" > To: "Vijay Bellur" , "Jeff Darcy" , > "Raghavendra Gowdappa" > , "Ira Cooper" > Cc: "Gluster Devel" > Sent: Wednesday, December 9, 2015 11:44:52 AM > Subject: Re: [Gluster-devel] compound fop design first cut > > > > On 12/09/2015 06:37 AM, Vijay Bellur wrote: > > On 12/08/2015 03:45 PM, Jeff Darcy wrote: > >> > >> > >> > >> On December 8, 2015 at 12:53:04 PM, Ira Cooper (i...@redhat.com) wrote: > >>> Raghavendra Gowdappa writes: > >>> I propose that we define a "compound op" that contains ops. > >>> > >>> Within each op, there are fields that can be "inherited" from the > >>> previous op, via use of a sentinel value. > >>> > >>> Sentinel is -1, for all of these examples. > >>> > >>> So: > >>> > >>> LOOKUP (1, "foo") (Sets the gfid value to be picked up by > >>> compounding, 1 > >>> is the root directory, as a gfid, by convention.) > >>> OPEN(-1, O_RDWR) (Uses the gfid value, sets the glfd compound value.) > >>> WRITE(-1, "foo", 3) (Uses the glfd compound value.) > >>> CLOSE(-1) (Uses the glfd compound value) > >> > >> So, basically, what the programming-language types would call futures > >> and promises. It’s a good and well studied concept, which is necessary > >> to solve the second-order problem of how to specify an argument in > >> sub-operation N+1 that’s not known until sub-operation N completes. > >> > >> To be honest, some of the highly general approaches suggested here scare > >> me too. Wrapping up the arguments for one sub-operation in xdata for > >> another would get pretty hairy if we ever try to go beyond two > >> sub-operations and have to nest sub-operation #3’s args within > >> sub-operation #2’s xdata which is itself encoded within sub-operation > >> #1’s xdata. There’s also not much clarity about how to handle errors in > >> that model. Encoding N sub-operations’ arguments in a linear structure > >> as Shyam proposes seems a bit cleaner that way. If I were to continue > >> down that route I’d suggest just having start_compound and end-compound > >> fops, plus an extra field (or by-convention xdata key) that either the > >> client-side or server-side translator could use to build whatever > >> structure it wants and schedule sub-operations however it wants. > >> > >> However, I’d be even more comfortable with an even simpler approach that > >> avoids the need to solve what the database folks (who have dealt with > >> complex transactions for years) would tell us is a really hard problem. > >> Instead of designing for every case we can imagine, let’s design for the > >> cases that we know would be useful for improving performance. Open plus > >> read/write plus close is an obvious one. Raghavendra mentions > >> create+inodelk as well. For each of those, we can easily define a > >> structure that contains the necessary fields, we don’t need a > >> client-side translator, and the server-side translator can take care of > >> “forwarding” results from one sub-operation to the next. We could even > >> use GF_FOP_IPC to prototype this. If we later find that the number of > >> “one-off” compound requests is growing too large, then at least we’ll > >> have some experience to guide our design of a more general alternative. > >> Right now, I think we’re trying to look further ahead than we can see > >> clearly. > Yes Agree. This makes implementation on the client side simpler as well. > So it is welcome. > > Just updating the solution. > 1) New RPCs are going to be implemented. > 2) client stack will use these new fops. > 3) On the server side we have server xlator implementing these new fops > to decode the RPC request then resolve_resume and > compound-op-receiver(Better name for this is welcome) which sends one op > after other and send compound fop response. > > List of compound fops identified so far: > Swift/S3: > PUT: creat(), write()s, setxattr(), fsync(), close(), rename() > > Dht: > mkdir + inodelk > > Afr: > xattrop+writev, xattrop+unlock to begin with. Geo-rep: mknod,entrylk,stat(on backend gfid) mkdir,entrylk,stat (on backend gfid) symlink,entrylk,stat(on backend gfid) > > Could everyone who needs compound fops add to this list? > > I see that Niels is back on 14th. Does anyone else know the list of > compound fops he has in mind? > > Pranith. > > > > Starting with a well defined set of operations for compounding has its > > advantages. It would be easier to understand and maintain correctness > > across the stack. Some of our translators perform transactions & > > create/update internal metadata for certain fops. It would be easier > > for such translators if the compound operations are well defined and > > does not entail deep introspection of a generic representation to > > ensure that the right behavior gets reflected at the end of a compound > > operation. > > > > -Vijay > > > > > > > >
[Gluster-devel] intermittent test failure: tests/basic/afr/sparse-file-self-heal.t
Another one: https://build.gluster.org/job/rackspace-regression-2GB-triggered/16601/consoleFull by http://review.gluster.org/#/c/12826/ Cheers - Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] intermittent test failure: tests/bugs/tier/bug-1279376-rename-demoted-file.t
Hi, found another one. See https://build.gluster.org/job/rackspace-regression-2GB-triggered/16603/consoleFull Run by http://review.gluster.org/#/c/12830/ which should not change any test result. Michael signature.asc Description: PGP signature ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel