Re: [Gluster-devel] autodelete in snapshots
- Original Message - From: M S Vishwanath Bhat msvb...@gmail.com To: Vijay Bellur vbel...@redhat.com Cc: Seema Naik sen...@redhat.com, Gluster Devel gluster-devel@gluster.org Sent: Tuesday, June 3, 2014 1:02:08 AM Subject: Re: [Gluster-devel] autodelete in snapshots On 2 June 2014 20:22, Vijay Bellur vbel...@redhat.com wrote: On 04/23/2014 05:50 AM, Vijay Bellur wrote: On 04/20/2014 11:42 PM, Lalatendu Mohanty wrote: On 04/16/2014 11:39 AM, Avra Sengupta wrote: The whole purpose of introducing the soft-limit is, that at any point of time the number of snaps should not exceed the hard limit. If we trigger auto-delete on hitting hard-limit, then the purpose itself is lost, because at that point we would be taking a snap, making the limit hard-limit + 1, and then triggering auto-delete, which violates the sanctity of the hard-limit. Also what happens when we are at hard-limit + 1, and another snap is issued, while auto-delete is yet to process the first delete. At that point we end up at hard-limit + 1. Also what happens if for a particular snap the auto-delete fails. We should see the hard-limit, as something set by the admin keeping in mind the resource consumption and at no-point should we cross this limit, come what may. If we hit this limit, the create command should fail asking the user to delete snaps using the snapshot delete command. The two options Raghavendra mentioned are applicable for the soft-limit only, in which cases on hitting the soft-limit 1. Trigger auto-delete or 2. Log a warning-message, for the user saying the number of snaps is exceeding the snap-limit and display the number of available snaps Now which of these should happen also depends on the user, because the auto-delete option is configurable. So if the auto-delete option is set as true, auto-delete should be triggered and the above message should also be logged. But if the option is set as false, only the message should be logged. This is the behaviour as designed. Adding Rahul, and Seema in the mail, to reflect upon the behaviour as well. Regards, Avra This sounds correct. However we need to make sure that the usage or documentation around this should be good enough , so that users understand the each of the limits correctly. It might be better to avoid the usage of the term soft-limit. soft-limit as used in quota and other places generally has an alerting connotation. Something like auto-deletion-limit might be better. I still see references to soft-limit and auto deletion seems to get triggered upon reaching soft-limit. Why is the ability to auto delete not configurable? It does seem pretty nasty to go about deleting snapshots without obtaining explicit consent from the user. I agree with Vijay here. It's not good to delete a snap (even though it is oldest) without the explicit consent from user. FYI It took me more than 2 weeks to figure out that my snaps were getting autodeleted after reaching soft-limit. For all I know I had not done anything and my snap restore were failing. I propose to remove the terms soft and hard limit. I believe there should be a limit (just limit) after which all snapshot creates should fail with proper error messages. And there can be a water-mark after which user should get warning messages. So below is my proposal. auto-delete + snap-limit: If the snap-limit is set to n , next snap create (n+1th) will succeed only if if auto-delete is set to on/true/1 and oldest snap will get deleted automatically. If autodelete is set to off/false/0 , (n+1)th snap create will fail with proper error message from gluster CLI command. But again by default autodelete should be off. snap-water-mark : This should come in picture only if autodelete is turned off. It should not have any meaning if auto-delete is turned ON. Basically it's usage is to give the user warning that limit almost being reached and it is time for admin to decide which snaps should be deleted (or which should be kept) *my two cents* -MS The reason for having a hard-limit is to stop snapshot creation once we reached this limit. This helps to have a control over the resource consumption. Therefore if we only have this limit (as snap-limit) then there is no question of auto-delete. Auto-delete can only be triggered once the count crosses the limit. Therefore we introduced the concept of soft-limit and a hard-limit. As the name suggests once the hard-limit is reached no more snaps will be created. So the idea is to keep the number of snapshots always less than the hard-limit. To do so we introduced the concept of soft-limit, wherein we allow snapshots even when this limit is crossed and once the snapshot is taken we delete the oldest snap. If you consider this definition then the name soft-limit and hard-limit looks ok to me. In phase II we are planning to have auto-delete feature configurable with different
Re: [Gluster-devel] doubts in posix_handle_path and posix_handle_pump
On Tuesday 03 June 2014 15:42:19 Pranith Kumar Karampuri wrote: On 06/03/2014 02:42 PM, Xavier Hernandez wrote: The possible problem I see is that in the comments it says that this function returns a path to an IA_IFDIR (it will return IA_IFDIR on an lstat), however if one of the symlinks is missing or anything else fails, it won't return an error *but* it will return a path to an existing file. An lstat on this path will return IA_IFLNK instead of IA_IFDIR. I don't know if this can be a problem in some places. This is exactly what I was referring to, I don't see an easy way to find out if there is any failure in the function. One needs to do extra lstat or a 'path' based syscall like getxattr etc on the returned path to check if it returned a good path. So do you think the best thing is to ignore the return value of the function call but depend on an lstat or a path based syscall of the path? The only point to consider is for gfid's representing directories. Other types of file do not have any problem (the returned path can be considered valid even if lstat() fails). For directories there are 3 places where things can fail: At line 360: I think this is not a problem. If lstat() fails (basically because it does not exist), the returned path can be considered valid. At line 367: If posix_handle_pump() fails, it could mean: * The symlink is not a valid directory symlink: * it's a corrupted one: any operation on this file should be denied * it's a normal symlink that has lost one of the hard-links: though it's bad to have damaged gfid's, the returned path can be considered valid. * readlink() failed: this woulb be very weird. Access to the file should be denied. At line 374: If lstat() fails, probably it means that the symlink of one of the parents of the directory is missing. The returned path won't fail on lstat(), but it should because it will return symlink information instead of directory information. I think that it's very hard to determine if something went wrong only by inspecting the returned path. I think the best approach would be that posix_handle_path() return -1 if posix_handle_pump() or lstat() at line 374 fail, and each caller decide what to do in case of failure. However I don't know all the details of the posix xlator, so maybe I'm wrong and this is not necessary. Let's see if there is someone else with more experience on it to see what he thinks. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] All builds are failing with BUILD ERROR
Guys its failing again with the same error: Please proceed with configuring, compiling, and installing. rm: cannot remove `/build/install/var/run/gluster/patchy': Device or resource busy + RET=1 + '[' 1 '!=' 0 ']' + VERDICT='BUILD FAILURE' Pranith On 06/02/2014 09:08 PM, Justin Clift wrote: On 02/06/2014, at 7:04 AM, Kaleb KEITHLEY wrote: snip someone cleaned the loopback devices. I deleted 500 unix domain sockets in /d/install/var/run and requeued the regressions. Interesting. The extra sockets problem is what prompted me to rewrite the cleanup function. The sockets are being created by glusterd during each test startup, but aren't removed by the existing cleanup function. (so, substantial build up over time) I'm not sure which of those two things was the solution. _Probably_ the loopback device thing. The extra sockets seem to be messy but (so far) I haven't seen them break anything. + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] Erasure coding doubts session
hi Xavier, Some of the developers are reading the code you submitted for erasure code. We want to know if you would be available on Friday IST so that we can have a discussion and doubt clarification session on IRC. Could you tell which time is good for you. Pranith ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] Erasure coding doubts session
Hi Pranith, On Tuesday 03 June 2014 17:04:12 Pranith Kumar Karampuri wrote: hi Xavier, Some of the developers are reading the code you submitted for erasure code. We want to know if you would be available on Friday IST so that we can have a discussion and doubt clarification session on IRC. Could you tell which time is good for you. Sure. I can get some time between 12:30 PM IST and 6:30 PM IST. Tell me when do you prefer. Xavi ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] [Gluster-users] Need testers for GlusterFS 3.4.4
On 06/04/2014 01:35 AM, Ben Turner wrote: - Original Message - From: Justin Clift jus...@gluster.org To: Ben Turner btur...@redhat.com Cc: James purplei...@gmail.com, gluster-us...@gluster.org, Gluster Devel gluster-devel@gluster.org Sent: Thursday, May 29, 2014 6:12:40 PM Subject: Re: [Gluster-users] [Gluster-devel] Need testers for GlusterFS 3.4.4 On 29/05/2014, at 8:04 PM, Ben Turner wrote: From: James purplei...@gmail.com Sent: Wednesday, May 28, 2014 5:21:21 PM On Wed, May 28, 2014 at 5:02 PM, Justin Clift jus...@gluster.org wrote: Hi all, Are there any Community members around who can test the GlusterFS 3.4.4 beta (rpms are available)? I've provided all the tools and how-to to do this yourself. Should probably take about ~20 min. Old example: https://ttboj.wordpress.com/2014/01/16/testing-glusterfs-during-glusterfest/ Same process should work, except base your testing on the latest vagrant article: https://ttboj.wordpress.com/2014/05/13/vagrant-on-fedora-with-libvirt-reprise/ If you haven't set it up already. I can help out here, I'll have a chance to run through some stuff this weekend. Where should I post feedback? Excellent Ben! Please send feedback to gluster-devel. :) So far so good on 3.4.4, sorry for the delay here. I had to fix my downstream test suites to run outside of RHS / downstream gluster. I did basic sanity testing on glusterfs mounts including: FSSANITY_TEST_LIST: arequal bonnie glusterfs_build compile_kernel dbench dd ffsb fileop fsx fs_mark iozone locks ltp multiple_files posix_compliance postmark read_large rpc syscallbench tiobench I am starting on NFS now, I'll have results tonight or tomorrow morning. I'll look updating the component scripts to work and run them as well. Thanks a lot for this ben. Justin, Ben, Do you think we can automate running of these scripts without a lot of human intervention? If yes, how can I help? We can use that just before making any release in future :-). Pranith -b + Justin -- Open Source and Standards @ Red Hat twitter.com/realjustinclift ___ Gluster-users mailing list gluster-us...@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-users ___ Gluster-devel mailing list Gluster-devel@gluster.org http://supercolony.gluster.org/mailman/listinfo/gluster-devel