Re: hammer prune explanation
Matthew Dillon wrote: :Yeah, I was thinking about wildcarding as well. : :But is it possible to implement it within cmd_prune.c, or do I have to :modify the ioctl kernel code? If done in cmd_prune.c, I somehow have to :iterate over all deleted files and call the prune command for it. : :I thought, it's easier to introduce a check in the kernel, whether the :file that should be pruned matches a given pattern. Doesn't sound very :hard to do, if it is easy to get the pathname for a given inode. : :Are you thinking about something like the archive flag? I think it is probably best to implement that level of sophistication in the utility rather then in the kernel. The pruning ioctl code has no concept of files or directories... literally it has no concept. All it understands, really, are object id's (aka inode numbers) and records. The hammer utility on the other hand can actually scan the filesystem hierarchy. Locating wholely deleted files and directories is not hard to do. As-of queries can be used to access earlier versions of a directory. Hm, how would that work, if I want it to behave like the prune command? I'd need to traverse a lot of filesystem trees, to just determine which files were deleted. Imagine: compare /mnt with /[EMAIL PROTECTED] and prune deleted files. compare /[EMAIL PROTECTED] with /[EMAIL PROTECTED] ... I wouldn't find files that were deleted in between 1-hour-ago and 2-hours-ago. To make it work, I'd need to compare the filesystem trees of every possible timestamp. It's probably easier, and more efficient, to have separate filesystems. We might want to add some kernel support to make it more efficient, for example to make it possible for the hammer utility to have visibility into all deleted directory entries. It could use that visbility to do as-of accesses and through that mechanic would thus have visibility into all deleted files and directories. Does this mean, I'd see files like: /[EMAIL PROTECTED] /[EMAIL PROTECTED]/[EMAIL PROTECTED] /[EMAIL PROTECTED] Regards, Michael
Re: hammer prune explanation
:Hm, how would that work, if I want it to behave like the prune command? :I'd need to traverse a lot of filesystem trees, to just determine which :files were deleted. : :Imagine: : : compare /mnt with /[EMAIL PROTECTED] and prune deleted files. : : compare /[EMAIL PROTECTED] with /[EMAIL PROTECTED] ... : :I wouldn't find files that were deleted in between 1-hour-ago and :2-hours-ago. To make it work, I'd need to compare the filesystem trees :of every possible timestamp. : :Regards, : : Michael Files and directories are two different beasts. A file can have a complex history associated with it, because each data block within the file may individually have its own history. Directories are a lot simpler. The history for a directory is done on a directory-entry by directory-entry basis. A directory entry can only be created or deleted so all we need, really, is a mechanism to access the deleted directory entries. This wouldn't be an 'as-of' style access... it would be accessing all the active and deleted directory entries regardless of when they became deleted. Inode numbers are not reused so being able to access the deleted directory entry will give you the inode number, and hence the object id, of the deleted file(s) and sub-directories. The pruning code is based entirely on the object id and doesn't care what that object's representation or visibility is. It might just be easiest to provide ioctl()'s to allow a user program (aka the hammer utility) to directly scan the B-Tree. I dunno yet. I'm not going to worry about enhancing the utility too much until I get the filesystem stabilized under heavy parallel loads. That should be this week. I'm making very good progress. It takes several hours of extreme testing (parallel buildworld, blogbench, pruning, AND reblocking all going on at the same time) to hit an assertion now and it is clear there are only a handful of bugs left. The HAMMER source has 280 KKASSERT directives in it plus CRCs on both the data and meta-data so that is saying something. -Matt Matthew Dillon [EMAIL PROTECTED]
Re: hammer prune explanation
access the deleted directory entries. This wouldn't be an 'as-of' style access... it would be accessing all the active and deleted directory entries regardless of when they became deleted. What if I deleted a directory and re-created it? Best Regards, Ben Cadieux
Re: hammer prune explanation
: access the deleted directory entries. This wouldn't be an 'as-of' style : access... it would be accessing all the active and deleted directory : entries regardless of when they became deleted. : :What if I deleted a directory and re-created it? : :Best Regards, :Ben Cadieux The two directories (the old one and the new one) might be named the same, but they are physically different entities and will have different object ids. So the deleted directory will still be accessible via its parent directory (and so on recursively through to the root). -Matt Matthew Dillon [EMAIL PROTECTED]
Re: hammer prune explanation
:Hi, : :I don't understand the usage of : : hammer prune from xxx to yyy every zzz : :Could someone enlighten me, what the from and to exactly means? : :Does it mean, that all deleted records with an age between xxx and yyy :are considered for pruning? Starting from xxx, just keep deleted :records every zzz? : :Regards, : : Michael You got it. Note that 'deletions' also mean overwrites and changes. For example, if you chmod a file HAMMER will remember the old modes as a deleted record. So here's an example: hammer prune /mnt from 1d to 30d every 1d Anything between 1 day and 30 days old which has been deleted is subject to pruning. But you are also saying 'every 1d', which means you are telling HAMMER to *RETAIN* deleted records on 1-day boundaries. The 'every' specification becomes your snapshot granularity. So take some file X and say it was modified like this over the period of a couple of days: [ day1 ][ day2 ][ day3 ] aabbcdgggiiijjjkkkl The pruning run will physically remove and recover the space for records 'b', 'c', 'e', 'f', 'g', 'h', 'j', and 'k'. I will expand the range of the other deleted records that were retained in order to maintain a continuum space for AS-OF lookups: [ day1 ][ day2 ][ day3 ] lll Now I want you to also note a side effect of this, which is that the timestamp range for the deleted records that were retained has been expanded. This expansion of the range, which is done in order to maintain a consistent continuum for AS-OF lookups, can cause 'record creep'. This record creep is why the hammer prune function puts so many restrictions on the date ranges. You can do this: hammer prune /mnt from 5m to 1h every 5m But you can't do this: hammer prune /mnt from 5m to 1h every 6m The reason is that if you do not specify boundaries that are integral multiples within the range being pruned, the 'record creep' becomes unbounded... it just keeps creeping. If you prune using a configuration file you can give multiple ranges, like this (/etc/hammer.conf). Each successive range must be an integral multiples of the previous one to avoid record creep. prune /usr/obj from 15m to 12h every 15m prune /usr/obj from 12h to 7d every 12h prune /usr/obj from 7d to 35d every 1d prune /usr/obj from 35d to 350d every 7d I probably need to add some more features to the pruning directive, like a shortcut to tell HAMMER not to retain any data at all past a certain age. -Matt Matthew Dillon [EMAIL PROTECTED]
Re: hammer prune explanation
Matthew Dillon wrote: :Hi, : :I don't understand the usage of : : hammer prune from xxx to yyy every zzz : :Could someone enlighten me, what the from and to exactly means? : :Does it mean, that all deleted records with an age between xxx and yyy :are considered for pruning? Starting from xxx, just keep deleted :records every zzz? : :Regards, : : Michael You got it. Note that 'deletions' also mean overwrites and changes. For example, if you chmod a file HAMMER will remember the old modes as a deleted record. So here's an example: hammer prune /mnt from 1d to 30d every 1d [...] Thanks a lot! Could this great explanation (or parts of it) go into the man-page? I think it's very helpful, especially the visualization. Is it possible to prune according to the filename? For example: hammer prune /mnt/usr/obj from 2d everything hammer prune /mnt/usr/src from 1d to 10d every 1d Don't know if it is possible to implement... but would avoid the need for separate filesystems. Regards, Michael
Re: hammer prune explanation
Matthew Dillon wrote: :Thanks a lot! Could this great explanation (or parts of it) go into the :man-page? I think it's very helpful, especially the visualization. I am going to write up a whole paper on HAMMER. It's almost time for me to sit down and do it. :Is it possible to prune according to the filename? For example: : : hammer prune /mnt/usr/obj from 2d everything : hammer prune /mnt/usr/src from 1d to 10d every 1d : :Don't know if it is possible to implement... but would avoid the need :for separate filesystems. : :Regards, : : Michael The filesystem supports pruning on an object-by-object basis, so it is possible to prune a single file. The hammer utility does not currently have support for that, but it would not be difficult to add. If you want a little side project, add it to the utility! The core code that selects the object id range (aka inode numbers) is in /usr/src/sbin/hammer/cmd_prune.c line 74ish. Sounds good :) What I would like to do is have a more sophisticated pruning capability in general, such as based on wildcarding and/or an inherited chflag flag, or perhaps be able to specify a pruning category selector on a file by file basis. I don't know what the best approach is. Yeah, I was thinking about wildcarding as well. But is it possible to implement it within cmd_prune.c, or do I have to modify the ioctl kernel code? If done in cmd_prune.c, I somehow have to iterate over all deleted files and call the prune command for it. I thought, it's easier to introduce a check in the kernel, whether the file that should be pruned matches a given pattern. Doesn't sound very hard to do, if it is easy to get the pathname for a given inode. Are you thinking about something like the archive flag? Right now any serious HAMMER user need to set up at least a daily cron job to prune and reblock the filesystem. I add a '-t timeout' feature to the HAMMER utility to make allow the operations to be set up in a cron job and keep the filesystem up to snuff over a long period of time. So, e.g. you would have a nightly cron job that did this: # spend up to 5 minutes pruning the filesystem and another # 5 minutes reblocking it, then stop. hammer -t 300 prune /myfilesystem; hammer -t 300 reblock /myfilesystem Does this degrade filesystem seriously? Regards, Michael
Re: hammer prune explanation
:Yeah, I was thinking about wildcarding as well. : :But is it possible to implement it within cmd_prune.c, or do I have to :modify the ioctl kernel code? If done in cmd_prune.c, I somehow have to :iterate over all deleted files and call the prune command for it. : :I thought, it's easier to introduce a check in the kernel, whether the :file that should be pruned matches a given pattern. Doesn't sound very :hard to do, if it is easy to get the pathname for a given inode. : :Are you thinking about something like the archive flag? I think it is probably best to implement that level of sophistication in the utility rather then in the kernel. The pruning ioctl code has no concept of files or directories... literally it has no concept. All it understands, really, are object id's (aka inode numbers) and records. The hammer utility on the other hand can actually scan the filesystem hierarchy. Locating wholely deleted files and directories is not hard to do. As-of queries can be used to access earlier versions of a directory. We might want to add some kernel support to make it more efficient, for example to make it possible for the hammer utility to have visibility into all deleted directory entries. It could use that visbility to do as-of accesses and through that mechanic would thus have visibility into all deleted files and directories. Inode numbers are never reused, so the inode number (and hence object id) of a deleted file will be just as unique as the inode number for one that is still visible. : Right now any serious HAMMER user need to set up at least a daily : cron job to prune and reblock the filesystem. I add a '-t timeout' : feature to the HAMMER utility to make allow the operations to be : set up in a cron job and keep the filesystem up to snuff over a long : period of time. So, e.g. you would have a nightly cron job that : did this: : : # spend up to 5 minutes pruning the filesystem and another : # 5 minutes reblocking it, then stop. : hammer -t 300 prune /myfilesystem; hammer -t 300 reblock /myfilesystem : :Does this degrade filesystem seriously? : :Regards, : : Michael For the time it is running it will be maxing out the filesystem, e.g. similar to doing a 'find / ...'. The idea is to limit the run time (hence the -t) so your nightly cron job does a small chunk of the filesystem every night, resulting in a clean well ordered filesystem over a long period of time. So, for example, spend 10 minutes a day doing housekeeping. Filesystems are rarely operating at 100% 24x7 and there are other ways to spread out the overhead if it became necessary to do so. Usually picking a chunk of time during off-hours is sufficient. The reblocking code is very efficient when it doesn't have much to do, meaning that it will very quickly skip over blocks that have already been reblocked. The pruning code is not quite as efficient, it must scan the B-Tree within the object range specified (typically the whole tree), but it will still be able to scan things very quickly until it hits B-Tree nodes that require pruning. This means that it is effectively incremental given a long enough time period, and could be made incremental for real by adding an option to the hammer prune utility to adjust the starting object id to pick up where it left off last time. -Matt Matthew Dillon [EMAIL PROTECTED]
Re: hammer prune explanation
On 2008-05-10 22:59, Matthew Dillon wrote: :Yeah, I was thinking about wildcarding as well. : :But is it possible to implement it within cmd_prune.c, or do I have to :modify the ioctl kernel code? If done in cmd_prune.c, I somehow have to :iterate over all deleted files and call the prune command for it. : :I thought, it's easier to introduce a check in the kernel, whether the :file that should be pruned matches a given pattern. Doesn't sound very :hard to do, if it is easy to get the pathname for a given inode. : :Are you thinking about something like the archive flag? I think it is probably best to implement that level of sophistication in the utility rather then in the kernel. The pruning ioctl code has no concept of files or directories... literally it has no concept. All it understands, really, are object id's (aka inode numbers) and records. The hammer utility on the other hand can actually scan the filesystem hierarchy. Locating wholely deleted files and directories is not hard to do. As-of queries can be used to access earlier versions of a directory. We might want to add some kernel support to make it more efficient, for example to make it possible for the hammer utility to have visibility into all deleted directory entries. It could use that visbility to do as-of accesses and through that mechanic would thus have visibility into all deleted files and directories. Inode numbers are never reused, so the inode number (and hence object id) of a deleted file will be just as unique as the inode number for one that is still visible. : Right now any serious HAMMER user need to set up at least a daily : cron job to prune and reblock the filesystem. I add a '-t timeout' : feature to the HAMMER utility to make allow the operations to be : set up in a cron job and keep the filesystem up to snuff over a long : period of time. So, e.g. you would have a nightly cron job that : did this: : :# spend up to 5 minutes pruning the filesystem and another :# 5 minutes reblocking it, then stop. :hammer -t 300 prune /myfilesystem; hammer -t 300 reblock /myfilesystem : :Does this degrade filesystem seriously? : :Regards, : : Michael For the time it is running it will be maxing out the filesystem, e.g. similar to doing a 'find / ...'. The idea is to limit the run time (hence the -t) so your nightly cron job does a small chunk of the filesystem every night, resulting in a clean well ordered filesystem over a long period of time. So, for example, spend 10 minutes a day doing housekeeping. Filesystems are rarely operating at 100% 24x7 and there are other ways to spread out the overhead if it became necessary to do so. Usually picking a chunk of time during off-hours is sufficient. That will probably work quite well for servers which are running 24x7, but how about using HAMMER on desktops/laptops (which might not be running except when in use, though the disk might not be used all the time). Could some kind of low priority process be used instead? Perhaps one that only runs for less than a minute but instead runs every 10 minutes or so, the idea being to spread out the pruning so that it wont affect (severely) normal usage but still keep the FS in good shape. -- Erik Wikström
Re: hammer prune explanation
:That will probably work quite well for servers which are running 24x7, :but how about using HAMMER on desktops/laptops (which might not be :running except when in use, though the disk might not be used all the :time). Could some kind of low priority process be used instead? Perhaps :one that only runs for less than a minute but instead runs every 10 :minutes or so, the idea being to spread out the pruning so that it wont :affect (severely) normal usage but still keep the FS in good shape. : :-- :Erik Wikström Yes, it would be fairly easy to automate it. Maybe make a 'hammer -S' and add directives to /etc/hammer.conf to tell it what to test for and how often to run some pruning/reblocking ops. The ioctls could be augmented to add a block I/O limit before it returns, to better control the load. I'm not going to do it right now but it someone wants a little side project this would be a great one. If it isn't done in the next few weeks I'll sit down and do it myself. -Matt Matthew Dillon [EMAIL PROTECTED]
Re: hammer prune explanation
On Sat, May 10, 2008 8:16 pm, Matthew Dillon wrote: Yes, it would be fairly easy to automate it. Maybe make a 'hammer -S' and add directives to /etc/hammer.conf to tell it what to test for and how often to run some pruning/reblocking ops. The ioctls could be augmented to add a block I/O limit before it returns, to better control the load. I'm not going to do it right now but it someone wants a little side project this would be a great one. If it isn't done in the next few weeks I'll sit down and do it myself. I was thinking when I read the earlier post about cron entries for hammer maintenance that it would be nice to have a 'hammerd' daemon with sensible defaults; the idea of having to make separate crontab entries for a filesystem by default makes me itchy.