Hello! My name is Grzegorz Borowiak and I am a programmer. I work for a company which uses rsync internally, to distribute our continuously changing development environment. The environment weighs several gigabytes and consists of over 100000 files, most of them binary, so VCS-es like git and subversion are not an option, but rsync performs very efficiently.
However, I would like to add some features, which we need, and they are generic enough to be useful for someone else, so I would like to add them in a way which would allow them to be contributed. Feature 1: tags Our environment is large, but modularised, i.e. every file in it belongs to some module. Not every user needs not every module, so the download by rsync is parametrised by checking or unchecking the modules. However, currently this is implemented as filters, which include or exclude some files by their path or, in some cases, by substrings in file names. To make modularisation more straightforward, and not limited by necessity of differentiation between files by path or name, I propose to introduce concept of tags. Every file could be tagged with some string as an xattr (for example, user.rsync.tag=TAG), and in downloading rsync invocation you could specify a parameter --tag=TAG. This option could be specified more than once. rsync, once invoked in such way, would affect: - all files without tag at all - all files which match any of specified tags Other approach would be to use multiple tags for each file. This would be achieved by setting or unsetting xattrs like user.rsync.tag.TAG. If a file is tagged by tags "a" and "b", it has xattrs user.rsync.tag.a and user.rsync.tag.b. This would allow to divide more finely and be able to use logical expressions, like --tag-expr='a || (b && !c)' would specify all files with have tag "a" or have tag "b" but not "c". rsync already uses xattrs for storing metadata in fake super mode, so it seems a natural way to implement tags. In both approaches, the filtering could be integrated with filter rules. If a modifier "t" were appended after "+", "-", "H", "S", "P" or "R", it would treat the following expression not as a path matching pattern, but rather as a tag or logical combination of tags. For example, the following rule: "+t base" would include all files with tag "base" "Ht gui" would hide all files with tag "gui" "Ht a && !b" would hide all files tagged with "a" but not "b" Feature 2: saving local modifications Our users frequently do some local modifications. They always get lost when they rsync with newer version. I would like to make it possible to detect these modifications and backup that file. There is already --backup option, but this is insufficient, as it saves too many files -- also those which were not locally modified. To solve this problem, I would like to use xattr again and introduce the user.rsync.md5sum, which would store the md5sum of that file; when a file is going to be overwritten or deleted by rsync, it first calculates md5sum for it and if it differs from what is in xattr, the file is saved to backup. If a file has no md5sum xattr at all, it is also saved to backup, as this was for sure created locally. Another, quicker and less demanding, but imperfect method would be to create a special file after each downloading rsync, which would serve as a timestamp, and treat all files with newer mtime as locally modified. And here go my questions: - is any of above features already implemented in some form, or is being implemented now (in-progress)? - for feature 1, which solution would you prefer: single or multiple tagging? - for feature 1, is this a good idea to extend filter rules to handle tags, or it is better to stay with standalone arguments? - for feature 2, which solution would you prefer: md5sum, timestamp, or both (they can be implemented both) - 'fake super' uses user.rsync.%stat xattr; is the percent sign a part of some convention, which my xattrs should also follow? - did I miss something? - do you have other ideas how to provide these features? - what are the coding guidelines for rsync development? -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html