Is it possible to set max file size on target?
Hi, I'm trying to prevent files larger than a certain size to be backed-up on my server. rsync has an option --max-size that lets you control the file transfer size. However, this can be changed by the client. I was wondering if this option can be set on the server side, so that I can be sure my server won't accept files that do not meet the size requirement. I've checked both the rsync and rsyncd man pages and archives so far. Any help would be greatly appreciated! Thanks. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: max file size
On Fri, 13 Nov 2009 01:38:48 -0500 Matt McCutchen m...@mattmccutchen.net wrote: On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote: Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen: On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote: does anybody know what's the maximum file size (terabytes?) when using rsync with options --checksum and / or --inplace? What file sizes have been tested in reality? Are there any experiences using rsync (with --checksum and / or --inplace) for big files with several / dozens or terabytes? I don't believe rsync has a fixed maximum size other than what can fit in 64 bits, but I can't speak to any reliability issues that might come up with extremely large files. I've read about a fix for overrun checksum buffers with more than some hundred terabytes but that was just something undefined . . . Indeed, I forgot about that. The delta-transfer algorithm doesn't work for files longer than 2^31 blocks. With the default maximum block size of 2^17, the limit is 2^48 bytes or 256 TB. You could stretch the limit by fixing a larger block size with --block-size . See: https://bugzilla.samba.org/show_bug.cgi?id=5459#c2 Thanks for that information! Do you (or anybody) every has done a test with big file sizes? For what purpose are you considering --checksum? In the case where the file's size hasn't changed (probably true for large image files), it will add an extra full read of the file on both sides before the transfer begins, which would be very expensive for multi-terabyte files. I want to check if the following is possible: 1. transport a big block of data (several terabytes) physically from location A to location B (very long distance) via tapes (or disks). (Location A and B use different storage technologies.) When the tapes arrive in location B, the block of data has changed in location A (a program / OS is running and storing data in it). 2. shutdown application / OS in location A, rsync the delta between Location A and B online, then restart the system in location B. (Perhaps step 2 has to be done multiple times.) Since the source and destination versions are practically certain to differ, --checksum would serve no purpose. See the man page description of --checksum. Don't understand what you mean. From 1. und 2., only a few percent of the data will change, so the idea is to transfer the differences only. Transferring the whole file online takes too long. How to do this without check sums (either --checksum or --inbound)? I'll probably be able to make a test with a file size of some terabytes in the next weeks, but that's not guaranteed. Regards, HJC -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: max file size
On Fri, 2009-11-13 at 12:36 +0100, Heinz-Josef Claes wrote: On Fri, 13 Nov 2009 01:38:48 -0500 Matt McCutchen m...@mattmccutchen.net wrote: On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote: I want to check if the following is possible: 1. transport a big block of data (several terabytes) physically from location A to location B (very long distance) via tapes (or disks). (Location A and B use different storage technologies.) When the tapes arrive in location B, the block of data has changed in location A (a program / OS is running and storing data in it). 2. shutdown application / OS in location A, rsync the delta between Location A and B online, then restart the system in location B. (Perhaps step 2 has to be done multiple times.) Since the source and destination versions are practically certain to differ, --checksum would serve no purpose. See the man page description of --checksum. Don't understand what you mean. From 1. und 2., only a few percent of the data will change, so the idea is to transfer the differences only. Transferring the whole file online takes too long. How to do this without check sums (either --checksum or --inbound)? Did you read the description of --checksum as I suggested? It is an alternative quick check for deciding whether a file needs to be transferred, which is not what you want. You're talking about the delta-transfer algorithm, which is on by default for remote runs and is controlled by a separate option, --(no-)whole-file. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: max file size
On Mon, 2009-11-09 at 18:20 +0100, Heinz-Josef Claes wrote: Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen: On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote: does anybody know what's the maximum file size (terabytes?) when using rsync with options --checksum and / or --inplace? What file sizes have been tested in reality? Are there any experiences using rsync (with --checksum and / or --inplace) for big files with several / dozens or terabytes? I don't believe rsync has a fixed maximum size other than what can fit in 64 bits, but I can't speak to any reliability issues that might come up with extremely large files. I've read about a fix for overrun checksum buffers with more than some hundred terabytes but that was just something undefined . . . Indeed, I forgot about that. The delta-transfer algorithm doesn't work for files longer than 2^31 blocks. With the default maximum block size of 2^17, the limit is 2^48 bytes or 256 TB. You could stretch the limit by fixing a larger block size with --block-size . See: https://bugzilla.samba.org/show_bug.cgi?id=5459#c2 For what purpose are you considering --checksum? In the case where the file's size hasn't changed (probably true for large image files), it will add an extra full read of the file on both sides before the transfer begins, which would be very expensive for multi-terabyte files. I want to check if the following is possible: 1. transport a big block of data (several terabytes) physically from location A to location B (very long distance) via tapes (or disks). (Location A and B use different storage technologies.) When the tapes arrive in location B, the block of data has changed in location A (a program / OS is running and storing data in it). 2. shutdown application / OS in location A, rsync the delta between Location A and B online, then restart the system in location B. (Perhaps step 2 has to be done multiple times.) Since the source and destination versions are practically certain to differ, --checksum would serve no purpose. See the man page description of --checksum. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
max file size
Hello, does anybody know what's the maximum file size (terabytes?) when using rsync with options --checksum and / or --inplace? What file sizes have been tested in reality? Are there any experiences using rsync (with --checksum and / or --inplace) for big files with several / dozens or terabytes? Thanks a lot, Heinz-Josef Claes -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: max file size
On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote: does anybody know what's the maximum file size (terabytes?) when using rsync with options --checksum and / or --inplace? What file sizes have been tested in reality? Are there any experiences using rsync (with --checksum and / or --inplace) for big files with several / dozens or terabytes? I don't believe rsync has a fixed maximum size other than what can fit in 64 bits, but I can't speak to any reliability issues that might come up with extremely large files. For what purpose are you considering --checksum? In the case where the file's size hasn't changed (probably true for large image files), it will add an extra full read of the file on both sides before the transfer begins, which would be very expensive for multi-terabyte files. -- Matt -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: max file size
Am Montag, 9. November 2009 17:48:35 schrieb Matt McCutchen: On Mon, 2009-11-09 at 11:43 +0100, Heinz-Josef Claes wrote: does anybody know what's the maximum file size (terabytes?) when using rsync with options --checksum and / or --inplace? What file sizes have been tested in reality? Are there any experiences using rsync (with --checksum and / or --inplace) for big files with several / dozens or terabytes? I don't believe rsync has a fixed maximum size other than what can fit in 64 bits, but I can't speak to any reliability issues that might come up with extremely large files. I've read about a fix for overrun checksum buffers with more than some hundred terabytes but that was just something undefined . . . For what purpose are you considering --checksum? In the case where the file's size hasn't changed (probably true for large image files), it will add an extra full read of the file on both sides before the transfer begins, which would be very expensive for multi-terabyte files. I want to check if the following is possible: 1. transport a big block of data (several terabytes) physically from location A to location B (very long distance) via tapes (or disks). (Location A and B use different storage technologies.) When the tapes arrive in location B, the block of data has changed in location A (a program / OS is running and storing data in it). 2. shutdown application / OS in location A, rsync the delta between Location A and B online, then restart the system in location B. (Perhaps step 2 has to be done multiple times.) -- There a lots of other aspects in this scenario, but that's another story. Regards, HJC -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Fri, Apr 24, 2009 at 02:19:42PM -0400, Ian! D. Allen wrote: There is no mention of the concept of transfer rule in the rsync man page. I offer some proposed man page wording changes, below. Thanks. I have committed some manpage changes that clarify this unexpected behavior. At some point rsync may allow actual filtering of files by their (non-name) attributes, which would avoid this situation. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Fri, Apr 24, 2009 at 02:19:41PM -0400, Ian! D. Allen wrote: On Fri, Apr 24, 2009 at 07:51:35AM -0700, Wayne Davison wrote: This is because --min-size is a transfer rule, not an exclude rule. There is no mention of the concept of transfer rule in the rsync man page. There is another oblique reference to transfer rule in --compare-dest for which I offer this man page clarification: --compare-dest=DIR This transfer rule instructs rsync to use DIR on the destination machine as an additional hierarchy to compare destination files against doing transfers (if the files are missing in the destination directory). If a file is found in DIR that is identical to the sender's file, the file will NOT be transferred to the destination directory. This is useful for creating a sparse backup of just files that have changed from an earlier backup, though all the directories in the file-list will still be created (most of them likely empty). Unlike a filter/exclude rule, this option does not affect the file-list, so --prune-empty-dirs will not work with this option. -m, --prune-empty-dirs This option tells the receiving rsync to get rid of empty directories from the file-list, including nested directories that have no non-directory children. This is useful for avoiding the creation of a bunch of useless directories when the sending rsync is recursively scanning a hierarchy of files using include/exclude/filter rules. It does not prevent the creation of empty directories that result from the use of transfer rules such as --max-size, --min-size, or --compare-dest, since transfer rules do not affect the file-list. -- | Ian! D. Allen - idal...@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Thu 23 Apr 2009, Ian! D. Allen wrote: In the man page it says in one place tells the receiving rsync to get rid of empty directories from the file-list and in another place it says prune empty directory chains from file-list. The latter sounds like it operates on the source list, not on the receiving list, and if rsync were Actually, to me it sounds quite like the same thing. I don't think the intention is to actually delete empty directories at the receiving end; only to prevent them being created. So once they're created due to perhaps an earlier invocation without purge-empty-dirs, you'll have to remove them by hand. Paul -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Fri, Apr 24, 2009 at 10:23:06AM +0200, Paul Slootman wrote: I don't think the intention is to actually delete empty directories at the receiving end; only to prevent them being created. I have not yet found out how to prevent empty directories from being created when using --max-size or --min-size. As I showed in my original post to this list, --prune-empty-dirs does not do it. Either the man page is wrong/misleading/incomplete, I am misunderstanding it badly, or rsync is broken. I am fully prepared to believe that I am misunderstanding something and will happly work on a better man page wording when the truth is revealed to me. I've supplied a short script below that you can use to see the problem yourself. So once they're created due to perhaps an earlier invocation without purge-empty-dirs, you'll have to remove them by hand. As my script below shows, the destination directory does not even exist. There is no previously-created content in it at all, and yet rsync creates empty directories even though I say --prune-empty-dirs. Why? How do I make --prune-empty-dirs do what the man page says it does? #!/bin/sh -u # start with fresh empy directories for source and destination tmp1=/tmp/one$$ tmp2=/tmp/two$$ rm -rf $tmp1 $tmp2 echo '*** create the source directory with six subdirectories' for i in 1 2 3 4 5 6 ; do mkdir -p $tmp1/dir$i done echo '*** create three small files in dir1 dir2 dir3' for i in 1 2 3 ; do dd bs=1M count=1 if=/dev/zero of=$tmp1/dir$i/smallfile done echo '*** create three big files in dir4 dir5 dir6' for i in 4 5 6 ; do dd bs=1M count=11 if=/dev/zero of=$tmp1/dir$i/BIGFILE done echo '*** rsync should copy only the big files and prune all empty directories' rsync -ai --min-size 10M --prune-empty-dirs $tmp1 $tmp2 echo '*** find should show no empty directories, but there are three - why?' find $tmp2 -empty echo '*** replace --min-size with an --exclude and it works fine:' rm -r $tmp2 rsync -ai --exclude smallfile --prune-empty-dirs $tmp1 $tmp2 find $tmp2 -empty # shows no output - this is correct and expected echo *** Why doesn't --prune-empty-dirs work with --min-size and --max-size? rm -r $tmp1 $tmp2 -- | Ian! D. Allen - idal...@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Wed, Apr 22, 2009 at 02:20:37AM -0400, Ian! D. Allen wrote: I want to use --min-size to copy just large files (and their necessary parent directories), but everything I've tried copies *all* the source directories, and creates them empty on the destination even if they don't have any big files in them. I only want the minimal directory hierarchies that contain the big files. This is because --min-size is a transfer rule, not an exclude rule. An exclude rule would affect deletions, and --min-size just affects what is transferred out of the full set of files that are present. Thus, the dirs with smaller files are not actually empty, they just don't have any files that match the transfer rule. There is not currently a way include/exclude files based on size in rsync. ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
On Fri, Apr 24, 2009 at 07:51:35AM -0700, Wayne Davison wrote: This is because --min-size is a transfer rule, not an exclude rule. There is no mention of the concept of transfer rule in the rsync man page. I offer some proposed man page wording changes, below. The man page says This option tells the receiving rsync to get rid of empty directories from the file-list - there is no mention that there must be two *kinds* of empty directories in the file list: (1) empty directories created by filter/exclude rules and (2) empty directories created by transfer rules. Or perhaps (2) doesn't really exist, but the sending rsync simply never gets around to sending the files that it says should be in those directories and so the receiving rsync does all that directory creation work but the promised files never arrive to fill them. There is not currently a way include/exclude files based on size in rsync. That is most awkward, given that --min-size sure sounds like it behaves this way. It is an annoyingly fine distinction to say that exclude and avoid transferring are two different kinds of operations when it comes to rsync pruning empty directories. This needs to be made much clearer in the man page. I offer these slightly reworded paragraphs: -m, --prune-empty-dirs This option tells the receiving rsync to get rid of empty directories from the file-list, including nested directories that have no non-directory children. This is useful for avoiding the creation of a bunch of useless directories when the sending rsync is recursively scanning a hierarchy of files using include/exclude/filter rules. It does not prevent the creation of empty directories that result from the use of transfer rules such as --max-size or --min-size, since transfer rules do not affect the file-list. --max-size=SIZE This transfer rule tells rsync to avoid transferring any file that is larger than the specified SIZE. Unlike a filter/exclude rule, it does not affect the file-list, so --prune-empty-dirs will not work with this option. --min-size=SIZE This transfer rule tells rsync to avoid transferring any file that is smaller than the specified SIZE, which can help in not transferring small, junk files. Unlike a filter/exclude rule, it does not affect the file-list, so --prune-empty-dirs will not work with this option. Thanks for keeping rsync alive and kicking! -- | Ian! D. Allen - idal...@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: purge-empty-dirs and max-file-size confusion
$ rsync -ai --min-size 10M --prune-empty-dirs /home/idallen/test /tmp/foo Have you tried --no-dirs? Why should I need it? I've explicitly told the receiving side don't create empty directories and that should be sufficient. I shouldn't need any other options. (In any case, I just tried --no-dirs and it didn't change the result. I still get piles of empty directories.) Perhaps the man page lies, and --prune-empty-dirs does not operate on the receiving side at all? In the man page it says in one place tells the receiving rsync to get rid of empty directories from the file-list and in another place it says prune empty directory chains from file-list. The latter sounds like it operates on the source list, not on the receiving list, and if rsync were operating on the source list it would explain the current misbehaviour. Has nobody ever wondered about this before? I suppose I shall have to Read The Source to find out what is wrong. Please someone enlighten me about what I'm missing, before I start digging around in there... -- | Ian! D. Allen - idal...@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
purge-empty-dirs and max-file-size confusion
I want to use --min-size to copy just large files (and their necessary parent directories), but everything I've tried copies *all* the source directories, and creates them empty on the destination even if they don't have any big files in them. I only want the minimal directory hierarchies that contain the big files. This doesn't work: $ rm -rf /tmp/foo $ rsync -ai --min-size 10M --prune-empty-dirs /home/idallen/test /tmp/foo cd+ test/ cd+ test/dir1/ cd+ test/dir2/ cd+ test/dir3/ cd+ test/dir4/ f+ test/dir4/BIGFILE cd+ test/dir5/ f+ test/dir5/BIGFILE cd+ test/dir6/ f+ test/dir6/BIGFILE Wrong. I don't want all those dir1, dir2, dir3 empty directories. I don't want *any* empty directories, at any level. What am I missing? -- | Ian! D. Allen - idal...@idallen.ca - Ottawa, Ontario, Canada | Home Page: http://idallen.com/ Contact Improv: http://contactimprov.ca/ | College professor (Open Source / Linux) via: http://teaching.idallen.com/ | Defend digital freedom: http://eff.org/ and have fun: http://fools.ca/ -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html