https://bugzilla.samba.org/show_bug.cgi?id=14109
Bug ID: 14109 Summary: Support Custom Fuzzy Basis Selection Algorithm Product: rsync Version: 3.1.3 Hardware: All OS: All Status: NEW Severity: normal Priority: P5 Component: core Assignee: wa...@opencoder.net Reporter: lonnie...@yahoo.com QA Contact: rsync...@samba.org The --fuzzy argument does an incredible job at syncing large files when it chooses the correct fuzzy basis. However, the default "fuzzy-basis-destination-file-selection algorithm" is not correct for every situation, so I propose the ability to pass an argument to the fuzzy parameter that specifies which "fuzzy-basis-destination-file-selection algorithm" to use. I've posted a question detailing my needs here: https://unix.stackexchange.com/questions/538548/ In short, some of the files in my source-folder are 200GB in size. When rsync chooses the correct existing-destination-file for its "fuzzy basis", my synchronization (of these files) seems magical in term of the data that gets transferred over the wire. However, when it chooses the wrong existing-destination-file as the source file's fuzzy basis, the data transfer can take days. Look at the filenames in both my source-folder an destination-folder (below): # Source Folder's new files (from today's on-site backup): file100-2019_09-01_12am.log file100-2019_09-01_12am.lzo file101-2019_09-01_12am.log file101-2019_09-01_12am.lzo file102-2019_09-01_12am.log file102-2019_09-01_12am.lzo # Destination-Folder's old files (from yesterday's off-site backup): file100-2019_08-31_12am.log file100-2019_08-31_12am.lzo file101-2019_08-31_12am.log file101-2019_08-31_12am.lzo file102-2019_08-31_12am.log file102-2019_08-31_12am.lzo In my case, the fuzzy-basis-selection-algorithm needs to select the existing destination-file that: 1) Has the same file extension as the source file 2) Begins with the most consecutively identical characters as the source file The default algorithm does not meet these requirements. Therefore, I propose the ability to pass an argument that allows the user to specify non-default fuzzy basis selection algorithms. There should probably be a few common, baked-in ones (as time goes on) that you can choose from by name and it would be even more flexible if rsync also permitted the user the ability pass a file into the command that specifies a custom "fuzzy-basis-destination-file-selection algorithm". Naturally, if these features are granted, the documentation would also need to be update to give guidance on specifying these things. If these things are already implemented, and I have somehow overlooked them, would you kindly post an answer to my question here?: https://unix.stackexchange.com/questions/538548/ -- You are receiving this mail because: You are the QA Contact for the bug. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html