https://bugzilla.samba.org/show_bug.cgi?id=14109

            Bug ID: 14109
           Summary: Support Custom Fuzzy Basis Selection Algorithm
           Product: rsync
           Version: 3.1.3
          Hardware: All
                OS: All
            Status: NEW
          Severity: normal
          Priority: P5
         Component: core
          Assignee: wa...@opencoder.net
          Reporter: lonnie...@yahoo.com
        QA Contact: rsync...@samba.org

The --fuzzy argument does an incredible job at syncing large files when it
chooses the correct fuzzy basis.

However, the default "fuzzy-basis-destination-file-selection algorithm" is not
correct for every situation, so I propose the ability to pass an argument to
the fuzzy parameter that specifies which
"fuzzy-basis-destination-file-selection algorithm" to use.

I've posted a question detailing my needs here:
https://unix.stackexchange.com/questions/538548/

In short, some of the files in my source-folder are 200GB in size. When rsync
chooses the correct existing-destination-file for its "fuzzy basis", my
synchronization (of these files) seems magical in term of the data that gets
transferred over the wire.

However, when it chooses the wrong existing-destination-file as the source
file's fuzzy basis, the data transfer can take days.

Look at the filenames in both my source-folder an destination-folder (below):

    # Source Folder's new files (from today's on-site backup):
    file100-2019_09-01_12am.log
    file100-2019_09-01_12am.lzo
    file101-2019_09-01_12am.log
    file101-2019_09-01_12am.lzo
    file102-2019_09-01_12am.log
    file102-2019_09-01_12am.lzo

    # Destination-Folder's old files (from yesterday's off-site backup):
    file100-2019_08-31_12am.log
    file100-2019_08-31_12am.lzo
    file101-2019_08-31_12am.log
    file101-2019_08-31_12am.lzo
    file102-2019_08-31_12am.log
    file102-2019_08-31_12am.lzo

In my case, the fuzzy-basis-selection-algorithm needs to select the existing
destination-file that:

1) Has the same file extension as the source file
2) Begins with the most consecutively identical characters as the source file

The default algorithm does not meet these requirements.

Therefore, I propose the ability to pass an argument that allows the user to
specify non-default fuzzy basis selection algorithms.

There should probably be a few common, baked-in ones (as time goes on) that you
can choose from by name and it would be even more flexible if rsync also
permitted the user the ability pass a file into the command that specifies a
custom "fuzzy-basis-destination-file-selection algorithm".

Naturally, if these features are granted, the documentation would also need to
be update to give guidance on specifying these things.

If these things are already implemented, and I have somehow overlooked them,
would you kindly post an answer to my question here?:
https://unix.stackexchange.com/questions/538548/

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
  • [Bug 14109] New: Supp... just subscribed for rsync-qa from bugzilla via rsync

Reply via email to