Dear All,

Along with the files that suddenly disappear, we have a bit of a problem with these that do not. Namely, in my test runs I can see a small but stable set of files, that rsync is repeatedly trying to transfer, and then repeatedly fails to updates their times ; and then the story repeats itself.

The situation is illustrated by the log snippet below, where I have changed file names to protect owners privacy. In the tests, I am using rsync 3.1.1 since this is what is presently supported by the OS release maintainers, and I was trying to avoid any new issues if possible. But in my understanding, rsync 3.1.2 would have had the same problem here.

This issue is in many ways similar to bug https://bugzilla.samba.org/show_bug.cgi?id=4977 and the like, and in this particular case is caused by files whose mtime nanoseconds exist on the filesystem and are broken. However, since there is a workaround to bypass a similar issue with symlinks, I believe we can do something about such files as well: with just a few broken timestamps on a large filesystem, it is very desirable to be able to avoid or suppress code 23 ('partial transfer') when all the file data and 99.9% of the timestamps were transferred properly.

After the log snippets below, I will try to explain the problem as I see it, and then suggest two different ways to address it: basically ignore error 22 (EINVAL) of fix nanosecond mtime .

===
...
2017/04/06 00:16:06 [16000] <f..tp..... filename1
2017/04/06 00:16:06 [16000] <f..tp..... filename2
2017/04/06 00:16:06 [16000] rsync: failed to set times on "filename1.ezq12g" (in module): Invalid argument (22) 2017/04/06 00:16:06 [16000] rsync: failed to set times on "filename2.wp5Aju" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] <f..t...... filename3
2017/04/06 00:16:09 [16000] rsync: failed to set times on "filename3.8Fa3LO" (in module): Invalid argument (22)
2017/04/06 00:16:09 [16000] .d..t...... dir1/
2017/04/06 00:16:09 [16000] <f..t...... filename4
2017/04/06 00:16:09 [16000] <f.st...... filename5
2017/04/06 00:16:09 [16000] .d..t...... dir2/
2017/04/06 00:16:09 [16000] <f..t...... filename6
2017/04/06 00:16:09 [16000] .d..t...... dir3/
2017/04/06 00:16:09 [16000] <f.st...... dir3/filename7
2017/04/06 00:16:10 [16000] sent 161,021,998 bytes received 1,858,287 bytes 868,694.85 bytes/sec 2017/04/06 00:16:10 [16000] total size is 1,228,368,779,326 speedup is 7,541.54 2017/04/06 00:16:10 [16000] rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1183) [sender=3.1.1]

===

With some extra verbosity, the logs give some insight:

===
2017/04/07 18:44:41 [27597] recv_files(filename1)
2017/04/07 18:44:41 [27597] got file_sum
2017/04/07 18:44:41 [27597] set uid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set gid of filename1.ZjS8YL from 0 to *****
2017/04/07 18:44:41 [27597] set modtime of filename1.ZjS8YL to (370409472) Sun Sep 27 03:31:12 1981 2017/04/07 18:44:41 [27597] rsync: failed to set times on "filename1.ZjS8YL" (in users): Invalid argument (22)
2017/04/07 18:44:41 [27597] renaming filename1.ZjS8YL to filename1
===

Both 'cp -a filename1 somewhere_else' and 'touch -r filename1 something_else' fail as well, and in a bug tracker we can stop here with a WONTFIX.

However, copying times with stat() / utime() from filename1 to its destination fixes all its time transfer problems once and for all, so certainly something can be done here.

Now I will have to apologize for the length of the letter -- and start discussing possible solutions.

(a) First of all, we can _fix_ the time on destination. ( I do not _own_ the files in question; I'm just trying to make sure the data is replicated from one location to another -- ideally, as accurately as possible. )

Debugging the GNU/touch utility code, I can see that for all these files .tv_nsec field of the timespec structure gets bogus values: either negative, or far over 1000000000 (one second), and in my opinion this eventually causes utimensat() / futimens() to fail .

To remind,  struct timespec is defined as follows:

===
           struct timespec {
               time_t tv_sec;        /* seconds */
               long   tv_nsec;       /* nanoseconds */
           };
===

Apart from a terrible idea to define a disk-related data field being dependent on the CPU word size ( both time_t and long have the word size on GCC ), there is an obvious problem here: while tv_sec field, by definition, _can_not_ have a "wrong" value (e.g. negative numbers here may look weird, but they are not, per se, _wrong_) -- tv_nsec field actually can have an unacceptable value.

First of all, it is not expected to be negative: imho on Linux UTIME_OMIT is (-2), that is, we ban negative nanoseconds. Next, although the standard never says so, it is apparently not /expected/ for these values to be greater than one second: for 32-bit longs, this is not a big deal, since a couple of extra seconds may just lie within very common time jitter. For 64-bit longs, however, one could stuff the whole Unix timestamp in that field, so we would probably want to ban that as well for the sake of general sanity.

Now back to rsync: looking at set_modtime() definition in util.c, one can clearly see that rsync developers take a very sane and reasonable approach by declaring mod_nsec as an uint32 -- that is, both unsigned, and having a byte-fixed reasonable size. ( This is imho how tv_nsec field should have been declared! )

Furthermore, rsync 3.1.1 -- see set_file_attrs() in rsync.c -- compares only mtime _seconds_, and takes no attempt to adjust mtimes further if the "classic" .st_mtime values coincide. ( And this is why all the "Invalid argument (22)" problems disappear after we transfer .st_mtime values by an utime() call. )

Therefore, still for rsync 3.1.1 -- first, any bogus negative value for .st_mtim.tv_nsec will be silently converted to a uint32 ; and second, any bogus value which is far over one second will be truncated.

I'll discuss version 3.1.2 in a moment, but it is rather tempting to agree that if we discard bogus nanosecond .st_mtim.tv_nsec values and set them to 0, we will in fact transfer all the same data and copy the timestamps as close as we can, and moreover -- will not attempt to transfer these files again.

This is a proposed patch to do the above: https://gist.github.com/anonymous/1eb789c825a8eaa1bdb4116334ba52da .

Version 3.1.2 takes a slightly more aggressive approach, and tries to compare the nanosecond part of the timestamp as well. However, since --modify-window parameter is still supported, in my understanding this happens only if the file was originally nominated for a transfer, so that the selection is still controlled by cmp_time() from util.c ; in other words, if the destination has the same value of .st_mtime field as the source, and the same contents etc -- no data transfer would be attempted.

One could possibly recompile rsync, telling it to ignore all nanosecond values whatsoever, but this is not exactly what we want here: it is rather desirable to transfer the existing terabytes of data as accurately as possible, omitting just a few files whose times were broken anyway.

At the moment I can not determine what makes these files to appear in the first place, but the fact is that they exist, and although are very small in numbers, can be very annoying.

(b) Another option is to allow rsync to ignore setting mtime failure, and to return RERR_OK irregardless of that.

The downside is that rsync will keep retrying to resend this data over and over again. The plus side, however, is that setting mtimes is a very complex task that can go wrong in a number of different ways on a number of platforms, so it may be desirable to let the user ignore this error in the case if the bulk of the data was transferred correctly.

In that case, we will probably need another rsync option, something like --ignore-missing-mtimes . Since choosing a proper name is a whole another problem, I am adding here a "wrong" patch that does not introduce extra rsync options, but rather extends '--ignore-missing-args' functionality to ignore mtime setting errors as well :

https://gist.github.com/anonymous/e6c95dc424837e09015104b1d5a2a59e

This patch works for both rsync 3.1.1 and 3.1.2, since the addressed code did not change much between the versions.

I sincerely hope that rsync can address the nanosecond mtimes issue in one way or another, since apparently it arises from time to time and when it does, can be very annoying.

Sincerely,
George Fedorov

--

George Fedorov
Senior Systems Specialist
Melbourne School of Engineering
The University of Melbourne, Victoria 3010, Australia
phone: *****
email: *****
http://www.eng.unimelb.edu.au/

-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to