bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Jim Meyering wrote: Linda A. Walsh wrote: Hmmm Dang strange processes on bugs... can't submit directly bug can just by emailing it to the email list? ... (bureaucracy!) Linda Walsh wrote: This should be filed under bugs, not under support, but it seems that users of the core utilis are ot allowed to find bugs...convenient. Thanks for the report. Please do not use savannah's bug or support interfaces for coreutils. We deliberately disabled the former. Now, when you send a message to the bug-coreutils mailing list, it creates a ticket for you. Yours is here: [1]http://bugs.gnu.org/10055 Simply replying to any mail about it adds entries to its log. But that's not the bug db interface...thats just a log...where? the bug db intface for the bug in the bug database? References 1. http://bugs.gnu.org/10055
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Linda A. Walsh wrote: ... But that's not the bug db interface...thats just a log...where? the bug db intface for the bug in the bug database? References 1. http://bugs.gnu.org/10055 Here's a description of the interface: http://debbugs.gnu.org/
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Original Message Subject: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions Date: Tue, 15 Nov 2011 17:58:23 + From: Linda A. Walsh invalid.nore...@gnu.org To: Linda A. Walsh URL: http://savannah.gnu.org/support/?107875 Summary: BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions Project: GNU Core Utilities Submitted by: law Submitted on: Tue Nov 15 09:58:22 2011 Category: None Priority: 5 - Normal Severity: 3 - Normal Status: None Privacy: Public Assigned to: None Originator Email: Open/Closed: Open Discussion Lock: Any Operating System: None ___ Details: This should be filed under bugs, not under support, but it seems that users of the core utilis are ot allowed to find bugs...convenient. No wonder quality metrics worthless. Not trying for a sensationalist summary, but you try coming up with a SHORT accurate summary for this. The problem is bad (in the sense of providing false assurance and not being reliable), but not as bad as the summary might sound... if you copy a bunch of files (or 1 file for that matter, but then it _might_ be more quickly noticed, and the copy is interrupted (most often control-C, cuz some param was forgotten, but could be other causes), a partial file with the current time stamp is left in the target location and the corrupt copy is not removed upon interruption, though it is marked as being current (w/current DT stamp). This creates a corrupt copy of the file in a collection of files that subsequent cp -u won't correct. This is a problem. As there is no indication in a collection of how many files are corrupted in this manner...and the sources may have long been deleted. If interrupted, the cp tool should remove any partials or ensure they are not created to begin with. Possible ways of addressing: A) catch INT ( catchable signals), and remove any files that are 'incomplete' Besides that, several other steps could be taken to provide increasing protections (some are orthogonal, some dependent): B) 1). open destination name for write (verifying accesses) w/ Exclusive Write; 2). open tmp file for actual cp operation. 3). use posix_fallocate (if available) to allocate sufficient space for the copy 4). do the copy. 5); rename tmp over original; (closing original before rename on systems that don't support separation of names and FD's (Win systems et al). C) reset DT stamps on newly opened files to '0' (~1969/70?)' in all non-auto-updated fields; -- then start copy... any future invokations of cp -u could examine the time stamps, and if the non-auto-updated fields appear to be zero; do the copy (and correct the time stamps) with 2 possible exception conditions being noted: (a) if the source file also has '0'd time fields, then check file sizes: if they match presume 'ok' (a statistical 'guess', -- possibly warned about with a -verbose option), if sizes don't match, presume not a correct update and do the copy. D) others? As this is, it creates a situation of cp being unreliable. Note, 'rsync' isn't a great substitute either, as I've ntoed that when I was updating files with 'rsync', (which is always slower on full file copies) with equivalent options, a later usage of cp -uav to copy the files recopied most of the files (all? not sure) that rsync had copied with -aUVHAX (supposedly the same info as cp -au from my understanding)). The same was not true for the reverse case (files cp'ed and updated by cp, were not updated by rsync, -- leading me to suspect rsync as not only being significantly slower, but not as thorough in copying over information). FWIW, I feel it important to file bugs about tools that are currently the best in their class...(and tend to devote my attentions to wanting to see them enhanced, even beyond their original scope at times); rsync used to have a very basic feature which put it above cp, ... it copied extended attrs and ACLS. Now that cp does that, and that cp was about 2-3x faster than rsync for full files... ___ Reply to this item at: http://savannah.gnu.org/support/?107875 ___ Message sent via/by Savannah http://savannah.gnu.org/
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Thanks for your thoughtful suggestions. I like many of the ideas and hope that somebody can find the time to code them up. Here are some more-detailed comments. On 11/15/11 11:07, Linda Walsh wrote: 3). use posix_fallocate (if available) to allocate sufficient space for the copy This seems like a good idea, independently of the other points. That is, if A and B are regular files, cp A B could use A's size to preallocate B's storage, and it could fail immediately (without trashing B!) if there's not enough storage. I like this. A) catch INT ( catchable signals), and remove any files that are 'incomplete' That might cause trouble in other cases. For example, cp A B where B already exists. In this case it's unwise to remove B if interrupted -- people won't expect that. And in general 'cp' has behaved the way that it does for decades, and we need to be careful about changing its default behavior in such a fairly-drastic way. But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. B) 1). open destination name for write (verifying accesses) w/ Exclusive Write; This could be another new option, though (as you write) it's orthogonal to the main point. I would suggest that this option be called --oflag=excl (by analogy with dd's oflag= option). We can add support for the other output flags while we're at it, e.g., --oflag=excl,append,noatime. 2). open tmp file for actual cp operation. 5); rename tmp over original; (closing original before rename on systems that don't support separation of names and FD's (Win systems et al). Yes, that could be another option. I see (2) and (5) as being the same feature. Perhaps --remove-destination=after? C) reset DT stamps on newly opened files to '0' (~1969/70?)' I dunno, this kind of time stamp munging sounds like it'd cause more trouble than it'd cure. It's more natural (and easier to debug failures) if the last-modified time of a file is the time that the file was last modified.
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
On 11/15/11 12:46, Linda A. Walsh wrote: Better than leaving *doo doo* in a file Sometimes, but not always. I can think of plausible cases where I'd rather have a partial copy than no copy at all. As an extreme example, if I'm doing 'cp /dev/tty A', I do not want A removed on interrupt even if A has already been truncated and overwritten, as A contains the only copy of the data that I just typed in by hand. But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. I think you misunderstood the problem. Perhaps I did. But could you explain the problem then? For example, how would the proposed cp --remove-destination=signal A B not address the problem?
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Paul Eggert wrote: A) catch INT ( catchable signals), and remove any files that are 'incomplete' That might cause trouble in other cases. For example, cp A B where B already exists. === Am **only** suggesting this where 'B' has already been opened and truncated by stuff being copied from 'A'... The point is to not leave a 'B' that is *indeterminate*. In this case it's unwise to remove B if interrupted -- people won't expect that. -- Better than leaving *doo doo* in a file where they expect some.thing valid. And in general 'cp' has behaved the way that it does for decades, and we need to be careful about changing its default behavior in such a fairly-drastic way. It's a bug...Fixing a bug isn't usually considered drastic. But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. I think you misunderstood the problem.
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Hmmm Dang strange processes on bugs... can't submit directly bug can just by emailing it to the email list? ... (bureaucracy!) Linda Walsh wrote: This should be filed under bugs, not under support, but it seems that users of the core utilis are ot allowed to find bugs...convenient.
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
On 11/15/2011 08:23 PM, Paul Eggert wrote: Thanks for your thoughtful suggestions. I like many of the ideas and hope that somebody can find the time to code them up. Here are some more-detailed comments. On 11/15/11 11:07, Linda Walsh wrote: 3). use posix_fallocate (if available) to allocate sufficient space for the copy This seems like a good idea, independently of the other points. That is, if A and B are regular files, cp A B could use A's size to preallocate B's storage, and it could fail immediately (without trashing B!) if there's not enough storage. I like this. I'll take a look at this at some stage. I was intending to do it right after the fiemap stuff as it was quite related, but that needed to be bypassed for normal copies. Anyway I'll bump fallocate up my priority list. A) catch INT ( catchable signals), and remove any files that are 'incomplete' That might cause trouble in other cases. For example, cp A B where B already exists. In this case it's unwise to remove B if interrupted -- people won't expect that. And in general 'cp' has behaved the way that it does for decades, and we need to be careful about changing its default behavior in such a fairly-drastic way. But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. B) 1). open destination name for write (verifying accesses) w/ Exclusive Write; This could be another new option, though (as you write) it's orthogonal to the main point. I would suggest that this option be called --oflag=excl (by analogy with dd's oflag= option). We can add support for the other output flags while we're at it, e.g., --oflag=excl,append,noatime. 2). open tmp file for actual cp operation. 5); rename tmp over original; (closing original before rename on systems that don't support separation of names and FD's (Win systems et al). Yes, that could be another option. I see (2) and (5) as being the same feature. Perhaps --remove-destination=after? There are lots of implementation issues with tmp files, many of which are noted here: http://www.pixelbeat.org/docs/unix_file_replacement.html C) reset DT stamps on newly opened files to '0' (~1969/70?)' I dunno, this kind of time stamp munging sounds like it'd cause more trouble than it'd cure. It's more natural (and easier to debug failures) if the last-modified time of a file is the time that the file was last modified. Not a bad idea and least invasive, but if the Ctrl-C happened between the creat() and utime() you'd get a newer zero length file. Then subsequent `cp -u` would have to treat zero length files specially. cheers, Pádraig.
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Paul Eggert wrote: On 11/15/11 12:46, Linda A. Walsh wrote: Better than leaving *doo doo* in a file Sometimes, but not always. I can think of plausible cases where I'd rather have a partial copy than no copy at all. As an extreme example, if I'm doing 'cp /dev/tty A', I do not want A removed on interrupt even if A has already been truncated and overwritten, as A contains the only copy of the data that I just typed in by hand. But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. I think you misunderstood the problem. Perhaps I did. But could you explain the problem then? For example, how would the proposed cp --remove-destination=signal A B not address the problem?
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
[Thought I send out rspns to this, but can't find it in my outgo, so...recomposing/sending, sorry for delay) On 11/15/11 12:46, Linda A. Walsh wrote: Better than leaving *doo doo* in a file Sometimes, but not always. I can think of plausible cases where I'd rather have a partial copy than no copy at all. As an extreme example, if I'm doing 'cp /dev/tty A', I do not want A removed on interrupt even if A has already been truncated and overwritten, as A contains the only copy of the data that I just typed in by hand. = A A A Um...yeah, you could try to apply the idea in general, but it might not have unforeseen side effects like you are demonstrating.A A Why don't we focus on the specific problem mentioned which was using it in the context of the -u flag, (and with -a/-r and/or a wildcard), where you expect it to update contents of 'Dst' with 'Src'. In that case, you get interrupt, and you end up with a truncated file in Dst, that has some (not even the DT of the src file, but the DT the file was opened (or more likely closed) DateTime that will guarantee, that a correct copy will never get updated over the now, destroyed, bogus copy.A Not only that, but weeks later, when you go though your backup dir, and wonder why some file 'x' is only 1/10th the size of the rest of the similar backups, your original can be very gone...(not that 1 of the multiple other backups might not sub-in, but that's not the point!)...A You don't want the partially copied update -- that has already destroyed an original, to now leave a turd in place so that no future cp -uav will correct the problem Though, (I'm sure you'd love to see this in 'cp', (*cough*), cp could check file sizes and see if the target is smaller and if so.. assume, if the DT's were equal that the file cp was interrupted...and finish it... Actually that might not be a bad idea... But we could add an option to 'cp' to have this behavior. Perhaps --remove-destination=signal? That is --remove-destination could have an optional list of names of places where the destination could be removed, where the default is not to remove it, and plain --remove-destination means --remove-destination=before. I think you misunderstood the problem. Perhaps I did. But could you explain the problem then? For example, how would the proposed cp --remove-destination=signal A B not address the problem? Well, if it were the default case, sure, but if default is to trash files, that's bad.
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
On 11/15/11 19:33, Linda A. Walsh wrote: Why don't we focus on the specific problem mentioned which was using it in the context of the -u flag, (and with -a/-r and/or a wildcard), where you expect it to update contents of 'Dst' with 'Src'. I'd rather not have a heuristic that says cp removes the destination when interrupted, if you use the -u flag with -a or -r or a wildcard. That'd be a hard rule to remember, and it's probably not the best rule anyway, for somebody's opinion of best. We need a simple rule that's easy to document and to remember, even if it isn't necessarily the best by some other measure. It'd be OK if cp -a implies the new --remove-destination=signal (or whatever) option. Then you could just use cp -a. cp could check file sizes and see if the target is smaller and if so.. assume, if the DT's were equal that the file cp was interrupted...and finish it... I'm still not convinced by the idea about trusting the time stamp on the destination. Every time 'cp' writes to its destination, it will update the destination's time stamp. Sure, 'cp' can use utime immediately afterwards to alter the time stamp, but there's still a window where the destination's time stamp will be 'now'. In general 'cp' must continue to work in that case -- so why should it bother to reset the destination's time stamp after every write?
bug#10055: [sr #107875] BUG cp -u corrupts 'fs'' information if interupted; can't recover on future invoctions
Linda A. Walsh wrote: Hmmm Dang strange processes on bugs... can't submit directly bug can just by emailing it to the email list? ... (bureaucracy!) Linda Walsh wrote: This should be filed under bugs, not under support, but it seems that users of the core utilis are ot allowed to find bugs...convenient. Thanks for the report. Please do not use savannah's bug or support interfaces for coreutils. We deliberately disabled the former. Now, when you send a message to the bug-coreutils mailing list, it creates a ticket for you. Yours is here: http://bugs.gnu.org/10055 Simply replying to any mail about it adds entries to its log.