A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=1872 ====================================================================== Reported By: steffen Assigned To: ====================================================================== Project: 1003.1(2024)/Issue8 Issue ID: 1872 Category: Shell and Utilities Type: Clarification Requested Severity: Editorial Priority: normal Status: New Name: steffen Organization: User Reference: Section: find Page Number: 2946 Line Number: 98444 ff. Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2024-11-07 21:34 UTC Last Modified: 2024-11-08 08:31 UTC ====================================================================== Summary: find: clarify "less safe" statement ======================================================================
---------------------------------------------------------------------- (0006953) stephane (reporter) - 2024-11-08 08:31 https://austingroupbugs.net/view.php?id=1872#c6953 ---------------------------------------------------------------------- For the record, I was the one bringing that issue up to POSIX in https://austingroupbugs.net/view.php?id=243#c6093 which resulted in that text, I also mention it at https://unix.stackexchange.com/questions/321697/why-is-looping-over-finds-output-bad-practice/321757#321757 (Interrupted output section) with a real life example. See also https://unix.stackexchange.com/questions/730873/find-print0-xargs-0-cmd-vs-find-exec-cmd/730874#730874 for some historical background on -exec {} + vs xargs -0. I do vaguely remember mentioning it to the GNU findutils maintainers, but I may have imagined it and in any case don't remember the outcome. Now I think it's too late for POSIX to mandate implementations discard non-delimited records as that behaviour is being relied upon. A new option would have to be introduced. Could be a -D pendant to -d where -D '\0' requires the delimiter when -d '\0' doesn't (-d currently badly missing from POSIX xargs for it's ability to deal with lines with -d '\n'). Or a -F extra flag to require Full records. Generally, GNU utilities do process non-delimited records. For instance all their text utilities allow non-delimited lines on input (and some add the delimiter back on output, some don't). Same applies for those that take a -z or -0 to deal with NUL-delimited records instead of lines. In POSIX, behaviour is unspecified for text utilities if the input doesn't end in newline with the exception of awk, so in any case, regardless of the behaviour of xargs in this instance, if one does something like: find ... -print0 | awk -v RS='\0' -v ORS='\0' '{print; print $0".back"}' | xargs -r0n2 cp -p -- (not that POSIX allows NUL field separator for awk yet), the fact that find was interrupted if it was is lost when that reaches xargs. The NUL-delimited mode of GNU text utilities is often used to process text files as a whole as a poor man's "slurp mode". As in: sed -z 's/.../.../' file.txt To have substitution possibly spanning several lines. xargs -r0a file.txt printf %b To expand echo-style escape sequences in the contents of file.txt. Mandating the delimiter would break those. In any case, that's not something for POSIX to address. For now, it allows implementations not to discard non-delimited records and warns about the safety implication of doing so. It's up to implementations to decide what they want to do now: ignore the problem which in practice rarely happens and falls in the category of the rare pathological cases, like memory/fd exhaustion or random bit flip on solar flares where all bets are off anyway or address it either by breaking backward compatibility or add extra API. If different implementors agree on that new API, then that can be specified in POSIX. Issue History Date Modified Username Field Change ====================================================================== 2024-11-07 21:34 steffen New Issue 2024-11-07 21:34 steffen Name => steffen 2024-11-07 21:34 steffen Section => find 2024-11-07 21:34 steffen Page Number => 2946 2024-11-07 21:34 steffen Line Number => 98444 ff. 2024-11-07 21:38 steffen Note Added: 0006951 2024-11-08 01:40 steffen Note Added: 0006952 2024-11-08 01:42 steffen Note Edited: 0006952 2024-11-08 08:31 stephane Note Added: 0006953 ======================================================================
