A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=243 ====================================================================== Reported By: dwheeler Assigned To: ajosey ====================================================================== Project: 1003.1(2008)/Issue 7 Issue ID: 243 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: IDA User Reference: Section: find Page Number: 2740 Line Number: 89194 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2010-04-29 19:23 UTC Last Modified: 2023-01-09 16:20 UTC ====================================================================== Summary: Add -print0 to "find" ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- has duplicate 0000244 Add -0 to xargs has duplicate 0000245 Add -0 option to shell's "read" has duplicate 0000903 Please, add find -print0, xargs -0, rea... ======================================================================
---------------------------------------------------------------------- (0006100) geoffclare (manager) - 2023-01-09 16:20 https://austingroupbugs.net/view.php?id=243#c6100 ---------------------------------------------------------------------- Page and line numbers are for Issue 8 draft 2.1. On page 2763 line 91806 section find (OPERANDS), change:<blockquote><b>-print</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output.</blockquote></blockquote>to:<blockquote><b>-print</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a <newline>.</blockquote><b>-print0</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte.</blockquote></blockquote> On page 2765 line 91869 section find (STDOUT), change:<blockquote>current pathnames to be written</blockquote>to:<blockquote>current pathname to be written</blockquote> After page 2765 line 91871 section find (STDOUT), add:<blockquote>The <b>-print0</b> primary shall cause the current pathname to be written to standard output, followed by a null byte.</blockquote> On page 2766 line 91911 section find (EXAMPLES), after:<blockquote>They both write out the entire directory hierarchy from the current directory.</blockquote>append:<blockquote>With this output format, if any pathnames include <newline> characters, it is not possible to tell where each pathname begins and ends. This problem can be avoided by omitting such pathnames:<pre>LC_ALL=POSIX find . -name $'*\n*' -prune -o -print</pre>or by using a sentinel in the pathname that <i>find</i> would never otherwise produce, such as:<pre>find .//. -print</pre>or by using <b>-print0</b> instead of <b>-print</b> and processing the output with a utility that can accept null-terminated pathnames as input, such as <i>xargs</i> with the <b>-0</b> option or <i>read</i> with <b>-d</b> "", for example:<pre>find . -print0 | while IFS= read -rd "" file do # process "$file" done</pre>It should be noted that using <i>find</i> with <b>-print0</b> to pipe input to <i>xargs</i> <b>-0</b> is less safe than using <i>find</i> with <b>-exec</b> because if <i>find</i> <b>-print0</b> is terminated after it has written a partial pathname, the partial pathname will be processed as if it was a complete pathname.</blockquote> On page 2769 line 92033-92037 section find (RATIONALE), delete:<blockquote>Other implementations [...] it would now be reading.</blockquote> On page 3106 line 105084 section read (SYNOPSIS), change:<blockquote><pre>read [-r] <i>var</i>...</pre></blockquote>to:<blockquote><pre>read [-r] [-d <i>delim</i>] <i>var</i>...</pre></blockquote> On page 3106 line 105088 section read (DESCRIPTION), change:<blockquote>By default, unless the <b>-r</b> option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the <i>read</i> utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields.</blockquote>to:<blockquote>By default, unless the <b>-r</b> option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of either <newline> or the logical line delimiter specified with the <b>-d</b> <i>delim</i> option (if it is used and <i>delim</i> is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the <i>read</i> utility shall interpret this as line continuation. The <backslash> and the excepted character shall be removed before splitting the input into fields.</blockquote> On page 3106 line 105097 section read (DESCRIPTION), change:<blockquote>The terminating <newline> (if any) shall be removed from the input</blockquote>to:<blockquote>The terminating logical line delimiter (if any) shall be removed from the input</blockquote> After page 3106 line 105115 section read (DESCRIPTION), add: <blockquote>If end-of-file is detected before a terminating logical line delimiter is encountered, the variables specified by the <i>var</i> operands shall be set as described above and the exit status shall be 1. On page 3106 line 105118 section read (OPTIONS), change:<blockquote>The following option is supported:</blockquote>to:<blockquote>The following options shall be supported: <b>-d</b> <i>delim</i><blockquote>If <i>delim</i> consists of one single-byte character, that byte shall be used as the logical line delimiter. If <i>delim</i> is the null string, the logical line delimiter shall be the null byte. Otherwise, the behavior is unspecified.</blockquote></blockquote> On page 3107 line 105125 section read (STDIN), change:<blockquote>The standard input shall be a text file.</blockquote>to:<blockquote>If the <b>-d</b> <i>delim</i> option is not specified, or if it is specified and <i>delim</i> consists of one single-byte character, the standard input shall contain zero or more characters and shall not contain any null bytes. If the <b>-d</b> <i>delim</i> option is specified and <i>delim</i> is the null string, the standard input shall contain zero or more bytes (which need not form valid characters).</blockquote> After page 3108 line 105167 section read (APPLICATION USAGE), add two new paragraphs:<blockquote>The <b>-d</b> <i>delim</i> option enables reading up to an arbitrary single-byte delimiter. When <i>delim</i> is the null string, the delimiter is the null byte and this allows <i>read</i> to be used to process null-terminated lists of pathnames (as produced by the <i>find</i> <b>-print0</b> primary), with correct handling of pathnames that contain <newline> characters. Note that in order to specify the null string as the delimiter, <b>-d</b> and <i>delim</i> need to be specified as two separate arguments. Implementations differ in their handling of <backslash> for line continuation when <b>-d</b> <i>delim</i> is specified (and <i>delim</i> is not <newline>); some treat <backslash><i>delim</i> (or <backslash><NUL> if <i>delim</i> is the null string) as a line continuation, whereas others still treat <backslash><newline> as a line continuation. Consequently, portable applications need to specify <b>-r</b> whenever they specify <b>-d</b> <i>delim</i> (and <i>delim</i> is not <newline>). When the current locale is not the C or POSIX locale, pathnames can contain bytes that do not form part of a valid character, and therefore portable applications need to ensure that the current locale is the C or POSIX locale when using <i>read</i> with arbitrary pathnames as input. (If <i>IFS</i> is not set to the null string this applies even when using <b>-d</b> "", because the field splitting performed by <i>read</i> is a character-based operation.) When reading a pathname it is also inadvisable to use the contents of the first <i>var</i> operand, if non-empty, when the exit status of <i>read</i> is 1, as it is likely the result of the command used to generate the list of pathnames (for example <i>find</i> with <b>-print</b> or </b>-print0</b> being terminated after it has written a partial pathname, and consequently using it could result in the wrong pathname being processed.</blockquote> On page 3108 line 105186 section read (RATIONALE), change:<blockquote>Although the standard input is required to be a text file, and therefore will always end with a <newline> (unless it is an empty file), the processing of continuation lines when the <b>−r</b> option is not used can result in the input not ending with a <newline>. This occurs if the last line of the input file ends with a <backslash> <newline>. It is for this reason that ``if any’’ is used in ``The terminating <newline> (if any) shall be removed from the input’’ in the description. It is not a relaxation of the requirement for standard input to be a text file.</blockquote>to:<blockquote>Earlier versions of this standard required the standard input to be a text file, and therefore the results were undefined if the input was not empty and end-of-file was detected before a <newline> character was encountered. However, all of the most popular shell implementations have been found to have consistent behavior in this case, and so the behavior is now specified and the requirement for standard input to be a text file has been relaxed to allow non-empty input that does not end with a <newline>.</blockquote> On page 3365 line 114578 section xargs (SYNOPSIS), change:<blockquote>[-E eofstr]</blockquote>to:<blockquote>[-E eofstr|-0]</blockquote> On page 3365 line 114593 section xargs (DESCRIPTION), change:<blockquote>The application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. A string of zero or more non-double-quote ('"') characters and non-<newline> characters can be quoted by enclosing them in double-quotes. A string of zero or more non-<apostrophe> ('\'') characters and non-<newline> characters can be quoted by enclosing them in <apostrophe> characters. Any unquoted character can be escaped by preceding it with a <backslash>. The utility named by <i>utility</i> shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by <i>utility</i> attempts to read from its standard input.</blockquote>to:<blockquote>If the <b>-0</b> option is not specified, the application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: <ul> <li>A string of zero or more non-double-quote ('"') non-<newline> characters can be quoted by enclosing them in double-quotes.</li> <li>A string of zero or more non-<apostrophe> ('\'') non-<newline> characters can be quoted by enclosing them in <apostrophe> characters.</li> <li>Any unquoted character can be escaped by preceding it with a <backslash>.</li> </ul> If the <b>-0</b> option is specified, the application shall ensure that arguments in the standard input are separated by null bytes. The utility named by <i>utility</i> shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by <i>utility</i> attempts to read from its standard input.</blockquote> On page 3365 line 114612 section xargs (OPTIONS -E), change:<blockquote>If <b>-E</b> is not specified</blockquote>to:<blockquote>If neither <b>-E</b> nor <b>-0</b> is specified</blockquote> On page 3365 line 114617 section xargs (OPTIONS -I), change:<blockquote>Insert mode: <i>utility</i> is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.</blockquote>to:<blockquote>Insert mode: invoke <i>utility</i> for each argument from standard input. If <b>-0</b> is not specified, arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters, and any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.</blockquote> On page 3366 line 114625 section xargs (OPTIONS -L), change:<blockquote>The <i>utility</i> shall be executed for each non-empty <i>number</i> lines of arguments from standard input. The last invocation of <i>utility</i> shall be with fewer lines of arguments if fewer than <i>number</i> remain. A line is considered to end with the first <newline> unless the last character of the line is an unescaped <blank>; a trailing unescaped <blank> signals continuation to the next non-empty line, inclusive.</blockquote>to:<blockquote>Invoke <i>utility</i> for each set of <i>number</i> arguments from standard input. The last invocation of <i>utility</i> shall be with fewer arguments if fewer than <i>number</i> remain. If the <b>-0</b> option is not specified, each line in the standard input shall be treated as containing one argument except that empty lines shall be ignored and a line ending with a trailing unescaped <blank> shall signal continuation to the next non-empty line, inclusive; such continuation shall result in removal of all trailing unescaped <blank> characters and all <newline> characters that immediately follow them from the argument.</blockquote> On page 3366 line 114644 section xargs (OPTIONS -s), change:<blockquote>The total number of lines exceeds that specified by the <b>-L</b> option.</blockquote>to:<blockquote>The total number of arguments exceeds that specified by the <b>-L</b> option.</blockquote> After page 3366 line 114655 section xargs (OPTIONS), add:<blockquote>-0<blockquote>Use a null byte as the input argument delimiter and do not treat any other input bytes as special.</blockquote>If the mutually exclusive <b>-0</b> and <b>-E</b> <i>eofstr</i> options are both specified, the behavior is unspecified, except that if <i>eofstr</i> is the null string the behavior shall be the same as if <b>-0</b> was specified without <b>-E</b> <i>eofstr</i>.</blockquote> On page 3367 line 114664 section xargs (STDIN), change:<blockquote>The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>.</blockquote>to:<blockquote>If the <b>-0</b> option is not specified, the standard input shall be a text file and the results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>. If the <b>-0</b> option is specified, the standard input need not be a text file, and <i>xargs</i> shall process the input as bytes, not characters.</blockquote> On page 3368 line 114722 section xargs (APPLICATION USAGE), change:<blockquote>Note that since input is parsed as lines, ...</blockquote>to:<blockquote>Note that since (if <b>-0</b> is not specified) input is parsed as lines, ...</blockquote> On page 3368 line 114726 section xargs (APPLICATION USAGE), change:<blockquote>This can be solved by ...</blockquote>to:<blockquote>This can be solved by using the <b>-print0</b> primary of <i>find</i> together with the <i>xargs</i> <b>-0</b> option, or by ...</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2010-04-29 19:23 dwheeler New Issue 2010-04-29 19:23 dwheeler Status New => Under Review 2010-04-29 19:23 dwheeler Assigned To => ajosey 2010-04-29 19:23 dwheeler Name => David A. Wheeler 2010-04-29 19:23 dwheeler Organization => IDA 2010-04-29 19:23 dwheeler Section => find 2010-04-29 19:23 dwheeler Page Number => 2740 2010-04-29 19:23 dwheeler Line Number => 89194 2011-07-06 23:42 Don Cragun Relationship added related to 0000244 2011-07-06 23:42 Don Cragun Relationship added related to 0000245 2011-07-06 23:54 Don Cragun Note Added: 0000882 2011-11-16 18:22 dwheeler Note Added: 0001020 2015-03-12 16:15 Don Cragun Relationship added has duplicate 0000903 2022-12-08 15:39 geoffclare Note Added: 0006091 2022-12-08 15:40 geoffclare Note Edited: 0006091 2022-12-08 16:21 stephane Note Added: 0006092 2022-12-08 16:23 stephane Note Edited: 0006092 2022-12-08 16:32 stephane Note Added: 0006093 2022-12-08 17:02 stephane Note Edited: 0006093 2022-12-09 10:22 geoffclare Note Edited: 0006091 2022-12-09 10:30 geoffclare Note Edited: 0006091 2022-12-09 10:44 geoffclare Note Edited: 0006091 2022-12-09 10:50 geoffclare Note Added: 0006094 2022-12-09 11:21 geoffclare Note Edited: 0006091 2022-12-09 12:09 stephane Note Added: 0006095 2023-01-09 16:13 Don Cragun Relationship replaced has duplicate 0000244 2023-01-09 16:17 Don Cragun Relationship replaced has duplicate 0000245 2023-01-09 16:20 geoffclare Note Added: 0006100 ======================================================================