A NOTE has been added to this issue. ====================================================================== https://austingroupbugs.net/view.php?id=243 ====================================================================== Reported By: dwheeler Assigned To: ajosey ====================================================================== Project: 1003.1(2008)/Issue 7 Issue ID: 243 Category: Shell and Utilities Type: Enhancement Request Severity: Objection Priority: normal Status: Under Review Name: David A. Wheeler Organization: IDA User Reference: Section: find Page Number: 2740 Line Number: 89194 Interp Status: --- Final Accepted Text: ====================================================================== Date Submitted: 2010-04-29 19:23 UTC Last Modified: 2022-12-08 15:39 UTC ====================================================================== Summary: Add -print0 to "find" ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- related to 0000244 Add -0 to xargs related to 0000245 Add -0 option to shell's "read" has duplicate 0000903 Please, add find -print0, xargs -0, rea... ======================================================================
---------------------------------------------------------------------- (0006091) geoffclare (manager) - 2022-12-08 15:39 https://austingroupbugs.net/view.php?id=243#c6091 ---------------------------------------------------------------------- It is looking like the group might decide to add find -print0 and related xargs and read features (for reasons I won't go into here). To minimise the delay to draft 3 should this be decided, here are some suggested wording changes. Page and line numbers are for Issue 8 draft 2.1. On page 2763 line 91806 section find (OPERANDS), change:<blockquote><b>-print</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output.</blockquote></blockquote>to:<blockquote><b>-print</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a <newline>.</blockquote><b>-print0</b><blockquote>The primary shall always evaluate as true; it shall cause the current pathname to be written to standard output, followed by a null byte.</blockquote></blockquote> On page 2765 line 91869 section find (STDOUT), change:<blockquote>current pathnames to be written</blockquote>to:<blockquote>current pathname to be written</blockquote> After page 2765 line 91871 section find (STDOUT), add:<blockquote>The <b>-print0</b> primary shall cause the current pathname to be written to standard output, followed by a null byte.</blockquote> On page 2766 line 91911 section find (EXAMPLES), after:<blockquote>They both write out the entire directory hierarchy from the current directory.</blockquote>append:<blockquote>With this output format, if any pathnames include <newline> characters, it is not possible to tell where each pathname begins and ends. This problem can be avoided by omitting such pathnames:<pre>find . ! -name \*'$\n'\* -print</pre>or by using a sentinel in the pathname that <i>find</i> would never otherwise produce, such as:<pre>find .//. -print</pre>or by using <b>-print0</b> instead of <b>-print</b> and processing the output with a utility that can accept null-terminated pathnames as input, such as <i>xargs</i> with the <b>-0</b> option or <i>read</i> with <b>-d</b> "", for example:<pre>find . -print0 | while LC_ALL=POSIX read -d "" -r file do # process "$file" done</pre></blockquote> On page 2769 line 92033-92037 section find (RATIONALE), delete:<blockquote>Other implementations [...] it would now be reading.</blockquote> On page 3106 line 105084 section read (SYNOPSIS), change:<blockquote><pre>read [-r] <i>var</i>...</pre></blockquote>to:<blockquote><pre>read [-r] [-d <i>delim</i>] <i>var</i>...</pre></blockquote> On page 3106 line 105088 section read (DESCRIPTION), change:<blockquote>By default, unless the <b>-r</b> option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the <i>read</i> utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields.</blockquote>to:<blockquote>By default, unless the <b>-r</b> option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of either <newline> or the logical line delimiter specified with the <b>-d>/b> <i>delim</i> option (if it is used and <i>delim</i> is not <newline>); it is unspecified which. If this excepted character follows the <backslash>, the <i>read</i> utility shall interpret this as line continuation. The <backslash> and the excepted character shall be removed before splitting the input into fields.</blockquote> On page 3106 line 105097 section read (DESCRIPTION), change:<blockquote>The terminating <newline> (if any) shall be removed from the input</blockquote>to:<blockquote>The terminating logical line delimiter (if any) shall be removed from the input</blockquote> On page 3106 line 105118 section read (OPTIONS), change:<blockquote>The following option is supported:</blockquote>to:<blockquote>The following options shall be supported: <b>-d</b> <i>delim</i><blockquote>If <i>delim</i> consists of one single-byte character, that byte shall be used as the logical line delimiter. If <i>delim</i> is the null string, the logical line delimiter shall be the null byte. Otherwise, the behavior is unspecified.</blockquote></blockquote> On page 3107 line 105125 section read (STDIN), change:<blockquote>The standard input shall be a text file.</blockquote>to:<blockquote>If the <b>-d</b> <i>delim</i> option is not specified, or if it is specified and <i>delim</i> is <newline>, the standard input shall be a text file, except that it can contain lines longer than {LINE_MAX}. If the <b>-d</b> <i>delim</i> option is specified and <i>delim</i> consists of one single-byte character other than <newline>, the standard input shall contain zero or more characters, shall not contain any null bytes, and (if not empty) shall end with <i>delim</i>. If the <b>-d</b> <i>delim</i> option is specified and <i>delim</i> is the null string, the standard input shall contain zero or more characters and (if not empty) shall end with a null byte.</blockquote> After page 3108 line 105167 section read (APPLICATION USAGE), add two new paragraphs:<blockquote>The <b>-d</b> <i>delim</i> option enables reading up to an arbitrary single-byte delimiter. When <i>delim</i> is the null string, the delimiter is the null byte and this allows <i>read</i> to be used to process null-terminated lists of pathnames (as produced by the <i>find</i> <b>-print0</b> primary), with correct handling of pathnames that contain <newline> characters. Note that in order to specify the null string as the delimiter, <b>-d</b> and <i>delim</i> need to be specified as two separate arguments. Implementations differ in their handling of <backslash> for line continuation when <b>-d</b> <i>delim</i> is specified (and <i>delim</i> is not <newline>); some treat <backslash><i>delim</i> (or <backslash><NUL> if <i>delim</i> is the null string) as a line continuation, whereas others still treat <backslash><newline> as a line continuation. Consequently, portable applications need to specify <b>-r</b> whenever they specify <b>-d</b> <i>delim</i> (and <i>delim</i> is not <newline>). When the current locale is not the C or POSIX locale, pathnames can contain bytes that do not form part of a valid character, and therefore portable applications need to ensure that the current locale is the C or POSIX locale when using <i>read</i> with arbitrary pathnames as input. (This applies even when using <b>-d</b> "", because the field splitting performed by <i>read</i> is a character-based operation.)</blockquote> On page 3108 line 105186 section read (RATIONALE), change:<blockquote>Although the standard input is required to be a text file</blockquote>to:<blockquote>Although the standard input is required to be a text file (without the {LINE_MAX} limit) when the logical line delimiter is <newline></blockquote> On page 3365 line 114578 section xargs (SYNOPSIS), change:<blockquote>[-E eofstr]</blockquote>to:<blockquote>[-E eofstr|-0]</blockquote> On page 3365 line 114593 section xargs (DESCRIPTION), change:<blockquote>The application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters. A string of zero or more non-double-quote ('"') characters and non-<newline> characters can be quoted by enclosing them in double-quotes. A string of zero or more non-<apostrophe> ('\'') characters and non-<newline> characters can be quoted by enclosing them in <apostrophe> characters. Any unquoted character can be escaped by preceding it with a <backslash>. The utility named by <i>utility</i> shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by <i>utility</i> attempts to read from its standard input.</blockquote>to:<blockquote>If the <b>-0</b> option is not specified, the application shall ensure that arguments in the standard input are separated by unquoted <blank> characters, unescaped <blank> characters, or <newline> characters, and quoting characters shall be interpreted as follows: <ul> <li>A string of zero or more non-double-quote ('"') non-<newline> characters can be quoted by enclosing them in double-quotes.</li> <li>A string of zero or more non-<apostrophe> ('\'') non-<newline> characters can be quoted by enclosing them in <apostrophe> characters.</li> <li>Any unquoted character can be escaped by preceding it with a <backslash>.</li> </ul> If the <b>-0</b> option is specified, the application shall ensure that arguments in the standard input are separated by null bytes. The utility named by <i>utility</i> shall be executed one or more times until the end-of-file is reached or the logical end-of file string is found. The results are unspecified if the utility named by <i>utility</i> attempts to read from its standard input.</blockquote> On page 3365 line 114612 section xargs (OPTIONS -E), change:<blockquote>If <b>-E</b> is not specified</blockquote>to:<blockquote>If neither <b>-E</b> nor <b>-0</b> is specified</blockquote> On page 3365 line 114617 section xargs (OPTIONS -I), change:<blockquote>Insert mode: <i>utility</i> is executed for each logical line from standard input. Arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters. Any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.</blockquote>to:<blockquote>Insert mode: invoke <i>utility</i> for each argument from standard input. If <b>-0</b> is not specified, arguments in the standard input shall be separated only by unescaped <newline> characters, not by <blank> characters, and any unquoted unescaped <blank> characters at the beginning of each line shall be ignored.</blockquote> On page 3366 line 114625 section xargs (OPTIONS -L), change:<blockquote>The <i>utility</i> shall be executed for each non-empty <i>number</i> lines of arguments from standard input. The last invocation of <i>utility</i> shall be with fewer lines of arguments if fewer than <i>number</i> remain. A line is considered to end with the first <newline> unless the last character of the line is an unescaped <blank>; a trailing unescaped <blank> signals continuation to the next non-empty line, inclusive.</blockquote>to:<blockquote>Invoke <i>utility</i> for each set of <i>number</i> arguments from standard input. The last invocation of <i>utility</i> shall be with fewer arguments if fewer than <i>number</i> remain. If the <b>-0</b> option is not specified, each line in the standard input shall be treated as containing one argument except that empty lines shall be ignored and a line ending with a trailing unescaped <blank> shall signal continuation to the next non-empty line, inclusive; such continuation shall result in removal of all trailing unescaped <blank> characters and all <newline> characters that immediately follow them from the argument.</blockquote> On page 3366 line 114644 section xargs (OPTIONS -s), change:<blockquote>The total number of lines exceeds that specified by the <b>-L</b> option.</blockquote>to:<blockquote>The total number of arguments exceeds that specified by the <b>-L</b> option.</blockquote> After page 3366 line 114655 section xargs (OPTIONS), add:<blockquote>-0<blockquote>Use a null byte as the input argument delimiter and do not treat any other input bytes as special.</blockquote>If the mutually exclusive <b>-0</b> and <b>-E</b> <i>eofstr</i> options are both specified, the behavior is unspecified, except that if <i>eofstr</i> is the null string the behavior shall be the same as if <b>-0</b> was specified without <b>-E</b> <i>eofstr</i>.</blockquote> On page 3367 line 114664 section xargs (STDIN), change:<blockquote>The standard input shall be a text file. The results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>.</blockquote>to:<blockquote>If the <b>-0</b> option is not specified, the standard input shall be a text file and the results are unspecified if an end-of-file condition is detected immediately following an escaped <newline>. If the <b>-0</b> option is specified, the standard input need not be a text file, and <i>xargs</i> shall process the input as bytes, not characters.</blockquote> On page 3368 line 114722 section xargs (APPLICATION USAGE), change:<blockquote>Note that since input is parsed as lines, ...</blockquote>to:<blockquote>Note that since (if <b>-0</b> is not specified) input is parsed as lines, ...</blockquote> On page 3368 line 114726 section xargs (APPLICATION USAGE), change:<blockquote>This can be solved by ...</blockquote>to:<blockquote>This can be solved by using the <b>-print0</b> primary of <i>find</i> together with the <i>xargs</i> <b>-0</b> option, or by ...</blockquote> Issue History Date Modified Username Field Change ====================================================================== 2010-04-29 19:23 dwheeler New Issue 2010-04-29 19:23 dwheeler Status New => Under Review 2010-04-29 19:23 dwheeler Assigned To => ajosey 2010-04-29 19:23 dwheeler Name => David A. Wheeler 2010-04-29 19:23 dwheeler Organization => IDA 2010-04-29 19:23 dwheeler Section => find 2010-04-29 19:23 dwheeler Page Number => 2740 2010-04-29 19:23 dwheeler Line Number => 89194 2011-07-06 23:42 Don Cragun Relationship added related to 0000244 2011-07-06 23:42 Don Cragun Relationship added related to 0000245 2011-07-06 23:54 Don Cragun Note Added: 0000882 2011-11-16 18:22 dwheeler Note Added: 0001020 2015-03-12 16:15 Don Cragun Relationship added has duplicate 0000903 2022-12-08 15:39 geoffclare Note Added: 0006091 ======================================================================