A NOTE has been added to this issue. 
====================================================================== 
https://austingroupbugs.net/view.php?id=243 
====================================================================== 
Reported By:                dwheeler
Assigned To:                ajosey
====================================================================== 
Project:                    1003.1(2008)/Issue 7
Issue ID:                   243
Category:                   Shell and Utilities
Type:                       Enhancement Request
Severity:                   Objection
Priority:                   normal
Status:                     Under Review
Name:                       David A. Wheeler 
Organization:               IDA 
User Reference:              
Section:                    find 
Page Number:                2740 
Line Number:                89194 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2010-04-29 19:23 UTC
Last Modified:              2022-12-08 15:39 UTC
====================================================================== 
Summary:                    Add -print0 to "find"
======================================================================
Relationships       ID      Summary
----------------------------------------------------------------------
related to          0000244 Add -0 to xargs
related to          0000245 Add -0 option to shell's "read"
has duplicate       0000903 Please, add find -print0, xargs -0, rea...
====================================================================== 

---------------------------------------------------------------------- 
 (0006091) geoffclare (manager) - 2022-12-08 15:39
 https://austingroupbugs.net/view.php?id=243#c6091 
---------------------------------------------------------------------- 
It is looking like the group might decide to add find -print0 and related
xargs and read features (for reasons I won't go into here).

To minimise the delay to draft 3 should this be decided, here are some
suggested wording changes.

Page and line numbers are for Issue 8 draft 2.1.

On page 2763 line 91806 section find (OPERANDS),
change:<blockquote><b>-print</b><blockquote>The primary shall always
evaluate as true; it shall cause the current pathname to be written to
standard
output.</blockquote></blockquote>to:<blockquote><b>-print</b><blockquote>The
primary shall always evaluate as true; it shall cause the current pathname
to be written to standard output, followed by a
<newline>.</blockquote><b>-print0</b><blockquote>The primary shall always
evaluate as true; it shall cause the current pathname to be written to
standard output, followed by a null byte.</blockquote></blockquote>
On page 2765 line 91869 section find (STDOUT), change:<blockquote>current
pathnames to be written</blockquote>to:<blockquote>current pathname to be
written</blockquote>
After page 2765 line 91871 section find (STDOUT), add:<blockquote>The
<b>-print0</b> primary shall cause the current pathname to be written to
standard output, followed by a null byte.</blockquote>
On page 2766 line 91911 section find (EXAMPLES), after:<blockquote>They
both write out the entire directory hierarchy from the current
directory.</blockquote>append:<blockquote>With this output format, if any
pathnames include <newline> characters, it is not possible to tell where
each pathname begins and ends. This problem can be avoided by omitting such
pathnames:<pre>find . ! -name \*'$\n'\* -print</pre>or by using a sentinel
in the pathname that <i>find</i> would never otherwise produce, such
as:<pre>find .//. -print</pre>or by using <b>-print0</b> instead of
<b>-print</b> and processing the output with a utility that can accept
null-terminated pathnames as input, such as <i>xargs</i> with the <b>-0</b>
option or <i>read</i> with <b>-d</b> "", for example:<pre>find . -print0 |
while LC_ALL=POSIX read -d "" -r file
do
    # process "$file"
done</pre></blockquote>
On page 2769 line 92033-92037 section find (RATIONALE),
delete:<blockquote>Other implementations [...] it would now be
reading.</blockquote>
On page 3106 line 105084 section read (SYNOPSIS),
change:<blockquote><pre>read [-r]
<i>var</i>...</pre></blockquote>to:<blockquote><pre>read [-r] [-d
<i>delim</i>] <i>var</i>...</pre></blockquote>
On page 3106 line 105088 section read (DESCRIPTION), change:<blockquote>By
default, unless the <b>-r</b> option is specified, <backslash> shall act as
an escape character. An unescaped <backslash> shall preserve the literal
value of the following character, with the exception of a <newline>. If a
<newline> follows the <backslash>, the <i>read</i> utility shall interpret
this as line continuation. The <backslash> and <newline> shall be removed
before splitting the input into fields.</blockquote>to:<blockquote>By
default, unless the <b>-r</b> option is specified, <backslash> shall act as
an escape character. An unescaped <backslash> shall preserve the literal
value of the following character, with the exception of either <newline> or
the logical line delimiter specified with the <b>-d>/b> <i>delim</i> option
(if it is used and <i>delim</i> is not <newline>); it is unspecified which.
If this excepted character follows the <backslash>, the <i>read</i> utility
shall interpret this as line continuation. The <backslash> and the excepted
character shall be removed before splitting the input into
fields.</blockquote>
On page 3106 line 105097 section read (DESCRIPTION), change:<blockquote>The
terminating <newline> (if any) shall be removed from the
input</blockquote>to:<blockquote>The terminating logical line delimiter (if
any) shall be removed from the input</blockquote>
On page 3106 line 105118 section read (OPTIONS), change:<blockquote>The
following option is supported:</blockquote>to:<blockquote>The following
options shall be supported:

<b>-d</b> <i>delim</i><blockquote>If <i>delim</i> consists of one
single-byte character, that byte shall be used as the logical line
delimiter. If <i>delim</i> is the null string, the logical line delimiter
shall be the null byte. Otherwise, the behavior is
unspecified.</blockquote></blockquote>
On page 3107 line 105125 section read (STDIN), change:<blockquote>The
standard input shall be a text file.</blockquote>to:<blockquote>If the
<b>-d</b> <i>delim</i> option is not specified, or if it is specified and
<i>delim</i> is <newline>, the standard input shall be a text file, except
that it can contain lines longer than {LINE_MAX}.

If the <b>-d</b> <i>delim</i> option is specified and <i>delim</i> consists
of one single-byte character other than <newline>, the standard input shall
contain zero or more characters, shall not contain any null bytes, and (if
not empty) shall end with <i>delim</i>.

If the <b>-d</b> <i>delim</i> option is specified and <i>delim</i> is the
null string, the standard input shall contain zero or more characters and
(if not empty) shall end with a null byte.</blockquote>
After page 3108 line 105167 section read (APPLICATION USAGE), add two new
paragraphs:<blockquote>The <b>-d</b> <i>delim</i> option enables reading up
to an arbitrary single-byte delimiter. When <i>delim</i> is the null
string, the delimiter is the null byte and this allows <i>read</i> to be
used to process null-terminated lists of pathnames (as produced by the
<i>find</i> <b>-print0</b> primary), with correct handling of pathnames
that contain <newline> characters. Note that in order to specify the null
string as the delimiter, <b>-d</b> and <i>delim</i> need to be specified as
two separate arguments. Implementations differ in their handling of
<backslash> for line continuation when <b>-d</b> <i>delim</i> is specified
(and <i>delim</i> is not <newline>); some treat <backslash><i>delim</i> (or
<backslash><NUL> if <i>delim</i> is the null string) as a line
continuation, whereas others still treat <backslash><newline> as a line
continuation. Consequently, portable applications need to specify <b>-r</b>
whenever they specify <b>-d</b> <i>delim</i> (and <i>delim</i> is not
<newline>).

When the current locale is not the C or POSIX locale, pathnames can contain
bytes that do not form part of a valid character, and therefore portable
applications need to ensure that the current locale is the C or POSIX
locale when using <i>read</i> with arbitrary pathnames as input. (This
applies even when using <b>-d</b> "", because the field splitting performed
by <i>read</i> is a character-based operation.)</blockquote>
On page 3108 line 105186 section read (RATIONALE),
change:<blockquote>Although the standard input is required to be a text
file</blockquote>to:<blockquote>Although the standard input is required to
be a text file (without the {LINE_MAX} limit) when the logical line
delimiter is <newline></blockquote>
On page 3365 line 114578 section xargs (SYNOPSIS), change:<blockquote>[-E
eofstr]</blockquote>to:<blockquote>[-E eofstr|-0]</blockquote>
On page 3365 line 114593 section xargs (DESCRIPTION),
change:<blockquote>The application shall ensure that arguments in the
standard input are separated by unquoted <blank> characters, unescaped
<blank> characters, or <newline> characters. A string of zero or more
non-double-quote ('"') characters and non-<newline> characters can be
quoted by enclosing them in double-quotes. A string of zero or more
non-<apostrophe> ('\'') characters and non-<newline> characters can be
quoted by enclosing them in <apostrophe> characters. Any unquoted character
can be escaped by preceding it with a <backslash>. The utility named by
<i>utility</i> shall be executed one or more times until the end-of-file is
reached or the logical end-of file string is found. The results are
unspecified if the utility named by <i>utility</i> attempts to read from
its standard input.</blockquote>to:<blockquote>If the <b>-0</b> option is
not specified, the application shall ensure that arguments in the standard
input are separated by unquoted <blank> characters, unescaped <blank>
characters, or <newline> characters, and quoting characters shall be
interpreted as follows: <ul>
<li>A string of zero or more non-double-quote ('"') non-<newline>
characters can be quoted by enclosing them in double-quotes.</li>
<li>A string of zero or more non-<apostrophe> ('\'') non-<newline>
characters can be quoted by enclosing them in <apostrophe>
characters.</li>
<li>Any unquoted character can be escaped by preceding it with a
<backslash>.</li> </ul>
If the <b>-0</b> option is specified, the application shall ensure that
arguments in the standard input are separated by null bytes.

The utility named by <i>utility</i> shall be executed one or more times
until the end-of-file is reached or the logical end-of file string is
found. The results are unspecified if the utility named by <i>utility</i>
attempts to read from its standard input.</blockquote>
On page 3365 line 114612 section xargs (OPTIONS -E), change:<blockquote>If
<b>-E</b> is not specified</blockquote>to:<blockquote>If neither <b>-E</b>
nor <b>-0</b> is specified</blockquote>
On page 3365 line 114617 section xargs (OPTIONS -I),
change:<blockquote>Insert mode: <i>utility</i> is executed for each logical
line from standard input. Arguments in the standard input shall be
separated only by unescaped <newline> characters, not by <blank>
characters. Any unquoted unescaped <blank> characters at the beginning of
each line shall be ignored.</blockquote>to:<blockquote>Insert mode: invoke
<i>utility</i> for each argument from standard input. If <b>-0</b> is not
specified, arguments in the standard input shall be separated only by
unescaped <newline> characters, not by <blank> characters, and any unquoted
unescaped <blank> characters at the beginning of each line shall be
ignored.</blockquote>
On page 3366 line 114625 section xargs (OPTIONS -L), change:<blockquote>The
<i>utility</i> shall be executed for each non-empty <i>number</i> lines of
arguments from standard input. The last invocation of <i>utility</i> shall
be with fewer lines of arguments if fewer than <i>number</i> remain. A line
is considered to end with the first <newline> unless the last character of
the line is an unescaped <blank>; a trailing unescaped <blank> signals
continuation to the next non-empty line,
inclusive.</blockquote>to:<blockquote>Invoke <i>utility</i> for each set of
<i>number</i> arguments from standard input. The last invocation of
<i>utility</i> shall be with fewer arguments if fewer than <i>number</i>
remain. If the <b>-0</b> option is not specified, each line in the standard
input shall be treated as containing one argument except that empty lines
shall be ignored and a line ending with a trailing unescaped <blank> shall
signal continuation to the next non-empty line, inclusive; such
continuation shall result in removal of all trailing unescaped <blank>
characters and all <newline> characters that immediately follow them from
the argument.</blockquote>
On page 3366 line 114644 section xargs (OPTIONS -s), change:<blockquote>The
total number of lines exceeds that specified by the <b>-L</b>
option.</blockquote>to:<blockquote>The total number of arguments exceeds
that specified by the <b>-L</b> option.</blockquote>
After page 3366 line 114655 section xargs (OPTIONS),
add:<blockquote>-0<blockquote>Use a null byte as the input argument
delimiter and do not treat any other input bytes as special.</blockquote>If
the mutually exclusive <b>-0</b> and <b>-E</b> <i>eofstr</i> options are
both specified, the behavior is unspecified, except that if <i>eofstr</i>
is the null string the behavior shall be the same as if <b>-0</b> was
specified without <b>-E</b> <i>eofstr</i>.</blockquote>
On page 3367 line 114664 section xargs (STDIN), change:<blockquote>The
standard input shall be a text file. The results are unspecified if an
end-of-file condition is detected immediately following an escaped
<newline>.</blockquote>to:<blockquote>If the <b>-0</b> option is not
specified, the standard input shall be a text file and the results are
unspecified if an end-of-file condition is detected immediately following
an escaped <newline>.

If the <b>-0</b> option is specified, the standard input need not be a text
file, and <i>xargs</i> shall process the input as bytes, not
characters.</blockquote>
On page 3368 line 114722 section xargs (APPLICATION USAGE),
change:<blockquote>Note that since input is parsed as lines,
...</blockquote>to:<blockquote>Note that since (if <b>-0</b> is not
specified) input is parsed as lines, ...</blockquote>
On page 3368 line 114726 section xargs (APPLICATION USAGE),
change:<blockquote>This can be solved by
...</blockquote>to:<blockquote>This can be solved by using the
<b>-print0</b> primary of <i>find</i> together with the <i>xargs</i>
<b>-0</b> option, or by ...</blockquote> 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2010-04-29 19:23 dwheeler       New Issue                                    
2010-04-29 19:23 dwheeler       Status                   New => Under Review 
2010-04-29 19:23 dwheeler       Assigned To               => ajosey          
2010-04-29 19:23 dwheeler       Name                      => David A. Wheeler
2010-04-29 19:23 dwheeler       Organization              => IDA             
2010-04-29 19:23 dwheeler       Section                   => find            
2010-04-29 19:23 dwheeler       Page Number               => 2740            
2010-04-29 19:23 dwheeler       Line Number               => 89194           
2011-07-06 23:42 Don Cragun     Relationship added       related to 0000244  
2011-07-06 23:42 Don Cragun     Relationship added       related to 0000245  
2011-07-06 23:54 Don Cragun     Note Added: 0000882                          
2011-11-16 18:22 dwheeler       Note Added: 0001020                          
2015-03-12 16:15 Don Cragun     Relationship added       has duplicate 0000903
2022-12-08 15:39 geoffclare     Note Added: 0006091                          
======================================================================


  • [1003.1(2008... Austin Group Bug Tracker via austin-group-l at The Open Group
    • Re: Add... Stephane Chazelas via austin-group-l at The Open Group
      • Re:... Geoff Clare via austin-group-l at The Open Group
        • ... Steffen Nurpmeso via austin-group-l at The Open Group
        • ... Stephane Chazelas via austin-group-l at The Open Group
          • ... Geoff Clare via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group
    • [1003.1... Austin Group Bug Tracker via austin-group-l at The Open Group

Reply via email to