A NOTE has been added to this issue. 
====================================================================== 
https://www.austingroupbugs.net/view.php?id=1942 
====================================================================== 
Reported By:                dwheeler
Assigned To:                ajosey
====================================================================== 
Project:                    1003.1(2024)/Issue8
Issue ID:                   1942
Category:                   Shell and Utilities
Type:                       Enhancement Request
Severity:                   Objection
Priority:                   normal
Status:                     Under Review
Name:                       David A. Wheeler 
Organization:                
User Reference:             diff 
Section:                    diff 
Page Number:                1 
Line Number:                1 
Interp Status:              --- 
Final Accepted Text:         
====================================================================== 
Date Submitted:             2025-08-31 22:16 UTC
Last Modified:              2025-10-05 00:19 UTC
====================================================================== 
Summary:                    Add common options to diff
====================================================================== 

---------------------------------------------------------------------- 
 (0007283) dwheeler (reporter) - 2025-10-05 00:19
 https://www.austingroupbugs.net/view.php?id=1942#c7283 
---------------------------------------------------------------------- 
Here are the questions and my attempted answers, followed by partially-updated
proposal for diff. This is NOT a completed proposal; I just wanted to share my
progress so far. That said, any comments on this work-in-progress is
appreciated.

My baseline is the POSIX 2024 diff specification visible at:
https://pubs.opengroup.org/onlinepubs/9799919799/utilities/diff.html and the
proposals are here at https://www.austingroupbugs.net/view.php?id=1942

My proposal adds the following options to diff:
<pre>
-a (all files as text),
-B (blank lines ignored), 
-d (aggressive diffing),
-i (ignore case difference),
-I regexp (ignore these lines),
-N (missing is new),
-q (brief report if differ),
-s (report if same),
-T (tabs),
-w (whitespace),
-x pattern (exclude files),
-X file (exclude patternfile).
</pre>

I've learned that the -T option has some fiddly details to be addressed that are
not fully addressed here (yet).

Regarding the questions raised:

1. If standard input is an input (e.g., "-"), what filename is reported? Is it
"(standard input)" everywhere, or something else? Ideally there would be a
standard answer, and if not, it should be clearly noted that it may vary by
implementation. I'll need to see what implementations do in this case.

The current spec already says for -c or -C, "The pathname written for standard
input is unspecified." It also already says that for -u or -U, "Each <file>
field shall be the pathname of the corresponding file being compared, or the
single character '-' if standard input is being compared." I don't intend to
change that.

So I think the key issue is the new -q and -s options for "-". I determined that
"-" appears to be the answer in implementations.

It's also important to determine if locale matters. I also installed the French
(fr_FR.UTF-8) on my Debian system to further test the GNU and Busybox versions,
and I saw no differences.

Here is my justification.

I ran this to test -q on Linux (GNU), Busybox, and FreeBSD:

<pre>
seq 1 30 > seq1-30.txt
export LANG=fr_FR.UTF-8
export LC_ALL=fr_FR.UTF-8
seq 1 25 | diff -q - seq1-30.txt 2> /dev/null
</pre>

This produced exactly "Files - and seq1-30.txt differ"
on GNU diff, Busybox diff ("busybox diff..."), and FreeBSD diff -
even in the French locale.

To test -s, I ran:
<pre>
seq 1 30 | diff -s - seq1-30.txt 2> /dev/null
</pre>

This produced exactly "Files - and seq1-30.txt are identical"
on GNU diff, Busybox diff, and FreeBSD diff, even in the French locale.

I did find a Busybox bug. The command
"diff -s seq1-30.txt seq1-30.txt"
produced the expected results on GNU diff and FreeBSD, namely
"Files seq1-30.txt and seq1-30.txt are identical".
However, when the *same* filename is given to Busybox diff it produces NOTHING.
I have no objections to optimizing computations (like noticing
the same file was sent), but I think this is a clear bug in busybox.
"Tell me if the files are the same" should do exactly that.  Busybox has no
problems providing the expected output when given *different* files:

<pre>
cp seq1-30.txt seq1-30-dup.txt
$ busybox diff -s seq1-30.txt seq1-30-dup.txt
Files seq1-30.txt and seq1-30-dup.txt are identical
</pre>

I think we should standardize common and expected behavior, and report
a bug to Busybox. Normally you don't compare files to <i>exactly</i> themselves
anyway, so this is probably not a bug anyone has encountered in real life.
It would be <i>possible</i> to make this case implementation-defined,
but I think this is a bug and should be treated as such.

2.  What's the impact of -q and -s? Do they send to standard output? How do they
interact? Are they locale-dependent?

Let me answer the question as I originally understood it, and then reply to the
later explanation, in the hopes that I fully answer the question.

Yes, their outputs are sent to standard out (I checked by redirecting stderr
from multiple implementations).

No, they're not locale-dependent, the output is the same (presumably the
intention is to aid scripts). I tried this no GNU and FreeBSD with French locale
fr_FR.UTF-8.

The -q and -s flags interact in the "obvious" way required by their definitions.
Basically, if you use both, you always have an output when comparing 2 files,
and it indicates if they differ or are the same. I don't think that needs
special documentation. I think the problem is that my earlier description wasn't
clear enough, so I rewrote it hopefully be clear. Here's what they do:

<pre>
$ diff -qs seq1-30.txt seq1-30-dup.txt
Files seq1-30.txt and seq1-30-dup.txt are identical
$ diff -qs seq1-30.txt seq1-25.txt
Files seq1-30.txt and seq1-25.txt differ
</pre>

The "-s" only does anything different when two compared files are considered the
same. The files may have <i>some</i> differences, e.g., per the -B flag, but
what matters is whether or not they're considered different by diff.

I'll clarify that in the options text.

geoffclare later clarified: "Re 0001942:0007257 item 2, the desired action has
statements about what is written to standard output for -q and -s. The point I
made in the meeting was that those details should be in the STDOUT section, not
in the option descriptions. Also, specific English text should only be required
for the POSIX locale."

Sorry I misunderstood. Sure, I'll do that.

3. The "-i" (Ignore Case) should refer to Refer to XPD 4.2 - general concepts
(case-insensitive) (spelling?). Review the similar references to use the same
format.

Agreed. Done. I used grep as my template.

4.  Does "-w" ignore space and tab, or ignore whitespace? The initial draft was
inconsistent. There's other whitespace than space or tab. Is it locale-specific?

I tested on GNU, Busybox, and FreeBSD.

The "-w" consistently ignores *only* these whitespace characters:
U+9 (TAB), U+B (VT), U+C (FF), U+D (CR), U+20 (SPACE).
It does not ignore *any* of the other Unicode whitespace characters
(there are 25 in the current spec). I can't say I tried it on all locales,
but I tried it with French (fr_FR.UTF-8) and I
saw no evidence of locale dependence.

The "-B" only ignores *fully* blank lines by itself (portably).

The *combination* of -Bw has an annoying implementation difference:

- GNU diff: Treats lines with only "-w" whitespace as equivalent
  to an empty line (and thus ignored)
- FreeBSD diff: Does NOT treat lines with only "-w" whitespace
  as equivalent to an empty line (and thus they are still compared)

This difference only seems to happen when the options are *combined*,
so I documented combination as implemention-defined behavior with
the two options identified.

See this test script if you want to investigate this:
http://dwheeler.com/misc/diff-whitespace-test.sh

Here are a few additional notes.

The GNU documentation for -q (--brief) says,
"report only when files differ". That text is misleading,
ecause diff *normally*
only reports when files differ, yet the text implies otherwise.
The FreeBSD documentation is a little clearer: "Just print a line when the
files differ. Does not output a list of changes."

GNU and FreeBSD differ on what -T does when there's no change.
FreeBSD outputs space-tab (no change, then indent).
GNU outputs a tab (merging the 'no change' space and the indent).
To see this you need to use od -c or similar, since visually
they look the same.
I think that could be "implementation-defined" without serious issue,
it's a little messy but it's also reality, and it really isn't hard to
handle programmatically once you know it can happen.


* * *

<b>INCOMPLETE proposed changes to POSIX diff specification:</b>

Synopsis Section:

Change from:

<pre>diff [-c|-e|-f|-u|-C n|-U n] [-br] file1 file2</pre>

To:

<pre>
diff [-c|-e|-f|-u|-C n|-U n] [-abdiqsNrTw]
     [-I regexp] [-X file] [-x pattern] file1 file2
</pre>

Description Section:

Change "This list should be minimal." to
"This list should be reasonably minimal." because really, that's
all you can hope for.

Options Section:

Add the following options in alphabetical order in addition to
existing options:

<b>-a</b> Treat all files as text. Files that would
otherwise be identified as binary files shall be treated as text files.

<b>-B</b> Ignore lines that are blank. A blank line is a
line that is empty (contains no characters).

<b>-d</b> Use a more aggressive algorithm to minimize the
number of changes in the output. This may require significantly more
time and memory.

<b>-i</b> Compare lines in a case-insensitive manner (using LC_CTYPE);
see XBD 9.2 Regular Expression General Requirements.

<b>-I regexp</b> Ignore lines in both files that match the
Extended Regular Expression regexp. Multiple -I options may be
specified; lines matching any of the patterns shall be ignored.
Perform pattern matching in a case-insensitive manner; see XBD 9.2 Regular
Expression General Requirements.

<b>-N</b> If file1 or file2 is a directory and the other is
not, or if one file is missing during directory comparison, treat the
missing file as an empty file.

<b>-q</b> If files have a reportable difference, output only that they differ
instead of the details about their differences. By default all differences in
files are reported, but options can change this (see -i, -I, -B, and -w).

<b>-s</b> If files are considered the same (do not have a reportable
difference), report that they are the same instead of being silent.

<b>-T</b> Write a tab instead of a space before the line information
about differences (to make tab alignment consistent).

<b>-w</b> Ignore differences in sequences of equivalent whitespace
when comparing lines.
The following characters are treated as equivalent whitespace:
&lt;space&gt; (U+20), &lt;tab&gt; (U+9), vertical tab (U+B), form feed (U+C),
and carriage return (U+D). Any sequence of one or more of these
characters shall be considered equivalent to any other such sequence of
one or more such characters.
Other whitespace characters are not treated as equivalent.

<b>-x pattern</b> During recursive directory comparison,
exclude files and directories whose basename matches the shell pattern
specified by pattern. Multiple -x options may be specified. Pattern
matching follows the rules specified in XBD Pattern Matching Notation.

<b>-X file</b> During recursive directory comparison,
exclude files and directories whose basenames match any pattern in file.
Each line in file shall be treated as a shell pattern following the same
matching rules as -x.

Note: The interaction between -B and -w options when applied together
(-Bw) is implementation-defined. An implementation may or may not consider
lines containing only the whitespace characters of -w
as blank lines when both options are used together.

<b>In the later "STDOUT" section:</b>

BEFORE the subsection "Diff Default Output Format" add this text
and these two subsections:

By default "indent" is a space character; with -T it becomes a tab character.

<b>Diff brief considered different form</b> (added)

If the -q option is specified and the compared files are considered different
(have reportable differences), a diagnostic line is written to standard output
to note that there are differencs instead of describing those differences.

In the POSIX locale, the following format is written in this case:

"Files %s and %s differ\n", <filename1>, <filename2>

<b>Diff considered same form</b> (added)

If the -s option is specified and the compared files are considered the same
(have no reportable differences), then instead of no output, a diagnostic line
is written to standard output to the note that they are the same.

In the POSIX locale, the following format is written in this case:

"Files %s and %s are identical\n", <filename1>, <filename2>

<b>Diff Default Output Format</b>

Change:
"The default (without -e, -f, -c, -C, -u, or -U options) diff utility output
shall contain lines of these forms:"
to:
"The default (without -e, -f, -q, -c, -C, -u, or -U options) diff utility output
shall contain lines of these forms where there are reportable differences:"

...

TODO: the output section must be modified to handle -T. It shouldn't add much
length, but there will be many small changes.

Again, this is work in progress, not a complete proposal, but I wanted to share
what I've learned so far. 

Issue History 
Date Modified    Username       Field                    Change               
====================================================================== 
2025-08-31 22:16 dwheeler       New Issue                                    
2025-08-31 22:16 dwheeler       Status                   New => Under Review 
2025-08-31 22:16 dwheeler       Assigned To               => ajosey          
2025-09-01 15:08 dwheeler       Note Added: 0007248                          
2025-09-11 15:50 geoffclare     Project                  1003.1(2008)/Issue 7 =>
1003.1(2024)/Issue8
2025-09-12 14:43 dwheeler       Note Added: 0007257                          
2025-09-12 14:44 dwheeler       Note Edited: 0007257                         
2025-09-12 14:45 dwheeler       Note Edited: 0007257                         
2025-09-18 16:00 geoffclare     Note Added: 0007268                          
2025-09-18 16:01 geoffclare     Note Edited: 0007268                         
2025-10-05 00:19 dwheeler       Note Added: 0007283                          
======================================================================


  • [1003.1(20... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group
    • [1003... Austin Group Issue Tracker via austin-group-l at The Open Group

Reply via email to