I slept on this and here is what I think

*they should have observed a unix-on/windows-user even just for a few minutes*
to see that --text default is wrong wrong wrong
*they should have made binary the default and '*' mark text mode exception case*
and then the minimal fraction of unix users on windows that generate \r\n
*and* don't want to sum the \r will have to explicitly demand --text

I slept on this and here is what I think should happen

(0) ast defaults to --binary for all methods and does O_TEXT only with
explicit --text

(1) anyone who has a file with ' ' or '*' as the first character
*and* calls ast md5sum will be sol

(2) petition gnu coreutils to accept
        
<checksum><one-ascii-space-char><name-not-starting-with-space-or-asterisk>
as being generated in --binary mode

(3) anyone who uses gnu md5sum to generate a checklist and uses something
other than ast or gnu md5sum --check will be sol

(4) change ast -t, --total => -T, --total and -T, --text => -t, --text
for gnu compatibility, and retain the ast --binary default *in all cases,
no isatty crap*

(5) change ast --header to include --text (but never --binary)

(6) change ast cksum --check to recognize either
        <checksum><space><name>
        <checksum><space><gnu-text-or-binary-indicator><name>
in _WINIX (uwin cygwin) make the distiction --text --binary
based on <gnu-text-or-binary-indicator>, otherwise ignore 
<gnu-text-or-binary-indicator>

if you notice, --method=md5 and --method=sha* are the only ones where ast 
prints *exactly*
        <checksum><space><name>
so it will be able to faithfully distinguish the ast vs gnu case for --check

I will consider this concession:

(0)(1)(3)(4)(5)(6)

(7) ast methods that currently list
        <checksum><space><name>
will change to
        <checksum><space><gnu-text-or-binary-indicator><name>

this would result in the '*' almost always being printed
ast will then handle old-ast and new-ast (gnu) formats seamlessly

can the unix user who never touches dos handle seeing the '*' indicator in 
md5sum output?

there are 2 comments below

this is another example where patches don't just exist in a vacuum
the universe of unintended consequences has to be extended to include unix on 
dos and unix on ebcdic

On Wed, 25 Sep 2013 07:21:24 +0200 Roland Mainz wrote:
> On Wed, Sep 25, 2013 at 7:03 AM, Glenn Fowler <[email protected]> wrote:
> > On Wed, 25 Sep 2013 00:39:18 +0200 Roland Mainz wrote:
> >> --089e01536feea09e6e04e728ceb8
> >> Content-Type: text/plain; charset=ISO-8859-1
> >> Attached (as "astksh20130913_md5sum_compat1.diff.txt") is a patch
> >> which fixes an incompatibility between AST md5sum(1)&&co. and GNU
> >> coreutils md5sum(1)&&co. fixes.
> >
> >> There are three major differences which caused hiccups for 3rd-party 
> >> scripts:
> >> - GNU coreutils md5sum/sha1sum/sha224sum/sha256sum default to text mode
> >> - GNU coreutils use a " *" before the file name to indicate binary
> >> mode and "  " to indicate text mode... the AST hash utilities used
> >> only a single blank " " instead.
> >> - "-t" means "text mode" for GNU coreutils while AST used this for "total"
> >
> >> * Notes:
> >> - GNU and AST *sum(1) utilities now have identical output and seem to
> >> be 100% compatible with each other
> >> - On platforms which do not implement |O_BINARY| and |O_TEXT| the
> >> change only affects the seperator ("  "/" *"(=new) vs. " "(=old)).
> >> Portable applications can use [[:space:]]+ in egrep(1) to make sure
> >> they can match the hashes against both the old and new versions of AST
> >> *sum(1)
> >> - The output *intentionally* changes only for utilities matching the
> >> shell pattern "*@(md5|sha@(1|224|256|384|512))sum". This is done to
> >> maintain compatibility for cksum(1) and sum(1)
> >> - AST does not have a sha224sum(1) utility (yet) ... need to talk to
> >>
> > I'm sorry but making --text the default on a windows systems simply does 
> > not make sense

> Well... blame Cygwin and "Windows Services For Unix" for that crazy
> idea. But I was looking at an older version of "md5sum" on Linux...
> but it turns out the situation is a bit more complex:
> -- snip --
>  157 void
>  158 usage (int status)
>  159 {
>  160   if (status != EXIT_SUCCESS)
>  161     emit_try_help ();
>  162   else
>  163     {
>  164       printf (_("\
>  165 Usage: %s [OPTION]... [FILE]...\n\
>  166 Print or check %s (%d-bit) checksums.\n\
>  167 With no FILE, or when FILE is -, read standard input.\n\
>  168 \n\
>  169 "),
>  170               program_name,
>  171               DIGEST_TYPE_STRING,
>  172               DIGEST_BITS);
>  173       if (O_BINARY)
>  174         fputs (_("\
>  175   -b, --binary         read in binary mode (default unless
> reading tty stdin)\n\
>  176 "), stdout);
>  177       else
>  178         fputs (_("\
>  179   -b, --binary         read in binary mode\n\
>  180 "), stdout);
>  181       printf (_("\
>  182   -c, --check          read %s sums from the FILEs and check them\n"),
>  183               DIGEST_TYPE_STRING);
>  184       fputs (_("\
>  185       --tag            create a BSD-style checksum\n\
>  186 "), stdout);
>  187       if (O_BINARY)
>  188         fputs (_("\
>  189   -t, --text           read in text mode (default if reading tty 
> stdin)\n\
>  190 "), stdout);
>  191       else
>  192         fputs (_("\
>  193   -t, --text           read in text mode (default)\n\
>  194 "), stdout);
>  195       fputs (_("\
>  196 \n\
>  197 The following three options are useful only when verifying checksums:\n\
>  198       --quiet          don't print OK for each successfully
> verified file\n\
>  199       --status         don't output anything, status code shows 
> success\n\
>  200   -w, --warn           warn about improperly formatted checksum lines\n\
>  201 \n\
>  202 "), stdout);
>  203       fputs (_("\
>  204       --strict         with --check, exit non-zero for any invalid 
> input\n\
>  205 "), stdout);
>  206       fputs (HELP_OPTION_DESCRIPTION, stdout);
>  207       fputs (VERSION_OPTION_DESCRIPTION, stdout);
>  208       printf (_("\
>  209 \n\
>  210 The sums are computed as described in %s.  When checking, the input\n\
>  211 should be a former output of this program.  The default mode is to 
> print\n\
>  212 a line with checksum, a character indicating input mode ('*' for 
> binary,\n\
>  213 space for text), and name for each FILE.\n"),
>  214               DIGEST_REFERENCE);
>  215       emit_ancillary_info ();
>  216     }
>  217
>  218   exit (status);
>  219 }
> -- snip --
> So basically the per-platform defaults are governed via the
> availability of |O_BINARY| at build time and whether you're reading
> from a tty stdin.

> > it renders tgz md5sum verification useless

> Yes and no. Yes, it's not a good idea... but what should we do for
> compatibility on Windows (Cygwin&&SFU) ? On Unix/Linux the
> --text/--binary options are no-ops but we need to be able to produce
> compatible output (e.g. the "  "/" *") and read it back (I forgot
> about that part in my patch).

> > where do you see anywhere "the md5sum --binary value for foo.tgz is 
> > hexhhexhexhex"
> >
> > my guess is that because of this weasling
> >          Note: There is no difference between binary and text mode option 
> > on GNU system.
> > most gnu weaned users call md2sum with neither --text nor --binary
> >
> > and this note lies anyway -- it *does* make a difference ' ' is printed for 
> > text,
> > '*' is printed for binary
> >
> > and on cygwin guess what -- md5sum defaults to binary

> Erm... see |usage()| function above... are you sure this is correct ?

as opposed to _UWIN, there is little gnu code untouched by _CYGWIN
my guess is there's a few of them in the code used to build on cygwin

> > if there's any change it will be for the md5sum-specific output to do the ' 
> > ' vs '*'
> > based on text vs binary so on all implementations '*' will be printed by 
> > default

> AFAIK that's not neccesary - see |usage()| above... there are limits
> to the insanity, governed by whether the platform has |O_BINARY| and
> whether the input is a tty or not.

> ... and please only change the output for utilities which match
> "*@(md5|sha@(1|224|256|384|512))sum" ... otherwise we end-up with a
> lot of trouble for scripts which depend on specific output for
> cksum(1) and sum(1) etc.

> > how many scripts will break with that default?

> A lot of scripts which do md5sum and sha256sum verification choke on
> the "  "/" *" vs. " " difference... we have that issue at least since
> 2007 when someone from Sun reported the issue in the Sun bugster bug
> database that libcmd "md5sum" can't replace GNU coreutils
> "md5sum"&&co. until this issue has been fixed.

do those script use --check to verify the sum?

_______________________________________________
ast-developers mailing list
[email protected]
http://lists.research.att.com/mailman/listinfo/ast-developers

Reply via email to