bug in sha1sum

2008-05-12 Thread Dave Hines
I have just been looking at the man page for sha1sum, and saw the options:

   -b, --binary
  read in binary mode

   -t, --text
  read in text mode (default)

There is no further explanation of what these options mean. I assume
that binary mode means to read the file as it is, and report the
checksum, and that reading in text mode will perform some unspecified
transformation of the file before computing the sum.

This would seem to be a bug. If I type "sha1sum filename" I want
the checksum of the named file, not the checksum of some unspecified
transformation of the file.

To add this as an option may be acceptable, if the transformation is
specified, common and useful, but under no circumstances should giving
a checksum of something other than the file be the default action.

Regards -- Dave Hines.


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: bug in sha1sum

2008-05-12 Thread Philip Rowlands

On Mon, 12 May 2008, Dave Hines wrote:


I have just been looking at the man page for sha1sum, and saw the options:

  -b, --binary
 read in binary mode

  -t, --text
 read in text mode (default)

There is no further explanation of what these options mean.


Coreutils manpages tend to be short reference sheets listing the 
available options. Further documentation is provided in the "info" 
command, as should be mentioned as the end of each manpage.


 From the docs:
`-b'
`--binary'
 Treat each input file as binary, by reading it in binary mode and
 outputting a `*' flag.  This is the inverse of `--text'.  On
 systems like GNU that do not distinguish between binary and text
 files, this option merely flags each input file as binary: the MD5
 checksum is unaffected.  This option is the default on systems
 like MS-DOS that distinguish between binary and text files, except
 for reading standard input when standard input is a terminal.

`-t'
`--text'
 Treat each input file as text, by reading it in text mode and
 outputting a ` ' flag.  This is the inverse of `--binary'.  This
 option is the default on systems like GNU that do not distinguish
 between binary and text files.  On other systems, it is the
 default for reading standard input when standard input is a
 terminal.


Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: bug in sha1sum

2008-05-13 Thread Phillip Susi

Philip Rowlands wrote:
Coreutils manpages tend to be short reference sheets listing the 
available options. Further documentation is provided in the "info" 
command, as should be mentioned as the end of each manpage.


 From the docs:
`-b'
`--binary'
 Treat each input file as binary, by reading it in binary mode and
 outputting a `*' flag.  This is the inverse of `--text'.  On
 systems like GNU that do not distinguish between binary and text
 files, this option merely flags each input file as binary: the MD5
 checksum is unaffected.  This option is the default on systems
 like MS-DOS that distinguish between binary and text files, except
 for reading standard input when standard input is a terminal.

`-t'
`--text'
 Treat each input file as text, by reading it in text mode and
 outputting a ` ' flag.  This is the inverse of `--binary'.  This
 option is the default on systems like GNU that do not distinguish
 between binary and text files.  On other systems, it is the
 default for reading standard input when standard input is a
 terminal.


I have to agree with Dave on this then.  It is a severe bug that text 
mode is the default since this means that you will get different results 
for the checksum on MS-DOS/Windows than on a GNU/Linux system.





___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


Re: bug in sha1sum

2008-05-13 Thread Philip Rowlands

On Tue, 13 May 2008, Phillip Susi wrote:


Philip Rowlands wrote:

 Coreutils manpages tend to be short reference sheets listing the available
 options. Further documentation is provided in the "info" command, as should
 be mentioned as the end of each manpage.

 From the docs:
 `-b'
 `--binary'
  Treat each input file as binary, by reading it in binary mode and
  outputting a `*' flag.  This is the inverse of `--text'.  On
  systems like GNU that do not distinguish between binary and text
  files, this option merely flags each input file as binary: the MD5
  checksum is unaffected.  This option is the default on systems
  like MS-DOS that distinguish between binary and text files, except
  for reading standard input when standard input is a terminal.

 `-t'
 `--text'
  Treat each input file as text, by reading it in text mode and
  outputting a ` ' flag.  This is the inverse of `--binary'.  This
  option is the default on systems like GNU that do not distinguish
  between binary and text files.  On other systems, it is the
  default for reading standard input when standard input is a
  terminal.


I have to agree with Dave on this then.  It is a severe bug that text mode is 
the default since this means that you will get different results for the 
checksum on MS-DOS/Windows than on a GNU/Linux system.


Please re-read the option descriptions. On MS-DOS, the default is 
--binary unless reading from a terminal. You'd practically have to be 
typing text directly into sha1sum to provoke this behaviour; pipes and 
file redirection wouldn't do it. (This does make me wonder why the 
behaviour was provided in the first place, as typing into checksumming 
utilities seems unusual and error-prone.)



Cheers,
Phil


___
Bug-coreutils mailing list
Bug-coreutils@gnu.org
http://lists.gnu.org/mailman/listinfo/bug-coreutils


bug#8766: Bug in sha1sum?

2011-05-30 Thread Theo Band
Hi

I'm not sure, but I think I found a bug in sha1sum. It's easy to
reproduce with any file that contains a backslash (\) in the name:
echo test > test
$ sha1sum test
4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test
$ mv test 'test\test'
$ sha1sum 'test\test'
\4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test\\test

I expect the file sha1sum to be the same after renaming the file (a
backslash is prepended to the otherwise correct result).

sha1sum --version
sha1sum (GNU coreutils) 5.97
coreutils-5.97-23.el5_6.4

Kind regards,
Theo Band

-- 

GreenPeak Technologies


Phone :  +31 30 711 5622 Vinkenburgstraat 2a
E-mail:  theo.b...@greenpeak.com  3512AB Utrecht
Skype :  Theo.Band-greenpeak The Netherlands
http://www.greenpeak.com
  .-.   CONFIDENTIALITY: this message, including possible attachment(s),
  /v\   constitutes confidential GreenPeak information, intended for the
 // \\  use of above named addressee(s) only; any other use or
/(   )\ disclosure to anyone other than addressee(s), is prohibited.
 ^^-^^  Chamber of Commerce NL-3210.56.42.







bug#8766: Bug in sha1sum?

2011-05-30 Thread Alan Curry
Theo Band writes:
> 
> Hi
> 
> I'm not sure, but I think I found a bug in sha1sum. It's easy to
> reproduce with any file that contains a backslash (\) in the name:
> echo test > test
> $ sha1sum test
> 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test
> $ mv test 'test\test'
> $ sha1sum 'test\test'
> \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test\\test
> 
> I expect the file sha1sum to be the same after renaming the file (a
> backslash is prepended to the otherwise correct result).

This result violated my expectations too, but it turns out to be a documented
feature:

 For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating
  a binary or text input file, and the file name.  If FILE contains a
  backslash or newline, the line is started with a backslash, and each
  problematic character in the file name is escaped with a backslash,
  making the output unambiguous even in the presence of arbitrary file
  names.  If FILE is omitted or specified as `-', standard input is read.

(the sha*sum utilities all refer back to md5sum's description)

I better go fix all my scripts that rely on /^[0-9a-f]{32} /

-- 
Alan Curry





bug#8766: Bug in sha1sum?

2011-05-31 Thread Theo Band
On 05/31/2011 01:03 AM, Alan Curry wrote:
> Theo Band writes:
>> Hi
>>
>> I'm not sure, but I think I found a bug in sha1sum. It's easy to
>> reproduce with any file that contains a backslash (\) in the name:
>> echo test > test
>> $ sha1sum test
>> 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test
>> $ mv test 'test\test'
>> $ sha1sum 'test\test'
>> \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test\\test
>>
>> I expect the file sha1sum to be the same after renaming the file (a
>> backslash is prepended to the otherwise correct result).
> This result violated my expectations too, but it turns out to be a documented
> feature:
>
>  For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating
>   a binary or text input file, and the file name.  If FILE contains a
>   backslash or newline, the line is started with a backslash, and each
>   problematic character in the file name is escaped with a backslash,
>   making the output unambiguous even in the presence of arbitrary file
>   names.  If FILE is omitted or specified as `-', standard input is read.
>
> (the sha*sum utilities all refer back to md5sum's description)
>
> I better go fix all my scripts that rely on /^[0-9a-f]{32} /
>
man sha1sum, info sha1sum and sha1sum --help don't show me this info.
Instead I read this:

> The default mode is to print a line with checksum, a character
indicating type (`*' for binary, ` ' for text), and name for each FILE.

Would that mean the documentation in the coreutils-5.97-23.el5_6.4 is
outdated? If so, is there perhaps an undocumented option that does not
output this backslash?
I make an index of all my files to find duplicates. The backslash
doesn't help.

Theo






bug#8766: Bug in sha1sum?

2011-05-31 Thread Jim Meyering
Theo Band wrote:
> On 05/31/2011 01:03 AM, Alan Curry wrote:
>> Theo Band writes:
>>> Hi
>>>
>>> I'm not sure, but I think I found a bug in sha1sum. It's easy to
>>> reproduce with any file that contains a backslash (\) in the name:
>>> echo test > test
>>> $ sha1sum test
>>> 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test
>>> $ mv test 'test\test'
>>> $ sha1sum 'test\test'
>>> \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83  test\\test
>>>
>>> I expect the file sha1sum to be the same after renaming the file (a
>>> backslash is prepended to the otherwise correct result).
>> This result violated my expectations too, but it turns out to be a documented
>> feature:
>>
>>  For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating
>>   a binary or text input file, and the file name.  If FILE contains a
>>   backslash or newline, the line is started with a backslash, and each
>>   problematic character in the file name is escaped with a backslash,
>>   making the output unambiguous even in the presence of arbitrary file
>>   names.  If FILE is omitted or specified as `-', standard input is read.
>>
>> (the sha*sum utilities all refer back to md5sum's description)
>>
>> I better go fix all my scripts that rely on /^[0-9a-f]{32} /
>>
> man sha1sum, info sha1sum and sha1sum --help don't show me this info.
> Instead I read this:
>
>> The default mode is to print a line with checksum, a character
> indicating type (`*' for binary, ` ' for text), and name for each FILE.
>
> Would that mean the documentation in the coreutils-5.97-23.el5_6.4 is
> outdated? If so, is there perhaps an undocumented option that does not
> output this backslash?
> I make an index of all my files to find duplicates. The backslash
> doesn't help.

That feature is required to allow checking the hash of any file name
that contains newlines.  There is no option to disable it.
That omission in the documentation was corrected by COREUTILS-6_8-69-g826ff08.

If you're sure you have no newline-afflicted file name,
you can safely filter out the backslashes with this:

sed 's/^\\//;s//\\/g'

E.g.,

$ touch a\\b
$ md5sum a\\b | sed 's/^\\//;s//\\/g' | md5sum -c -
a\b: OK