bug in sha1sum
I have just been looking at the man page for sha1sum, and saw the options: -b, --binary read in binary mode -t, --text read in text mode (default) There is no further explanation of what these options mean. I assume that binary mode means to read the file as it is, and report the checksum, and that reading in text mode will perform some unspecified transformation of the file before computing the sum. This would seem to be a bug. If I type "sha1sum filename" I want the checksum of the named file, not the checksum of some unspecified transformation of the file. To add this as an option may be acceptable, if the transformation is specified, common and useful, but under no circumstances should giving a checksum of something other than the file be the default action. Regards -- Dave Hines. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: bug in sha1sum
On Mon, 12 May 2008, Dave Hines wrote: I have just been looking at the man page for sha1sum, and saw the options: -b, --binary read in binary mode -t, --text read in text mode (default) There is no further explanation of what these options mean. Coreutils manpages tend to be short reference sheets listing the available options. Further documentation is provided in the "info" command, as should be mentioned as the end of each manpage. From the docs: `-b' `--binary' Treat each input file as binary, by reading it in binary mode and outputting a `*' flag. This is the inverse of `--text'. On systems like GNU that do not distinguish between binary and text files, this option merely flags each input file as binary: the MD5 checksum is unaffected. This option is the default on systems like MS-DOS that distinguish between binary and text files, except for reading standard input when standard input is a terminal. `-t' `--text' Treat each input file as text, by reading it in text mode and outputting a ` ' flag. This is the inverse of `--binary'. This option is the default on systems like GNU that do not distinguish between binary and text files. On other systems, it is the default for reading standard input when standard input is a terminal. Cheers, Phil ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: bug in sha1sum
Philip Rowlands wrote: Coreutils manpages tend to be short reference sheets listing the available options. Further documentation is provided in the "info" command, as should be mentioned as the end of each manpage. From the docs: `-b' `--binary' Treat each input file as binary, by reading it in binary mode and outputting a `*' flag. This is the inverse of `--text'. On systems like GNU that do not distinguish between binary and text files, this option merely flags each input file as binary: the MD5 checksum is unaffected. This option is the default on systems like MS-DOS that distinguish between binary and text files, except for reading standard input when standard input is a terminal. `-t' `--text' Treat each input file as text, by reading it in text mode and outputting a ` ' flag. This is the inverse of `--binary'. This option is the default on systems like GNU that do not distinguish between binary and text files. On other systems, it is the default for reading standard input when standard input is a terminal. I have to agree with Dave on this then. It is a severe bug that text mode is the default since this means that you will get different results for the checksum on MS-DOS/Windows than on a GNU/Linux system. ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
Re: bug in sha1sum
On Tue, 13 May 2008, Phillip Susi wrote: Philip Rowlands wrote: Coreutils manpages tend to be short reference sheets listing the available options. Further documentation is provided in the "info" command, as should be mentioned as the end of each manpage. From the docs: `-b' `--binary' Treat each input file as binary, by reading it in binary mode and outputting a `*' flag. This is the inverse of `--text'. On systems like GNU that do not distinguish between binary and text files, this option merely flags each input file as binary: the MD5 checksum is unaffected. This option is the default on systems like MS-DOS that distinguish between binary and text files, except for reading standard input when standard input is a terminal. `-t' `--text' Treat each input file as text, by reading it in text mode and outputting a ` ' flag. This is the inverse of `--binary'. This option is the default on systems like GNU that do not distinguish between binary and text files. On other systems, it is the default for reading standard input when standard input is a terminal. I have to agree with Dave on this then. It is a severe bug that text mode is the default since this means that you will get different results for the checksum on MS-DOS/Windows than on a GNU/Linux system. Please re-read the option descriptions. On MS-DOS, the default is --binary unless reading from a terminal. You'd practically have to be typing text directly into sha1sum to provoke this behaviour; pipes and file redirection wouldn't do it. (This does make me wonder why the behaviour was provided in the first place, as typing into checksumming utilities seems unusual and error-prone.) Cheers, Phil ___ Bug-coreutils mailing list Bug-coreutils@gnu.org http://lists.gnu.org/mailman/listinfo/bug-coreutils
bug#8766: Bug in sha1sum?
Hi I'm not sure, but I think I found a bug in sha1sum. It's easy to reproduce with any file that contains a backslash (\) in the name: echo test > test $ sha1sum test 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test $ mv test 'test\test' $ sha1sum 'test\test' \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test\\test I expect the file sha1sum to be the same after renaming the file (a backslash is prepended to the otherwise correct result). sha1sum --version sha1sum (GNU coreutils) 5.97 coreutils-5.97-23.el5_6.4 Kind regards, Theo Band -- GreenPeak Technologies Phone : +31 30 711 5622 Vinkenburgstraat 2a E-mail: theo.b...@greenpeak.com 3512AB Utrecht Skype : Theo.Band-greenpeak The Netherlands http://www.greenpeak.com .-. CONFIDENTIALITY: this message, including possible attachment(s), /v\ constitutes confidential GreenPeak information, intended for the // \\ use of above named addressee(s) only; any other use or /( )\ disclosure to anyone other than addressee(s), is prohibited. ^^-^^ Chamber of Commerce NL-3210.56.42.
bug#8766: Bug in sha1sum?
Theo Band writes: > > Hi > > I'm not sure, but I think I found a bug in sha1sum. It's easy to > reproduce with any file that contains a backslash (\) in the name: > echo test > test > $ sha1sum test > 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test > $ mv test 'test\test' > $ sha1sum 'test\test' > \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test\\test > > I expect the file sha1sum to be the same after renaming the file (a > backslash is prepended to the otherwise correct result). This result violated my expectations too, but it turns out to be a documented feature: For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating a binary or text input file, and the file name. If FILE contains a backslash or newline, the line is started with a backslash, and each problematic character in the file name is escaped with a backslash, making the output unambiguous even in the presence of arbitrary file names. If FILE is omitted or specified as `-', standard input is read. (the sha*sum utilities all refer back to md5sum's description) I better go fix all my scripts that rely on /^[0-9a-f]{32} / -- Alan Curry
bug#8766: Bug in sha1sum?
On 05/31/2011 01:03 AM, Alan Curry wrote: > Theo Band writes: >> Hi >> >> I'm not sure, but I think I found a bug in sha1sum. It's easy to >> reproduce with any file that contains a backslash (\) in the name: >> echo test > test >> $ sha1sum test >> 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test >> $ mv test 'test\test' >> $ sha1sum 'test\test' >> \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test\\test >> >> I expect the file sha1sum to be the same after renaming the file (a >> backslash is prepended to the otherwise correct result). > This result violated my expectations too, but it turns out to be a documented > feature: > > For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating > a binary or text input file, and the file name. If FILE contains a > backslash or newline, the line is started with a backslash, and each > problematic character in the file name is escaped with a backslash, > making the output unambiguous even in the presence of arbitrary file > names. If FILE is omitted or specified as `-', standard input is read. > > (the sha*sum utilities all refer back to md5sum's description) > > I better go fix all my scripts that rely on /^[0-9a-f]{32} / > man sha1sum, info sha1sum and sha1sum --help don't show me this info. Instead I read this: > The default mode is to print a line with checksum, a character indicating type (`*' for binary, ` ' for text), and name for each FILE. Would that mean the documentation in the coreutils-5.97-23.el5_6.4 is outdated? If so, is there perhaps an undocumented option that does not output this backslash? I make an index of all my files to find duplicates. The backslash doesn't help. Theo
bug#8766: Bug in sha1sum?
Theo Band wrote: > On 05/31/2011 01:03 AM, Alan Curry wrote: >> Theo Band writes: >>> Hi >>> >>> I'm not sure, but I think I found a bug in sha1sum. It's easy to >>> reproduce with any file that contains a backslash (\) in the name: >>> echo test > test >>> $ sha1sum test >>> 4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test >>> $ mv test 'test\test' >>> $ sha1sum 'test\test' >>> \4e1243bd22c66e76c2ba9eddc1f91394e57f9f83 test\\test >>> >>> I expect the file sha1sum to be the same after renaming the file (a >>> backslash is prepended to the otherwise correct result). >> This result violated my expectations too, but it turns out to be a documented >> feature: >> >> For each FILE, `md5sum' outputs the MD5 checksum, a flag indicating >> a binary or text input file, and the file name. If FILE contains a >> backslash or newline, the line is started with a backslash, and each >> problematic character in the file name is escaped with a backslash, >> making the output unambiguous even in the presence of arbitrary file >> names. If FILE is omitted or specified as `-', standard input is read. >> >> (the sha*sum utilities all refer back to md5sum's description) >> >> I better go fix all my scripts that rely on /^[0-9a-f]{32} / >> > man sha1sum, info sha1sum and sha1sum --help don't show me this info. > Instead I read this: > >> The default mode is to print a line with checksum, a character > indicating type (`*' for binary, ` ' for text), and name for each FILE. > > Would that mean the documentation in the coreutils-5.97-23.el5_6.4 is > outdated? If so, is there perhaps an undocumented option that does not > output this backslash? > I make an index of all my files to find duplicates. The backslash > doesn't help. That feature is required to allow checking the hash of any file name that contains newlines. There is no option to disable it. That omission in the documentation was corrected by COREUTILS-6_8-69-g826ff08. If you're sure you have no newline-afflicted file name, you can safely filter out the backslashes with this: sed 's/^\\//;s//\\/g' E.g., $ touch a\\b $ md5sum a\\b | sed 's/^\\//;s//\\/g' | md5sum -c - a\b: OK