[issue26488] hashlib command line interface

2016-08-25 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
resolution:  -> rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-21 Thread Guido van Rossum

Guido van Rossum added the comment:

I prefer not to go down this road. The modules that do this where I use it
are typically Python specific, e.g. pdb or timeit. In the past we sometimes
had little main() functions in modules for testing but I think we have
better ways to test modules these days.

--Guido (mobile)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-21 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Guido, would you can to opine on this?  Every now and then we get a request to 
make command-line utilities out of tools in the standard library.  Whether we 
should or not depends on whether the standard library intends to be primarily a 
library for Python code or whether it is also about providing general purpose 
toolkits that might be helpful in a non-unix environment.

We've has some of these that have met with success (for example, timeit, 
json.tool, and SimpleHTTPServer) and others that were just a waste or a were a 
pale shadow of their full featured Unix counterparts (or left to rot in the 
Tools directory).  

If we go further down this road, it would be nice for you to lay out the ground 
roads for what kind of command line tools would be acceptable, how stable their 
API would be, and whether they should be separated from the module itself.  Do 
you even want to be in the business of offering command-line APIs that 
duplicate commonly available Unix tools?

--
assignee:  -> gvanrossum
nosy: +gvanrossum

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-21 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
assignee: rhettinger -> 

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-21 Thread Antonio Valentino

Antonio Valentino added the comment:

Hi all,
please note that to stay compatible with the GNU md5sum utility you should 
print the file name prefixed by a "*" if you read the file in binary mode.

Also when digests is checked, files to check should be opened in binary or text 
mode according the presence/absence of the "*" character before the file name.

A explicit error should be raised IMO if some specific mode is not supported 
(e.g. text mode) by the check function.

Also, since the tool supports different hashing algorithms the 
openssl/BSD-style output format could be more appropriate IMO:

MD5 (file01.dat) = 101b455ce70d2e73e1a4d92a3e8c29e1

FYI I wrote a the hashsum package [1] that provides a command line tool that is 
intended to be a "Python drop-in replacement for md5sum and co.".

If my understanding is correct you want to keep this patch as simple as 
possible but, if you are interested in, I could provide patches to:

* fix the GNU style output: binary ("*") vs text mode
* implement full support for the text mode
* implement full support for BSD-style output format

I could also convert the hassum.py utility [1] to be integrated directly in 
hashlib.

[1] https://github.com/avalentino/hashsum

--
nosy: +avalentino

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-18 Thread STINNER Victor

STINNER Victor added the comment:

> There are so many existing tools that already do this I don't really see
why Python needs to become yet another one.  What is the value in doing
this?

Portability. You described UNIX. There is no such tool by default on
Windows for example (as Avid said).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-18 Thread Aviv Palivoda

Aviv Palivoda added the comment:

The use case that made me think about this feature was when I was working on a 
Windows PC and needed to calculate an md5 of a file. I agree that in a unix 
environment there are existing tools but on windows you usually don't have them.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-18 Thread Gregory P. Smith

Gregory P. Smith added the comment:

There are so many existing tools that already do this I don't really see why 
Python needs to become yet another one.  What is the value in doing this?

Just use the openssl command.  "openssl sha256 myfile"  Or any of the md5sum, 
sha1sum and other plethora of commands people also have installed.

Overall the change looks pretty good (i left a couple comments on the patch), 
i'm not going to object to it going in.  But I don't know why we're bothering.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-08-18 Thread Aviv Palivoda

Aviv Palivoda added the comment:

Hi, is there anything more I need to do on this patch? If not do you think this 
can be added in 3.6?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-02 Thread Aviv Palivoda

Aviv Palivoda added the comment:

Adding new patch after CR changes.

--
Added file: 
http://bugs.python.org/file42355/hashlib-script-mod-md5sum-style-5.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-01 Thread Martin Panter

Martin Panter added the comment:

I left some replies to Rietveld comments (sending review emails seems buggy).

For a chunk size, don’t worry too much about it. I would say keep it large 
enough to limit time spent executing Python code and syscalls, keep it small to 
avoid wasting high speed cache memory, and keep it a power of two to work with 
OS and filesystem buffers.

--
dependencies:  -argparse.FileType for '-' doesn't work for a mode of 'rb'

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-01 Thread Aviv Palivoda

Aviv Palivoda added the comment:

Publishing another patch after SilentGhost and Victor CR. I also changed the 
block size to 256 KB. If someone can remove the dependency on issue 14156 (I 
don't think I have permissions).

--
Added file: 
http://bugs.python.org/file42347/hashlib-script-mod-md5sum-style-4.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-01 Thread STINNER Victor

STINNER Victor added the comment:

> The blocksize should be fixed and large (perhaps 256kB).

I used strace to check md5sum & sha1sum: they use read() syscalls of 32,768 
bytes.

stat().st_blksize is 4,096 bytes.

I'm not sure that it matters so much to use large read. But I don't really 
care, I'm also ok to use something large like 256 kB.

Note: The cp command uses read() syscalls of 131,072 bytes.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-01 Thread SilentGhost

SilentGhost added the comment:

Left comments on Rietveld.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-04-01 Thread Aviv Palivoda

Aviv Palivoda added the comment:

I am adding a new patch with changes from Martin CR (Thanks for the review) and 
support in the "check" option.
I also changed to examples in the Documentation to use sha256 instead of md5 as 
Christian asked. I left one example with sha1 so when someone read it he will 
see that other algorithms are supported.
As for the multi-threading feature I checked on my PC and I never reach 100% 
CPU when calculating a single hash so I think leaving this feature out is 
better.

--
Added file: 
http://bugs.python.org/file42340/hashlib-script-mod-md5sum-style-3.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-31 Thread Aviv Palivoda

Changes by Aviv Palivoda :


Added file: 
http://bugs.python.org/file42337/hashlib-script-mod-md5sum-style-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-31 Thread Aviv Palivoda

Aviv Palivoda added the comment:

Thanks for the review SilentGhost. I am including the patch after the changes 
from your CR comments.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-31 Thread Christian Heimes

Christian Heimes added the comment:

Threading doesn't make much sense here. The runtime of hash computation is 
usually dominated by I/O performance. On a typical consumer computer SSDs have 
a sequential read performance of 200 to 500 MiB/sec. SHA-512 performance 
between 100 to 150 Mib/sec. Threading could make parallel computation a bit 
faster, but at the cost of a much more complex implementation. Let's keep it 
simple.

Both Python's hashlib implementation and OpenSSL aren't the best foundation if 
you are aiming for maximum performance. All libraries do couple of malloc()s, 
memcpy() and have additional overheads, too.

And please don't use MD5 in your examples. Go for sha256.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-31 Thread SilentGhost

SilentGhost added the comment:

The mailing system is acting up, so just for the record, I've left comments on 
rietveld regarding md5sum-style patch.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-31 Thread Aviv Palivoda

Aviv Palivoda added the comment:

I am adding a new patch with a API compatible to GNU md5sum:

$ python -m hashlib md5 /bin/sh
$ d985d0ea551c1253c2305140c583d11f /bin/sh

I will soon add the feature's requested by Victor:
  1) The check option.
  2) Running the hash calculation for different files in different threads.

If anyone would like any other feature he thinks will be helpful please post it.

If we choose to use this API we should remove the dependency on issue 14156.

--
Added file: 
http://bugs.python.org/file42333/hashlib-script-mod-md5sum-style.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-30 Thread Raymond Hettinger

Raymond Hettinger added the comment:

I concur.  The blocksize should be fixed and large (perhaps 256kB).

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-30 Thread Christian Heimes

Christian Heimes added the comment:

I see a potential performance issue here. The block size is a small value, 
usually a couple of kb. With such a small value, the runtime will be dominated 
by Python call overhead and syscalls.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-30 Thread Aviv Palivoda

Aviv Palivoda added the comment:

Sorry on the late response for some reason i don't receive email notification 
from the tracker for the past few days.

1) Thanks for the review SilentGhost the patch attached include your CR 
suggestions.

2) Raymond I have fixed the problem with ctrl+D. I tried writing a test to 
simulate this problem but i don't seem to be able to simulated the terminal 
behavior on ctrl+D.

3) Removed the block_size option as suggested by Raymond and Victor and using 
os.stat().st_blksize as suggested by Victor.

4) I changed the CLI to support all available algorithms in hashlib. I am not 
sure if this is too many choices to show in the --help message.

5) About removing the use of argparse.FileType i would prefer resolving issue 
14156 but if you think this is problematic i will do the change.

6) What do you think about changing the API to be more like md5sum?
a) Allowing * in the file name to calcualte on multiple files.
b) Adding the check option.
c) printing file name in output.

--
Added file: http://bugs.python.org/file42329/hashlib-script-mod-2.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread STINNER Victor

STINNER Victor added the comment:

> I think we should add a dependency in issue 14156 for this issue.

Maybe you can start using strings for filename parameters and open files 
manually, and later consider switching back to FileType once it's fixed?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread STINNER Victor

STINNER Victor added the comment:

About the compatibility with existing tools, I recall a discussion when the 
tarfile module got a CLI. First I expected a clone of the UNIX tar command, but 
it was decided to design a new *simpler* CLI.

---
$ python3 -m tarfile
usage: tarfile.py [-h] [-v] [-l  | -e  [ ...] |
  -c  [ ...] | -t ]

A simple command line interface for tarfile module.

optional arguments:
  -h, --helpshow this help message and exit
  -v, --verbose Verbose output
  -l , --list 
Show listing of a tarfile
  -e  [ ...], --extract  [ ...]
Extract tarfile into target dir
  -c  [ ...], --create  [ ...]
Create tarfile from sources
  -t , --test 
Test if a tarfile is valid
---


A common trap of the md5sum CLI is that users write "echo string|md5sum" which 
adds a newline to string. For an unknown reason, my french manual page of the 
md5sum command has a -s STRING/--string=STRING argument, but not my effective 
md5sum program. Maybe we should consider adding such option to avoid the trap?


Do you want to implement a function to compare computed hash to a file which 
contains the expected hash? Check for file integrity, md5sum -c 
FILE/--check=FILE. Example:
--
$ md5sum test_socket_with.patch > check
$ cat check 
cfc1d69e76c827c32af4f28f50714a5e  test_socket_with.patch

$ md5sum -c check
test_socket_with.patch: OK

$ vim test_socket_with.patch 


$ md5sum -c check
test_socket_with.patch: FAILED
md5sum: WARNING: 1 computed checksum did NOT match
--


I worked hard to release the GIL when a hash is released. It would be super 
cool (a killer feature?) to automatically spawn threads to compute the hash. 
For example, use N threads where N is the number of CPU (os.cpu_count() or 1). 
Last time I wrote my md5sum.py, it was much faster than the UNIX md5sum tool 
since it uses all my CPU cores. You should just ensure that output is written 
in the correct order.


Raymond wrote:
> 1) Neither the md5 or shasum command-line tools offer control over the 
> blocksize.  I suggest that option be dropped from the command-line API giving 
> a nice simplification and usability improvement.

I agree. You should compute it per file using os.stat().st_blksize:

   https://docs.python.org/dev/library/os.html#os.stat_result.st_blksize

The io module uses st_blksize if it is greater than 1, or 8 * 1024 bytes.

(By the way, it looks like shutil.copyfile() doesn't use st_blksize.)

--
nosy: +haypo

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Overall, I'm +1 on the idea and think the patch looks good.  There are two 
issues to look at:

1) Neither the md5 or shasum command-line tools offer control over the 
blocksize.  I suggest that option be dropped from the command-line API giving a 
nice simplification and usability improvement.

The stdin option has issues:

$ md5
abc
0bee89b07a248e27c83fc3d5951213c1

$ python3.6 -m hashlib md5
abc
^D
^CTraceback (most recent call last):
  File "/Users/raymond/cpython/Lib/runpy.py", line 184, in _run_module_as_main
"__main__", mod_spec)
  File "/Users/raymond/cpython/Lib/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File "/Users/raymond/cpython/Lib/hashlib.py", line 246, in 
main()
  File "/Users/raymond/cpython/Lib/hashlib.py", line 238, in main
data = args.file.read(args.block_size)
KeyboardInterrupt

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread SilentGhost

SilentGhost added the comment:

I've left some comment on rietveld yesterday, not sure if you got the e-mail.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread Raymond Hettinger

Raymond Hettinger added the comment:

Here is a quick check versus other command-line tools:

$ md5 xml_prolog.diff 
MD5 (xml_prolog.diff) = c30e1fe70651d5a472efe9598db70771
$ python3.6 -m hashlib md5 xml_prolog.diff 
c30e1fe70651d5a472efe9598db70771
$ shasum -a 1 xml_prolog.diff 
db5fd60f3002ed270fe4469c7f343aba515d9b1a  xml_prolog.diff
$ python3.6 -m hashlib sha1 xml_prolog.diff 
db5fd60f3002ed270fe4469c7f343aba515d9b1a
$ shasum -a 224 xml_prolog.diff 
d0d18eb3e4d71269a610308e51e49d8b4650134bf2757dd22d69430b  xml_prolog.diff
$ python3.6 -m hashlib sha224 xml_prolog.diff 
d0d18eb3e4d71269a610308e51e49d8b4650134bf2757dd22d69430b
$ shasum -a 256 xml_prolog.diff 
0bbd834589d5fc9e26e32e5c665de4a2fbd93ea3bb4688ea25ef1139cd152b09  
xml_prolog.diff
$ python3.6 -m hashlib sha256 xml_prolog.diff 
0bbd834589d5fc9e26e32e5c665de4a2fbd93ea3bb4688ea25ef1139cd152b09
$ shasum -a 384 xml_prolog.diff 
f627b6c2bdc95e1af00da2a4b5d3897284127d0e820963f60758e34e5cfeb64cb529bfb306789f73c91d58d7594a3f3b
  xml_prolog.diff
$ python3.6 -m hashlib sha384 xml_prolog.diff 
f627b6c2bdc95e1af00da2a4b5d3897284127d0e820963f60758e34e5cfeb64cb529bfb306789f73c91d58d7594a3f3b
$ shasum -a 512 xml_prolog.diff 
c93babfef5a25bb569e4fd5c6edbb6d5e1de92044576b68522abcd9d9a356eca6b791df97a08e5c75cd090f00d5f2f73d940b901796b190681d370e5da74e4d8
  xml_prolog.diff
$ python3.6 -m hashlib sha512 xml_prolog.diff 
c93babfef5a25bb569e4fd5c6edbb6d5e1de92044576b68522abcd9d9a356eca6b791df97a08e5c75cd090f00d5f2f73d940b901796b190681d370e5da74e4d8

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-29 Thread Raymond Hettinger

Changes by Raymond Hettinger :


--
assignee:  -> rhettinger
nosy: +rhettinger

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-28 Thread SilentGhost

Changes by SilentGhost :


--
dependencies: +argparse.FileType for '-' doesn't work for a mode of 'rb'
nosy: +SilentGhost

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-28 Thread Aviv Palivoda

Aviv Palivoda added the comment:

I actually have noticed issue 13824 while working on this issue. The patch I 
uploaded to Issue 14156 actually fixes the problem in issue 13824 in addition 
to the problem with the stdin/stdout.
I think we should add a dependency in issue 14156 for this issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-27 Thread Martin Panter

Martin Panter added the comment:

FWIW I am not really comfortable with argparse.FileType; see Issue 13824. How 
does this patch perform when warnings are enabled, or if you specify a file, 
but then cause some other argparse error?

--
nosy: +martin.panter

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-11 Thread Terry J. Reedy

Terry J. Reedy added the comment:

This strikes me as a sensible addition.  While I did not review the patch in 
detail, it seems cleanly written and is complete in having a new main() for the 
code, a new doc section, and a new unittest Test class.

--
nosy: +terry.reedy
stage:  -> patch review

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue26488] hashlib command line interface

2016-03-05 Thread Aviv Palivoda

New submission from Aviv Palivoda:

I am suggesting to add a command line interface to the hashlib module. A simple 
example of the api I suggest is:
$ python -m hashlib md5 /bin/sh
$ d985d0ea551c1253c2305140c583d11f

A patch is included.

--
components: Library (Lib)
files: hashlib-script-mod.patch
keywords: patch
messages: 261225
nosy: christian.heimes, gregory.p.smith, palaviv
priority: normal
severity: normal
status: open
title: hashlib command line interface
type: enhancement
versions: Python 3.6
Added file: http://bugs.python.org/file42074/hashlib-script-mod.patch

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com