New submission from Arnim Rupp <pyt...@rupp.de>:

Problem: hashlib only offers digest() and hexdigest() but the fastest way to 
work with hashes is as integer.

The first thing loki does after getting the hashes is to convert them to int:
md5, sha1, sha256 = generateHashes(fileData)
                        md5_num=int(md5, 16)
                        sha1_num=int(sha1, 16)
                        sha256_num=int(sha256, 16)
https://github.com/Neo23x0/Loki/blob/master/loki.py

All the ~50000 hashes to compare are also converted to int after reading them 
from a file. The comparison is about twice as fast compared to hexdigest in 
strings because it uses just half the memory. 

(The use case here is to compare these 50,000 hashes to the hashes of all the 
200,000 files on a system that gets scanned for malicious files.)

Solution: Add decdigest() to hashlib which returns the int version of the hash. 
This has 2 advantages: 
1. It saves the time for converting the hash to hex and back
2. Having decdigest() in the documentation inspires more programmers to work 
with hashes as int opposed to slow strings (where it's performance relevant.)

Should be just few lines of code for each algorithm, I could do the PR.

static PyObject *
_sha3_shake_128_hexdigest(SHA3object *self, PyObject *arg)
{
    PyObject *return_value = NULL;
    unsigned long length;

    if (!_PyLong_UnsignedLong_Converter(arg, &length)) {
        goto exit;
    }
    return_value = _sha3_shake_128_hexdigest_impl(self, length);

https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Modules/_sha3/clinic/sha3module.c.h

----------
components: Library (Lib)
messages: 385150
nosy: 2d4d
priority: normal
severity: normal
status: open
title: Feature request: Add decdigest() to hashlib
type: performance
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue42942>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to