Re: The review of std.hash package

David Tue, 07 Aug 2012 16:00:51 -0700

Am 07.08.2012 21:53, schrieb Johannes Pfau:

Am Tue, 07 Aug 2012 20:07:07 +0200
schrieb David <d...@dav1d.de>:

Is the API already set in stone?

No, as Jonathan already mentioned reviewing the API is an important
part of the review. (Actually the most important part for this review,
as we define one API for many implementations)

Using .start and .finish, feels like the use of OpenSSL. Why not
making a Hash object not "restartable" (since e.g. MD5.start() sets
`this` just to MD5.init)

I'm not an expert in the hash/digest area. We had incompatible APIs for
CRC32, MD5 and SHA1 and I proposed we should have a unique interface.
As I had some free time and nobody else stepped up I implemented it,
asking for expert help in the newsgroup. This probably also explains
why this API isn't revolutionary, it's inspired by other designs
(tango, .net, ...).

BTW: making it possible to wrap OpenSSL and other C hash libraries was
also a goal. This is one reason why a start() function is necessary.
OK, that was the intro ;-)

start is here as structs can't have default constructors. For all
current hashes it's really mostly useless (except it can be used as a
simple way to reset a hash, which is also why the hash structs don't
have a reset member), but I just don't know if there are hashes which
would require a more complex (runtime) initialization.

Ok this point with the one above makes sense (I implemented my OpenSSLhashing wrapper as a class, initilaization is done in the constructor),it still doesn't feel right, if you have to call .start first. Whatabout a struct-constructor which calls .start internally, so you get abit more of a modern API (imo) and you're still able to implement thesame interface for any kind of wrapper/own implementation/whatever.

Classes don't have a start method as they can do that initialization in
the default constructor. But the default constructor can't be used as a
cheap way to reset a hash, therefore the reset function was added to
the OOP interface.


Ok. (more below)

and making finish private and implementing a
digest and hexdigest property

property is not a good idea, as finish must reset the internal state
for some objects, so you can't access the property repeatedly. It'd
have to be a function.

Well, you could store the result internally, property generates on thefirst call the digest, stores it (that's not too much, 16 byte for amd5) and returns the already stored value on the 2nd call.

which calls finish and returns an
ubyte[] array (or string for hexdigest)


This could be done and is probably a matter of taste. I prefer the free
function in this case, it makes clear that digest() really is a generic
function which can be used with any hash. It's also one function less
which must be implemented by hash writers. I'd like to keep the number
of members required to implement a hash as low as possible.

With digest you mean:http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ?

You normally always want the hexdigest (you barely need "real" digest),so it's a matter of convenience. Well, thanks to UFCS, you can call itlike a method.

(This is however unrelated to the actual naming of the functions. If
you think 'digest' is not a good name, please let me know)


I think that's a fitting name, but I am not a hashing expert.

, this would also eliminate
the need of the `digest` wrapper (e.g. you can mixin these properties
with template mixins, so no code duplication)


there's actually no code duplication. 'digest' is the only
implementation, sha1Of, md5Of are just aliases.

Yeah, I meant, you would have to implement these properties for eachhash-type and in the end, all properties do the same (calling finish andmaybe setting some internal flags, buffers), so you can put them into atemplate mixin and mixin them for each hash-type.

Also I am not sure if we want to use `hash.put` instead of
`hash.update` (which is used by python and openssl). But that's just
a minor point (e.g. perl uses .add)


I initially called it update, I agree it's a better name. But phobos is
all about ranges, so we need the put member anyway. Providing an
'update' alias just for the name isn't worth it, imho (see above,
keeping the number of hash members low).

I have to agree, an alias would produce more confusion (what two methodswhich do the same?)

Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions,
why not having a static method? e.g. MD5.hexdigest("input"), which
returns the hexdigest and not a MD5 object.


Same reason as above, sha1Of, md5Of are just convenience aliases, it
could be argued you don't need those at all. You can use 'digest'
directly and it will provide exactly the same result. So hash writers
don't have to implement anything to support digest, providing an alias
is optional and just 1 line of code. A static method would have to be
implemented by every hash.


Yes, you would have to implement it for every hash, but that's 3 lines:

static string hexdigest(void[] data) {
        return toHexString(digest!(typeof(this))("yay"));
}

Yes you could use mixins to make that
painless, but I think it's still to much work. (And mixins can't be
documented in DDOC, but that's an unrelated compiler issue)

BTW: it is implemented as a member for the OOP API, as we can just
implement it in the base interface, so the actual implementations don't
have to care about it.

I've seen that, but I am wondering why it's not a static method, thecreation of the Hash Object could be hidden inside. You might say, thiswould allocate a class instance just for a single use, without thepossibility to reuse it. Correct, but I think that's the use of such afunction, if you just need a Hash, nothing else, otherwise you could usethe class "directly", with all its other functionality.

One more thing, `.reset` does the same as `start`? If so, why do both
exist?

See above. It's because start is used in the template/struct API and
structs can't have default constructors, so we need start.

The OOP/class api can use a default constructor. But unlike the start
function it can't work as a reset function, so there's an explicit
reset function.

I am not sure what do think about that. On the one side it's useful ifyou really need that speed, but then I think, is it really worth itresetting the state hidden in a function. If you really want to avoidanother allocation you could use emplace (if I didn't misunderstand theuse of emplace). To quote from the Python Zen:

"Explicit is better than implicit."

(I also find the examples quite confusing, why do you want to
reset the hash onject?

My approach was to document how to use the functions, not as much when
it makes sense to use them. Maybe I should care more about that?

When I saw the documentation/examples, I was confused, why do you wantto reset a hash object, so does it really reset the state? So I had totake a look into the code before I was sure, it surely does reset thewhole thingy (is it called context?).

It's more efficient than allocating a new hash with 'new' (or do you
mean why use reset if finish resets anyway? It's useful if you put some
data into a hash, then decide you don't want the hash (because the
user aborts the operation, network connection is lost,...) but you still
need the hash object later on.) You could just call finish and
disregard the result, but the reset implementation is faster. I agree
there's probably no _strong_ need for reset, but I think it doesn't do
any harm.


"Explicit is better than implicit."

Well, that's it from my side :)


Thanks for your review!


Thanks for writing std.hash, I think lots of people need it.

Re: The review of std.hash package

Reply via email to