Am 07.08.2012 21:53, schrieb Johannes Pfau:
Am Tue, 07 Aug 2012 20:07:07 +0200
schrieb David <d...@dav1d.de>:

Is the API already set in stone?
No, as Jonathan already mentioned reviewing the API is an important
part of the review. (Actually the most important part for this review,
as we define one API for many implementations)

Using .start and .finish, feels like the use of OpenSSL. Why not
making a Hash object not "restartable" (since e.g. MD5.start() sets
`this` just to MD5.init)
I'm not an expert in the hash/digest area. We had incompatible APIs for
CRC32, MD5 and SHA1 and I proposed we should have a unique interface.
As I had some free time and nobody else stepped up I implemented it,
asking for expert help in the newsgroup. This probably also explains
why this API isn't revolutionary, it's inspired by other designs
(tango, .net, ...).

BTW: making it possible to wrap OpenSSL and other C hash libraries was
also a goal. This is one reason why a start() function is necessary.
OK, that was the intro ;-)

start is here as structs can't have default constructors. For all
current hashes it's really mostly useless (except it can be used as a
simple way to reset a hash, which is also why the hash structs don't
have a reset member), but I just don't know if there are hashes which
would require a more complex (runtime) initialization.

Ok this point with the one above makes sense (I implemented my OpenSSL hashing wrapper as a class, initilaization is done in the constructor), it still doesn't feel right, if you have to call .start first. What about a struct-constructor which calls .start internally, so you get a bit more of a modern API (imo) and you're still able to implement the same interface for any kind of wrapper/own implementation/whatever.

Classes don't have a start method as they can do that initialization in
the default constructor. But the default constructor can't be used as a
cheap way to reset a hash, therefore the reset function was added to
the OOP interface.

Ok. (more below)

and making finish private and implementing a
digest and hexdigest property
property is not a good idea, as finish must reset the internal state
for some objects, so you can't access the property repeatedly. It'd
have to be a function.

Well, you could store the result internally, property generates on the first call the digest, stores it (that's not too much, 16 byte for a md5) and returns the already stored value on the 2nd call.

which calls finish and returns an
ubyte[] array (or string for hexdigest)

This could be done and is probably a matter of taste. I prefer the free
function in this case, it makes clear that digest() really is a generic
function which can be used with any hash. It's also one function less
which must be implemented by hash writers. I'd like to keep the number
of members required to implement a hash as low as possible.

With digest you mean: http://dl.dropbox.com/u/24218791/d/phobos/std_hash_hash.html#digest ?

You normally always want the hexdigest (you barely need "real" digest), so it's a matter of convenience. Well, thanks to UFCS, you can call it like a method.

(This is however unrelated to the actual naming of the functions. If
you think 'digest' is not a good name, please let me know)

I think that's a fitting name, but I am not a hashing expert.

, this would also eliminate
the need of the `digest` wrapper (e.g. you can mixin these properties
with template mixins, so no code duplication)

there's actually no code duplication. 'digest' is the only
implementation, sha1Of, md5Of are just aliases.

Yeah, I meant, you would have to implement these properties for each hash-type and in the end, all properties do the same (calling finish and maybe setting some internal flags, buffers), so you can put them into a template mixin and mixin them for each hash-type.

Also I am not sure if we want to use `hash.put` instead of
`hash.update` (which is used by python and openssl). But that's just
a minor point (e.g. perl uses .add)

I initially called it update, I agree it's a better name. But phobos is
all about ranges, so we need the put member anyway. Providing an
'update' alias just for the name isn't worth it, imho (see above,
keeping the number of hash members low).

I have to agree, an alias would produce more confusion (what two methods which do the same?)

Furthermore, I don't like the `sha1Of` and `md5Of` etc. functions,
why not having a static method? e.g. MD5.hexdigest("input"), which
returns the hexdigest and not a MD5 object.

Same reason as above, sha1Of, md5Of are just convenience aliases, it
could be argued you don't need those at all. You can use 'digest'
directly and it will provide exactly the same result. So hash writers
don't have to implement anything to support digest, providing an alias
is optional and just 1 line of code. A static method would have to be
implemented by every hash.

Yes, you would have to implement it for every hash, but that's 3 lines:

static string hexdigest(void[] data) {
        return toHexString(digest!(typeof(this))("yay"));
}

Yes you could use mixins to make that
painless, but I think it's still to much work. (And mixins can't be
documented in DDOC, but that's an unrelated compiler issue)

BTW: it is implemented as a member for the OOP API, as we can just
implement it in the base interface, so the actual implementations don't
have to care about it.

I've seen that, but I am wondering why it's not a static method, the creation of the Hash Object could be hidden inside. You might say, this would allocate a class instance just for a single use, without the possibility to reuse it. Correct, but I think that's the use of such a function, if you just need a Hash, nothing else, otherwise you could use the class "directly", with all its other functionality.

One more thing, `.reset` does the same as `start`? If so, why do both
exist?
See above. It's because start is used in the template/struct API and
structs can't have default constructors, so we need start.

The OOP/class api can use a default constructor. But unlike the start
function it can't work as a reset function, so there's an explicit
reset function.

I am not sure what do think about that. On the one side it's useful if you really need that speed, but then I think, is it really worth it resetting the state hidden in a function. If you really want to avoid another allocation you could use emplace (if I didn't misunderstand the use of emplace). To quote from the Python Zen:
"Explicit is better than implicit."

(I also find the examples quite confusing, why do you want to
reset the hash onject?
My approach was to document how to use the functions, not as much when
it makes sense to use them. Maybe I should care more about that?

When I saw the documentation/examples, I was confused, why do you want to reset a hash object, so does it really reset the state? So I had to take a look into the code before I was sure, it surely does reset the whole thingy (is it called context?).

It's more efficient than allocating a new hash with 'new' (or do you
mean why use reset if finish resets anyway? It's useful if you put some
data into a hash, then decide you don't want the hash (because the
user aborts the operation, network connection is lost,...) but you still
need the hash object later on.) You could just call finish and
disregard the result, but the reset implementation is faster. I agree
there's probably no _strong_ need for reset, but I think it doesn't do
any harm.

"Explicit is better than implicit."

Well, that's it from my side :)

Thanks for your review!

Thanks for writing std.hash, I think lots of people need it.



Reply via email to