All you say is true. It should be noted that using vector as a key is
innefficient. Similarly to that using String as a key in a map is just
about as inefficient for the same reason.

Shouldn't it already be obvious and the keying by vector should
already be considered perhaps not a good massive iteration technique?
IMO It is already assumed that generalized equals() and hashCode()
contracts may be inefficient depending on the data they go over.
Assumption to the contrary to that just means a naive view of
implications of hashing and hashCode() contracts. Which does happen a
lot though, i must admit. But it doesn't make it true what hashcode()
and equals contracts are about though. (As Bill Maher puts it, just
because many believe something being true, does not actually make it
true).:)

if math equality is not vector identity but rather reference identity
is, then a) it goes against generalized  equals () contract, and b)
what would be a compelling use case of usng vector by idenitty in a
map which is perhaps more compelling than generalized equals()
contract? Note that those are fundamentally different tasks: if
generialized equals() contract (i.e. equality by value) is desired, it
cannot be simply solved by identity equals contract (the reverse is
always true though).

Another argument for not breaking generalized equals() by-value
contract is that for those people who do realize the deficiency,
there're already idenitty-based structures that make people make the
conscious choice of dropping equals-by-value: IdentityHashMap. In
light of this, repelling equals-by-value doesn't really provide any
new capability.

I vaguely recollect there perhaps was a discussion of factoring
mathematical vector equality into a separate method which i support.

But i don't necessarily see how your proposal cures any of the ills
you've mentioned or makes existing identity-based choices any better.
It does however introduce contradiction to equals-by-value contract
which is what most people should (even if not would) expect.

Thanks.
-d

On Thu, Feb 23, 2012 at 9:24 AM, Jake Mannix <[email protected]> wrote:
> Hey Devs.
>
>  Was prototyping some stuff in Mahout last night, and noticed something
> I'm not sure if we've talked about before: because we have equals() for
> Vector instances return true iff the numeric values of the vectors are
> equal, and we also have a consistent hashCode(), anytime you have
> HashMap<Vector, Anything>, all the typical things you think are O(1) are
> really O(vector.numNonZeroes()).  I tried to look through the codebase and
> see where we hang onto maps with vector keys, and we do it sometimes.
>  Maybe we shouldn't?  Most Vectors have identities (clusterId, documentId,
> topicId, etc...) which we could normalize away... or maybe we should be
> using IdentityHashMap, to ensure you're using strict object identity and
> avoid doing this calculation?  This could be really slow if these are big
> dense vectors, for instance.
>
>  This looks like it could be a really easy place to accidentally add heavy
> complexity to things.  Do we really want people do be checking
> *mathematical* equals() on vectors which have floating point precision?
>
>  -jake

Reply via email to