[protobuf] Re: Generated hashcode() returns different values across JVM instances?

Ben Wright Wed, 11 May 2011 14:33:03 -0700

I think we wrote those replies at the same time : )

You're right, at the cost of some additional hash collisions, the
simplest solution is to simply not include the type / descriptor in
the hash calculation at all.


The best / least-collision solutions with good performance would be
what I wrote in my previous post, but that requires that someone
(presumably a current committer) with sufficient knowledge of the
Descriptor types to have enough time to update the compiler and java
libraries accordingly.

Any input from a committer for this issue?  Seems the simple solution
would take less than an hour to push into the stream and could make it
into the next release.

On May 11, 5:25 pm, Ben Wright <compuware...@gmail.com> wrote:
> Alternatively... instead of putting the onus on the compiler, the
> hashcode could be computed by the JVM at initialization time for the
> Descriptor instance, (which would also help performance of dynamically
> parsed Descriptor instance hashcode calls).
>
> i.e.
>
> private final int computedHashcode;
>
> public Descriptor() {
>    //initialization
>
>   computedHashcode = do_compute_hashCode();
>
> }
>
> public int hashCode() {
>     return computedHashcode;
>
> }
>
> punlic int do_compute_hashCode(){
>   return // compute hashcode
>
> }
>
> This is all talking towards optimum performance implementation... the
> real problem is the need for a hashCode implementation for Descriptor
> based on the actual Descriptor's content...
>
> On May 11, 4:54 pm, Ben Wright <compuware...@gmail.com> wrote:
>
>
>
>
>
>
>
> > Jay:
>
> > Using the class name to generate the hashcode is logically incorrect
> > because the class name can be derived by the options java_package_
> > name and java_outer_classname.
>
> > Additionally (although less likely to matter), separate protocol
> > buffer files can define an identical class names with different
> > protocol buffers.
>
> > Lastly, and most importantly...
>
> > If the same Message is being used with generated code and with dynamic
> > code, the hash code for the descriptor would still be identical if
> > generated from the descriptor instance, whereas the dynamic usage does
> > not have a classname from which to derive a hashcode.  While in your
> > case this should not matter, it does matter for other users of
> > protobuf.  The hashcode function would be better served by being
> > implemented correctly from state data for the descriptor.
> > Additionally, in generated code it seems that this hashcode could be
> > pre-computed by the compiler and Descriptor.hashcode() could return a
> > constant integer - which would be much more efficient than any other
> > method.
>
> > On May 11, 3:02 pm, Jay Booth <jaybo...@gmail.com> wrote:
>
> > > It can be legitimate, especially in the case of Object.hashCode(), but
> > > it's supposed to be in sync with equals() by contract.  As it stands,
> > > two objects which are equal() will produce different hashes, or the
> > > same logical object will produce different hashes across JVMs.  That
> > > breaks the contract..  if the equals() method simply did return (other
> > > == this), then it'd be fine, albeit a little useless.
>
> > > I created an issue and posted a 1-liner patch that would eliminate the
> > > problem by using getClass().getName().hashCode() to incorporate type
> > > information into the hashCode without depending on a Descriptor
> > > object's memory address.
>
> > > On May 11, 12:01 am, Dmitriy Ryaboy <dvrya...@gmail.com> wrote:
>
> > > > Hi Jay,
>
> > > > I encountered that before. Unfortunately this is a legitimate thing to
> > > > do, as documented in Object.hashCode()
>
> > > > I have a write-up of the problem and how we wound up solving it (not
> > > > elegant.. suggestions welcome) 
> > > > here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...
>
> > > > D
>
> > > > On Mon, May 9, 2011 at 8:25 AM, Jay Booth <jaybo...@gmail.com> wrote:
> > > > > I'm testing an on-disk hashtable with Protobufs and noticed that with
> > > > > the java generated hashcode function, it seems to return a different
> > > > > hashcode across JVM invocations for the same logically equivalent
> > > > > object (tested with a single string protobuf, same string for both
> > > > > instances).
>
> > > > > Is this known behavior?  Bit busy right now backporting this to work
> > > > > with String keys instead but I could provide a bit of command line
> > > > > code that demonstrates the issue when I get a chance.
>
> > > > > Glancing at the generated hashcode() function, it looks like the
> > > > > difference comes from etiher getDescriptorForType().hashCode() or
> > > > > getUnknownFields().hashCode(), both of which are incorporated.
>
> > > > > --
> > > > > You received this message because you are subscribed to the Google 
> > > > > Groups "Protocol Buffers" group.
> > > > > To post to this group, send email to protobuf@googlegroups.com.
> > > > > To unsubscribe from this group, send email to 
> > > > > protobuf+unsubscr...@googlegroups.com.
> > > > > For more options, visit this group 
> > > > > athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: Generated hashcode() returns different values across JVM instances?

Reply via email to