I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all.
The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright <compuware...@gmail.com> wrote: > Alternatively... instead of putting the onus on the compiler, the > hashcode could be computed by the JVM at initialization time for the > Descriptor instance, (which would also help performance of dynamically > parsed Descriptor instance hashcode calls). > > i.e. > > private final int computedHashcode; > > public Descriptor() { > //initialization > > computedHashcode = do_compute_hashCode(); > > } > > public int hashCode() { > return computedHashcode; > > } > > punlic int do_compute_hashCode(){ > return // compute hashcode > > } > > This is all talking towards optimum performance implementation... the > real problem is the need for a hashCode implementation for Descriptor > based on the actual Descriptor's content... > > On May 11, 4:54 pm, Ben Wright <compuware...@gmail.com> wrote: > > > > > > > > > Jay: > > > Using the class name to generate the hashcode is logically incorrect > > because the class name can be derived by the options java_package_ > > name and java_outer_classname. > > > Additionally (although less likely to matter), separate protocol > > buffer files can define an identical class names with different > > protocol buffers. > > > Lastly, and most importantly... > > > If the same Message is being used with generated code and with dynamic > > code, the hash code for the descriptor would still be identical if > > generated from the descriptor instance, whereas the dynamic usage does > > not have a classname from which to derive a hashcode. While in your > > case this should not matter, it does matter for other users of > > protobuf. The hashcode function would be better served by being > > implemented correctly from state data for the descriptor. > > Additionally, in generated code it seems that this hashcode could be > > pre-computed by the compiler and Descriptor.hashcode() could return a > > constant integer - which would be much more efficient than any other > > method. > > > On May 11, 3:02 pm, Jay Booth <jaybo...@gmail.com> wrote: > > > > It can be legitimate, especially in the case of Object.hashCode(), but > > > it's supposed to be in sync with equals() by contract. As it stands, > > > two objects which are equal() will produce different hashes, or the > > > same logical object will produce different hashes across JVMs. That > > > breaks the contract.. if the equals() method simply did return (other > > > == this), then it'd be fine, albeit a little useless. > > > > I created an issue and posted a 1-liner patch that would eliminate the > > > problem by using getClass().getName().hashCode() to incorporate type > > > information into the hashCode without depending on a Descriptor > > > object's memory address. > > > > On May 11, 12:01 am, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > > > > > Hi Jay, > > > > > I encountered that before. Unfortunately this is a legitimate thing to > > > > do, as documented in Object.hashCode() > > > > > I have a write-up of the problem and how we wound up solving it (not > > > > elegant.. suggestions welcome) > > > > here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... > > > > > D > > > > > On Mon, May 9, 2011 at 8:25 AM, Jay Booth <jaybo...@gmail.com> wrote: > > > > > I'm testing an on-disk hashtable with Protobufs and noticed that with > > > > > the java generated hashcode function, it seems to return a different > > > > > hashcode across JVM invocations for the same logically equivalent > > > > > object (tested with a single string protobuf, same string for both > > > > > instances). > > > > > > Is this known behavior? Bit busy right now backporting this to work > > > > > with String keys instead but I could provide a bit of command line > > > > > code that demonstrates the issue when I get a chance. > > > > > > Glancing at the generated hashcode() function, it looks like the > > > > > difference comes from etiher getDescriptorForType().hashCode() or > > > > > getUnknownFields().hashCode(), both of which are incorporated. > > > > > > -- > > > > > You received this message because you are subscribed to the Google > > > > > Groups "Protocol Buffers" group. > > > > > To post to this group, send email to protobuf@googlegroups.com. > > > > > To unsubscribe from this group, send email to > > > > > protobuf+unsubscr...@googlegroups.com. > > > > > For more options, visit this group > > > > > athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.