[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
Ah, I didn't get this message. How does this break use in a HashMap? Given two protobufs of the same type in the same JVM, with the same contents, does hashCode() not return the same value? On Wed, May 18, 2011 at 11:50 AM, Jay Booth jaybo...@gmail.com wrote: I'd also note that this precludes usage of protobuf objects in HashSets or as keys in HashMaps -- any time someone does that, it will be a (subtle and confusing) bug, as hashcode() isn't aligned with equals(). On May 18, 2:48 pm, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. tho it's not explicitly stated it would seem to be implied that is within a single JVM. not sure if the code in question maintains that rule within a JVM (if not that's a big deal). if so that would seem sufficient for all but the most distributed of Hash* containers, such as where a client which is remote from the storage (i.e, in another JVM) determines the bucket (based on hashCode()) to find that the element in question has been placed into another bucket because the hashCode() within the containing JVM has evaluated to another value. that's a pretty far-fetched, but not unimaginable, situation. From: Jason Hsueh jas...@google.com To: Jay Booth jaybo...@gmail.com Cc: Protocol Buffers protobuf@googlegroups.com Sent: Wed, May 18, 2011 12:00:19 PM Subject: Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances? Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
On Wed, May 18, 2011 at 12:08 PM, Ron Reynolds tequila...@ymail.com wrote: the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. From what Ive gathered, this invariant is true for protocol buffer's implementation of equals() and hashCode(). The descriptor object identity is included in both methods. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Right, I forgot that the Descriptor object is statically initialized so it'll be consistent within the JVM. I still think that in the context of serializable objects, (i.e. objects intended to be transported between JVMs in some way, shape or form) a consistent hashCode would be useful for a lot of cases (mine included). If it can't be consistent in the presence of unknown fields, perhaps a well-documented caution to that (relatively uncommon?) case would be more useful than completely punting on all cases. On May 18, 3:08 pm, Ron Reynolds tequila...@ymail.com wrote: the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. tho it's not explicitly stated it would seem to be implied that is within a single JVM. not sure if the code in question maintains that rule within a JVM (if not that's a big deal). if so that would seem sufficient for all but the most distributed of Hash* containers, such as where a client which is remote from the storage (i.e, in another JVM) determines the bucket (based on hashCode()) to find that the element in question has been placed into another bucket because the hashCode() within the containing JVM has evaluated to another value. that's a pretty far-fetched, but not unimaginable, situation. From: Jason Hsueh jas...@google.com To: Jay Booth jaybo...@gmail.com Cc: Protocol Buffers protobuf@googlegroups.com Sent: Wed, May 18, 2011 12:00:19 PM Subject: Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances? Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
Well, hashing the serialization as Pherl/Kenton suggested gives you that, and seems cleaner and more direct than making hashCode() of the in-memory object assume that you are doing something across JVMs. (Also, unknown fields are pretty common when you may have a jungle of binaries, which is true here.) On Wed, May 18, 2011 at 12:19 PM, Jay Booth jaybo...@gmail.com wrote: Right, I forgot that the Descriptor object is statically initialized so it'll be consistent within the JVM. I still think that in the context of serializable objects, (i.e. objects intended to be transported between JVMs in some way, shape or form) a consistent hashCode would be useful for a lot of cases (mine included). If it can't be consistent in the presence of unknown fields, perhaps a well-documented caution to that (relatively uncommon?) case would be more useful than completely punting on all cases. On May 18, 3:08 pm, Ron Reynolds tequila...@ymail.com wrote: the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. tho it's not explicitly stated it would seem to be implied that is within a single JVM. not sure if the code in question maintains that rule within a JVM (if not that's a big deal). if so that would seem sufficient for all but the most distributed of Hash* containers, such as where a client which is remote from the storage (i.e, in another JVM) determines the bucket (based on hashCode()) to find that the element in question has been placed into another bucket because the hashCode() within the containing JVM has evaluated to another value. that's a pretty far-fetched, but not unimaginable, situation. From: Jason Hsueh jas...@google.com To: Jay Booth jaybo...@gmail.com Cc: Protocol Buffers protobuf@googlegroups.com Sent: Wed, May 18, 2011 12:00:19 PM Subject: Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances? Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
That definitely seems acceptable. Any reason not to make the default implementation return toByteString().hashCode() and get the best of both worlds? It's a little more expensive probably, but Object.hashCode() is already deceptively expensive to begin with. I personally switched my particular problem to use Strings as keys, since that works for the use case here and is easier for corresponding command-line tools, I just generally see a consistent hashcode as a good idea if supportable. On Wed, May 18, 2011 at 3:33 PM, Jason Hsueh jas...@google.com wrote: Well, hashing the serialization as Pherl/Kenton suggested gives you that, and seems cleaner and more direct than making hashCode() of the in-memory object assume that you are doing something across JVMs. (Also, unknown fields are pretty common when you may have a jungle of binaries, which is true here.) On Wed, May 18, 2011 at 12:19 PM, Jay Booth jaybo...@gmail.com wrote: Right, I forgot that the Descriptor object is statically initialized so it'll be consistent within the JVM. I still think that in the context of serializable objects, (i.e. objects intended to be transported between JVMs in some way, shape or form) a consistent hashCode would be useful for a lot of cases (mine included). If it can't be consistent in the presence of unknown fields, perhaps a well-documented caution to that (relatively uncommon?) case would be more useful than completely punting on all cases. On May 18, 3:08 pm, Ron Reynolds tequila...@ymail.com wrote: the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. tho it's not explicitly stated it would seem to be implied that is within a single JVM. not sure if the code in question maintains that rule within a JVM (if not that's a big deal). if so that would seem sufficient for all but the most distributed of Hash* containers, such as where a client which is remote from the storage (i.e, in another JVM) determines the bucket (based on hashCode()) to find that the element in question has been placed into another bucket because the hashCode() within the containing JVM has evaluated to another value. that's a pretty far-fetched, but not unimaginable, situation. From: Jason Hsueh jas...@google.com To: Jay Booth jaybo...@gmail.com Cc: Protocol Buffers protobuf@googlegroups.com Sent: Wed, May 18, 2011 12:00:19 PM Subject: Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances? Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
On Wed, May 18, 2011 at 12:44 PM, Jay Booth jaybo...@gmail.com wrote: That definitely seems acceptable. Any reason not to make the default implementation return toByteString().hashCode() and get the best of both worlds? It's a little more expensive probably, but Object.hashCode() is already deceptively expensive to begin with. I'm not well-enough versed in Java to say with authority, but as far as I can tell, hashCode() is intended for HashMap/HashSet, and assuming that Object.hashCode() provides some consistent hash is incorrect. Yes, protobuf could attempt to provide a consistent hash with toByteString().hashCode(), except we already know that there are caveats to that, so it's not like we can provide any contractual guarantee. Applications like yours that need a stable hash could implement a wrapper object that overrides hashCode() to use this approach, or perhaps simply use a different method altogether to express the stability requirement. I personally switched my particular problem to use Strings as keys, I think you want to avoid using Strings to store the serializations since this is binary data, and String will coerce that to UTF. since that works for the use case here and is easier for corresponding command-line tools, I just generally see a consistent hashcode as a good idea if supportable. On Wed, May 18, 2011 at 3:33 PM, Jason Hsueh jas...@google.com wrote: Well, hashing the serialization as Pherl/Kenton suggested gives you that, and seems cleaner and more direct than making hashCode() of the in-memory object assume that you are doing something across JVMs. (Also, unknown fields are pretty common when you may have a jungle of binaries, which is true here.) On Wed, May 18, 2011 at 12:19 PM, Jay Booth jaybo...@gmail.com wrote: Right, I forgot that the Descriptor object is statically initialized so it'll be consistent within the JVM. I still think that in the context of serializable objects, (i.e. objects intended to be transported between JVMs in some way, shape or form) a consistent hashCode would be useful for a lot of cases (mine included). If it can't be consistent in the presence of unknown fields, perhaps a well-documented caution to that (relatively uncommon?) case would be more useful than completely punting on all cases. On May 18, 3:08 pm, Ron Reynolds tequila...@ymail.com wrote: the corner-stone of Hash* containers is: (A.equals(B)) = (A.hashCode() == B.hashCode()) for all A, B. tho it's not explicitly stated it would seem to be implied that is within a single JVM. not sure if the code in question maintains that rule within a JVM (if not that's a big deal). if so that would seem sufficient for all but the most distributed of Hash* containers, such as where a client which is remote from the storage (i.e, in another JVM) determines the bucket (based on hashCode()) to find that the element in question has been placed into another bucket because the hashCode() within the containing JVM has evaluated to another value. that's a pretty far-fetched, but not unimaginable, situation. From: Jason Hsueh jas...@google.com To: Jay Booth jaybo...@gmail.com Cc: Protocol Buffers protobuf@googlegroups.com Sent: Wed, May 18, 2011 12:00:19 PM Subject: Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances? Jumping in late to the thread, and I'm not really a Java person, so I may be misunderstanding something here. But as far as I can tell, you are asking for hashCode() to be a 'consistent' hash across processes. hashCode() as implemented is still useful within a single JVM, allowing you to use protobufs in HashMaps based on content rather than object identity. That was the intended use case. On Wed, May 18, 2011 at 11:48 AM, Jay Booth jaybo...@gmail.com wrote: Well, that's your prerogative, I guess, but why even implement hashcode at all then? Just inherit from object and you're getting effectively the same behavior. Is that what you're intending? On May 16, 10:03 am, Pherl Liu liuj...@google.com wrote: We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never
Re: [protobuf] Re: Generated hashcode() returns different values across JVM instances?
We discussed internally and decided not to make the hashCode() return deterministic result. If you need consistent hashcode in different runs, use toByteString().hashCode(). Quoted from Kenton: Hashing the content of the descriptor would actually be incorrect, because two descriptors with exactly the same content are still considered different types. Descriptors are compared by identity, hence they are hashed by pointer. Removing the descriptor from the calculation would indeed make hashCode() consistent between two runs of the same binary, and probably insignificant runtime cost. Of course, once you do that, you will never be able to introduce non-determinism again because people will depend on it. But there's a much bigger risk. People may actually start depending on hashCode() returning consistent results between two different versions of the binary, or two completely separate binaries that compile in the same protocol, or -- most dangerously -- two different versions of the same protocol (e.g. with fields added or removed). I think it would be very difficult and limiting to make these guarantees, so I would be extremely cautious about this. Certainly, there is no implementation of hashCode() that would be any safer than .toByteString().hashCode(). So, I'd advise steering people to the latter. Note that if unknown fields are present, the results may still be inconsistent. However, there is no reasonable way to implement a hashCode() that is consistent in the presence of unknown fields. On Thu, May 12, 2011 at 5:32 AM, Ben Wright compuware...@gmail.com wrote: I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Well, to sidestep the whole discussion, we could just eliminate types from hashcode completely and work strictly on field hashcodes. The type check is there in equals(), hashmaps will function correctly, and hash collisions are already allowed between inequal objects, they're a fact of life. Return 1; would be legal, it would just be a really bad idea for anything that actually uses the hashcode. I figured getClass().getName() would provide the additional hash spread benefit in the majority of cases and HashMaps and the like could fall back on equals() as they always do upon the rare collision between differently-typed-but-same-fields objects. Descriptor has a bunch of stuff attached to it, which would lead to a potentially complicated/expensive hashCode() implementation and I didn't understand that code well enough to suggest a patch. I also don't know the protobuf compiler well enough to have an opinion on generating a constant per-class as you suggest, I could see it leading to trouble when you regenerate classes that haven't changed unless it's deterministic somehow. Just going with the KISS principle. On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post