[protobuf] Issue 290 in protobuf: Cannot compile Google Protocol Buffers on z/OS
Status: New Owner: liuj...@google.com Labels: Type-Defect Priority-Medium New issue 290 by simonsto...@gmail.com: Cannot compile Google Protocol Buffers on z/OS http://code.google.com/p/protobuf/issues/detail?id=290 What steps will reproduce the problem? 1. Build Google Protocol Buffers on a z/OS system as follows: _CXX_CXXSUFFIX=cpp ./configure --prefix=/tmp/gpb_64 _CXX_OPTIONS=-Wc,LP64 -Wl,LP64 -Wc,TARGET(zOSV1R11) -Wc,LANGLVL(EXTENDED) -D_OPEN_THREADS -D_XOPEN_SOURCE -D_XOPEN_SOURCE_EXTENDED=1 -D_OPEN_SYS -D_ISOC99_SOURCE make -j20 clean all What is the expected output? What do you see instead? I expect the build to pass successfully so I can run the install phase. Instead I see the following output from a unit test: libtool: link: c++ -DNDEBUG -o protoc main.o ./.libs/libprotobuf.a ./.libs/libprotoc.a /MVK3/tmp/sstone/protobuf-2.4.1/src/.libs/libprotobuf.a oldpwd=`pwd` ( cd . $oldpwd/protoc -I. --cpp_out=$oldpwd google/protobuf/unittest.proto google/protobuf/unittest_empty.proto google/protobuf/unittest_import.proto google/protobuf/unittest_mset.proto google/protobuf/unittest_optimize_for.proto google/protobuf/unittest_embed_optimize_for.proto google/protobuf/unittest_custom_options.proto google/protobuf/unittest_lite.proto google/protobuf/unittest_import_lite.proto google/protobuf/unittest_lite_imports_nonlite.proto google/protobuf/unittest_no_generic_services.proto google/protobuf/compiler/cpp/cpp_test_bad_identifiers.proto ) libprotobuf ERROR ./google/protobuf/descriptor_database.cc:314] Invalid file descriptor data passed to EncodedDescriptorDatabase::Add(). libprotobuf FATAL ./google/protobuf/descriptor.cc:862] CHECK failed: generated_database_-Add(encoded_file_descriptor, size): CEE5207E The signal SIGABRT was received. make[2]: *** [unittest_proto_middleman] Error 131 make[2]: Leaving directory `/MVK3/tmp/sstone/protobuf-2.4.1/src' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/MVK3/tmp/sstone/protobuf-2.4.1' make: *** [all] Error 2 What version of the product are you using? On what operating system? I am using Google Protocol Buffers 2.4.1. I also tried 2.4.0a and 2.1.0. The system is running z/OS V1R11 with the XL C/C++ V1R11 compilers. Please provide any additional information below. I suspect the issue may be caused by an expectation that some of the data *somewhere* will be in ASCII, but it is actually in EBCDIC because of the operating system. I have managed to get this working on AIX (POWER), RHEL (x86, 390) and Windows (x86) systems without any issues. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Well, to sidestep the whole discussion, we could just eliminate types from hashcode completely and work strictly on field hashcodes. The type check is there in equals(), hashmaps will function correctly, and hash collisions are already allowed between inequal objects, they're a fact of life. Return 1; would be legal, it would just be a really bad idea for anything that actually uses the hashcode. I figured getClass().getName() would provide the additional hash spread benefit in the majority of cases and HashMaps and the like could fall back on equals() as they always do upon the rare collision between differently-typed-but-same-fields objects. Descriptor has a bunch of stuff attached to it, which would lead to a potentially complicated/expensive hashCode() implementation and I didn't understand that code well enough to suggest a patch. I also don't know the protobuf compiler well enough to have an opinion on generating a constant per-class as you suggest, I could see it leading to trouble when you regenerate classes that haven't changed unless it's deterministic somehow. Just going with the KISS principle. On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
[protobuf] Re: Generated hashcode() returns different values across JVM instances?
I think we wrote those replies at the same time : ) You're right, at the cost of some additional hash collisions, the simplest solution is to simply not include the type / descriptor in the hash calculation at all. The best / least-collision solutions with good performance would be what I wrote in my previous post, but that requires that someone (presumably a current committer) with sufficient knowledge of the Descriptor types to have enough time to update the compiler and java libraries accordingly. Any input from a committer for this issue? Seems the simple solution would take less than an hour to push into the stream and could make it into the next release. On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote: Alternatively... instead of putting the onus on the compiler, the hashcode could be computed by the JVM at initialization time for the Descriptor instance, (which would also help performance of dynamically parsed Descriptor instance hashcode calls). i.e. private final int computedHashcode; public Descriptor() { //initialization computedHashcode = do_compute_hashCode(); } public int hashCode() { return computedHashcode; } punlic int do_compute_hashCode(){ return // compute hashcode } This is all talking towards optimum performance implementation... the real problem is the need for a hashCode implementation for Descriptor based on the actual Descriptor's content... On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote: Jay: Using the class name to generate the hashcode is logically incorrect because the class name can be derived by the options java_package_ name and java_outer_classname. Additionally (although less likely to matter), separate protocol buffer files can define an identical class names with different protocol buffers. Lastly, and most importantly... If the same Message is being used with generated code and with dynamic code, the hash code for the descriptor would still be identical if generated from the descriptor instance, whereas the dynamic usage does not have a classname from which to derive a hashcode. While in your case this should not matter, it does matter for other users of protobuf. The hashcode function would be better served by being implemented correctly from state data for the descriptor. Additionally, in generated code it seems that this hashcode could be pre-computed by the compiler and Descriptor.hashcode() could return a constant integer - which would be much more efficient than any other method. On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote: It can be legitimate, especially in the case of Object.hashCode(), but it's supposed to be in sync with equals() by contract. As it stands, two objects which are equal() will produce different hashes, or the same logical object will produce different hashes across JVMs. That breaks the contract.. if the equals() method simply did return (other == this), then it'd be fine, albeit a little useless. I created an issue and posted a 1-liner patch that would eliminate the problem by using getClass().getName().hashCode() to incorporate type information into the hashCode without depending on a Descriptor object's memory address. On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote: Hi Jay, I encountered that before. Unfortunately this is a legitimate thing to do, as documented in Object.hashCode() I have a write-up of the problem and how we wound up solving it (not elegant.. suggestions welcome) here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash... D On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote: I'm testing an on-disk hashtable with Protobufs and noticed that with the java generated hashcode function, it seems to return a different hashcode across JVM invocations for the same logically equivalent object (tested with a single string protobuf, same string for both instances). Is this known behavior? Bit busy right now backporting this to work with String keys instead but I could provide a bit of command line code that demonstrates the issue when I get a chance. Glancing at the generated hashcode() function, it looks like the difference comes from etiher getDescriptorForType().hashCode() or getUnknownFields().hashCode(), both of which are incorporated. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group athttp://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post
[protobuf] Correct behaviour when encountering an unexpected enum on the wire
I'm doing some code maintenance on my protobuf library, and I have encountered a test that is... confusing me. So before I go crazy (/ crazier)... what should an implementation do if during deserialization it gets an enum it doesn't recognise? - to explode in sparks? - to ignore the data? - to brutally coerce the data to the unexpected value? - other? Thanks in advance Marc -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
Re: [protobuf] Correct behaviour when encountering an unexpected enum on the wire
Ignore the data (sort of): The unknown value gets treated as an unknown field, leaving the enum field unset. In implementations that support propagation of unknown fields (non-lite C++, Java), the value is added to the UnknownFieldSet. Implications: - a sender may set a required enum field to a valid value, according to its defintion of the .proto. If the recipient doesn't know about that value, the message may fail to parse. It's best to avoid required enums - by treating it as an unknown field, a message can be sent between two programs who understand a particular enum value through a middleman that doesn't, and there is no loss of data On Wed, May 11, 2011 at 2:47 PM, Marc Gravell marc.grav...@gmail.comwrote: I'm doing some code maintenance on my protobuf library, and I have encountered a test that is... confusing me. So before I go crazy (/ crazier)... what should an implementation do if during deserialization it gets an enum it doesn't recognise? - to explode in sparks? - to ignore the data? - to brutally coerce the data to the unexpected value? - other? Thanks in advance Marc -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en. -- You received this message because you are subscribed to the Google Groups Protocol Buffers group. To post to this group, send email to protobuf@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.