[protobuf] Issue 290 in protobuf: Cannot compile Google Protocol Buffers on z/OS

2011-05-11 Thread protobuf

Status: New
Owner: liuj...@google.com
Labels: Type-Defect Priority-Medium

New issue 290 by simonsto...@gmail.com: Cannot compile Google Protocol  
Buffers on z/OS

http://code.google.com/p/protobuf/issues/detail?id=290

What steps will reproduce the problem?

1. Build Google Protocol Buffers on a z/OS system as follows:

_CXX_CXXSUFFIX=cpp ./configure --prefix=/tmp/gpb_64

_CXX_OPTIONS=-Wc,LP64 -Wl,LP64 -Wc,TARGET(zOSV1R11) -Wc,LANGLVL(EXTENDED)  
-D_OPEN_THREADS -D_XOPEN_SOURCE -D_XOPEN_SOURCE_EXTENDED=1 -D_OPEN_SYS  
-D_ISOC99_SOURCE make -j20 clean all


What is the expected output? What do you see instead?

I expect the build to pass successfully so I can run the install phase.  
Instead I see the following output from a unit test:


libtool: link: c++ -DNDEBUG -o protoc  
main.o  ./.libs/libprotobuf.a ./.libs/libprotoc.a  
/MVK3/tmp/sstone/protobuf-2.4.1/src/.libs/libprotobuf.a
oldpwd=`pwd`  ( cd .  $oldpwd/protoc -I. --cpp_out=$oldpwd  
google/protobuf/unittest.proto google/protobuf/unittest_empty.proto  
google/protobuf/unittest_import.proto google/protobuf/unittest_mset.proto  
google/protobuf/unittest_optimize_for.proto  
google/protobuf/unittest_embed_optimize_for.proto  
google/protobuf/unittest_custom_options.proto  
google/protobuf/unittest_lite.proto  
google/protobuf/unittest_import_lite.proto  
google/protobuf/unittest_lite_imports_nonlite.proto  
google/protobuf/unittest_no_generic_services.proto  
google/protobuf/compiler/cpp/cpp_test_bad_identifiers.proto )
libprotobuf ERROR ./google/protobuf/descriptor_database.cc:314] Invalid  
file descriptor data passed to EncodedDescriptorDatabase::Add().
libprotobuf FATAL ./google/protobuf/descriptor.cc:862] CHECK failed:  
generated_database_-Add(encoded_file_descriptor, size):

CEE5207E The signal SIGABRT was received.
make[2]: *** [unittest_proto_middleman] Error 131
make[2]: Leaving directory `/MVK3/tmp/sstone/protobuf-2.4.1/src'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/MVK3/tmp/sstone/protobuf-2.4.1'
make: *** [all] Error 2

What version of the product are you using? On what operating system?

I am using Google Protocol Buffers 2.4.1. I also tried 2.4.0a and 2.1.0.  
The system is running z/OS V1R11 with the XL C/C++ V1R11 compilers.


Please provide any additional information below.

I suspect the issue may be caused by an expectation that some of the data  
*somewhere* will be in ASCII, but it is actually in EBCDIC because of the  
operating system. I have managed to get this working on AIX (POWER), RHEL  
(x86, 390) and Windows (x86) systems without any issues.


--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Generated hashcode() returns different values across JVM instances?

2011-05-11 Thread Jay Booth
It can be legitimate, especially in the case of Object.hashCode(), but
it's supposed to be in sync with equals() by contract.  As it stands,
two objects which are equal() will produce different hashes, or the
same logical object will produce different hashes across JVMs.  That
breaks the contract..  if the equals() method simply did return (other
== this), then it'd be fine, albeit a little useless.

I created an issue and posted a 1-liner patch that would eliminate the
problem by using getClass().getName().hashCode() to incorporate type
information into the hashCode without depending on a Descriptor
object's memory address.

On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote:
 Hi Jay,

 I encountered that before. Unfortunately this is a legitimate thing to
 do, as documented in Object.hashCode()

 I have a write-up of the problem and how we wound up solving it (not
 elegant.. suggestions welcome) 
 here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...

 D







 On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote:
  I'm testing an on-disk hashtable with Protobufs and noticed that with
  the java generated hashcode function, it seems to return a different
  hashcode across JVM invocations for the same logically equivalent
  object (tested with a single string protobuf, same string for both
  instances).

  Is this known behavior?  Bit busy right now backporting this to work
  with String keys instead but I could provide a bit of command line
  code that demonstrates the issue when I get a chance.

  Glancing at the generated hashcode() function, it looks like the
  difference comes from etiher getDescriptorForType().hashCode() or
  getUnknownFields().hashCode(), both of which are incorporated.

  --
  You received this message because you are subscribed to the Google Groups 
  Protocol Buffers group.
  To post to this group, send email to protobuf@googlegroups.com.
  To unsubscribe from this group, send email to 
  protobuf+unsubscr...@googlegroups.com.
  For more options, visit this group 
  athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Generated hashcode() returns different values across JVM instances?

2011-05-11 Thread Ben Wright
Jay:

Using the class name to generate the hashcode is logically incorrect
because the class name can be derived by the options java_package_
name and java_outer_classname.

Additionally (although less likely to matter), separate protocol
buffer files can define an identical class names with different
protocol buffers.

Lastly, and most importantly...

If the same Message is being used with generated code and with dynamic
code, the hash code for the descriptor would still be identical if
generated from the descriptor instance, whereas the dynamic usage does
not have a classname from which to derive a hashcode.  While in your
case this should not matter, it does matter for other users of
protobuf.  The hashcode function would be better served by being
implemented correctly from state data for the descriptor.
Additionally, in generated code it seems that this hashcode could be
pre-computed by the compiler and Descriptor.hashcode() could return a
constant integer - which would be much more efficient than any other
method.


On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote:
 It can be legitimate, especially in the case of Object.hashCode(), but
 it's supposed to be in sync with equals() by contract.  As it stands,
 two objects which are equal() will produce different hashes, or the
 same logical object will produce different hashes across JVMs.  That
 breaks the contract..  if the equals() method simply did return (other
 == this), then it'd be fine, albeit a little useless.

 I created an issue and posted a 1-liner patch that would eliminate the
 problem by using getClass().getName().hashCode() to incorporate type
 information into the hashCode without depending on a Descriptor
 object's memory address.

 On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote:







  Hi Jay,

  I encountered that before. Unfortunately this is a legitimate thing to
  do, as documented in Object.hashCode()

  I have a write-up of the problem and how we wound up solving it (not
  elegant.. suggestions welcome) 
  here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...

  D

  On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote:
   I'm testing an on-disk hashtable with Protobufs and noticed that with
   the java generated hashcode function, it seems to return a different
   hashcode across JVM invocations for the same logically equivalent
   object (tested with a single string protobuf, same string for both
   instances).

   Is this known behavior?  Bit busy right now backporting this to work
   with String keys instead but I could provide a bit of command line
   code that demonstrates the issue when I get a chance.

   Glancing at the generated hashcode() function, it looks like the
   difference comes from etiher getDescriptorForType().hashCode() or
   getUnknownFields().hashCode(), both of which are incorporated.

   --
   You received this message because you are subscribed to the Google Groups 
   Protocol Buffers group.
   To post to this group, send email to protobuf@googlegroups.com.
   To unsubscribe from this group, send email to 
   protobuf+unsubscr...@googlegroups.com.
   For more options, visit this group 
   athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Generated hashcode() returns different values across JVM instances?

2011-05-11 Thread Jay Booth
Well, to sidestep the whole discussion, we could just eliminate types
from hashcode completely and work strictly on field hashcodes.  The
type check is there in equals(), hashmaps will function correctly, and
hash collisions are already allowed between inequal objects, they're a
fact of life.   Return 1; would be legal, it would just be a really
bad idea for anything that actually uses the hashcode.

I figured getClass().getName() would provide the additional hash
spread benefit in the majority of cases and HashMaps and the like
could fall back on equals() as they always do upon the rare collision
between differently-typed-but-same-fields objects.  Descriptor has a
bunch of stuff attached to it, which would lead to a potentially
complicated/expensive hashCode() implementation and I didn't
understand that code well enough to suggest a patch.  I also don't
know the protobuf compiler well enough to have an opinion on
generating a constant per-class as you suggest, I could see it leading
to trouble when you regenerate classes that haven't changed unless
it's deterministic somehow.  Just going with the KISS principle.



On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote:
 Jay:

 Using the class name to generate the hashcode is logically incorrect
 because the class name can be derived by the options java_package_
 name and java_outer_classname.

 Additionally (although less likely to matter), separate protocol
 buffer files can define an identical class names with different
 protocol buffers.

 Lastly, and most importantly...

 If the same Message is being used with generated code and with dynamic
 code, the hash code for the descriptor would still be identical if
 generated from the descriptor instance, whereas the dynamic usage does
 not have a classname from which to derive a hashcode.  While in your
 case this should not matter, it does matter for other users of
 protobuf.  The hashcode function would be better served by being
 implemented correctly from state data for the descriptor.
 Additionally, in generated code it seems that this hashcode could be
 pre-computed by the compiler and Descriptor.hashcode() could return a
 constant integer - which would be much more efficient than any other
 method.

 On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote:







  It can be legitimate, especially in the case of Object.hashCode(), but
  it's supposed to be in sync with equals() by contract.  As it stands,
  two objects which are equal() will produce different hashes, or the
  same logical object will produce different hashes across JVMs.  That
  breaks the contract..  if the equals() method simply did return (other
  == this), then it'd be fine, albeit a little useless.

  I created an issue and posted a 1-liner patch that would eliminate the
  problem by using getClass().getName().hashCode() to incorporate type
  information into the hashCode without depending on a Descriptor
  object's memory address.

  On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote:

   Hi Jay,

   I encountered that before. Unfortunately this is a legitimate thing to
   do, as documented in Object.hashCode()

   I have a write-up of the problem and how we wound up solving it (not
   elegant.. suggestions welcome) 
   here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...

   D

   On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote:
I'm testing an on-disk hashtable with Protobufs and noticed that with
the java generated hashcode function, it seems to return a different
hashcode across JVM invocations for the same logically equivalent
object (tested with a single string protobuf, same string for both
instances).

Is this known behavior?  Bit busy right now backporting this to work
with String keys instead but I could provide a bit of command line
code that demonstrates the issue when I get a chance.

Glancing at the generated hashcode() function, it looks like the
difference comes from etiher getDescriptorForType().hashCode() or
getUnknownFields().hashCode(), both of which are incorporated.

--
You received this message because you are subscribed to the Google 
Groups Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group 
athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Generated hashcode() returns different values across JVM instances?

2011-05-11 Thread Ben Wright
Alternatively... instead of putting the onus on the compiler, the
hashcode could be computed by the JVM at initialization time for the
Descriptor instance, (which would also help performance of dynamically
parsed Descriptor instance hashcode calls).

i.e.

private final int computedHashcode;

public Descriptor() {
   //initialization

  computedHashcode = do_compute_hashCode();
}

public int hashCode() {
return computedHashcode;
}

punlic int do_compute_hashCode(){
  return // compute hashcode
}

This is all talking towards optimum performance implementation... the
real problem is the need for a hashCode implementation for Descriptor
based on the actual Descriptor's content...


On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote:
 Jay:

 Using the class name to generate the hashcode is logically incorrect
 because the class name can be derived by the options java_package_
 name and java_outer_classname.

 Additionally (although less likely to matter), separate protocol
 buffer files can define an identical class names with different
 protocol buffers.

 Lastly, and most importantly...

 If the same Message is being used with generated code and with dynamic
 code, the hash code for the descriptor would still be identical if
 generated from the descriptor instance, whereas the dynamic usage does
 not have a classname from which to derive a hashcode.  While in your
 case this should not matter, it does matter for other users of
 protobuf.  The hashcode function would be better served by being
 implemented correctly from state data for the descriptor.
 Additionally, in generated code it seems that this hashcode could be
 pre-computed by the compiler and Descriptor.hashcode() could return a
 constant integer - which would be much more efficient than any other
 method.

 On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote:







  It can be legitimate, especially in the case of Object.hashCode(), but
  it's supposed to be in sync with equals() by contract.  As it stands,
  two objects which are equal() will produce different hashes, or the
  same logical object will produce different hashes across JVMs.  That
  breaks the contract..  if the equals() method simply did return (other
  == this), then it'd be fine, albeit a little useless.

  I created an issue and posted a 1-liner patch that would eliminate the
  problem by using getClass().getName().hashCode() to incorporate type
  information into the hashCode without depending on a Descriptor
  object's memory address.

  On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote:

   Hi Jay,

   I encountered that before. Unfortunately this is a legitimate thing to
   do, as documented in Object.hashCode()

   I have a write-up of the problem and how we wound up solving it (not
   elegant.. suggestions welcome) 
   here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...

   D

   On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote:
I'm testing an on-disk hashtable with Protobufs and noticed that with
the java generated hashcode function, it seems to return a different
hashcode across JVM invocations for the same logically equivalent
object (tested with a single string protobuf, same string for both
instances).

Is this known behavior?  Bit busy right now backporting this to work
with String keys instead but I could provide a bit of command line
code that demonstrates the issue when I get a chance.

Glancing at the generated hashcode() function, it looks like the
difference comes from etiher getDescriptorForType().hashCode() or
getUnknownFields().hashCode(), both of which are incorporated.

--
You received this message because you are subscribed to the Google 
Groups Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group 
athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: Generated hashcode() returns different values across JVM instances?

2011-05-11 Thread Ben Wright
I think we wrote those replies at the same time : )

You're right, at the cost of some additional hash collisions, the
simplest solution is to simply not include the type / descriptor in
the hash calculation at all.

The best / least-collision solutions with good performance would be
what I wrote in my previous post, but that requires that someone
(presumably a current committer) with sufficient knowledge of the
Descriptor types to have enough time to update the compiler and java
libraries accordingly.

Any input from a committer for this issue?  Seems the simple solution
would take less than an hour to push into the stream and could make it
into the next release.

On May 11, 5:25 pm, Ben Wright compuware...@gmail.com wrote:
 Alternatively... instead of putting the onus on the compiler, the
 hashcode could be computed by the JVM at initialization time for the
 Descriptor instance, (which would also help performance of dynamically
 parsed Descriptor instance hashcode calls).

 i.e.

 private final int computedHashcode;

 public Descriptor() {
    //initialization

   computedHashcode = do_compute_hashCode();

 }

 public int hashCode() {
     return computedHashcode;

 }

 punlic int do_compute_hashCode(){
   return // compute hashcode

 }

 This is all talking towards optimum performance implementation... the
 real problem is the need for a hashCode implementation for Descriptor
 based on the actual Descriptor's content...

 On May 11, 4:54 pm, Ben Wright compuware...@gmail.com wrote:







  Jay:

  Using the class name to generate the hashcode is logically incorrect
  because the class name can be derived by the options java_package_
  name and java_outer_classname.

  Additionally (although less likely to matter), separate protocol
  buffer files can define an identical class names with different
  protocol buffers.

  Lastly, and most importantly...

  If the same Message is being used with generated code and with dynamic
  code, the hash code for the descriptor would still be identical if
  generated from the descriptor instance, whereas the dynamic usage does
  not have a classname from which to derive a hashcode.  While in your
  case this should not matter, it does matter for other users of
  protobuf.  The hashcode function would be better served by being
  implemented correctly from state data for the descriptor.
  Additionally, in generated code it seems that this hashcode could be
  pre-computed by the compiler and Descriptor.hashcode() could return a
  constant integer - which would be much more efficient than any other
  method.

  On May 11, 3:02 pm, Jay Booth jaybo...@gmail.com wrote:

   It can be legitimate, especially in the case of Object.hashCode(), but
   it's supposed to be in sync with equals() by contract.  As it stands,
   two objects which are equal() will produce different hashes, or the
   same logical object will produce different hashes across JVMs.  That
   breaks the contract..  if the equals() method simply did return (other
   == this), then it'd be fine, albeit a little useless.

   I created an issue and posted a 1-liner patch that would eliminate the
   problem by using getClass().getName().hashCode() to incorporate type
   information into the hashCode without depending on a Descriptor
   object's memory address.

   On May 11, 12:01 am, Dmitriy Ryaboy dvrya...@gmail.com wrote:

Hi Jay,

I encountered that before. Unfortunately this is a legitimate thing to
do, as documented in Object.hashCode()

I have a write-up of the problem and how we wound up solving it (not
elegant.. suggestions welcome) 
here:http://squarecog.wordpress.com/2011/02/20/hadoop-requires-stable-hash...

D

On Mon, May 9, 2011 at 8:25 AM, Jay Booth jaybo...@gmail.com wrote:
 I'm testing an on-disk hashtable with Protobufs and noticed that with
 the java generated hashcode function, it seems to return a different
 hashcode across JVM invocations for the same logically equivalent
 object (tested with a single string protobuf, same string for both
 instances).

 Is this known behavior?  Bit busy right now backporting this to work
 with String keys instead but I could provide a bit of command line
 code that demonstrates the issue when I get a chance.

 Glancing at the generated hashcode() function, it looks like the
 difference comes from etiher getDescriptorForType().hashCode() or
 getUnknownFields().hashCode(), both of which are incorporated.

 --
 You received this message because you are subscribed to the Google 
 Groups Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to 
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group 
 athttp://groups.google.com/group/protobuf?hl=en.

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post 

[protobuf] Correct behaviour when encountering an unexpected enum on the wire

2011-05-11 Thread Marc Gravell
I'm doing some code maintenance on my protobuf library, and I have
encountered a test that is... confusing me. So before I go crazy (/
crazier)... what should an implementation do if during deserialization
it gets an enum it doesn't recognise?

- to explode in sparks?
- to ignore the data?
- to brutally coerce the data to the unexpected value?
- other?

Thanks in advance

Marc

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Correct behaviour when encountering an unexpected enum on the wire

2011-05-11 Thread Jason Hsueh
Ignore the data (sort of): The unknown value gets treated as an unknown
field, leaving the enum field unset. In implementations that support
propagation of unknown fields (non-lite C++, Java), the value is added to
the UnknownFieldSet.

Implications:
- a sender may set a required enum field to a valid value, according to its
defintion of the .proto. If the recipient doesn't know about that value, the
message may fail to parse. It's best to avoid required enums
- by treating it as an unknown field, a message can be sent between two
programs who understand a particular enum value through a middleman that
doesn't, and there is no loss of data

On Wed, May 11, 2011 at 2:47 PM, Marc Gravell marc.grav...@gmail.comwrote:

 I'm doing some code maintenance on my protobuf library, and I have
 encountered a test that is... confusing me. So before I go crazy (/
 crazier)... what should an implementation do if during deserialization
 it gets an enum it doesn't recognise?

 - to explode in sparks?
 - to ignore the data?
 - to brutally coerce the data to the unexpected value?
 - other?

 Thanks in advance

 Marc

 --
 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to protobuf@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.com.
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.



-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.