Kenton,

I am nearly ready with the Haskell update to protoc [1] that will
support the user defined options introduced by protocol-buffers 2.0.2
(protoc).

To have a hope of testing my code, I have redesigned my processing to
have hprotoc produce a binary FileDescriptorSet that I could compare
to the output of protoc.  This should also allow hprotoc to consume
the binary FileDescriptorSet output of protoc.

I have a few questions and two bug reports against protoc-2.0.2 that
all arise from me examining the FileDescriptorSet output with protoc's
decoding:

BUG * The user-defined options have the wrong value for some 32 bit
value.  You store 64 bit values:
unittest_custom_options.proto: optional int32 message_opt1 = 7739036;
unittest_custom_options.proto: option (message_opt1) = -56;
protoc:      7739036: 18446744073709551560
hprotoc:    7739036: 4294967240
There is another problem which is seen in the raw output :
unittest_custom_options.proto:
message DummyMessageContainingEnum {
  enum TestEnumType {
    TEST_OPTION_ENUM_TYPE1 = 22;
    TEST_OPTION_ENUM_TYPE2 = -23;
  }
}
protoc:
      2 {
        1: "TEST_OPTION_ENUM_TYPE2"
        2: 18446744073709551593
      }
hprotoc:
      2 {
        1: "TEST_OPTION_ENUM_TYPE2"
        2: 4294967273
      }
The negative enum value reveals that this is stored as a 64 bits
number instead of 32 bits. This obviously makes the inefficient
negative values about twice as bad as they would otherwise be, and
threatens to cause errors when read into other implementations that
only expect 32 bits.

BUG * The user-defined options from unittest_custom.proto have
repetitions in the output from protoc that are not present in
the .proto file.  Not all fields are repeated (apparently just the
fixed width ones), but this looks dangerous in the presence of
repeated fields. Example from the raw output from protoc:
  4 {
    1: "CustomOptionMinIntegerValues"
    7 {
      7706090: 0
      7705709: 18446744071562067968
      7705542: 9223372036854775808
      7704880: 0
      7702367: 0
      7701568: 4294967295
      7700863: 18446744073709551615
      7700307: 0x00000000
      7700307: 0x00000000
      7700194: 0x0000000000000000
      7700194: 0x0000000000000000
      7698645: 0x80000000
      7698645: 0x80000000
      7685475: 0x8000000000000000
      7685475: 0x8000000000000000
    }
  }


* The default_value of bytes and string types are stored differently.
The bytes are stored in a raw form at the same "escaping level" as the
proto file.  A string is stored after the escape codes have been
interpreted.
** Why, oh why, are they stored with different escape conventions?
** Is this documented anywhere?

* The "name" field of the FileDescriptorProto seems to be the file
path passed on the command line or the filepath in the import
statement.
** I have not checked, but if I were on windows would the file path
from the command line have \ instead of / ?
** Is this documented anywhere?

Thanks for your attention,
  Chris

[1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/protocol-buffers
--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to