Kenton, I am nearly ready with the Haskell update to protoc [1] that will support the user defined options introduced by protocol-buffers 2.0.2 (protoc).
To have a hope of testing my code, I have redesigned my processing to have hprotoc produce a binary FileDescriptorSet that I could compare to the output of protoc. This should also allow hprotoc to consume the binary FileDescriptorSet output of protoc. I have a few questions and two bug reports against protoc-2.0.2 that all arise from me examining the FileDescriptorSet output with protoc's decoding: BUG * The user-defined options have the wrong value for some 32 bit value. You store 64 bit values: unittest_custom_options.proto: optional int32 message_opt1 = 7739036; unittest_custom_options.proto: option (message_opt1) = -56; protoc: 7739036: 18446744073709551560 hprotoc: 7739036: 4294967240 There is another problem which is seen in the raw output : unittest_custom_options.proto: message DummyMessageContainingEnum { enum TestEnumType { TEST_OPTION_ENUM_TYPE1 = 22; TEST_OPTION_ENUM_TYPE2 = -23; } } protoc: 2 { 1: "TEST_OPTION_ENUM_TYPE2" 2: 18446744073709551593 } hprotoc: 2 { 1: "TEST_OPTION_ENUM_TYPE2" 2: 4294967273 } The negative enum value reveals that this is stored as a 64 bits number instead of 32 bits. This obviously makes the inefficient negative values about twice as bad as they would otherwise be, and threatens to cause errors when read into other implementations that only expect 32 bits. BUG * The user-defined options from unittest_custom.proto have repetitions in the output from protoc that are not present in the .proto file. Not all fields are repeated (apparently just the fixed width ones), but this looks dangerous in the presence of repeated fields. Example from the raw output from protoc: 4 { 1: "CustomOptionMinIntegerValues" 7 { 7706090: 0 7705709: 18446744071562067968 7705542: 9223372036854775808 7704880: 0 7702367: 0 7701568: 4294967295 7700863: 18446744073709551615 7700307: 0x00000000 7700307: 0x00000000 7700194: 0x0000000000000000 7700194: 0x0000000000000000 7698645: 0x80000000 7698645: 0x80000000 7685475: 0x8000000000000000 7685475: 0x8000000000000000 } } * The default_value of bytes and string types are stored differently. The bytes are stored in a raw form at the same "escaping level" as the proto file. A string is stored after the escape codes have been interpreted. ** Why, oh why, are they stored with different escape conventions? ** Is this documented anywhere? * The "name" field of the FileDescriptorProto seems to be the file path passed on the command line or the filepath in the import statement. ** I have not checked, but if I were on windows would the file path from the command line have \ instead of / ? ** Is this documented anywhere? Thanks for your attention, Chris [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/protocol-buffers --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~----------~----~----~----~------~----~------~--~---