Re: Bug report and nearing completion hprotoc on par with protocol-buffers 2.0.2
On Mon, Nov 10, 2008 at 9:12 AM, Chris Kuklewicz <[EMAIL PROTECTED]>wrote: > BUG * The user-defined options have the wrong value for some 32 bit > value. Looks like you figured this out. Yeah, the idea is that 32-bit varints and 64-bit varints should always be compatible. So, if you write a 32-bit negative number as a varint, it needs to be sign-extended to 64 bits so that if it is read as a 64-bit varint you still get the correct result. The whole negative varints problem was a mistake made in an early version of protocol buffers that unfortunately we're stuck with now. > BUG * The user-defined options from unittest_custom.proto have > repetitions in the output from protoc that are not present in > the .proto file. Not all fields are repeated (apparently just the > fixed width ones), but this looks dangerous in the presence of > repeated fields. Thanks, I'm looking into this. > * The default_value of bytes and string types are stored differently. > The bytes are stored in a raw form at the same "escaping level" as the > proto file. A string is stored after the escape codes have been > interpreted. > ** Why, oh why, are they stored with different escape conventions? The default_value field of FileDescriptorProto is a string. Strings can only contain structurally-valid UTF-8 text. So, the default values for other strings can be represented just fine with no escaping, but raw bytes need to be escaped somehow such that they are valid UTF-8. In retrospect, this may not have been the best format. > ** Is this documented anywhere? Yes, in the comments in descriptor.proto. > * The "name" field of the FileDescriptorProto seems to be the file > path passed on the command line or the filepath in the import > statement. > ** I have not checked, but if I were on windows would the file path > from the command line have \ instead of / ? It will always be a forward slash. The path is actually not taken from the command line or from import statements. The path of each file is its location relative to the source tree defined by the --proto_path (or -I) flag. The goal here is to have a canonical name for every file. Note that this also implies that file names cannot contain "." or ".." components and cannot be absolute paths. > ** Is this documented anywhere? I guess not as well as it should be. descriptor.proto describes the "name" field as "file name, relative to the root of the source tree", but that's not precise enough for someone trying to write their own implementation. Sorry, my intent was never for people to write their own compiler; I hoped everyone would reuse libprotoc. --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Re: Bug report and nearing completion hprotoc on par with protocol-buffers 2.0.2
Okay, I spoke too soon on the first bug. From http://code.google.com/apis/protocolbuffers/docs/encoding.html : "If you use int32 or int64 as the type for a negative number, the resulting varint is always ten bytes long – it is, effectively, treated like a very large unsigned integer." So it is clear that protoc is always casting 32 bit values to 64 bit values before sending them onto the wire and this is documented. I apologize for not double checking this before posting my previous message. My decoder handles this by discarding the excess high bits, so my reader code is fine. My writer will need to be changed to match the documentation, since it only writes five bytes for negative 32-bit values. Cheers, Chris --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---
Bug report and nearing completion hprotoc on par with protocol-buffers 2.0.2
Kenton, I am nearly ready with the Haskell update to protoc [1] that will support the user defined options introduced by protocol-buffers 2.0.2 (protoc). To have a hope of testing my code, I have redesigned my processing to have hprotoc produce a binary FileDescriptorSet that I could compare to the output of protoc. This should also allow hprotoc to consume the binary FileDescriptorSet output of protoc. I have a few questions and two bug reports against protoc-2.0.2 that all arise from me examining the FileDescriptorSet output with protoc's decoding: BUG * The user-defined options have the wrong value for some 32 bit value. You store 64 bit values: unittest_custom_options.proto: optional int32 message_opt1 = 7739036; unittest_custom_options.proto: option (message_opt1) = -56; protoc: 7739036: 18446744073709551560 hprotoc:7739036: 4294967240 There is another problem which is seen in the raw output : unittest_custom_options.proto: message DummyMessageContainingEnum { enum TestEnumType { TEST_OPTION_ENUM_TYPE1 = 22; TEST_OPTION_ENUM_TYPE2 = -23; } } protoc: 2 { 1: "TEST_OPTION_ENUM_TYPE2" 2: 18446744073709551593 } hprotoc: 2 { 1: "TEST_OPTION_ENUM_TYPE2" 2: 4294967273 } The negative enum value reveals that this is stored as a 64 bits number instead of 32 bits. This obviously makes the inefficient negative values about twice as bad as they would otherwise be, and threatens to cause errors when read into other implementations that only expect 32 bits. BUG * The user-defined options from unittest_custom.proto have repetitions in the output from protoc that are not present in the .proto file. Not all fields are repeated (apparently just the fixed width ones), but this looks dangerous in the presence of repeated fields. Example from the raw output from protoc: 4 { 1: "CustomOptionMinIntegerValues" 7 { 7706090: 0 7705709: 18446744071562067968 7705542: 9223372036854775808 7704880: 0 7702367: 0 7701568: 4294967295 7700863: 18446744073709551615 7700307: 0x 7700307: 0x 7700194: 0x 7700194: 0x 7698645: 0x8000 7698645: 0x8000 7685475: 0x8000 7685475: 0x8000 } } * The default_value of bytes and string types are stored differently. The bytes are stored in a raw form at the same "escaping level" as the proto file. A string is stored after the escape codes have been interpreted. ** Why, oh why, are they stored with different escape conventions? ** Is this documented anywhere? * The "name" field of the FileDescriptorProto seems to be the file path passed on the command line or the filepath in the import statement. ** I have not checked, but if I were on windows would the file path from the command line have \ instead of / ? ** Is this documented anywhere? Thanks for your attention, Chris [1] http://hackage.haskell.org/cgi-bin/hackage-scripts/package/protocol-buffers --~--~-~--~~~---~--~~ You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to protobuf@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/protobuf?hl=en -~--~~~~--~~--~--~---