Re: [protobuf] Performance aspect of submessage serialization

Marc Gravell Wed, 15 May 2013 06:26:11 -0700

To be explicit - I didn't say that any would *fail* to parse it - I simply
said that it was not tested and not guaranteed. It is arguably a shame that
the specification is ambiguous - it would be nice if it was explicitly
either permitted or not. An interesting side-effect of this is that it
allows the same data to be encoded in different ways, which might be
problematic - but there is precedent for this: out-of-order fields is
explicitly permitted.


I found the old thread:
https://groups.google.com/forum/#!topic/protobuf/Ch29bSo7DXU/discussion

"Variant-length encoding : is it OK to be wasteful?"

(note the typo "a" in "varint"; sigh... you won't believe how long it took
me to realize it wasn't "variant")

Marc


On 15 May 2013 11:16, <mailto.jo...@gmail.com> wrote:

> Thanks for the input Marc! I cannot find the thread you refer to. Was it
> developers from Google who gave you the answer?
>
> The "groups" alternative would really fix my problem. It is a pity that it
> is deprecated and that the same mechanism cannot be used for submessages!
>
> Hope for a response from Google developers on the first question.
> Prefixing a varint with zero bits would follow the protocol as I understand
> it. I think it is more of a bug in the protoc implementation that it fail's
> to parse such a message.
>
> Kind regards,
> Jonas
>
>
>
> On Tuesday, May 14, 2013 6:50:00 PM UTC+2, Marc Gravell wrote:
>
>> I should clarify: when talking about "groups" I should emphasise that
>> Google have marked that feature plagued. Which is a shame since I'm still
>> of the opinion that they are at least as good, probably better, than
>> sub-messages **on the wire** (how they appear in the class model is of far
>> less interest to me, since I don't use the Google API).
>>
>> Marc
>> On 14 May 2013 17:47, "Marc Gravell" <marc.g...@gmail.com> wrote:
>>
>>>  I asked about this a few years ago (feel free to search the archive -
>>> I couldn't find it; I believe I used the term "subnormal forms" for this).
>>> IIRC the answer then was along the lines of "hmmm.... looking at the
>>> current implementation that will probably work, but it isn't guaranteed and
>>> won't be tested on all platforms; we don't recommend it".
>>>
>>> However: I should note that if you want optimal encoding, groups (rather
>>> than length-prefix) might be worth a look - since the group doesn't demand
>>> you know the length.
>>>
>>> Note that on some implementations the length *is* known in advance, so
>>> they don't have any overhead here.
>>>
>>> Note that 2 bytes *is not enough* to guarantee every scenario, but it is
>>> probably enough to avoid the large majority of shuffles, if (whichever
>>> implementation you are using) is doing what I suspect it is doing.
>>>
>>> Marc
>>> On 14 May 2013 16:35, <mailto...@gmail.com> wrote:
>>>
>>>>  I am trying to understand the performance overhead of serializing
>>>> Google Protocol Buffer messages. One aspect that annoys me a bit is the way
>>>> submessages are handled with it's variable size field. It seems to be
>>>> optimized for reducing message size on not for serialization performance.
>>>>
>>>> Problem:
>>>> ======
>>>> The size field that preceeds a submessage is of type varint.
>>>>
>>>> She number of bytes (octets) needed:
>>>>                  submessage size (bytes): 1-127, 127-16383, ...
>>>>    number of bytes for serialized "size":   1   ,    2           , ...
>>>>
>>>> Since the serialized size vary, we cannot know the start position of
>>>> the submessage in the stream before the submessage serialized size is
>>>> known. I.e. we have to make a guess and/or use temporary buffers for
>>>> submessage serialization which will affect serialization performance in a
>>>> negative way.
>>>>
>>>> Question:
>>>> =======
>>>> Now to my question, would it be possible (not violating the standard)
>>>> to force a varint to be 2 bytes large even though the value is less than
>>>> 127? This could be achieved by prefixing the value with zeros according to
>>>> the following scheme:
>>>>
>>>> value = b0,b1,b2,b3,b4,b5,b6,b7 where b0=0 (i.e. value < 127)
>>>>
>>>> Serialized: 1,b1,b2,b3,b4,b5,b6,b7,0,0,0,**0,0,0,0,0
>>>>
>>>> Prefixing with 7 zero bits would add to the overall message size, but
>>>> reduce serialization time (for submessages < 16384 bytes). Do the standard
>>>> allow such manipulation? I have tried to decode such a message using protoc
>>>> but it reports a failure. However, I have not found any description in the
>>>> Google documentation saying that this is not allowed.
>>>>
>>>>  --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Protocol Buffers" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to protobuf+u...@**googlegroups.com.
>>>> To post to this group, send email to prot...@googlegroups.com.
>>>>
>>>> Visit this group at 
>>>> http://groups.google.com/**group/protobuf?hl=en<http://groups.google.com/group/protobuf?hl=en>
>>>> .
>>>> For more options, visit 
>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>> .
>>>>
>>>>
>>>>
>>>   --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at http://groups.google.com/group/protobuf?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>



-- 
Regards,

Marc

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [protobuf] Performance aspect of submessage serialization

Reply via email to