Re: [protobuf] Performance aspect of submessage serialization

mailto . jonas Wed, 15 May 2013 14:23:54 -0700

Hi all,

Thanks for all input! I was about to post my serialization code when I 
realized that it contained an bug, rendering a size value that was "correct 
size + 1". I have corrected the bug and now protoc deserializes the message 
without any warnings.


All in all, without the bug, I would not have posted on this forum. 
However, based on your input I still think it would be nice if the 
documentation mentioned this "option", such that we could use it to enhance 
serialization performance. 

Kind regards,
Jonas Hansson



On Wednesday, May 15, 2013 10:49:23 PM UTC+2, Marc Gravell wrote:
>
> > So far I haven't seen such data showing that prefixing a fixed-size 
> length is a performance gain.
>
> That depends on the implementation ;) For implementations that know the 
> lengths of data in advance (or can compute it cheaply by summing things 
> that *do* know their own lengths), it probably isn't in the least bit 
> advantageous. For implementations that **aren't** based on that (for 
> example, they are working against plain vanilla objects, not special 
> code-gen types that have builders that calculate lengths, etc) - I am 
> pretty sure that it would help. I don't have benchmarks to hand - I could 
> invest some time producing them if you really wanted, but frankly it didn't 
> seem top priority after the 2-years-ago email chain.
>
> Note that all of this only applies during serialization; during 
> deserialization, the important question is : is the field expected / 
> wanted. If it *is*, then frankly it is of no huge difference whether you 
> use groups or length-prefix : you are still going to churn through the 
> entire block. The difference comes when it is *not* expected / wanted - it 
> is cheaper to skim-read / skip the data when it is length-prefixed (to 
> "skip" a group, you still need to parse the field-headers inside that 
> group). Of course, in most cases this is a bit of a non-issue, as you are 
> *probably* interested in the data!
>
> Marc
>
>
>  
>
> On 15 May 2013 19:07, Feng Xiao <xiao...@google.com <javascript:>> wrote:
>
>>
>>
>>
>> On Wed, May 15, 2013 at 3:16 AM, <mailto...@gmail.com <javascript:>>wrote:
>>
>>> Thanks for the input Marc! I cannot find the thread you refer to. Was it 
>>> developers from Google who gave you the answer?
>>>
>>> The "groups" alternative would really fix my problem. It is a pity that 
>>> it is deprecated and that the same mechanism cannot be used for submessages!
>>>
>>> Hope for a response from Google developers on the first question. 
>>> Prefixing a varint with zero bits would follow the protocol as I understand 
>>> it. I think it is more of a bug in the protoc implementation that it fail's 
>>> to parse such a message.
>>>
>> The answer to your question is the same as it was before: It probably 
>> work but not guaranteed, and we don't have any plan to change it.
>> As to your performance concern, only benchmarks can tell. So far I 
>> haven't seen such data showing that prefixing a fixed-size length is a 
>> performance gain.
>>   
>>
>>>
>>> Kind regards,
>>> Jonas
>>>
>>>
>>>
>>> On Tuesday, May 14, 2013 6:50:00 PM UTC+2, Marc Gravell wrote:
>>>
>>>> I should clarify: when talking about "groups" I should emphasise that 
>>>> Google have marked that feature plagued. Which is a shame since I'm still 
>>>> of the opinion that they are at least as good, probably better, than 
>>>> sub-messages **on the wire** (how they appear in the class model is of far 
>>>> less interest to me, since I don't use the Google API).
>>>>
>>>> Marc
>>>> On 14 May 2013 17:47, "Marc Gravell" <marc.g...@gmail.com> wrote:
>>>>
>>>>>  I asked about this a few years ago (feel free to search the archive 
>>>>> - I couldn't find it; I believe I used the term "subnormal forms" for 
>>>>> this). IIRC the answer then was along the lines of "hmmm.... looking at 
>>>>> the 
>>>>> current implementation that will probably work, but it isn't guaranteed 
>>>>> and 
>>>>> won't be tested on all platforms; we don't recommend it".
>>>>>
>>>>> However: I should note that if you want optimal encoding, groups 
>>>>> (rather than length-prefix) might be worth a look - since the group 
>>>>> doesn't 
>>>>> demand you know the length.
>>>>>
>>>>> Note that on some implementations the length *is* known in advance, so 
>>>>> they don't have any overhead here.
>>>>>
>>>>> Note that 2 bytes *is not enough* to guarantee every scenario, but it 
>>>>> is probably enough to avoid the large majority of shuffles, if (whichever 
>>>>> implementation you are using) is doing what I suspect it is doing.
>>>>>
>>>>> Marc
>>>>> On 14 May 2013 16:35, <mailto...@gmail.com> wrote:
>>>>>
>>>>>>  I am trying to understand the performance overhead of serializing 
>>>>>> Google Protocol Buffer messages. One aspect that annoys me a bit is the 
>>>>>> way 
>>>>>> submessages are handled with it's variable size field. It seems to be 
>>>>>> optimized for reducing message size on not for serialization performance.
>>>>>>
>>>>>> Problem:
>>>>>> ======
>>>>>> The size field that preceeds a submessage is of type varint. 
>>>>>>
>>>>>> She number of bytes (octets) needed:
>>>>>>                  submessage size (bytes): 1-127, 127-16383, ...
>>>>>>    number of bytes for serialized "size":   1   ,    2           , ...
>>>>>>
>>>>>> Since the serialized size vary, we cannot know the start position of 
>>>>>> the submessage in the stream before the submessage serialized size is 
>>>>>> known. I.e. we have to make a guess and/or use temporary buffers for 
>>>>>> submessage serialization which will affect serialization performance in 
>>>>>> a 
>>>>>> negative way.
>>>>>>
>>>>>> Question:
>>>>>> =======
>>>>>> Now to my question, would it be possible (not violating the standard) 
>>>>>> to force a varint to be 2 bytes large even though the value is less than 
>>>>>> 127? This could be achieved by prefixing the value with zeros according 
>>>>>> to 
>>>>>> the following scheme:
>>>>>>
>>>>>> value = b0,b1,b2,b3,b4,b5,b6,b7 where b0=0 (i.e. value < 127)
>>>>>>
>>>>>> Serialized: 1,b1,b2,b3,b4,b5,b6,b7,0,0,0,**0,0,0,0,0
>>>>>>
>>>>>> Prefixing with 7 zero bits would add to the overall message size, but 
>>>>>> reduce serialization time (for submessages < 16384 bytes). Do the 
>>>>>> standard 
>>>>>> allow such manipulation? I have tried to decode such a message using 
>>>>>> protoc 
>>>>>> but it reports a failure. However, I have not found any description in 
>>>>>> the 
>>>>>> Google documentation saying that this is not allowed.
>>>>>>
>>>>>>  -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Protocol Buffers" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to protobuf+u...@**googlegroups.com.
>>>>>> To post to this group, send email to prot...@googlegroups.com.
>>>>>>
>>>>>> Visit this group at 
>>>>>> http://groups.google.com/**group/protobuf?hl=en<http://groups.google.com/group/protobuf?hl=en>
>>>>>> .
>>>>>> For more options, visit 
>>>>>> https://groups.google.com/**groups/opt_out<https://groups.google.com/groups/opt_out>
>>>>>> .
>>>>>>  
>>>>>>  
>>>>>>
>>>>>   -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Protocol Buffers" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to protobuf+u...@googlegroups.com <javascript:>.
>>> To post to this group, send email to prot...@googlegroups.com<javascript:>
>>> .
>>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>  
>>>  
>>>
>>
>>  -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Protocol Buffers" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to protobuf+u...@googlegroups.com <javascript:>.
>> To post to this group, send email to prot...@googlegroups.com<javascript:>
>> .
>> Visit this group at http://groups.google.com/group/protobuf?hl=en.
>> For more options, visit https://groups.google.com/groups/opt_out.
>>  
>>  
>>
>
>
>
> -- 
> Regards, 
>
> Marc 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at http://groups.google.com/group/protobuf?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Re: [protobuf] Performance aspect of submessage serialization

Reply via email to