[protobuf] Re: Variant-length encoding : is it OK to be wasteful?

Dave Bailey Wed, 10 Mar 2010 10:14:14 -0800

I imagine there would be some pathological cases where the same memory
would get repeatedly shifted to the right by 1 byte at a time as the
parent message length prefixes get resized.  It seems like the
explicit precomputation of lengths is likely to be more efficient
overall... that said, maybe it would be possible to establish a good
upper bound for messages that satisfy the following restrictions:


1) the message does not have any repeated fields
2) the message does not have any string or bytes fields

In this case, an upper bound on the length can be calculated at code
generation time, and this is the value you could use to determine the
number of bytes you need to set aside for the length prefix.  I don't
know if it would yield any discernible performance benefit, though.

-dave

On Mar 10, 7:48 am, Marc Gravell <marc.grav...@gmail.com> wrote:
> The variable-length encoding allows multiple representations of the
> same value - for example, 1 could be written as:
>
>  0000 0001
>
> or it could be (I believe):
>
>  1000 0001    1000 0000    0000 0000
>
> Is there anything in the core (or other) implementations that would
> object to this second form?
>
> In particular, the scenario I'm thinking about is where a message is
> know to be pretty deep - for example:
>
>  A
>  > B
>      > C
>         > D
>         > D
>      > C
>         > D
>         > D
>
> At the moment, my code leaves an optimistic single-byte dummy-value
> for the prefix, and then backfills this value when the length is known
> (i.e. after writing this subtree), shuffling the data if needed.
>
> I'm toying with making this voluntarily lossy; for example it might
> (in some cases) decide to leave a longer (2-5) dummy value, and write
> the alternative form if the value turns out to be small (i.e. if
> actually only 4 bytes were written). This would reduce the number of
> times I need to block-copy the data (noting that it might have to copy
> the same data multiple times - potentially every non-root object might
> be more than 127 bytes).
>
> I'm tempted to spike it (to gauge the performance benefit), but I
> don't want to waste my time if this is going to make it incompatible
> with the core implementations (i.e. if they would actively spot this
> and cause an error). And if it *is* valid, would it be possible to
> make it explicit in the encoding spec? (or indeed, if it *isn't* valid
> make it explicit in the encoding spec).
>
> Thanks,
>
> Marc Gravell

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

[protobuf] Re: Variant-length encoding : is it OK to be wasteful?

Reply via email to