Re: encoding of embedded messages and repeated elements

2009-06-25 Thread Kenton Varda
Yes, "groups" are never going to fully go away.  But we recommend against
using them in new code.

On Thu, Jun 25, 2009 at 9:13 AM, Piotr Findeisen
wrote:

>
> Hi!
>
> On Jun 24, 8:42 pm, Kenton Varda  wrote:
> > The end-tag approach is more efficient than your idea -- it's faster (no
> > need to count elements at all) and it takes no more space (no need to
> write
> > a count, which makes up for the extra space taken by the end tag).
> > But in any case, the encoding is not something we can change at this
> point,
> > since protocol buffers is nothing without backwards-compatibility.
>
> As I read the code of C++ protobuf deserializer I found it supports
> end-tag approach using END_GROUP constant -- or I just misunderstood
> the code and/or this thread?
>
> From my experiments it looks like I can stream messages one by one
> separating them with END_GROUP tag, but -- again from comments in the
> code -- it's deprecated. If "protocol buffers is nothing without
> backwards-compatibility", can I assume that existing and future
> implementation of C++ and also Java/Python deserializers will support
> this approach?
>
> best regards,
> Piotr
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-25 Thread Piotr Findeisen

Hi!

On Jun 24, 8:42 pm, Kenton Varda  wrote:
> The end-tag approach is more efficient than your idea -- it's faster (no
> need to count elements at all) and it takes no more space (no need to write
> a count, which makes up for the extra space taken by the end tag).
> But in any case, the encoding is not something we can change at this point,
> since protocol buffers is nothing without backwards-compatibility.

As I read the code of C++ protobuf deserializer I found it supports
end-tag approach using END_GROUP constant -- or I just misunderstood
the code and/or this thread?

>From my experiments it looks like I can stream messages one by one
separating them with END_GROUP tag, but -- again from comments in the
code -- it's deprecated. If "protocol buffers is nothing without
backwards-compatibility", can I assume that existing and future
implementation of C++ and also Java/Python deserializers will support
this approach?

best regards,
Piotr

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-24 Thread Kenton Varda
The end-tag approach is more efficient than your idea -- it's faster (no
need to count elements at all) and it takes no more space (no need to write
a count, which makes up for the extra space taken by the end tag).
But in any case, the encoding is not something we can change at this point,
since protocol buffers is nothing without backwards-compatibility.

And yes, some existing parsers do, in fact, take advantage of the ability to
skip over messages without parsing them, and there are many features that
people are considering implementing (like lazy parsing) which would need
this.

It actually turns out that pre-computing the size of the embedded message
does not take very long compared to actually writing it.

On Wed, Jun 24, 2009 at 12:57 AM, etorri  wrote:

>
>
> Does some existing parser actually implement that skipping feature?
>
> There would not be any need for a end-tag. Let's assume that there
> would be two different tags
>
> 2 - Length_Delimited,  which could contain a packed list of bytes
> (string, memory block) or other types where the parser needs to know
> what is packed inside (no tags)
> 6 -  Group or Element_Delimited - which would be like Length_Delimited
> but have the number of elements that follow that belong to this field
>
> So for an example message where the first field is a group
>
> (1,6),3 - field  numbered 1 of the message, type 6 = Group and 3
> elements that follow belong to this group
>  (1,2),5,"Hello" -field number 1 of the embedded message would be a
> string
>  (3,1),120 - field nr 3 of the embedded message, varint of value 120
>  (4,1),0 - field nr 4 of the embedded message, varint of value 0
> (2,2),5,"World" - field nr 2 of the message
>
> this would be the encoding of the following TheMessage
>
> message Embedded {
>  required string greeting = 1;
>  optional int32 useless = 2;
>  required int32  good = 3;
>  required int32  evil = 4;
> }
>
> message TheMessage {
>  required Embedded e = 1;
>  required string target = 2;
> }
>
> So in this case there would not be need for an end tag. When
> constructing the message it should be relatively easy to count the
> number of embedded elements instead of knowing how much space they
> occupy. This would enable streaming/serializing the elements
> recursively out one by one.
>
>
>
> On Jun 23, 9:07 pm, Kenton Varda  wrote:
> > The advantage of writing the length is that a parser can skip the entire
> > sub-message easily without having to parse its contents.  Otherwise, we
> > would probably use the "group" encoding for sub-messages, where a special
> > end tag marks the end of the message.
> >
>
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-24 Thread etorri


Does some existing parser actually implement that skipping feature?

There would not be any need for a end-tag. Let's assume that there
would be two different tags

2 - Length_Delimited,  which could contain a packed list of bytes
(string, memory block) or other types where the parser needs to know
what is packed inside (no tags)
6 -  Group or Element_Delimited - which would be like Length_Delimited
but have the number of elements that follow that belong to this field

So for an example message where the first field is a group

(1,6),3 - field  numbered 1 of the message, type 6 = Group and 3
elements that follow belong to this group
  (1,2),5,"Hello" -field number 1 of the embedded message would be a
string
  (3,1),120 - field nr 3 of the embedded message, varint of value 120
  (4,1),0 - field nr 4 of the embedded message, varint of value 0
(2,2),5,"World" - field nr 2 of the message

this would be the encoding of the following TheMessage

message Embedded {
  required string greeting = 1;
  optional int32 useless = 2;
  required int32  good = 3;
  required int32  evil = 4;
}

message TheMessage {
  required Embedded e = 1;
  required string target = 2;
}

So in this case there would not be need for an end tag. When
constructing the message it should be relatively easy to count the
number of embedded elements instead of knowing how much space they
occupy. This would enable streaming/serializing the elements
recursively out one by one.



On Jun 23, 9:07 pm, Kenton Varda  wrote:
> The advantage of writing the length is that a parser can skip the entire
> sub-message easily without having to parse its contents.  Otherwise, we
> would probably use the "group" encoding for sub-messages, where a special
> end tag marks the end of the message.
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-23 Thread Kenton Varda
The advantage of writing the length is that a parser can skip the entire
sub-message easily without having to parse its contents.  Otherwise, we
would probably use the "group" encoding for sub-messages, where a special
end tag marks the end of the message.

On Tue, Jun 23, 2009 at 9:06 AM, etorri  wrote:

>
>
> Hello,
>
> The "length delimited" encoding basically tells that the following N
> bytes belong to this field. Wouldn't it be easier to instead use the
> number of elements that belong to the embedded message (repeated
> element).
>
> Now (as far as I have understood) the message needs to be built from
> fragments and then collected together as the lengths are not known
> beforehand and it would be expensive to calculate the byte-length of
> the embedded message.
>
> Instead, it would be relatively inexpensive to calculate just the
> number of following elements that make the embedded message before
> starting to encode it.
>
> This would enable streaming of PB or encoding and sending the elements
> right as they are encoded.
>
> Sorry if I misunderstood something. I have just started looking at BP.
> >
>

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-23 Thread etorri



ok thanks. it is as it is.

(just looking at the feasibility of implementing the PB in Ada for my
own projects :-)



On Jun 23, 6:17 pm, Alek Storm  wrote:
> Hi etorri,
>
> Embedded messages and strings have the exact same wire format.  When parsing
> a message, it's impossible to know whether you're parsing one or the other,
> and since strings have to be encoded using their length in bytes, we can't
> do something different for embedded messages.
>
> Cheers,
> Alek
>
>
>
> On Tue, Jun 23, 2009 at 9:06 AM, etorri  wrote:
>
> > Hello,
>
> > The "length delimited" encoding basically tells that the following N
> > bytes belong to this field. Wouldn't it be easier to instead use the
> > number of elements that belong to the embedded message (repeated
> > element).
>
> > Now (as far as I have understood) the message needs to be built from
> > fragments and then collected together as the lengths are not known
> > beforehand and it would be expensive to calculate the byte-length of
> > the embedded message.
>
> > Instead, it would be relatively inexpensive to calculate just the
> > number of following elements that make the embedded message before
> > starting to encode it.
>
> > This would enable streaming of PB or encoding and sending the elements
> > right as they are encoded.
>
> > Sorry if I misunderstood something. I have just started looking at BP.
>
> --
> Alek Storm
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



Re: encoding of embedded messages and repeated elements

2009-06-23 Thread Alek Storm
Hi etorri,

Embedded messages and strings have the exact same wire format.  When parsing
a message, it's impossible to know whether you're parsing one or the other,
and since strings have to be encoded using their length in bytes, we can't
do something different for embedded messages.

Cheers,
Alek

On Tue, Jun 23, 2009 at 9:06 AM, etorri  wrote:

>
>
> Hello,
>
> The "length delimited" encoding basically tells that the following N
> bytes belong to this field. Wouldn't it be easier to instead use the
> number of elements that belong to the embedded message (repeated
> element).
>
> Now (as far as I have understood) the message needs to be built from
> fragments and then collected together as the lengths are not known
> beforehand and it would be expensive to calculate the byte-length of
> the embedded message.
>
> Instead, it would be relatively inexpensive to calculate just the
> number of following elements that make the embedded message before
> starting to encode it.
>
> This would enable streaming of PB or encoding and sending the elements
> right as they are encoded.
>
> Sorry if I misunderstood something. I have just started looking at BP.
> >
>


-- 
Alek Storm

--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---



encoding of embedded messages and repeated elements

2009-06-23 Thread etorri


Hello,

The "length delimited" encoding basically tells that the following N
bytes belong to this field. Wouldn't it be easier to instead use the
number of elements that belong to the embedded message (repeated
element).

Now (as far as I have understood) the message needs to be built from
fragments and then collected together as the lengths are not known
beforehand and it would be expensive to calculate the byte-length of
the embedded message.

Instead, it would be relatively inexpensive to calculate just the
number of following elements that make the embedded message before
starting to encode it.

This would enable streaming of PB or encoding and sending the elements
right as they are encoded.

Sorry if I misunderstood something. I have just started looking at BP.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---