Re: [protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-19 Thread Kenton Varda
On Tue, Jan 5, 2010 at 3:57 AM, Graham Cox cox.gra...@gmail.com wrote:

 I was saying the user *could* do that, and that it's currently what I'm
 doing in my server-side code. The reason being, as you said, if you naively
 read from a stream and the message isn't all present then you need to block
 until it is with the way that the Java code works at present. If you are
 using it for client-side code then likely this is not an issue in the
 slightest, but a server that needs to be able to handle many clients at once
 just can not block on one of them...


Well, you should be able to read just the size prefix, then wait until all
the bytes have arrived before attempting to parse.  If the size prefix
itself has not completely arrived, then you would have to retry reading it.


 As to your other alternative, (a), I would suggest that this leaves too
 much of the underlying network protocol bare to the caller. This will make
 it very difficult to change the way that delimiting messages happens in the
 future should such a thing be required. If - for example - it is decided to
 go from having the length prefixed to having a special delimiting sequence
 after the message then it will cause all current calling code to need to be
 changed. It might be that this is considered a low enough level library that
 this is acceptable, but that would be a Google decision...


Realistically, we would never be able to change the encoding like this
without requiring some sort of update to the callers, because otherwise we'd
be breaking all the callers' abilities to talk to code compiled before the
change, which is unacceptable.


 One more alternative would be how the asn1c library works for parsing ASN.1
 streams into objects, which is to be resumable. The decoder reads all the
 data it is given, and tries to build the object from this. If it doesn't
 have enough data yet then it does what it can, remembers where it got to and
 returns back to the user who can then supply more data when it becomes
 available. If the entire message does parse from the data provided then
 return back to the user the amount of data consumed so that they can discard
 this (reading from the stream directly makes this slightly cleaner still).
 At present, the Protobuf libraries (any of them) can not support this method
 of decoding an object, and it is not a trivial change to make it possible to
 do, but it does - IMO - give a much cleaner and easier to use method of use.


I think this would add far too much complication to the system, probably
harming performance and increasing code size and memory usage.

Maybe we want is a class called DelemitedMessageReader which has a method
Add(bytes).  You read some bytes off the wire, then pass it to Add().  Add()
returns a list of byte strings representing messages that are now complete.
 It may return an empty list if no new messages were completed.  So you have
a loop like:

  while True:
bytes = ReadSomeBytes()
messages = reader.Add(bytes)
for message in messages:
  parsed = MyProtobuf()
  parsed.ParseFromString(message)
  HandleMessage(parsed)

If you want to do event-driven I/O, you simply remove the while True: and
instead execute this code each time you detect that bytes are available on
the input.


 --
 Graham Cox

 On Tue, Jan 5, 2010 at 1:32 AM, Kenton Varda ken...@google.com wrote:

 Make sure to reply all so that the group is CC'd.

 So you are saying that the user should read whatever data is on the
 socket, then attempt to parse it, and if it fails, assume that it's because
 there is more data to read?  Seems rather wasteful.  I think what we ideally
 want is either:
 (a) Provide a way for the caller to read the size independently, so that
 they can then make sure to read that many bytes from the input before
 parsing.
 (b) Provide a method that reads from a stream, so that the protobuf
 library can automatically take care of reading all necessary bytes.

 Option (b) is obviously cleaner but has a few problems:
 - We have to choose a particular stream interface to support.  While the
 Python file-like interface is pretty common I'm not sure if it's universal
 for this kind of task.
 - If not all bytes of the message are available yet, we'd have to block.
  This might be fine most of the time, but would be unacceptable for some
 uses.

 Thoughts?

 On Mon, Jan 4, 2010 at 3:09 PM, Graham Cox cox.gra...@gmail.com wrote:

 I'm using it for reading/writing to sockets in my functional tests -
 works well enough there...
 In my Java-side server code, I read from the socket into a byte buffer,
 then deserialize the byte buffer into Protobuf objects, throwing away the
 data that has been deserialized. The python MergeDelimitedFromString
 function also returns the number of bytes that were processed to build up
 the Protobuf object, so the user could easily do the same - read the socket
 onto the end of a buffer, and then while the buffer is successfully
 deserializing into 

Re: [protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-13 Thread Kenton Varda
(I have this on the back burner as I'm kind of swamped, but I do want to get
this submitted at some point, hopefully within a week.)

On Tue, Jan 5, 2010 at 3:57 AM, Graham Cox cox.gra...@gmail.com wrote:

 I was saying the user *could* do that, and that it's currently what I'm
 doing in my server-side code. The reason being, as you said, if you naively
 read from a stream and the message isn't all present then you need to block
 until it is with the way that the Java code works at present. If you are
 using it for client-side code then likely this is not an issue in the
 slightest, but a server that needs to be able to handle many clients at once
 just can not block on one of them...

 As to your other alternative, (a), I would suggest that this leaves too
 much of the underlying network protocol bare to the caller. This will make
 it very difficult to change the way that delimiting messages happens in the
 future should such a thing be required. If - for example - it is decided to
 go from having the length prefixed to having a special delimiting sequence
 after the message then it will cause all current calling code to need to be
 changed. It might be that this is considered a low enough level library that
 this is acceptable, but that would be a Google decision...

 One more alternative would be how the asn1c library works for parsing ASN.1
 streams into objects, which is to be resumable. The decoder reads all the
 data it is given, and tries to build the object from this. If it doesn't
 have enough data yet then it does what it can, remembers where it got to and
 returns back to the user who can then supply more data when it becomes
 available. If the entire message does parse from the data provided then
 return back to the user the amount of data consumed so that they can discard
 this (reading from the stream directly makes this slightly cleaner still).
 At present, the Protobuf libraries (any of them) can not support this method
 of decoding an object, and it is not a trivial change to make it possible to
 do, but it does - IMO - give a much cleaner and easier to use method of use.
 --
 Graham Cox

 On Tue, Jan 5, 2010 at 1:32 AM, Kenton Varda ken...@google.com wrote:

 Make sure to reply all so that the group is CC'd.

 So you are saying that the user should read whatever data is on the
 socket, then attempt to parse it, and if it fails, assume that it's because
 there is more data to read?  Seems rather wasteful.  I think what we ideally
 want is either:
 (a) Provide a way for the caller to read the size independently, so that
 they can then make sure to read that many bytes from the input before
 parsing.
 (b) Provide a method that reads from a stream, so that the protobuf
 library can automatically take care of reading all necessary bytes.

 Option (b) is obviously cleaner but has a few problems:
 - We have to choose a particular stream interface to support.  While the
 Python file-like interface is pretty common I'm not sure if it's universal
 for this kind of task.
 - If not all bytes of the message are available yet, we'd have to block.
  This might be fine most of the time, but would be unacceptable for some
 uses.

 Thoughts?

 On Mon, Jan 4, 2010 at 3:09 PM, Graham Cox cox.gra...@gmail.com wrote:

 I'm using it for reading/writing to sockets in my functional tests -
 works well enough there...
 In my Java-side server code, I read from the socket into a byte buffer,
 then deserialize the byte buffer into Protobuf objects, throwing away the
 data that has been deserialized. The python MergeDelimitedFromString
 function also returns the number of bytes that were processed to build up
 the Protobuf object, so the user could easily do the same - read the socket
 onto the end of a buffer, and then while the buffer is successfully
 deserializing into objects throw away the first x bytes as appropriate...

 Just a thought :)

 On Mon, Jan 4, 2010 at 9:57 PM, Kenton Varda ken...@google.com wrote:

 Hmm, it occurs to me that this currently is not useful for reading from
 a socket or similar stream since the caller has to make sure to read an
 entire message before trying to parse it, but the caller doesn't actually
 know how long the message is (because the code that determines this is
 encapsulated).  Any thoughts on this?

 On Mon, Jan 4, 2010 at 12:11 PM, Kenton Varda ken...@google.comwrote:

 Mostly looks good.  There are some style issues (e.g. lines over 80
 chars) but I can clean those up myself.

 You'll need to sign the contributor license agreement:

 http://code.google.com/legal/individual-cla-v1.0.html -- If you own
 copyright on this change.
 http://code.google.com/legal/corporate-cla-v1.0.html -- If your
 employer does.

 Please let me know after you've done this and then I can submit these.


 On Fri, Jan 1, 2010 at 12:53 PM, Graham cox.gra...@gmail.com wrote:

 On Jan 1, 7:32 am, Kenton Varda ken...@google.com wrote:
  I don't think an equivalent has been added to the Python API.  

Re: [protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-05 Thread Graham Cox
I was saying the user *could* do that, and that it's currently what I'm
doing in my server-side code. The reason being, as you said, if you naively
read from a stream and the message isn't all present then you need to block
until it is with the way that the Java code works at present. If you are
using it for client-side code then likely this is not an issue in the
slightest, but a server that needs to be able to handle many clients at once
just can not block on one of them...

As to your other alternative, (a), I would suggest that this leaves too much
of the underlying network protocol bare to the caller. This will make it
very difficult to change the way that delimiting messages happens in the
future should such a thing be required. If - for example - it is decided to
go from having the length prefixed to having a special delimiting sequence
after the message then it will cause all current calling code to need to be
changed. It might be that this is considered a low enough level library that
this is acceptable, but that would be a Google decision...

One more alternative would be how the asn1c library works for parsing ASN.1
streams into objects, which is to be resumable. The decoder reads all the
data it is given, and tries to build the object from this. If it doesn't
have enough data yet then it does what it can, remembers where it got to and
returns back to the user who can then supply more data when it becomes
available. If the entire message does parse from the data provided then
return back to the user the amount of data consumed so that they can discard
this (reading from the stream directly makes this slightly cleaner still).
At present, the Protobuf libraries (any of them) can not support this method
of decoding an object, and it is not a trivial change to make it possible to
do, but it does - IMO - give a much cleaner and easier to use method of use.
-- 
Graham Cox

On Tue, Jan 5, 2010 at 1:32 AM, Kenton Varda ken...@google.com wrote:

 Make sure to reply all so that the group is CC'd.

 So you are saying that the user should read whatever data is on the socket,
 then attempt to parse it, and if it fails, assume that it's because there is
 more data to read?  Seems rather wasteful.  I think what we ideally want is
 either:
 (a) Provide a way for the caller to read the size independently, so that
 they can then make sure to read that many bytes from the input before
 parsing.
 (b) Provide a method that reads from a stream, so that the protobuf library
 can automatically take care of reading all necessary bytes.

 Option (b) is obviously cleaner but has a few problems:
 - We have to choose a particular stream interface to support.  While the
 Python file-like interface is pretty common I'm not sure if it's universal
 for this kind of task.
 - If not all bytes of the message are available yet, we'd have to block.
  This might be fine most of the time, but would be unacceptable for some
 uses.

 Thoughts?

 On Mon, Jan 4, 2010 at 3:09 PM, Graham Cox cox.gra...@gmail.com wrote:

 I'm using it for reading/writing to sockets in my functional tests - works
 well enough there...
 In my Java-side server code, I read from the socket into a byte buffer,
 then deserialize the byte buffer into Protobuf objects, throwing away the
 data that has been deserialized. The python MergeDelimitedFromString
 function also returns the number of bytes that were processed to build up
 the Protobuf object, so the user could easily do the same - read the socket
 onto the end of a buffer, and then while the buffer is successfully
 deserializing into objects throw away the first x bytes as appropriate...

 Just a thought :)

 On Mon, Jan 4, 2010 at 9:57 PM, Kenton Varda ken...@google.com wrote:

 Hmm, it occurs to me that this currently is not useful for reading from a
 socket or similar stream since the caller has to make sure to read an entire
 message before trying to parse it, but the caller doesn't actually know how
 long the message is (because the code that determines this is encapsulated).
  Any thoughts on this?

 On Mon, Jan 4, 2010 at 12:11 PM, Kenton Varda ken...@google.com wrote:

 Mostly looks good.  There are some style issues (e.g. lines over 80
 chars) but I can clean those up myself.

 You'll need to sign the contributor license agreement:

 http://code.google.com/legal/individual-cla-v1.0.html -- If you own
 copyright on this change.
 http://code.google.com/legal/corporate-cla-v1.0.html -- If your
 employer does.

 Please let me know after you've done this and then I can submit these.


 On Fri, Jan 1, 2010 at 12:53 PM, Graham cox.gra...@gmail.com wrote:

 On Jan 1, 7:32 am, Kenton Varda ken...@google.com wrote:
  I don't think an equivalent has been added to the Python API.  Want
 to write
  up a patch?

 Well - if you insist... Here's a first run, which seems to work but
 I'm a very long way from a competent python programmers so feel free
 to fix it up some :)

 I can't see how to attach files using 

Re: [protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-04 Thread Kenton Varda
Hmm, it occurs to me that this currently is not useful for reading from a
socket or similar stream since the caller has to make sure to read an entire
message before trying to parse it, but the caller doesn't actually know how
long the message is (because the code that determines this is encapsulated).
 Any thoughts on this?

On Mon, Jan 4, 2010 at 12:11 PM, Kenton Varda ken...@google.com wrote:

 Mostly looks good.  There are some style issues (e.g. lines over 80 chars)
 but I can clean those up myself.

 You'll need to sign the contributor license agreement:

 http://code.google.com/legal/individual-cla-v1.0.html -- If you own
 copyright on this change.
 http://code.google.com/legal/corporate-cla-v1.0.html -- If your employer
 does.

 Please let me know after you've done this and then I can submit these.

  http://code.google.com/legal/corporate-cla-v1.0.html
 On Fri, Jan 1, 2010 at 12:53 PM, Graham cox.gra...@gmail.com wrote:

 On Jan 1, 7:32 am, Kenton Varda ken...@google.com wrote:
  I don't think an equivalent has been added to the Python API.  Want to
 write
  up a patch?

 Well - if you insist... Here's a first run, which seems to work but
 I'm a very long way from a competent python programmers so feel free
 to fix it up some :)

 I can't see how to attach files using the google groups interface, so
 I've stuck them on my webspace for now:
 http://grahamcox.co.uk/patches/protobuf/
 There's two patches - one for serializing in a delimited form, and one
 for deserializing from a delimited form.
 --
 Graham Cox

 --

 You received this message because you are subscribed to the Google Groups
 Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.





--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




Re: [protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-04 Thread Kenton Varda
Make sure to reply all so that the group is CC'd.

So you are saying that the user should read whatever data is on the socket,
then attempt to parse it, and if it fails, assume that it's because there is
more data to read?  Seems rather wasteful.  I think what we ideally want is
either:
(a) Provide a way for the caller to read the size independently, so that
they can then make sure to read that many bytes from the input before
parsing.
(b) Provide a method that reads from a stream, so that the protobuf library
can automatically take care of reading all necessary bytes.

Option (b) is obviously cleaner but has a few problems:
- We have to choose a particular stream interface to support.  While the
Python file-like interface is pretty common I'm not sure if it's universal
for this kind of task.
- If not all bytes of the message are available yet, we'd have to block.
 This might be fine most of the time, but would be unacceptable for some
uses.

Thoughts?

On Mon, Jan 4, 2010 at 3:09 PM, Graham Cox cox.gra...@gmail.com wrote:

 I'm using it for reading/writing to sockets in my functional tests - works
 well enough there...
 In my Java-side server code, I read from the socket into a byte buffer,
 then deserialize the byte buffer into Protobuf objects, throwing away the
 data that has been deserialized. The python MergeDelimitedFromString
 function also returns the number of bytes that were processed to build up
 the Protobuf object, so the user could easily do the same - read the socket
 onto the end of a buffer, and then while the buffer is successfully
 deserializing into objects throw away the first x bytes as appropriate...

 Just a thought :)

 On Mon, Jan 4, 2010 at 9:57 PM, Kenton Varda ken...@google.com wrote:

 Hmm, it occurs to me that this currently is not useful for reading from a
 socket or similar stream since the caller has to make sure to read an entire
 message before trying to parse it, but the caller doesn't actually know how
 long the message is (because the code that determines this is encapsulated).
  Any thoughts on this?

 On Mon, Jan 4, 2010 at 12:11 PM, Kenton Varda ken...@google.com wrote:

 Mostly looks good.  There are some style issues (e.g. lines over 80
 chars) but I can clean those up myself.

 You'll need to sign the contributor license agreement:

 http://code.google.com/legal/individual-cla-v1.0.html -- If you own
 copyright on this change.
 http://code.google.com/legal/corporate-cla-v1.0.html -- If your employer
 does.

 Please let me know after you've done this and then I can submit these.


 On Fri, Jan 1, 2010 at 12:53 PM, Graham cox.gra...@gmail.com wrote:

 On Jan 1, 7:32 am, Kenton Varda ken...@google.com wrote:
  I don't think an equivalent has been added to the Python API.  Want to
 write
  up a patch?

 Well - if you insist... Here's a first run, which seems to work but
 I'm a very long way from a competent python programmers so feel free
 to fix it up some :)

 I can't see how to attach files using the google groups interface, so
 I've stuck them on my webspace for now:
 http://grahamcox.co.uk/patches/protobuf/
 There's two patches - one for serializing in a delimited form, and one
 for deserializing from a delimited form.
 --
 Graham Cox

 --

 You received this message because you are subscribed to the Google
 Groups Protocol Buffers group.
 To post to this group, send email to proto...@googlegroups.com.
 To unsubscribe from this group, send email to
 protobuf+unsubscr...@googlegroups.comprotobuf%2bunsubscr...@googlegroups.com
 .
 For more options, visit this group at
 http://groups.google.com/group/protobuf?hl=en.







--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




[protobuf] Re: WriteDelimited/parseDelimited in python

2010-01-01 Thread Graham
On Jan 1, 7:32 am, Kenton Varda ken...@google.com wrote:
 I don't think an equivalent has been added to the Python API.  Want to write
 up a patch?

Well - if you insist... Here's a first run, which seems to work but
I'm a very long way from a competent python programmers so feel free
to fix it up some :)

I can't see how to attach files using the google groups interface, so
I've stuck them on my webspace for now: http://grahamcox.co.uk/patches/protobuf/
There's two patches - one for serializing in a delimited form, and one
for deserializing from a delimited form.
--
Graham Cox

--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.