Re: Sending binary pickled data through TCP

2006-10-14 Thread Steve Holden
David Hirschfield wrote:
 Thanks for the great response.
 
 Yeah, by safe I mean that it's all happening on an intranet with no 
 chance of malicious individuals getting access to the stream of data.
 
 The chunks are arbitrary collections of python objects. I'm wrapping 
 them up a little, but I don't know much about the actual formal makeup 
 of the data, other than it pickles successfully.
 
 Are there any existing python modules that do the equivalent of pickling 
 on arbitrary python data, but do it a lot faster? I wasn't aware of any 
 that are as easy to use as pickle, or don't require implementing them 
 myself, which is not something I have time for.
 
Marshal may achieve what you want, but on a more limited range of 
datatypes than pickle.

regards
  Steve


 Thanks again,
 -Dave
 
 Steve Holden wrote:
 
David Hirschfield wrote:
  

I have a pair of programs which trade python data back and forth by 
pickling up lists of objects on one side (using 
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating 
multiple chunks of pickled binary data in the stream being sent back and 
forth.

Questions:

Is it safe to do what I'm doing? I didn't think there was anything 
fundamentally wrong with sending binary pickled data, especially in the 
closed, safe environment these programs operate under...but maybe I'm 
making a poor assumption?



If there's no chance of malevolent attackers modifying the data stream 
then you can safely ignore the otherwise dire consequences of unpickling 
arbitrary chunks of data.

  

I was going to separate the chunks of pickled data with some well-formed 
string, but couldn't that string potentially randomly appear in the 
pickled data? Do I just pick an extremely 
unlikely-to-be-randomly-generated string as the separator? Is there some 
string that will definitely NEVER show up in pickled binary data?



I presumed each chunk was of a know structure. Couldn't you just lead of 
with a pickled integer saying how many chunks follow?

  

I thought about base64 encoding the data, and then decoding on the 
opposite side (like what xmlrpclib does), but that turns out to be a 
very expensive operation, which I want to avoid, speed is of the essence 
in this situation.



Yes, base64 stuffs three bytes into four (six bits per byte) giving you 
a 33% overhead. Having said that, pickle isn't all that efficient a 
representation because it's designed to be portable. If you are using 
machines of the same type there are almost certainly faster binary 
encodings.

  

Is there a reliable way to determine the byte count of some pickled 
binary data? Can I rely on len(pickled data) == bytes?



Yes, since pickle returns a string of bytes, not a Unicode object.

If bandwidth really is becoming a limitation you might want to consider 
uses of the struct module to represent things more compactly (but this 
may be too difficult if the objects being exchanged are at all complex).

regards
  Steve
  

 
 -- 
 Presenting:
 mediocre nebula.
 


-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-13 Thread Nick Craig-Wood
Paul Rubin http wrote:
  As for the network representation, DJB proposes this format:
  http://cr.yp.to/proto/netstrings.txt

Netstrings are cool and you'll find some python implementations if you
search.

But it is basically number:string,, ie 12:hello world!,

Or you could use escaping which is what I usually do.  This has the
advantage that you don't need to know how long the data is in advance.

Eg, these are from a scheme which uses \t to seperate arguments and
\r or \n to seperate transactions.  These are then escaped in the
actual data using these functions

def escape(s):
This escapes the string passed in, changing CR, LF, TAB and \\ into
\\r, \\n, \\t and 
s = s.replace(\\, )
s = s.replace(\r, \\r)
s = s.replace(\n, \\n)
s = s.replace(\t, \\t)
return s

def unescape(s, _unescape_mapping = string.maketrans('tnr','\t\n\r'), 
_unescape_re = re.compile(r'\\([(rnt\\)])')):
This unescapes the string passed in, changing \\r, \\n, \\t and 
\\any_char into
CR, LF, TAB and any_char
def _translate(m):
return m.group(1).translate(_unescape_mapping)
return _unescape_re.sub(_translate, s)

(These functions have been through the optimisation mill which is why
they may not look immediately like how you might first think of
writing them!)

-- 
Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-13 Thread MRAB

David Hirschfield wrote:
 I have a pair of programs which trade python data back and forth by
 pickling up lists of objects on one side (using
 pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket
 connection to the receiver, who unpickles the data and uses it.

 So far this has been working fine, but I now need a way of separating
 multiple chunks of pickled binary data in the stream being sent back and
 forth.

 Questions:

 Is it safe to do what I'm doing? I didn't think there was anything
 fundamentally wrong with sending binary pickled data, especially in the
 closed, safe environment these programs operate under...but maybe I'm
 making a poor assumption?

 I was going to separate the chunks of pickled data with some well-formed
 string, but couldn't that string potentially randomly appear in the
 pickled data? Do I just pick an extremely
 unlikely-to-be-randomly-generated string as the separator? Is there some
 string that will definitely NEVER show up in pickled binary data?

 I thought about base64 encoding the data, and then decoding on the
 opposite side (like what xmlrpclib does), but that turns out to be a
 very expensive operation, which I want to avoid, speed is of the essence
 in this situation.

 Is there a reliable way to determine the byte count of some pickled
 binary data? Can I rely on len(pickled data) == bytes?

Instead of communicating directly with the TCP socket, you could talk
to it via an object which precedes each chunk with a byte count, and if
you're working with multiple streams of picked data, then each chunk
could also have an identifier which specified which stream it belonged
to.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-13 Thread David Hirschfield




Thanks for the great response.

Yeah, by "safe" I mean that it's all happening on an intranet with no
chance of malicious individuals getting access to the stream of data.

The chunks are arbitrary collections of python objects. I'm wrapping
them up a little, but I don't know much about the actual formal makeup
of the data, other than it pickles successfully.

Are there any existing python modules that do the equivalent of
pickling on arbitrary python data, but do it a lot faster? I wasn't
aware of any that are as easy to use as pickle, or don't require
implementing them myself, which is not something I have time for.

Thanks again,
-Dave

Steve Holden wrote:

  David Hirschfield wrote:
  
  
I have a pair of programs which trade python data back and forth by 
pickling up lists of objects on one side (using 
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating 
multiple chunks of pickled binary data in the stream being sent back and 
forth.

Questions:

Is it safe to do what I'm doing? I didn't think there was anything 
fundamentally wrong with sending binary pickled data, especially in the 
closed, safe environment these programs operate under...but maybe I'm 
making a poor assumption?


  
  If there's no chance of malevolent attackers modifying the data stream 
then you can safely ignore the otherwise dire consequences of unpickling 
arbitrary chunks of data.

  
  
I was going to separate the chunks of pickled data with some well-formed 
string, but couldn't that string potentially randomly appear in the 
pickled data? Do I just pick an extremely 
unlikely-to-be-randomly-generated string as the separator? Is there some 
string that will definitely NEVER show up in pickled binary data?


  
  I presumed each chunk was of a know structure. Couldn't you just lead of 
with a pickled integer saying how many chunks follow?

  
  
I thought about base64 encoding the data, and then decoding on the 
opposite side (like what xmlrpclib does), but that turns out to be a 
very expensive operation, which I want to avoid, speed is of the essence 
in this situation.


  
  Yes, base64 stuffs three bytes into four (six bits per byte) giving you 
a 33% overhead. Having said that, pickle isn't all that efficient a 
representation because it's designed to be portable. If you are using 
machines of the same type there are almost certainly faster binary 
encodings.

  
  
Is there a reliable way to determine the byte count of some pickled 
binary data? Can I rely on len(pickled data) == bytes?


  
  Yes, since pickle returns a string of bytes, not a Unicode object.

If bandwidth really is becoming a limitation you might want to consider 
uses of the struct module to represent things more compactly (but this 
may be too difficult if the objects being exchanged are at all complex).

regards
  Steve
  


-- 
Presenting:
mediocre nebula.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sending binary pickled data through TCP

2006-10-13 Thread Fredrik Lundh
David Hirschfield wrote:

 Are there any existing python modules that do the equivalent of pickling 
 on arbitrary python data, but do it a lot faster? I wasn't aware of any 
 that are as easy to use as pickle, or don't require implementing them 
 myself, which is not something I have time for.

cPickle is faster than pickle.  marshal is faster than cPickle, but only 
supports certain code object types.

/F

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-13 Thread Irmen de Jong
David Hirschfield wrote:
 I have a pair of programs which trade python data back and forth by 
 pickling up lists of objects on one side (using 
 pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
 connection to the receiver, who unpickles the data and uses it.
 
 So far this has been working fine, but I now need a way of separating 
 multiple chunks of pickled binary data in the stream being sent back and 
 forth.
[...]

Save yourself the trouble of implementing some sort of IPC mechanism
over sockets, and give Pyro a swing: http://pyro.sourceforge.net

In Pyro almost all of the nastyness that is usually associated with socket
programming is shielded from you and you'll get much more as well
(a complete pythonic IPC library).

It may be a bit heavy for what you are trying to do but it may
be the right choice to avoid troubles later when your requirements
get more complex and/or you discover problems with your networking code.

Hth,
---Irmen de Jong
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-13 Thread David Hirschfield




I've looked at pyro, and it is definitely overkill for what I need.

If I was requiring some kind of persistent state for objects shared
between processes, pyro would be awesome...but I just need to transfer
chunks of complex python data back and forth. No method calls or
keeping state in sync.

I don't find socket code particularly nasty, especially through a
higher-level module like asyncore/asynchat.
-Dave

Irmen de Jong wrote:

  David Hirschfield wrote:
  
  
I have a pair of programs which trade python data back and forth by 
pickling up lists of objects on one side (using 
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating 
multiple chunks of pickled binary data in the stream being sent back and 
forth.

  
  [...]

Save yourself the trouble of implementing some sort of IPC mechanism
over sockets, and give Pyro a swing: http://pyro.sourceforge.net

In Pyro almost all of the nastyness that is usually associated with socket
programming is shielded from you and you'll get much more as well
(a complete pythonic IPC library).

It may be a bit heavy for what you are trying to do but it may
be the right choice to avoid troubles later when your requirements
get more complex and/or you discover problems with your networking code.

Hth,
---Irmen de Jong
  


-- 
Presenting:
mediocre nebula.



-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sending binary pickled data through TCP

2006-10-13 Thread David Hirschfield




I'm using cPickle already. I need to be able to pickle pretty
arbitrarily complex python data structures, so I can't use marshal.
I'm guessing that cPickle is the best choice, but if someone has a
faster pickling-like module, I'd love to know about it.

-Dave

Fredrik Lundh wrote:

  David Hirschfield wrote:

  
  
Are there any existing python modules that do the equivalent of pickling 
on arbitrary python data, but do it a lot faster? I wasn't aware of any 
that are as easy to use as pickle, or don't require implementing them 
myself, which is not something I have time for.

  
  
cPickle is faster than pickle.  marshal is faster than cPickle, but only 
supports certain code object types.

/F

  


-- 
Presenting:
mediocre nebula.



-- 
http://mail.python.org/mailman/listinfo/python-list

Sending binary pickled data through TCP

2006-10-12 Thread David Hirschfield
I have a pair of programs which trade python data back and forth by 
pickling up lists of objects on one side (using 
pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
connection to the receiver, who unpickles the data and uses it.

So far this has been working fine, but I now need a way of separating 
multiple chunks of pickled binary data in the stream being sent back and 
forth.

Questions:

Is it safe to do what I'm doing? I didn't think there was anything 
fundamentally wrong with sending binary pickled data, especially in the 
closed, safe environment these programs operate under...but maybe I'm 
making a poor assumption?

I was going to separate the chunks of pickled data with some well-formed 
string, but couldn't that string potentially randomly appear in the 
pickled data? Do I just pick an extremely 
unlikely-to-be-randomly-generated string as the separator? Is there some 
string that will definitely NEVER show up in pickled binary data?

I thought about base64 encoding the data, and then decoding on the 
opposite side (like what xmlrpclib does), but that turns out to be a 
very expensive operation, which I want to avoid, speed is of the essence 
in this situation.

Is there a reliable way to determine the byte count of some pickled 
binary data? Can I rely on len(pickled data) == bytes?

Thanks for all responses,
-David

-- 
Presenting:
mediocre nebula.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-12 Thread Steve Holden
David Hirschfield wrote:
 I have a pair of programs which trade python data back and forth by 
 pickling up lists of objects on one side (using 
 pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket 
 connection to the receiver, who unpickles the data and uses it.
 
 So far this has been working fine, but I now need a way of separating 
 multiple chunks of pickled binary data in the stream being sent back and 
 forth.
 
 Questions:
 
 Is it safe to do what I'm doing? I didn't think there was anything 
 fundamentally wrong with sending binary pickled data, especially in the 
 closed, safe environment these programs operate under...but maybe I'm 
 making a poor assumption?
 
If there's no chance of malevolent attackers modifying the data stream 
then you can safely ignore the otherwise dire consequences of unpickling 
arbitrary chunks of data.

 I was going to separate the chunks of pickled data with some well-formed 
 string, but couldn't that string potentially randomly appear in the 
 pickled data? Do I just pick an extremely 
 unlikely-to-be-randomly-generated string as the separator? Is there some 
 string that will definitely NEVER show up in pickled binary data?
 
I presumed each chunk was of a know structure. Couldn't you just lead of 
with a pickled integer saying how many chunks follow?

 I thought about base64 encoding the data, and then decoding on the 
 opposite side (like what xmlrpclib does), but that turns out to be a 
 very expensive operation, which I want to avoid, speed is of the essence 
 in this situation.
 
Yes, base64 stuffs three bytes into four (six bits per byte) giving you 
a 33% overhead. Having said that, pickle isn't all that efficient a 
representation because it's designed to be portable. If you are using 
machines of the same type there are almost certainly faster binary 
encodings.

 Is there a reliable way to determine the byte count of some pickled 
 binary data? Can I rely on len(pickled data) == bytes?
 
Yes, since pickle returns a string of bytes, not a Unicode object.

If bandwidth really is becoming a limitation you might want to consider 
uses of the struct module to represent things more compactly (but this 
may be too difficult if the objects being exchanged are at all complex).

regards
  Steve
-- 
Steve Holden   +44 150 684 7255  +1 800 494 3119
Holden Web LLC/Ltd  http://www.holdenweb.com
Skype: holdenweb   http://holdenweb.blogspot.com
Recent Ramblings http://del.icio.us/steve.holden

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sending binary pickled data through TCP

2006-10-12 Thread Paul Rubin
David Hirschfield [EMAIL PROTECTED] writes:
 Is there a reliable way to determine the byte count of some pickled
 binary data? Can I rely on len(pickled data) == bytes?

Huh?  Yes, of course len gives you the length.

As for the network representation, DJB proposes this format:
http://cr.yp.to/proto/netstrings.txt
-- 
http://mail.python.org/mailman/listinfo/python-list