Re: Sending binary pickled data through TCP
David Hirschfield wrote: Thanks for the great response. Yeah, by safe I mean that it's all happening on an intranet with no chance of malicious individuals getting access to the stream of data. The chunks are arbitrary collections of python objects. I'm wrapping them up a little, but I don't know much about the actual formal makeup of the data, other than it pickles successfully. Are there any existing python modules that do the equivalent of pickling on arbitrary python data, but do it a lot faster? I wasn't aware of any that are as easy to use as pickle, or don't require implementing them myself, which is not something I have time for. Marshal may achieve what you want, but on a more limited range of datatypes than pickle. regards Steve Thanks again, -Dave Steve Holden wrote: David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. Questions: Is it safe to do what I'm doing? I didn't think there was anything fundamentally wrong with sending binary pickled data, especially in the closed, safe environment these programs operate under...but maybe I'm making a poor assumption? If there's no chance of malevolent attackers modifying the data stream then you can safely ignore the otherwise dire consequences of unpickling arbitrary chunks of data. I was going to separate the chunks of pickled data with some well-formed string, but couldn't that string potentially randomly appear in the pickled data? Do I just pick an extremely unlikely-to-be-randomly-generated string as the separator? Is there some string that will definitely NEVER show up in pickled binary data? I presumed each chunk was of a know structure. Couldn't you just lead of with a pickled integer saying how many chunks follow? I thought about base64 encoding the data, and then decoding on the opposite side (like what xmlrpclib does), but that turns out to be a very expensive operation, which I want to avoid, speed is of the essence in this situation. Yes, base64 stuffs three bytes into four (six bits per byte) giving you a 33% overhead. Having said that, pickle isn't all that efficient a representation because it's designed to be portable. If you are using machines of the same type there are almost certainly faster binary encodings. Is there a reliable way to determine the byte count of some pickled binary data? Can I rely on len(pickled data) == bytes? Yes, since pickle returns a string of bytes, not a Unicode object. If bandwidth really is becoming a limitation you might want to consider uses of the struct module to represent things more compactly (but this may be too difficult if the objects being exchanged are at all complex). regards Steve -- Presenting: mediocre nebula. -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
Paul Rubin http wrote: As for the network representation, DJB proposes this format: http://cr.yp.to/proto/netstrings.txt Netstrings are cool and you'll find some python implementations if you search. But it is basically number:string,, ie 12:hello world!, Or you could use escaping which is what I usually do. This has the advantage that you don't need to know how long the data is in advance. Eg, these are from a scheme which uses \t to seperate arguments and \r or \n to seperate transactions. These are then escaped in the actual data using these functions def escape(s): This escapes the string passed in, changing CR, LF, TAB and \\ into \\r, \\n, \\t and s = s.replace(\\, ) s = s.replace(\r, \\r) s = s.replace(\n, \\n) s = s.replace(\t, \\t) return s def unescape(s, _unescape_mapping = string.maketrans('tnr','\t\n\r'), _unescape_re = re.compile(r'\\([(rnt\\)])')): This unescapes the string passed in, changing \\r, \\n, \\t and \\any_char into CR, LF, TAB and any_char def _translate(m): return m.group(1).translate(_unescape_mapping) return _unescape_re.sub(_translate, s) (These functions have been through the optimisation mill which is why they may not look immediately like how you might first think of writing them!) -- Nick Craig-Wood [EMAIL PROTECTED] -- http://www.craig-wood.com/nick -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. Questions: Is it safe to do what I'm doing? I didn't think there was anything fundamentally wrong with sending binary pickled data, especially in the closed, safe environment these programs operate under...but maybe I'm making a poor assumption? I was going to separate the chunks of pickled data with some well-formed string, but couldn't that string potentially randomly appear in the pickled data? Do I just pick an extremely unlikely-to-be-randomly-generated string as the separator? Is there some string that will definitely NEVER show up in pickled binary data? I thought about base64 encoding the data, and then decoding on the opposite side (like what xmlrpclib does), but that turns out to be a very expensive operation, which I want to avoid, speed is of the essence in this situation. Is there a reliable way to determine the byte count of some pickled binary data? Can I rely on len(pickled data) == bytes? Instead of communicating directly with the TCP socket, you could talk to it via an object which precedes each chunk with a byte count, and if you're working with multiple streams of picked data, then each chunk could also have an identifier which specified which stream it belonged to. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
Thanks for the great response. Yeah, by "safe" I mean that it's all happening on an intranet with no chance of malicious individuals getting access to the stream of data. The chunks are arbitrary collections of python objects. I'm wrapping them up a little, but I don't know much about the actual formal makeup of the data, other than it pickles successfully. Are there any existing python modules that do the equivalent of pickling on arbitrary python data, but do it a lot faster? I wasn't aware of any that are as easy to use as pickle, or don't require implementing them myself, which is not something I have time for. Thanks again, -Dave Steve Holden wrote: David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. Questions: Is it safe to do what I'm doing? I didn't think there was anything fundamentally wrong with sending binary pickled data, especially in the closed, safe environment these programs operate under...but maybe I'm making a poor assumption? If there's no chance of malevolent attackers modifying the data stream then you can safely ignore the otherwise dire consequences of unpickling arbitrary chunks of data. I was going to separate the chunks of pickled data with some well-formed string, but couldn't that string potentially randomly appear in the pickled data? Do I just pick an extremely unlikely-to-be-randomly-generated string as the separator? Is there some string that will definitely NEVER show up in pickled binary data? I presumed each chunk was of a know structure. Couldn't you just lead of with a pickled integer saying how many chunks follow? I thought about base64 encoding the data, and then decoding on the opposite side (like what xmlrpclib does), but that turns out to be a very expensive operation, which I want to avoid, speed is of the essence in this situation. Yes, base64 stuffs three bytes into four (six bits per byte) giving you a 33% overhead. Having said that, pickle isn't all that efficient a representation because it's designed to be portable. If you are using machines of the same type there are almost certainly faster binary encodings. Is there a reliable way to determine the byte count of some pickled binary data? Can I rely on len(pickled data) == bytes? Yes, since pickle returns a string of bytes, not a Unicode object. If bandwidth really is becoming a limitation you might want to consider uses of the struct module to represent things more compactly (but this may be too difficult if the objects being exchanged are at all complex). regards Steve -- Presenting: mediocre nebula. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
David Hirschfield wrote: Are there any existing python modules that do the equivalent of pickling on arbitrary python data, but do it a lot faster? I wasn't aware of any that are as easy to use as pickle, or don't require implementing them myself, which is not something I have time for. cPickle is faster than pickle. marshal is faster than cPickle, but only supports certain code object types. /F -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. [...] Save yourself the trouble of implementing some sort of IPC mechanism over sockets, and give Pyro a swing: http://pyro.sourceforge.net In Pyro almost all of the nastyness that is usually associated with socket programming is shielded from you and you'll get much more as well (a complete pythonic IPC library). It may be a bit heavy for what you are trying to do but it may be the right choice to avoid troubles later when your requirements get more complex and/or you discover problems with your networking code. Hth, ---Irmen de Jong -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
I've looked at pyro, and it is definitely overkill for what I need. If I was requiring some kind of persistent state for objects shared between processes, pyro would be awesome...but I just need to transfer chunks of complex python data back and forth. No method calls or keeping state in sync. I don't find socket code particularly nasty, especially through a higher-level module like asyncore/asynchat. -Dave Irmen de Jong wrote: David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. [...] Save yourself the trouble of implementing some sort of IPC mechanism over sockets, and give Pyro a swing: http://pyro.sourceforge.net In Pyro almost all of the nastyness that is usually associated with socket programming is shielded from you and you'll get much more as well (a complete pythonic IPC library). It may be a bit heavy for what you are trying to do but it may be the right choice to avoid troubles later when your requirements get more complex and/or you discover problems with your networking code. Hth, ---Irmen de Jong -- Presenting: mediocre nebula. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
I'm using cPickle already. I need to be able to pickle pretty arbitrarily complex python data structures, so I can't use marshal. I'm guessing that cPickle is the best choice, but if someone has a faster pickling-like module, I'd love to know about it. -Dave Fredrik Lundh wrote: David Hirschfield wrote: Are there any existing python modules that do the equivalent of pickling on arbitrary python data, but do it a lot faster? I wasn't aware of any that are as easy to use as pickle, or don't require implementing them myself, which is not something I have time for. cPickle is faster than pickle. marshal is faster than cPickle, but only supports certain code object types. /F -- Presenting: mediocre nebula. -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
David Hirschfield wrote: I have a pair of programs which trade python data back and forth by pickling up lists of objects on one side (using pickle.HIGHEST_PROTOCOL), and sending that data over a TCP socket connection to the receiver, who unpickles the data and uses it. So far this has been working fine, but I now need a way of separating multiple chunks of pickled binary data in the stream being sent back and forth. Questions: Is it safe to do what I'm doing? I didn't think there was anything fundamentally wrong with sending binary pickled data, especially in the closed, safe environment these programs operate under...but maybe I'm making a poor assumption? If there's no chance of malevolent attackers modifying the data stream then you can safely ignore the otherwise dire consequences of unpickling arbitrary chunks of data. I was going to separate the chunks of pickled data with some well-formed string, but couldn't that string potentially randomly appear in the pickled data? Do I just pick an extremely unlikely-to-be-randomly-generated string as the separator? Is there some string that will definitely NEVER show up in pickled binary data? I presumed each chunk was of a know structure. Couldn't you just lead of with a pickled integer saying how many chunks follow? I thought about base64 encoding the data, and then decoding on the opposite side (like what xmlrpclib does), but that turns out to be a very expensive operation, which I want to avoid, speed is of the essence in this situation. Yes, base64 stuffs three bytes into four (six bits per byte) giving you a 33% overhead. Having said that, pickle isn't all that efficient a representation because it's designed to be portable. If you are using machines of the same type there are almost certainly faster binary encodings. Is there a reliable way to determine the byte count of some pickled binary data? Can I rely on len(pickled data) == bytes? Yes, since pickle returns a string of bytes, not a Unicode object. If bandwidth really is becoming a limitation you might want to consider uses of the struct module to represent things more compactly (but this may be too difficult if the objects being exchanged are at all complex). regards Steve -- Steve Holden +44 150 684 7255 +1 800 494 3119 Holden Web LLC/Ltd http://www.holdenweb.com Skype: holdenweb http://holdenweb.blogspot.com Recent Ramblings http://del.icio.us/steve.holden -- http://mail.python.org/mailman/listinfo/python-list
Re: Sending binary pickled data through TCP
David Hirschfield [EMAIL PROTECTED] writes: Is there a reliable way to determine the byte count of some pickled binary data? Can I rely on len(pickled data) == bytes? Huh? Yes, of course len gives you the length. As for the network representation, DJB proposes this format: http://cr.yp.to/proto/netstrings.txt -- http://mail.python.org/mailman/listinfo/python-list