I'm not sure if it's more efficient, but there's the struct
module: http://docs.python.org/library/struct.html

Thanks for your suggestion. I've been experimenting with this
technique, but my initial tests don't show any performance
improvements over using slice() objects to slice a string.
However, I missed the nuance of using 'x' to mark filler bytes
- I'm going to see if this makes a difference (it may as I am
skipping over several columns of input that I've been currently returning as ignored values)

I don't expect it will make a great deal of difference -- there's
not much room to improve the process.  Are you actually
experiencing efficiency problems?  I regularly use slice
unpacking (without reaching for the struct module) with no
noteworthy performance impact beyond the cost of scanning the
file and doing the processing on those lines (and these are text
files several hundred megs in size).  When I omit my processing
code and just skim through the file, the difference between
slice-unpacking and not slice-unpacking is in the sub-second range.

<reading your link to doc ...> wait ... it looks like I can
'compile' struct strings using by using a Struct class vs. the
using the module's basic unpack() function. This sounds like
the difference between using compiled regular expressions vs.
re-compiling a regular expression on every use. I'll see if
this makes a difference and report back to the list.

I don't expect it will...in the code for the struct.py I've got
here in my 2.5 distribution, it maintains an internal cache of
compiled strings, so unless you have more than _MAXCACHE=100
formatting strings, it's not something you're really have to
worry about.  (in my main data-processing/ETL app, I can't
envision having more than about 20 formatting strings, if I went
that route)

-tkc



--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to