I have a file of lines that contains some extraneous chars, this the basic version of code to process it:
IDtable = "".join(map(chr, xrange(256))) text = file("...", "rb").read().translate(IDtable, toRemove) for raw_line in file(file_name): line = raw_line.translate(IDtable, toRemove) ... A faster alternative: IDtable = "".join(map(chr, xrange(256))) text = file(file_name).read().translate(IDtable, toRemove) for line in text.split("/n"): ... But text.split requires some memory if the text isn't small. Probably there are simpler solutions (solutions with the language as it is now), but one seems the following, an: str.isplit() or str.itersplit() or str.xsplit() Like split, but iterative. (Or even making str.split() itself an iterator (for Py3.0), and str.listsplit() to generate lists.) (At the moment a simple RE can probably work as the isplit.) Bye, bearophile -- http://mail.python.org/mailman/listinfo/python-list