On 10/31/2011 08:32 PM, Patrick Maupin wrote:
On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel<d...@davea.name>  wrote:

Yes.  Actually, you don't even need the split() -- you can pass an
optional deletechars parameter to translate().


On Oct 31, 5:52 pm, Ian Kelly<ian.g.ke...@gmail.com>  wrote:

That sounds overly complicated and error-prone.
Not really.

  For instance, split() will split on vertical tab,
which is not one of the characters the OP wanted.
That's just the default behavior.  You can explicitly specify the
separator to split on.  But it's probably more efficient to just use
translate with deletechars.

I would probably use a regular expression for this.
I use 'em all the time, but not for stuff this simple.

Regards,
Pat
I would claim that a well-written (in C) translate function, without using the delete option, should be much quicker than any python loop, even if it does copy the data. Incidentally, on the Pentium family, there's a machine instruction for that, to do the whole loop in one instruction (with rep prefix). I don't know if the library version is done so. And with the delete option, it wouldn't be copying anything, if the data is all legal. As for processing a gig of data, I never said to do it all in one pass. Process it maybe 4k at a time, and quit the first time you encounter a character not in the table.

But I didn't try to post any code, since the OP never specified Python version, nor the encoding of the data. He just said string. And we all know that without measuring, it's all speculation.

DaveA


--

DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to