Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Dave Angel Mon, 31 Oct 2011 15:11:26 -0700

On 10/31/2011 05:47 PM, Dave Angel wrote:

On 10/31/2011 03:54 PM, pyt...@bdurham.com wrote:

Wondering if there's a fast/efficient built-in way to determine
if a string has non-ASCII chars outside the range ASCII 32-127,
CR, LF, or Tab?


I know I can look at the chars of a string individually and
compare them against a set of legal chars using standard Python
code (and this works fine), but I will be working with some very
large files in the 100's Gb to several Tb size range so I'd
thought I'd check to see if there was a built-in in C that might
handle this type of check more efficiently.

Does this sound like a use case for cython or pypy?

Thanks,
Malcolm

How about doing a .replace() method call, with all those charactersturning into '', and then see if there's anything left?

I was wrong once again. But a simple combination of translate() andsplit() methods might do it. Here I'm suggesting that the table replaceall valid characters with space, so the split() can use its defaultbehavior.


--

DaveA

--
http://mail.python.org/mailman/listinfo/python-list

Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?

Reply via email to