On 10/31/11 18:02, Steven D'Aprano wrote:
# Define legal characters:
LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
# everybody forgets about formfeed... \f
# and are you sure you want to include chr(127) as a text char?
def is_ascii_text(text):
for c in text:
if c not in LEGAL:
return False
return True
Algorithmically, that's as efficient as possible: there's no faster way
of performing the test, although one implementation may be faster or
slower than another. (PyPy is likely to be faster than CPython, for
example.)
Additionally, if one has some foreknowledge of the character
distribution, one might be able to tweak your
def is_ascii_text(text):
legal = frozenset(LEGAL)
return all(c in legal for c in text)
with some if/else chain that might be faster than the hashing
involved in a set lookup (emphasis on the *might*, not being an
expert on CPython internals) such as
def is_ascii_text(text):
return all(
(' ' <= c <= '\x7a') or
c == '\n' or
c == '\t'
for c in text)
But Steven's main points are all spot on: (1) use an O(1) lookup;
(2) return at the first sign of trouble; and (3) push it into the
C implementation rather than a for-loop. (and the "locals are
faster in CPython" is something I didn't know)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list