Martin v. Löwis wrote: > M.-A. Lemburg wrote: > >>If all you're interested in is the lexical class of the code points >>in a string, you could use such a codec to map each code point >>to a code point representing the lexical class. > > > How can I efficiently implement such a codec? The whole point is doing > that in pure Python (because if I had to write an extension module, > I could just as well do the entire lexical analysis in C, without > any regular expressions).
You can write such a codec in Python, but C will of course be more efficient. The whole point is that for things that you will likely use a lot in your application, it is better to have one efficient implementation than dozens of duplicate re character sets embedded in compiled re-expressions. > Any kind of associative/indexed table for this task consumes a lot > of memory, and takes quite some time to initialize. Right - which is why an algorithmic approach will always be more efficient (in terms of speed/memory tradeoff) and these *can* support surrogates. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, May 11 2005) >>> Python/Zope Consulting and Support ... http://www.egenix.com/ >>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/ >>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/ ________________________________________________________________________ ::: Try mxODBC.Zope.DA for Windows,Linux,Solaris,FreeBSD for free ! :::: _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com