Follow-up Comment #2, patch #3803 (project grep):
The full story behind this patch is that grep-2.5.1a does not handle UTF-8
gracefully at all. The basic plan with handling UTF-8 in 2.5.1a is:
* whenever a buffer is parsed, go through the entire buffer deciding how
many bytes make up each character
* use this information when necessary
This patch changes that to:
* when information about how many bytes make up a character is needed, work
it out on demand
On the face of it, this is a small obvious improvement. In fact it is much
better than that, because the original scheme would calculate character
lengths several times for each buffer: in fact, one full pass for every
single potential match!
For a full discussion of this patch, as well as dfa-optional, including
benchmarking results, see the mailing list.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/patch/?func=detailitem&item_id=3803>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/