Follow-up Comment #2, patch #3803 (project grep):

The full story behind this patch is that grep-2.5.1a does not handle UTF-8
gracefully at all.  The basic plan with handling UTF-8 in 2.5.1a is:
 * whenever a buffer is parsed, go through the entire buffer deciding how
many bytes make up each character
 * use this information when necessary

This patch changes that to:
 * when information about how many bytes make up a character is needed, work
it out on demand

On the face of it, this is a small obvious improvement.  In fact it is much
better than that, because the original scheme would calculate character
lengths several times for each buffer: in fact, one full pass for every
single potential match!

For a full discussion of this patch, as well as dfa-optional, including
benchmarking results, see the mailing list.

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/patch/?func=detailitem&item_id=3803>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/



Reply via email to