[bug #28275] Ranges like [a-z] incorrectly match in UTF systems

Makar Sun, 13 Dec 2009 06:07:57 -0800

URL:
  <http://savannah.gnu.org/bugs/?28275>


                 Summary: Ranges like [a-z] incorrectly match in UTF systems
                 Project: grep
            Submitted by: tkzv
            Submitted on: Вск 13 Дек 2009 14:06:05
                Category: None
                Severity: 3 - Normal
              Item Group: None
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any

    _______________________________________________________

Details:

In UTF-8 locale if basic or extended regular expressions are selected, ranges
like [a-z] or [а-я] seem to match much more symbols, than they should.
Simply enumerating all the symbols, e.g. [abcdefghijklmnopqrstuvwxyz] or
[абвгдеёжзийклмнопрстуфхцчшщъыьэюя] works
fine.

If perl regular expressions are selected (-P switch), ranges with ASCII-only
symbols like [a-z] work correctly, but multibyte (both ranges and enumeration)
symbols are interpreted as several 1-byte symbols.




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?28275>

_______________________________________________
  Message sent via/by Savannah
  http://savannah.gnu.org/

[bug #28275] Ranges like [a-z] incorrectly match in UTF systems

Reply via email to