URL:
<http://savannah.gnu.org/bugs/?28275>
Summary: Ranges like [a-z] incorrectly match in UTF systems
Project: grep
Submitted by: tkzv
Submitted on: Вск 13 Дек 2009 14:06:05
Category: None
Severity: 3 - Normal
Item Group: None
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
_______________________________________________________
Details:
In UTF-8 locale if basic or extended regular expressions are selected, ranges
like [a-z] or [а-я] seem to match much more symbols, than they should.
Simply enumerating all the symbols, e.g. [abcdefghijklmnopqrstuvwxyz] or
[абвгдеёжзийклмнопрстуфхцчшщъыьэюя] works
fine.
If perl regular expressions are selected (-P switch), ranges with ASCII-only
symbols like [a-z] work correctly, but multibyte (both ranges and enumeration)
symbols are interpreted as several 1-byte symbols.
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?28275>
_______________________________________________
Message sent via/by Savannah
http://savannah.gnu.org/