Norihiro Tanaka wrote:
I found difference between dfa and regex (glibc) treatment of titlecase.
Thanks for bringing this up, but I'm afraid that it appears that regex
is buggy in this area. The regex code does the match by converting
pattern and text to uppercase, and then trying a match with uppercase.
But this is incorrect for an example like the following, which uses
'\(\)\1' to force using the regex code:
echo 'ς' | grep -i '\(\)\1σ'
This should output nothing, because terminal sigma is not the same as
lowercase sigma even when case is ignored. But since the uppercase
counterpart of both characters is capital sigma, grep incorrectly
outputs the terminal sigma. The dfa code gets it right.
POSIX is muddy in this area, unfortunately, but I don't see any
interpretation whereby ς and σ should match when case is ignored.