bug#16581: suggested code simplification in dfa.c

arnold Thu, 30 Jan 2014 07:56:15 -0800

Paul Eggert <[email protected]> wrote:

> Aaron Crane wrote:
> > I'd expect U+01C8 LATIN CAPITAL LETTER L WITH SMALL
> > LETTER J ("Lj", roughly) to be U+01C7 LATIN CAPITAL LETTER LJ ("LJ")
> > under towupper(), and U+01C9 LATIN SMALL LETTER LJ ("lj") under
> > towlower().
>
> Ouch, thanks, I hadn't considered that.  So my idea was all wrong.  But 
> this means the current code is all wrong too.  I'll take a look at it. I 
> hope I don't regret picking up this thread....


This seems to be a weird (and very much corner) case: wc != towlower(wc)
and wc != towupper(wc).  It can only be an issue if doing case folding,
and there are only a few spots in the code that deal with case folding
when compiling the dfa.

I suggest starting with the XOR changes for unibyte locales - they seem
(to me) to be good no matter what. And then separately try to deal with
the multibyte case.

And just to increase the need for Aspirin, any idea how regex handles
this case?  I would not be surprised if the code there also doesn't
catch this.  Wheeeeeeeee!  :-)

Arnold

bug#16581: suggested code simplification in dfa.c

Reply via email to