Re: [bitc-dev] Unicode RegExp Hell

Matt Rice Tue, 22 Apr 2014 15:07:05 -0700

On Tue, Apr 22, 2014 at 7:15 AM, Jonathan S. Shapiro <[email protected]> wrote:
> On Tue, Apr 22, 2014 at 12:26 AM, Matt Rice <[email protected]> wrote:
>>
>> On Mon, Apr 21, 2014 at 10:51 AM, Jonathan S. Shapiro <[email protected]>
>> wrote:
>>
>> > "[_a-zA-Z][_a-zA-Z0-9]*"). At some point I'll probably extend that to
>> > case-insensitive keyword matching somehow, but for now I don't need
>> > that.
>>
>> I'm not entirely sure what to make of the particularly awkward
>> instances of unicode case conversion from a lexers perspective, but it
>> seems unlikely to have keywords in particular that admit those
>> specific case conversions.
>
>
> The Unicode case conversion rules actually aren't that bad, but I was
> thinking about something else entirely.


I wasn't actually thinking about the specifics of unicode case
conversion rules, but
the problem that lexical analysis works on words, and that all case
matches may/may not be appropriate for a keyword when working with
those letters with the multiple lower/upper case variations, that said
I do not actually know any of the languages where this is a problem.

from http://www.unicode.org/faq/casemap_charprop.html

Q: Does the default case mapping work for every language? What about
the default case folding?

A: The Unicode Standard defines the default case mapping for each
individual character, with each character considered in isolation.
This mapping does not provide for the context in which the character
appears, nor for the language-specific rules that must be applied when
working in natural language text.
_______________________________________________
bitc-dev mailing list
[email protected]
http://www.coyotos.org/mailman/listinfo/bitc-dev

Re: [bitc-dev] Unicode RegExp Hell

Reply via email to