Re: [go-nuts] regexp syntax and named Unicode character classes

Ian Lance Taylor Tue, 07 Jan 2020 10:36:06 -0800

On Tue, Jan 7, 2020 at 10:22 AM Tom Payne <twpa...@gmail.com> wrote:
>
> tl;dr How should I use named Unicode character classes in regexps?
>
> I'm trying to write a regular expression that matches Go identifiers, which 
> start with a Unicode letter or underscore followed by zero or more Unicode 
> letters, decimal digits, and/or underscores.
>
> Based on the regexp syntax, and the variables in the unicode package which 
> mention the classes "Letter" and "Number, decimal digit", I was expecting to 
> write something like:
>
>   identiferRegexp := 
> regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
> digit}]_]*\z`)
>
> However, this pattern does not compile, giving the error:
>
>   regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal 
> digit}]_]*\z`): error parsing regexp: invalid character class range: 
> `\p{Letter}`
>
> Using the short name for character classes (L for Letter, Nd for Number, 
> decimal digit) does work however:
>
>   identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`)
>
> You can play with these regexps on play.golang.org.
>
> Is this simply an oversight that Unicode character classes like "Letter" and 
> "Number, decimal digit" are not available for use in regexps, or should I be 
> using them differently?


The strings you can use with \p are the ones listed in
unicode.Categories and unicode.Scripts.  So use \pL as you do in the
second example.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWJQFfQ9C5%2BGdz1uSsGCRWb_pmuphnj%2Bg5jza8%2BytEVrA%40mail.gmail.com.

Re: [go-nuts] regexp syntax and named Unicode character classes

Reply via email to