On Tue, Jan 7, 2020 at 10:22 AM Tom Payne <twpa...@gmail.com> wrote: > > tl;dr How should I use named Unicode character classes in regexps? > > I'm trying to write a regular expression that matches Go identifiers, which > start with a Unicode letter or underscore followed by zero or more Unicode > letters, decimal digits, and/or underscores. > > Based on the regexp syntax, and the variables in the unicode package which > mention the classes "Letter" and "Number, decimal digit", I was expecting to > write something like: > > identiferRegexp := > regexp.MustCompile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal > digit}]_]*\z`) > > However, this pattern does not compile, giving the error: > > regexp: Compile(`\A[[\p{Letter}]_][[\p{Letter}][\p{Number, decimal > digit}]_]*\z`): error parsing regexp: invalid character class range: > `\p{Letter}` > > Using the short name for character classes (L for Letter, Nd for Number, > decimal digit) does work however: > > identiferRegexp := regexp.MustCompile(`\A[\pL_][\pL\p{Nd}_]*\z`) > > You can play with these regexps on play.golang.org. > > Is this simply an oversight that Unicode character classes like "Letter" and > "Number, decimal digit" are not available for use in regexps, or should I be > using them differently?
The strings you can use with \p are the ones listed in unicode.Categories and unicode.Scripts. So use \pL as you do in the second example. Ian -- You received this message because you are subscribed to the Google Groups "golang-nuts" group. To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWJQFfQ9C5%2BGdz1uSsGCRWb_pmuphnj%2Bg5jza8%2BytEVrA%40mail.gmail.com.