Hi,
I don't know much about regex. I just need to match characters from various Unicode classes. Here - http://www.jsoftware.com/help/pcre/pcrepattern.html#SEC2 under "Unicode character properties" (about 3 screens below) is said: "When PCRE is built with Unicode character property support, three additional escape sequences to match character properties are available when UTF-8 mode is selected. They are: \p{xx} a character with the xx property \P{xx} a character without the xx property \X an extended Unicode sequence" Seems like \p{xx} escape sequence would do what I need, but it doesn't seem to work. What do you mean by using UTF-8 verbatim? I'm not using at the moment any software other than from a standard J distribution. That includes jpcre.dll, and here - http://www.jsoftware.com/help/user/regex_expressions.htm is said "J uses the PCRE (Perl Compatible Regular Expression) engine through the POSIX regex interface. So, the question is: can rxmatch match a Unicode class of characters, and if yes, how? Alexander > Date: Fri, 13 Mar 2009 17:06:47 -0700 (PDT) > From: Oleg Kobchenko <[email protected]> > Subject: Re: [Jprogramming] regex matching Unicode classes? > To: Programming forum <[email protected]> > Message-ID: > <[email protected]> > Content-Type: text/plain; charset=us-ascii > > > I know regex very well but that escape is unfamiliar. > > Can you use UTF-8 verbatim? > > OSS has compile flags, so it could have different > features. > > > Oleg > > > On Mar 12, 2009, at 0:26, Alexander Mikhailov > <[email protected]> wrote: > > > > Hi, > > I'm trying to construct a regular expression which > recognizes a Unicode > class of characters. > > The following command > > '\p{Lu}' rxmatch 'bAb' > > produces the error > > |pattern error at offset 1 : rxcomp > | (rxerror'') 13!:8[12 > > I expect it should return 1 1 . The command > '\d[ab]' rxmatch 'qw1awer' produces 2 2 > , as expected. Am I doing > something wrong? > > I've checked > http://www.jsoftware.com/help/pcre/pcrepattern.html , > it says, "When PCRE is built with Unicode character > property support, > three additional escape sequences to match character > properties are > available..." Does it mean there are different > versions of ~tools/ > regex/jpcre.dll?.. > > Thank you, > > Alexander ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
