Hi,

I don't know much about regex. I just need to match
characters from various Unicode classes. Here -

http://www.jsoftware.com/help/pcre/pcrepattern.html#SEC2

under "Unicode character properties" (about 3 screens
below) is said:

"When PCRE is built with Unicode character property support,
three additional escape sequences to match character
properties are available when UTF-8 mode is selected. They
are:

  \p{xx}   a character with the xx property
  \P{xx}   a character without the xx property
  \X       an extended Unicode sequence"

Seems like \p{xx} escape sequence would do what I need,
but it doesn't seem to work.

What do you mean by using UTF-8 verbatim?

I'm not using at the moment any software other than from
a standard J distribution. That includes jpcre.dll, and
here -

http://www.jsoftware.com/help/user/regex_expressions.htm

is said "J uses the PCRE (Perl Compatible Regular Expression)
engine through the POSIX regex interface.

So, the question is: can rxmatch match a Unicode class of
characters, and if yes, how?

Alexander

> Date: Fri, 13 Mar 2009 17:06:47 -0700 (PDT)
> From: Oleg Kobchenko <[email protected]>
> Subject: Re: [Jprogramming] regex matching Unicode classes?
> To: Programming forum <[email protected]>
> Message-ID:
> <[email protected]>
> Content-Type: text/plain; charset=us-ascii
> 
> 
> I know regex very well but that escape is unfamiliar.
> 
> Can you use UTF-8 verbatim?
> 
> OSS has compile flags, so  it could have different
> features. 
> 
> 
> Oleg
> 
> 
> On Mar 12, 2009, at 0:26, Alexander Mikhailov
> <[email protected]> wrote:
> 
> 
> 
> Hi,
> 
> I'm trying to construct a regular expression which
> recognizes a Unicode
> class of characters.
> 
> The following command
> 
> '\p{Lu}' rxmatch 'bAb'
> 
> produces the error
> 
> |pattern error at offset 1     : rxcomp
> |   (rxerror'')    13!:8[12
> 
> I expect it should return 1 1 . The command
> '\d[ab]' rxmatch 'qw1awer' produces 2 2
> , as expected. Am I doing
> something wrong?
> 
> I've checked
> http://www.jsoftware.com/help/pcre/pcrepattern.html ,
> it says, "When PCRE is built with Unicode character
> property support,
> three additional escape sequences to match character
> properties are
> available..." Does it mean there are different
> versions of ~tools/
> regex/jpcre.dll?..
> 
> Thank you,
> 
> Alexander



      
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to