Re: [Jprogramming] regex matching Unicode classes?

Alexander Mikhailov Mon, 16 Mar 2009 13:45:29 -0700


>It would be helpful, you explained what you are trying to do.


There is a lexical part of a grammar I'm trying to get
parsed. The part particularly says:

identifier-or-keyword:
identifier-start-character   identifier-part-charactersopt

identifier-start-character:
letter-character
_ (the underscore character U+005F)

letter-character:
A Unicode character of classes Lu, Ll, Lt, Lm, Lo, or Nl 
A unicode-escape-sequence representing a character of classes
 Lu, Ll, Lt, Lm, Lo, or Nl

combining-character:
A Unicode character of classes Mn or Mc 
A unicode-escape-sequence representing a character of classes
 Mn or Mc

etc. I'm trying to build a lexer for the grammar.

Sorry, I didn't get what UTF-8 verbatim means. I just got a
bunch of question marks.

Alexander

-----

Date: Sat, 14 Mar 2009 21:10:11 -0700 (PDT)
From: Oleg Kobchenko <[email protected]>
Subject: Re: [Jprogramming] regex matching Unicode classes?
To: Programming forum <[email protected]>
Message-ID: <[email protected]>
Content-Type: text/plain; charset=utf-8


It would be helpful, you explained what you are trying to do.


> What do you mean by using UTF-8 verbatim?

   load 'regex'
   T=: '? ??????? ??? ???????? ??????'  NB. test
   V=: '?????'                          NB. some vowels
   runs=: ;:^:_1@,@(rxmatches rxfrom])  NB. contigous runs
   ('[^ ',V,']+') runs T
?? ??? ? ? ??? ? ? ? ?? ?



      
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] regex matching Unicode classes?

Reply via email to