I have now got this working:

//////////
include "./src/lib/std/regdef";
open Regdef;
{
  var digit = Charset "9";
  val digits = Rpt (digit, 1, -1);
  var letter = Charset "x";
  var us = Regdef::String "_";
  var id = Seqs (list (
     Alts (list(us,letter)), 
     Rpt( 
       Alts(list(letter,digit,us)),0,-1)));
  println$ render id;
};

// alternate encoding
var digit = #(charset "9" );
var digits = #(digit+);
var letter = #(charset "x");
var us = #("_");
var id = #( (us|letter)(letter|digit|us)*);

println$ render id;
///////////////////////////

~/felix>flx --test=build/release/ --force rt
(?:_|[x])(?:[x]|[9]|_)*
(?:_|[x])(?:[x]|[9]|_)*

This is only preliminary of course. What you see in the example is two ways to 
make
regexps combinatorially. The first way uses the regdef.flx library module, in 
particular the
tree encoding is done by this:

  union regex =
  | Alts of list[regex]
  | Seqs of list[regex]
  | Rpt of regex * int * int
  | Charset of string
  | String of string
  | Group of regex
  | Perl of string
  ;

As you can see, this is just a standard tree structure made by a recursive 
union type,
the example code is a bit messy because I used a list for Alts and Seqs, which 
has to
be written list (a1, a2, a3) to make a list from a tuple. I chicked out on 
thinking about
using an array polymorphic in its size, since that would mean a union with 
polymorphic
constructors (NOT a polymorphic union). A varray would do the job too, but the 
notation would
still be varray (a1,a2, 3).

The second encoding uses a user defined Domain Specific Sublanguage defined in 
the grammar..

syntax regexps {
  priority 
    ralt_pri <
    rseq_pri <
    rpostfix_pri <
    ratom_pri
  ;

  satom := "#" lpar sregexp[ralt_pri] rpar =># "_3";

  sregexp[ralt_pri] := sregexp[>ralt_pri] ("|" sregexp[>ralt_pri])+ =># 
    "`(ast_apply ,_sr ( ,(noi 'Alts) (ast_apply ,_sr (,(noi 'list) ,(cons _1 
(map second _2))))))"
  ;
  sregexp[rseq_pri] := sregexp[>rseq_pri] sregexp[>ralt_pri]+ =># 
    "`(ast_apply ,_sr ( ,(noi 'Seqs) (ast_apply ,_sr (,(noi 'list) ,(cons _1 
_2)))))"
  ;
  sregexp[rpostfix_pri] := sregexp[ratom_pri] "*" =># 
    "`(ast_apply ,_sr ( ,(noi 'Rpt) (,_1,0,-1)))"
  ;
  sregexp[rpostfix_pri] := sregexp[ratom_pri] "+" =>#
    "`(ast_apply ,_sr ( ,(noi 'Rpt) (,_1,1,-1)))"
  ;
  sregexp[rpostfix_pri] := sregexp[ratom_pri] "?" =>#
    "`(ast_apply ,_sr ( ,(noi 'Rpt) (,_1,1,-1)))"
  ;
  sregexp[ratom_pri] := "(" sregexp[ralt_pri] ")" =># "_2";
  sregexp[ratom_pri] := "group" "(" sregexp[ralt_pri] ")" =># 
    "`(ast_apply ,_sr ( ,(noi 'Group) ,_3))"
  ;
  sregexp[ratom_pri] := "charset" sstring =># 
    """`(ast_apply ,_sr ( 
       ,( noi 'Charset)
       (ast_literal ,_sr (ast_string ,_2))))
    """
  ;

  sregexp[ratom_pri] := sstring =># 
    """`(ast_apply ,_sr ( 
       ( ast_lookup ( ,(noi 'Regdef) "String" () )  ) 
       (ast_literal ,_sr (ast_string ,_1)))
    ) """
  ;

  sregexp[ratom_pri] := "perl" "(" sexpr ")" =># 
    """`(ast_apply ,_sr ( 
       ,( noi 'Perl)
       ,_3))
    """
  ;
  sregexp[ratom_pri] := sname=># "`(ast_name ,_sr ,_1 ())";
 
}

Yeah, it's a bit tricky :) But as you can see, this just maps the nice grammar
onto the Regdef::regex union constructors. (It's a pity the grammar can't go IN 
the
regdef.flx file, but it can't at the moment).

I have a problem though: I wanted to use a "keyword", namely "regexp" instead 
of the "#".
But it won't fly. I get a parse error. This doesn't make sense to me, though 
for sure I'm doing
something wrong!

Dypgen is GLR and the syntax and sample code clearly parse, so changing the 
unique "#" to
an identifier "regexp" should at worst cause an ambiguity. In those cases where 
the regexp syntax
conflicts with ordinary Felix expressions, the bad parse should just be dropped.

Anyhow, the nice thing here is that Felix expressions and "regexps" and be 
inter-twined.

If you have a Felix expression yielding a string to be considered as a regexp 
directly,
you just wrap the expression in perl(expr).  If you have a Felix expression 
yielding
an actual regex value, you will be able to write something like regex(expr): 
this is not in
the grammar yet, but a special case of it is: a simple variable name.

So basically, the two languages can be mixed together.


--
john skaller
skal...@users.sourceforge.net





------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to