Current state: this works:

///////////////////
open Regdef;
{
var digit = Charset "9";
val digits = Rpt (digit, 1, -1);
var letter = Charset "x";
var us = Regdef::String "_";
var id = Seqs (list (
   Alts (list(us,letter)), 
   Rpt( 
     Alts(list(letter,digit,us)),0,-1)));
println$ render id;
};

// alternate encoding
var digit = regexp(charset "9" );
regdef digits = digit+;
var letter = regexp(charset "x");
var us = regexp("_");
regdef id = (us|letter)(letter|digit|us)*;

val pl = regexp ( perl ( "(?:_|[x])(?:[x]|[9]|_)*" ));
println$ render id;
println$ render pl;

val dd = regexp ( regex  ( Alts(list(letter,digit,us))));
println$ render dd;


var r = RE2 (render (Group id));
var n = r.NumberOfCapturingGroups;
println$ "Captures = "+str n;

var s = "x_9";
var v = _ctor_varray[StringPiece]$ (n+1).ulong, StringPiece "";

var res = Re2::Match(r,StringPiece s,0, ANCHOR_START, v.stl_begin, v.len.int);
println$ res;
println$ v;
///////////////////

What we have here is some tests for making regexps, various ways.
The first block uses the regex type constructors.
The second lot uses the syntax extension.
Finally, I actually test the result will work with re2, which it does:


(?:_|[x])(?:[x]|[9]|_)*
(?:_|[x])(?:[x]|[9]|_)*
(?:_|[x])(?:[x]|[9]|_)*
[x]|[9]|_
Captures = 1
true
varray(x_9, x_9)

Obviously this will all be streamlined a bit more. Two things are in mind:

x in regset

will test if a string is a member of a regular set. A bit more ambitious:

match s with
| regexp "..." => ...
..


This will require a change to the compiler itself. There are two problems here:
the first is to consider how match works. Currently it is uses two functions:
a match checker (does it match??) and a match handler (so now, extract
the match variables and do something with them).

For regexps we don't want to waste time checking, then re-analysing
to extract the sub-group variables, at least I don't think so: it is actually
a lot faster to do a match without extraction than with it (so perhaps we
do want to do the matching twice).

The second problem is exactly how to get the sub groups out.
We could use an array here. We could also have named groups
and a string indexed string valued dictionary.

Or we could use actual Felix variables (as all the other matches do),
but this is harder to do and is *impossible* if the match case expression
is variable: it can only work if regexp is a constant literal (obviously,
since we need to know the variable names).

Anyhow: whilst it is by no means finished the proof of principle integration is 
now done.

--
john skaller
skal...@users.sourceforge.net





------------------------------------------------------------------------------
Lotusphere 2011
Register now for Lotusphere 2011 and learn how
to connect the dots, take your collaborative environment
to the next level, and enter the era of Social Business.
http://p.sf.net/sfu/lotusphere-d2d
_______________________________________________
Felix-language mailing list
Felix-language@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/felix-language

Reply via email to