regex problems

2014-09-20 Thread seany via Digitalmars-d-learn

consider this:


import std.conv, std.algorithm;
import core.vararg;
import std.stdio, std.regex;

void main()
{

string haystack = ID : generateWorld;
Position : { 
 
 {ID : \ absolute ; Coordinate : , NULL OMEGA;}
 
 {ID : \ inclusion ; Coordinate : UNDEF;}
 
 {ID : \ subarc; Coordinate : , NULL OMEGA;  }
  }; ID : ;

// thus, something like *{B}* can not end here,
// but something like X can start here.

string needle = 
(?!(([.\n\r])*(\\{)([.\n\r])*))(ID(\\p{White_Space})*:(\\p{White_Space})*)(?!(([.\n\r])*(\\})([.\n\r])*));


auto r = regex(needle, g);
auto m = matchAll(haystack, r);

foreach (c; m)
  writeln(c.hit);

}


So let us break up needle:

(
?!
  (
([.\n\r])*(\\{)([.\n\r])*
  )
)

Do not match somthing, that may contain a *{* as a leading 
match, * this time means any character, including \n and \r


(ID(\\p{White_Space})*:(\\p{White_Space})*)

however, look for the form : ID few blank spaces :  more 
blank spaces


(?!(([.\n\r])*(\\})([.\n\r])*))

but no trailing *}* as a trailing match.

In haystack, there are two such ID : -s. once at the beginning, 
ID : generateWorld. and then the final, last ID


However, this is returning all 5 ID-s as match

what am I doing wrong?


Re: regex problems

2014-09-20 Thread anonymous via Digitalmars-d-learn

On Saturday, 20 September 2014 at 15:28:54 UTC, seany wrote:
In haystack, there are two such ID : -s. once at the 
beginning, ID : generateWorld. and then the final, last ID


However, this is returning all 5 ID-s as match

what am I doing wrong?


Prints

ID :
ID :

for me.

I'd advise against using regular expressions in this way, though.
They are not the proper tool for nested structures. Coming up
with correct(!) regexes is probably harder than the alternatives:
* using a parser generator like Pegged [1] (haven't used it
myself) which supports more powerful grammars than regular
expressions,
* writing a (recursive descent) parser manually.

[1] https://github.com/PhilippeSigaud/Pegged


Re: regex problems

2014-09-20 Thread AsmMan via Digitalmars-d-learn

On Saturday, 20 September 2014 at 15:28:54 UTC, seany wrote:

consider this:


import std.conv, std.algorithm;
import core.vararg;
import std.stdio, std.regex;

void main()
{

string haystack = ID : generateWorld;
Position : { 
 
 {ID : \ absolute ; Coordinate : , NULL OMEGA;}
 
 {ID : \ inclusion ; Coordinate : UNDEF;}
 
 {ID : \ subarc; Coordinate : , NULL OMEGA;  }
  }; ID : ;

// thus, something like *{B}* can not end here,
// but something like X can start here.

string needle = 
(?!(([.\n\r])*(\\{)([.\n\r])*))(ID(\\p{White_Space})*:(\\p{White_Space})*)(?!(([.\n\r])*(\\})([.\n\r])*));


auto r = regex(needle, g);
auto m = matchAll(haystack, r);

foreach (c; m)
  writeln(c.hit);

}


So let us break up needle:

(
?!
  (
([.\n\r])*(\\{)([.\n\r])*
  )
)

Do not match somthing, that may contain a *{* as a leading 
match, * this time means any character, including \n and \r


(ID(\\p{White_Space})*:(\\p{White_Space})*)

however, look for the form : ID few blank spaces :  more 
blank spaces


(?!(([.\n\r])*(\\})([.\n\r])*))

but no trailing *}* as a trailing match.

In haystack, there are two such ID : -s. once at the 
beginning, ID : generateWorld. and then the final, last ID


However, this is returning all 5 ID-s as match

what am I doing wrong?


Is this string a JSON string? if so, why not use a proper JSON 
parsing library?
as other already mentioned, this kind of data isn't good to parse 
using regex... write small routines to parse that data instead 
of. It isn't more hard than make it working using regexp. 
Seriously.