[REBOL] Regular Expressions Re:(2)

newsletters Tue, 4 Jan 2000 13:21:54 -0800
I'll just reply to both messages so far with just one reply. First to Elan:

> I doubt that very much (but only REBOL Tech can tell you for sure).
Parse - and even more so the version Carl announced for REBOL/View - is much
more powerful and at the same time more user friendly.

Yeah, I doubted it too. Regexes seem unnecessary when 'parse is around.
Regexes in Rebol would only be good for two things, as far as I can see.
Winning programmers from Perl and other languages that provide them, or at
least making programmers comfortable with regexes more comfortable in Rebol,
and providing alternate ways to do parsing that could probably be done with
'parse, but might be simpler with regexes. Just to use an example that is
common in the Rebol documentation, it's easier to say /<title>(.*)</title>/
than "thru <title> copy title to </title>" although I admit this is a bad
example because it doesn't really show areas where regexes simplify things a
lot more. I don't feel like coming up with an example, but anyone familiar
with regexes will probably agree with me. An area where regexes might be a
lot easier than its Rebol code counterpart is in substition.

What was that about an improved 'parse for REBOL/View?

> >2. I've heard that 'parse actually supports a superset of regular
> >expressions. Is this true?
>
> IMHO definitely.

That's really cool. I'd like to see some things that 'parse can do that
regexes can't, besides putting the parsing rules in BNF.. I'm not
challenging you, it's just a wish. :)

> use [_parse] [
>   _parse: :parse
>   parse: func [string [any-string!] rule [string! block! none!] /regex] [
>     either regex [print "regex"] [_parse string rule]
>   ]
> ] ;- close use
>
> >> parse "abc" none
> == ["abc"]
> >> parse/regex "abc" none
> regex

Awesome, I didn't realize it would be that easy to overload an existing word
like that.

> >'regex could return a block of values for patterns that have parentheses
in
> >them, so with
> >
> >string: "Hello, World. This is your captain speaking."
> >regex string "(H.+),.+(c.+) "
> >
> >regex returns ["Hello" "captain"]
>
> What you propose here would be a dialect.

Ok, I'm still not sure exactly what qualifies something to be a dialect. :)
If you could explain I'd appreciate it.

> >Would it be so slow that it would be
> >useless?
>
> No. BTW, it should be a breeze to implement regexs using parse.

Awesome, glad to hear that you think it shouldn't be too slow. There would
be no point in making it if it wouldn't be usable for real tasks.

Now to Eric:

> Funny you should ask. I just posted search-text.r to rebol.org, which is
an
> attempt to simulate regular expressions in REBOL. Please have a look at it
> and tell me what you think.
>
> http://www.rebol.org/utility/search-text.r

Wow, I'm impressed. I have to look at the code more, so I apologize in
advance if anything I say is inaccurate. I also want to get my syntax
highlighting file working better in EditPlus so that it's easier to read
when I print it. :) It looks like we have different goals or reasons in mind
for developing regex support for Rebol. My goal with it is to be as "Perl
compatible" as possible. As I stated above, one of my main goals for
developing regex support for rebol would be to allow people coming from Perl
to just jump right in. Quantifiers don't work, for example:

>> a: "abcddddefg"
== "abcddddefg"
>> search a ["a"]
== [1 1 "a"]
>> search a ["d"]
== [4 1 "d"]
>> search a ["d+"]
== none

Also, it works differently than in Perl by providing all that extra info for
where the regex matched. I like it, but I'm not sure whether I'd keep it or
not. It's doesn't work the same as in Perl, although in this case it doesn't
really matter, because blocks evaluate to true, but in other cases it
wouldn't work how I'd want it to.

>> search a ["(d)"]
== none

For cases like this I'd like to return a block of strings containing the
values that matched. This would replace Perl's $1, $2, etc. This serves as
another example of where the "Perl compatible" support isn't there, since
this doesn't even match.

Oops, sorry about that. All of my responses to you so far have been based on
the first e-mail. You note in your second:

> I use regular expressions in block form, and have tried to make them as
> similar as possible to parse rules. Don't have any text capture yet,
except
> for the whole match. That's definitely needed, but I haven't figured out
how
> to do it yet.

Yeah, I'm not sure how to do the text capture yet either, or exactly how I'm
going to translate character classes (what are they called in Perl? I'm
talking about the things that appear in brackets) into Rebol bitsets and put
them in the rules, or how I'm going to do non-greedy matching.

About the backtracking. I'm not sure how essential backtracking is, because
I think we can simulate a bunch of what backtracking does with things like
"any" and "some" rules for 'parse. I don't know all that much about
backtracking though, so maybe it's a lot more essential than I think. The
thing is that I don't think Perl itself had backtracking until version 5,
and I think there have been regex libraries which have at least been useful
which haven't had full support for backtracking.

Anyway, thanks for the code suggestions. I haven't done much work on the
thing yet, and I'm still getting comfortable with 'parse itself. I'll keep
ya'll posted with developments.

By the way, one of the things I'm having trouble figuring out is the basic
structure of the whole thing. I'm not sure whether I should just split the
whole pattern into separate characters or some larger entities or what.

Let me just leave you with some of the code I've got so far.

regex: make function! [
    string [string!] "The string to run the regex on"
    pattern [string!] "The regex"

    /local alternatives alternative simplematch +match *match
]
[
    simplematch: [parse/case string [to pattern to end]]

    +match: "test"                               ;I don't have any
quantifiers working yet.
    *match: "test"
    alternatives: parse pattern "|"
        foreach alternative alternatives [
            pattern: alternative
            if do simplematch [return true]
        ]
]

Basically all it is so far is a big dumb "find" that works with
alternatives. I split the whole thing up into alternatives, (with the
degenerate case still working) and try to match on each alternative.

Ok, sorry for the huge e-mail. :)

Thanks.

Keith
[REBOL] Regular Expressions Re:(2)

Reply via email to