Re: S5 updated

Rod Adams Fri, 24 Sep 2004 09:37:11 -0700

Edward Peschko wrote:

Well, there re two responses to the "that's not a common thing to want to do":
   1) its not a common thing to want to do because its not a useful thing to do.
   2) its not a common thing to want to do because its too damn difficult to do.
I'd say that #2 is what holds. *Everybody* has difficulties with regular expressions - about a quarter of my job is simply looking at other people's regex used in data transformations and deciding what small bug is causing them to fail given a certain input.

Running a regular expression in reverse has IMO the best potential for making regexes transparent - you graphically see how they work and what they match. So would this get used? Yes - far more IMO than *other* parts of the language that already are sanctified: continuations, for example.

I have to disagree here. I've also been the head of a major data transformation project that used P5 RE engine as a workhorse. And yes, they are a pain to debug. But I really don't think that a RE --> string generator is the solution.

What's wrong with it? Quite simply, they can generate way too many different results, and finding the ones that will give you insight can be challenging at best. I imagine it seeing the first * or + and then generating an infinite number of strings for just that. What logic would you impose to truncate this list in a "standard" way? Given a long list of possibilities, I'd be tempted to write a different RE to scan the matches of the generation of the first RE, and how should I then debug the second ones? Also, showing a list of _what_ that RE does match gives no clues as to _why_ it matched it, or moreover, why it _didn't_ match what you wanted. I just don't see it as being useful.

Oh, and I'd disagree with "*Everybody* has difficulties with regular expressions". If after a month of working on my project, if you still had troubles building/modifying RE's, your job future there would be in serious question. But I never had a problem with that. Once people got up to speed, it was mainly just the tedium of dealing with the volume of rules and data that got to people. And for reference, outright nasty RE's were fairly commonplace.

There's simply no way to graphically show regexes now. Even use re 'debug' is terribly cryptic. The best way to deal with them right now is to burn a regex parser into your brain.

Ahh... Now this is the real problem. People need someway to better see how a given RE attacks a given string. I see potential for a standalone program that acts as a "RE analyzer". Inputs would be a RE and a string. Output would be a step by step graph of the internal logic used to match / not match the string. I'd break the RE up into the same pieces the Engine does, then show how that subrule matched char a, then char b, but failed to match c, so it backtracked to a, etc. I envisions three columns: partial string matched so far, partial RE used to match it, and whatever flags or comments the engine wishes to make at this juncture.

The beauty of this solution is, if Perl6 and/or Parrot (not sure which is the better choice) provides a few hooks into the P6RE Engine, it would be absolutely authoritative, and could even handle cases where the rules changed due to module loading. And since I'd expect those hooks to be in there anyways for other reasons (mainly, letting people muck around with how they work), I'd suspect all the "Core support" needed would already be there (I may be wrong.).

But it doesn't need to be core. A friendly side project, possibly mentioned in the core documentation as a learning tool, would do the job nicely. And it doesn't need to be discussed on p6i, p6l, or p6c, at least not for a very long time.

Or we could just burn a RE parser into everyone's brain, as you mentioned. That'd also work.

-- Rod Adams

Re: S5 updated

Reply via email to