On Wed 06 Sep, David Corbin wrote:
> Nathan Wiger wrote:
> >
> > > It would be useful (and increasingly more common) to be able to match
> > > qr|<\s*(\w+)([^>]*)>| to qr|<\s*/\1\s*>|, and handle the case where
> > > those can nest as well. Something like
> > >
> > > <list> match this with
> > > <list>
> > > </list> not this but
> > > </list> this.
> >
> > I suspect this is going to need a ?[ and ?] of its own. I've been
> > thinking about this since your email on the subject yesterday, and I
> > don't see how either RFC 145 or this alternative method could support
> > it, since there are two tags - > and </ - which are paired
> > asymmetrically, and neither approach gives any credence to what's
> > contained inside the tag. So <tag> would be matched itself as "< matches
> > >".
>
> Actually, in one of my responses I did outline a syntax which would handle
> this with reasonably ease, I think. If the contents of (?[) is considered
> a pattern, then you can define a matching pattern.
I think it should be a list of patterns rather than a single pattern.
Each pattern in the list is attempted left to right until one matches. I now
dont think it should be a hash as it needs to be ordered. But using the =>
as the l/r separateor does make it clear.
>
> m:(?['<\w+>' => '</\1>').*(?]):
>
>
> I'll grant you it's not the simplest syntax, but it's a lot simpler than
> using the 5.6 method... :)
Actually that simple case is handled as m:<(\w+)>.*</\1>: but I
think this is getting somewhere. This is a rich syntax that has lots of
potential uses, not just for html.
> >
> > What if we added special XML/HTML-parsing ?< and ?> operators?
> > Unfortunately, as Richard notes, ?> is already taken, but I will use it
> > for the examples to make things symmetrical.
> >
> > ?< = opening tag (with name specified)
> > ?> = closing tag (matches based on nesting)
We are running out of (? syntax, we might want to find some other construct
before long. But anyway, XML/HTML is important, but I am not convinced
that what is being covered here really helps. I am working on an RFC
to allow boolean logic ( && and || and !) to apply a number of patterns to
the same substring to allow easier mining of information out of such
constructs.
Richard
--
[EMAIL PROTECTED]