But wouldn't a non-greedy match only ever find the closest closing
tag, and if we can assume valid XML, that will always be the correct
one?
Still need to be looped - at least once per level of tags (or simply
with a conditional/while), but doesn't need any markers or anything.

On 11/14/06, Rob Wilkerson <[EMAIL PROTECTED]> wrote:
> On 11/14/06, Peter Boughton <[EMAIL PROTECTED]> wrote:
> > I'm guessing your problem is with greedy matching.
> > Your current regex will look like this: (partone).*(parttwo)
> > It should look like this: (partone).*?(parttwo)
> > (not the question mark after the asterisk)
> >
> > Also, if you're doing this in three steps, you'll want to do it in one.
> > Actually, here we go, have a try with this...
> > Search:
> > <Emphasis type="([biuBIU])[^"]*">(.*?)</Emphasis>
> > Replace:
> > <\1>\2</\1>
> >
> > (or <$1>$2</$1> if you're using something that uses dollars rather than 
> > slashes)
> >
> > Not sure if you'll need to wrap that in a while-loop to guarantee
> > everything gets replaced.
> >
> >
> > Hmmm, odd - I just decided to quickly test that, and it only worked
> > without the non-greedy modifier. ie: this worked:
> > <cfloop condition="#REFind('<Emphasis type="([biuBIU])[^"]*">',X)#">
> >         <cfset X = REReplace(X,'<Emphasis
> > type="([biuBIU])[^"]*">(.*)</Emphasis>',"<\1>\2</\1>","all")/>
> > </cfloop>
>
> The problem she's going to have is that the regex has no way of
> knowing which </Emphasis> tag is the proper close tag in a nested
> scenario.  Using the original example:
>
> <Book>
> <Emphasis type="Bold">
> sample text
>   <Emphasis type="Italic">
>    sample Text
>         <Emphasis type="underline">
>             sample text
>         </Emphasis>
>   </Emphasis>
> sample text
> </Emphasis>
> </Book>
>
> A greedy match (.*) on <Emphasis type="Bold"> would find the very last
> </Emphasis> which happens to be correct, but a greedy match on
> <Emphasis type="Italic"> would also return that same </Emphasis> (that
> is, the very last one).  Not correct.  On the other hand, a non-greedy
> match (.*?) on <Emphasis type="Italic"> would return the very first
> </Emphasis> tag it finds - in this case the close tag for underline.
> Also not correct.
>
> As far as I know the only way to accomplish this is to loop over
> matches and manually find the proper end tag - regex by itself cannot
> accomplish this particular task.
>
> 

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|
Introducing the Fusion Authority Quarterly Update. 80 pages of hard-hitting,
up-to-date ColdFusion information by your peers, delivered to your door four 
times a year.
http://www.fusionauthority.com/quarterly

Archive: http://www.houseoffusion.com/groups/RegEx/message.cfm/messageid:988
Subscription: http://www.houseoffusion.com/groups/RegEx/subscribe.cfm
Unsubscribe: 
http://www.houseoffusion.com/cf_lists/unsubscribe.cfm?user=11502.10531.21

Reply via email to