I've got a real easy one here (in theory). I have some XML files that were generated by a program, but generated imperfectly. There's some naked ampersands that need to be converted to &. I need a regexp that will detect them and change them. Sounds easy enough.

The pattern I want to match is an ampersand that is NOT immediately followed by a few characters and then a semicolon. Any ideas?

This is the best I've come up with so far. It should match an ampersand whose following characters, up to five, are not semicolons. I don't feel that this is a great solution. I'm hoping the community can think of a better one.

$line =~ s/\&[^;]{,5}/\&/g;

I'm hoping that'll match something like: "<tag>Blah data &</tag>", but NOT match "<tag>Blah &amp;</tag>".

I'm not sure if I'm on the right track here. I also can't match other escaped characters such as: "<tag>Copyright &copy; 2003</tag>".



--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to