Felipe Almeida Lessa wrote: > On 26 Dec 2006 04:22:38 -0800, placid <[EMAIL PROTECTED]> wrote: > >> So do you want to remove "&" or replace them with "&" ? If you want >> to replace it try the following; > > > I think he wants to replace them, but just the invalid ones. I.e., > > This & this & that > > would become > > This & this & that > > > No, i don't know how to do this efficiently. =/... > I think some kind of regex could do it.
Yes, and the appropriate one is: krefindamp = re.compile(r'&(?!(\w|#)+;)') ... xmlsection = re.sub(krefindamp,'&',xmlsection) This will replace an '&' with '&' if the '&' isn't immediately followed by some combination of letters, numbers, and '#' ending with a ';' Admittedly this would let something like '&xx#2;', which isn't a legal entity, through unmodified. There's still a potential problem with unknown entities in the output XML, but at least they're recognized as entities. John Nagle -- http://mail.python.org/mailman/listinfo/python-list