S. Isaac Dealey wrote: >>Ok, I've got a little problem here. I'm reading an XML file from a third >>party and displaying it's content. The problem is that the third party >>is not checking for illegal characters in the XML file. So things like: > > >><news>He & me are</news> > > >>will show up in the damn thing. So I want to replace the special >>characters, but only those that are outside of the tags. I've probably >>got to use a regexp for this, but I'm not sure how to do this. I know I >>can select part of a sctring with regexp and replce it with a changed >>version of thet string, but how is that done efficiently, and in one >>REReplace (I know it can be done, but don't know how). > > >>Anyone? > > >>Jesse > > > Unfortunately, while you can use back-references to return a portion of a > found regular expression back to the replacement, you can't use any kind of > functions or conditional logic on these back-references, so you'd have to > replace each character individually... As for actually getting the illegal > characters, try something like this: > > <cfset illegalchar = REFind(">[^<]*?[^ _-\.[:alnum:]][^<]*?",myxmlpacket)> > > This should give you the location of the first illegal character in the > packet, within the contents of an element, assuming that an illegal > character is anything other than a space, underscore, hyphen, dot or > alpha-numeric character... That's probably not a real good definition for > illegal characters, but it's a starting point. :) > > Once you know where that character is, then you can replace it with > something like <char=#asc(illegalcharacter)#> or whatever the spec. is for > special characters in your xml dtd. Am I using the terminology correctly?
Ok, I found a solution, it works fine, but could use a bit op optimization I think. But I first check IF the document is valid, and if not parse it, so the impact should not be too high, as they usually DO give a valid XML to parse. The solution is this: <cfscript> ct=htmleditformat(cfhttp.filecontent, -1); ct=REReplace(ct, "(<[^&>]*)"([^>]*>)", "<\1""\2>", "ALL"); ct=Replace(ct, "<", "<", "ALL"); ct=Replace(ct, ">", ">", "ALL"); newct=""; while (not ct is newct){ newct=ct; ct=REReplaceNoCase(ct, "(<[^>&]*)"([^>]*>)", "\1""\2", "ALL"); } ct=REReplace(ct, "&([a-zA-Z]*);", "&\1;", "ALL"); </cfscript> And it works like a charm :) Jesse ______________________________________________________________________ Get the mailserver that powers this list at http://www.coolfusion.com FAQ: http://www.thenetprofits.co.uk/coldfusion/faq Archives: http://www.mail-archive.com/cf-talk@houseoffusion.com/ Unsubscribe: http://www.houseoffusion.com/index.cfm?sidebar=lists