Hi Nick! > I'm not sure where and how you're manipulating the DOM but I'd also > be curious as to how it works with potentially horribly XML unfriendly > content eg something that has been posted that originated in Microsoft > Word for example. I just remember in some of the PHP4 > XML based templating engines I played with that they had a tendency > to choke on the kind of real world content that users put in.
Yes, I was also thinking of Word and the likes when implementing the DOM based approach :/ Initially, I used regex to find all ahrefs and formactions for link replacement. Unfortunately, I'm no mr. regex so that turned out to be quite difficult for me. On the other hand, I was fearing that regex might just solve another part of the problem, working e.g. for valid and malformed documents but not for all cases that ahref links/ formactions might look like. The current code basically looks like this: $responseDoc = new DOMDocument(); $responseDoc->loadHtml($response); // process the form action links $formTags = $responseDoc->getElementsByTagName("form"); foreach ($formTags as $formTag) { if ($formTag->hasAttribute("action")) { $action = $formTag->getAttribute("action"); $newAction = $this->_postProcessUrl($action, $previousPortletactionParam); $formTag->setAttribute("action", $newAction); } } which was really easy to implement. Do you see a chance to improve the parsing part? regs, Stephan