RE: [PHP] Using DOM textContent Property
Nathan, Thanks for your help on this. I actually need to do this a different way I think though. The problem is that I'm not just replacing a text entity with a link entity. For example, consider this paragraph: pFor information, please contact [EMAIL PROTECTED]/p In this case, I want [EMAIL PROTECTED] to be a link, but not the rest of the paragraph. That means that the p entity has to be split into three separate entities - one DOMText for For information, please contact , one DOMEntity node for [EMAIL PROTECTED], and one DOMText node for .. This seems doable with the DOM modle, but complicated. I'm thinking regular expressions might be the way to go again. :\ Tim Gustafson SOE Webmaster UC Santa Cruz [EMAIL PROTECTED] 831-459-5354 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using DOM textContent Property
On Wed, Sep 10, 2008 at 10:35 AM, Tim Gustafson [EMAIL PROTECTED] wrote: Nathan, Thanks for your help on this. I actually need to do this a different way I think though. The problem is that I'm not just replacing a text entity with a link entity. For example, consider this paragraph: pFor information, please contact [EMAIL PROTECTED]/p In this case, I want [EMAIL PROTECTED] to be a link, but not the rest of the paragraph. That means that the p entity has to be split into three separate entities - one DOMText for For information, please contact , one DOMEntity node for [EMAIL PROTECTED], and one DOMText node for .. This seems doable with the DOM modle, but complicated. I'm thinking regular expressions might be the way to go again. :\ so use some regex :D thats the only way i know of to determine if DOMText nodes contain email address(s) as substrings while retaining ones sanity... i got it working, again by modifying the code from my original post and dropping in an additional clause which will use regex to determine if there is an email address embedded in a DOMText node, however, it checks to see if the whole thing is a mail first, cause i think thats a little optimization, but it could be ommitted. heres the output of the script now (notice i changed the input text to incorporate the new issue): [EMAIL PROTECTED] ~/domIterator/initialTests $ php testDom.php IN: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2b[EMAIL PROTECTED]/b/h2ptext that we dont want to turn into a link.. [EMAIL PROTECTED]/pa name=barstuff inside the link/aFoopcare/ppyoyser/p/body/html OUT: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2ba href=mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]/a/b/h2ptext that we dont want to turn into a link.. a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a/pa name=barstuff inside the link/aFoopcare/ppyoyser/p/body/html and here is the code; sorry for the lengthy post fellas, i just want to post all of it rather than just attempting to illustrate the segments ive changed, ?php $doc = new DOMDocument(); $doc-loadHTML('htmlbodyTestbrh2b[EMAIL PROTECTED]/b/h2ptext that we dont want to turn into a link.. [EMAIL PROTECTED]/pa name=barstuff inside the link/aFoopcare/ppyoyser/p/body/html'); echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL; findTextNodes($doc-getElementsByTagName('*'), 'convertToLinkIfNecc'); echo 'OUT: ' . PHP_EOL . $doc-saveXML() . PHP_EOL; /** * run through a DOMNodeList, looking for text nodes. apply a callback to * all such text nodes that are encountered */ function findTextNodes(DOMNodeList $nodesToSearch, $callback) { foreach($nodesToSearch as $curNode) { if($curNode-hasChildNodes()) foreach($curNode-childNodes as $curChild) if($curChild instanceof DOMText) call_user_func($callback, $curNode, $curChild); } } /** * determine if a node should be modified, by chcking to see if a child is a text node * and the text looks like an email address. * call a subordinate function to convert the text node into a mailto anchor DOMElement */ function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if(strtolower($textContainer-nodeName) === 'a') /// per original request dont bother w/ a tags return; if(filter_var($textNode-wholeText, FILTER_VALIDATE_EMAIL) !== false) { convertMailtoToAnchor($textContainer, $textNode); } else { /// lets see if theres an email burried in this text node /// regex taken from: http://www.regular-expressions.info/email.html preg_match('/[EMAIL PROTECTED],4}\b/i', $textNode-wholeText, $matches); if(count($matches) 0) rebuildTextNodeWithEmailAddrs($textContainer, $textNode, $matches); } } /** * given a DOMText instance w/ multiple email addresses, construct * a new set of nodes that contain the original text along w/ anchors for * all the bare email addresses */ function rebuildTextNodeWithEmailAddrs(DomElement $textContainer, DOMText $textNode, array $emailAddrs) { $nodeOrder = array(); /// construct array of elements $origText = $textNode-wholeText; foreach($emailAddrs as $curAddr) { $startPos = strpos($origText, $curAddr);// start pos of cur email $txtBuff = substr($origText, 0, $startPos);// buffer so we can check if its empty if(!empty($txtBuff)) { $eltTokens[] = $txtBuff; $nodeOrder[] = 't';// indicate this token is a textNode } $eltTokens[] = $curAddr; $nodeOrder[] = 'e';// indicate this token is an email addr $origText = substr($origText, $startPos + strlen($curAddr)); } /// now that we have the tokens delete the orig DOMText and drop in the
Re: [PHP] Using DOM textContent Property
Hi Nathan, if you're already speaking of iterating children, i'd like to ask you another question: Basically i was trying to do the same thing as Tim, when i experienced some difficulties iterating over DOMElement-childNodes with foreach and manipulating strings inside the nodes or even replacing DOMElement/DOMNode/DOMText with another node. Instead, i am currently iterating like this: $child = $element-firstChild; while ($child != null) { $next_sibling = $child-nextSibling; // Do something with child (manipulate, replace, ...) // Continue iteration $child = $next_sibling } Is this correct, or is there any better way? Thank you in advance! Mario Nathan Nobbe schrieb: bouncing back to the list so that others may benefit from our work... On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson [EMAIL PROTECTED] wrote: Nathan, Thanks for the suggestion, but it's still not working for me. Here's my code: === $HTML = new DOMDocument(); @$HTML-loadHTML($text); $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { $Element = $Elements-item($X); if ($Element-tagName == a) { # SNIP - Do something with A tags here } else if ($Element instanceof DOMText) { echo $Element-nodeValue; exit; } } === This loop never executes the instanceof part of the code. If I add: } else if ($Element instanceof DOMNode) { echo foo!; exit; } Then it echos foo! as expected. It just seems that none of the nodes in the tree are DOMText nodes. In fact, get_class($Element) returns DOMElement for every node in the tree. Tim, i got your code working with minimal effort by pulling in two of the methods i posted and making some revisions. scope it out, (this will produce the same output as my last post (the part after OUT:)) ?php $text = 'htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html'; $HTML = new DOMDocument(); $HTML-loadHTML($text); $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { $Element = $Elements-item($X); if($Element-hasChildNodes()) foreach($Element-childNodes as $curChild) if ($curChild-nodeName == a) { # SNIP - Do something with A tags here } else if ($curChild instanceof DOMText) { convertToLinkIfNecc($Element, $curChild); } } echo $HTML-saveXML() . PHP_EOL; function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if( (strtolower($textContainer-nodeName) != 'a') (filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false) ) { convertMailtoToAnchor($textContainer, $textNode); } } function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode) { $newNode = new DomElement('a', $textNode-nodeValue); $textContainer-replaceChild($newNode, $textNode); $newNode-setAttribute('href', mailto:{$textNode-nodeValue}); } ? so, the problem is iterating over a tree structure will only show you whats at the first level of the tree. this is why you need to call hasChildNodes(), and if that is true, call childNodes() and iterate across that (and really, the code should be doing the same thing there as well, calling hasChildNodes() and iterating over the results of childNodes()). the code i have shown will work for the html i posted, however it wont work on (x)html where these text nodes we're searching for are deeper in the tree than the second level. im sure you can cook up something that will recurse down to the leafs :) anyway, im going to try and hook up a RecursiveDOMDocumentIterator that implements RecursiveIterator so that it has the convenient foreach support. also, ill probly try to hook up a Filter varient of this class so that situations like this are trivial. stay tuned :D -nathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using DOM textContent Property
On Tue, Sep 9, 2008 at 12:37 AM, Mario Trojan [EMAIL PROTECTED] wrote: Hi Nathan, if you're already speaking of iterating children, i'd like to ask you another question: Basically i was trying to do the same thing as Tim, when i experienced some difficulties iterating over DOMElement-childNodes with foreach and manipulating strings inside the nodes or even replacing DOMElement/DOMNode/DOMText with another node. Instead, i am currently iterating like this: $child = $element-firstChild; while ($child != null) { $next_sibling = $child-nextSibling; // Do something with child (manipulate, replace, ...) // Continue iteration $child = $next_sibling } Is this correct, or is there any better way? i found this the other day on the DOMNodeList page on php.net, essentially foreach will implicitly do what you are doing under the hood, actually, it will also recurse into the children, whereas in this example youve shown, youre only iterating over 1 sub-level of the tree (horizontally across elements at the same level). sometimes it makes sense to drive the iteration yourself as you have shown, but i think the answer to your question is that you must use a reference to the parent to perform manipulations to the dom during iteration, see below (hope it helps :D), -nathan *a dot buffa at sns dot it* 29-May-2008 04:28 http://us2.php.net/manual/en/class.domnodelist.php#83513 I agree with drichter at muvicom dot de. For istance, in order to delete each child node of a particular parent node, ?php while ($parentNode-hasChildNodes()){ $domNodeList = $parentNode-childNodes; $parentNode-removeChild($domNodeList-item(0)); } ? In other word you have to uptade the DomNodeList on every iteration. In my opinion, the DomNodeList class is useless.
Re: [PHP] Using DOM textContent Property
Nathan Nobbe wrote: In my opinion, the DomNodeList class is useless. agreed; ever tried making a replacement node class that extends it? then you see how useless it is! [yet a vital part of the dom structure] ot here; but I thought maybe useful for reference; I do loads of xml/dom api work and find that this little iterator is very very useful; I've trimmed it down but you'll find below how *I* iterate through the dom grabbing the important values.. private function iterateDom( $nodeList ) { foreach( $nodeList as $values ) { if( $values-nodeType == XML_ELEMENT_NODE ) { $nodeName = $values-nodeName; if( $values-attributes ) { for( $i=0;$values-attributes-item($i);$i++ ) { $attributeName = $values-attributes-item($i)-nodeName $attributeValue = $values-attributes-item($i)-nodeValue } } $values-children = $this-iterateDom( $values-childNodes ); $tempNode[$nodeName] = $values; } elseif( in_array($values-nodeType, array(XML_TEXT_NODE, XML_CDATA_SECTION_NODE)) ) { $nodeType = $values-nodeType; $nodeData = $values-data; } elseif( $values-nodeType === XML_PI_NODE ) { $DOMProcessingInstruction = array('target' = $values-target, 'data' = $values-data); } # other wise we ignore as all that's left is DOMComment } } might be useful for somebody -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Using DOM textContent Property
Nathan, Thanks for the suggestion, but it's still not working for me. Here's my code: === $HTML = new DOMDocument(); @$HTML-loadHTML($text); $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { $Element = $Elements-item($X); if ($Element-tagName == a) { # SNIP - Do something with A tags here } else if ($Element instanceof DOMText) { echo $Element-nodeValue; exit; } } === This loop never executes the instanceof part of the code. If I add: } else if ($Element instanceof DOMNode) { echo foo!; exit; } Then it echos foo! as expected. It just seems that none of the nodes in the tree are DOMText nodes. In fact, get_class($Element) returns DOMElement for every node in the tree. Tim Gustafson SOE Webmaster UC Santa Cruz [EMAIL PROTECTED] 831-459-5354 From: Nathan Nobbe [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 03, 2008 11:55 AM To: Tim Gustafson Cc: [EMAIL PROTECTED]; php-general@lists.php.net Subject: Re: [PHP] Using DOM textContent Property On Wed, Sep 3, 2008 at 10:03 AM, Tim Gustafson [EMAIL PROTECTED] wrote: I think you might be better off using regexp on the text *before* sending it through the DOM parser. Send the user's text through a function that searches for URLs and email addresses, creating proper links as they're found, then use the output from that to move on to your DOM stuff. That way, you need not create new nodes in your nodelist. I think that's the way I'm going to have to go, but I was really hoping not to. Thanks for the suggestion! i think i have what youre looking for Tim, take a look at this script output [EMAIL PROTECTED] ~ $ php testDom.php IN: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html OUT: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/aa name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html and heres the code using the DOM extension you may have to tweak it to suit your needs, but currently i think it does the trick ;) ?php $doc = new DOMDocument(); $doc-loadHTML('htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html'); echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL; findTextNodes($doc-getElementsByTagName('*'), 'convertToLinkIfNecc'); echo 'OUT: ' . PHP_EOL . $doc-saveXML() . PHP_EOL; /** * run through a DOMNodeList, looking for text nodes. apply a callback to * all such text nodes that are encountered */ function findTextNodes(DOMNodeList $nodesToSearch, $callback) { foreach($nodesToSearch as $curNode) { if($curNode-hasChildNodes()) foreach($curNode-childNodes as $curChild) if($curChild instanceof DOMText) #echo TEXT NODE FOUND: . $curChild-nodeValue . PHP_EOL; /// todo: allow use of hook here call_user_func($callback, $curNode, $curChild); } } /** * determine if a node should be modified, by chcking to see if a child is a text node * and the text looks like an email address. * call a subordinate function to convert the text node into a mailto anchor DOMElement */ function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if( (strtolower($textContainer-nodeName) != 'a') (filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false) ) { convertMailtoToAnchor($textContainer, $textNode); } } /** * modify a DOMElement that has a DOMText node as a child; create a DOMElement * that represents and a tag, and set the value and href attirbute, so that it * acts as a 'mailto' link */ function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode) { $newNode = new DomElement('a', $textNode-nodeValue); $textContainer-replaceChild($newNode, $textNode); $newNode-setAttribute('href', mailto:{$textNode-nodeValue
[PHP] Using DOM textContent Property
bouncing back to the list so that others may benefit from our work... On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson [EMAIL PROTECTED] wrote: Nathan, Thanks for the suggestion, but it's still not working for me. Here's my code: === $HTML = new DOMDocument(); @$HTML-loadHTML($text); $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { $Element = $Elements-item($X); if ($Element-tagName == a) { # SNIP - Do something with A tags here } else if ($Element instanceof DOMText) { echo $Element-nodeValue; exit; } } === This loop never executes the instanceof part of the code. If I add: } else if ($Element instanceof DOMNode) { echo foo!; exit; } Then it echos foo! as expected. It just seems that none of the nodes in the tree are DOMText nodes. In fact, get_class($Element) returns DOMElement for every node in the tree. Tim, i got your code working with minimal effort by pulling in two of the methods i posted and making some revisions. scope it out, (this will produce the same output as my last post (the part after OUT:)) ?php $text = 'htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html'; $HTML = new DOMDocument(); $HTML-loadHTML($text); $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { $Element = $Elements-item($X); if($Element-hasChildNodes()) foreach($Element-childNodes as $curChild) if ($curChild-nodeName == a) { # SNIP - Do something with A tags here } else if ($curChild instanceof DOMText) { convertToLinkIfNecc($Element, $curChild); } } echo $HTML-saveXML() . PHP_EOL; function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if( (strtolower($textContainer-nodeName) != 'a') (filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false) ) { convertMailtoToAnchor($textContainer, $textNode); } } function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode) { $newNode = new DomElement('a', $textNode-nodeValue); $textContainer-replaceChild($newNode, $textNode); $newNode-setAttribute('href', mailto:{$textNode-nodeValue}); } ? so, the problem is iterating over a tree structure will only show you whats at the first level of the tree. this is why you need to call hasChildNodes(), and if that is true, call childNodes() and iterate across that (and really, the code should be doing the same thing there as well, calling hasChildNodes() and iterating over the results of childNodes()). the code i have shown will work for the html i posted, however it wont work on (x)html where these text nodes we're searching for are deeper in the tree than the second level. im sure you can cook up something that will recurse down to the leafs :) anyway, im going to try and hook up a RecursiveDOMDocumentIterator that implements RecursiveIterator so that it has the convenient foreach support. also, ill probly try to hook up a Filter varient of this class so that situations like this are trivial. stay tuned :D -nathan
Re: [PHP] Using DOM textContent Property
Tim Gustafson wrote: Hello, I am writing a filter in PHP that takes some HTML as input and goes through the HTML and adjusts certain tag attributes as needed. So, for example, if a tag is missing the title attribute, this filter adds a title attribute to the a tag. I'm doing this all using PHP 5 and the DOM parsing library, and it's working really well. The one snafu I'm running in to is dealing with users who will just type an e-mail address into an HTML document without actually making it a link - so, they'll just put [EMAIL PROTECTED] rather than a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a. I'd like for these incorrectly entered e-mail addresses to magically change into real clickable links, so I'd like my filter to be able to grab those plain text e-mail addresses and convert them to actual clickable links. I tried iterating through all the elements on a page using something like this: $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { ... SNIP ... } I think you might be better off using regexp on the text *before* sending it through the DOM parser. Send the user's text through a function that searches for URLs and email addresses, creating proper links as they're found, then use the output from that to move on to your DOM stuff. That way, you need not create new nodes in your nodelist. -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
RE: [PHP] Using DOM textContent Property
I think you might be better off using regexp on the text *before* sending it through the DOM parser. Send the user's text through a function that searches for URLs and email addresses, creating proper links as they're found, then use the output from that to move on to your DOM stuff. That way, you need not create new nodes in your nodelist. I think that's the way I'm going to have to go, but I was really hoping not to. Thanks for the suggestion! Tim Gustafson SOE Webmaster UC Santa Cruz [EMAIL PROTECTED] 831-459-5354 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using DOM textContent Property
On Wed, Sep 3, 2008 at 10:03 AM, Tim Gustafson [EMAIL PROTECTED] wrote: I think you might be better off using regexp on the text *before* sending it through the DOM parser. Send the user's text through a function that searches for URLs and email addresses, creating proper links as they're found, then use the output from that to move on to your DOM stuff. That way, you need not create new nodes in your nodelist. I think that's the way I'm going to have to go, but I was really hoping not to. Thanks for the suggestion! i think i have what youre looking for Tim, take a look at this script output [EMAIL PROTECTED] ~ $ php testDom.php IN: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html OUT: ?xml version=1.0 standalone=yes? !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN http://www.w3.org/TR/REC-html40/loose.dtd; htmlbodyTestbr/h2a href=mailto:[EMAIL PROTECTED] [EMAIL PROTECTED]/aa name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html and heres the code using the DOM extension you may have to tweak it to suit your needs, but currently i think it does the trick ;) ?php $doc = new DOMDocument(); $doc-loadHTML('htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff inside the link/aFoo/h2pcare/ppyoyser/p/body/html'); echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL; findTextNodes($doc-getElementsByTagName('*'), 'convertToLinkIfNecc'); echo 'OUT: ' . PHP_EOL . $doc-saveXML() . PHP_EOL; /** * run through a DOMNodeList, looking for text nodes. apply a callback to * all such text nodes that are encountered */ function findTextNodes(DOMNodeList $nodesToSearch, $callback) { foreach($nodesToSearch as $curNode) { if($curNode-hasChildNodes()) foreach($curNode-childNodes as $curChild) if($curChild instanceof DOMText) #echo TEXT NODE FOUND: . $curChild-nodeValue . PHP_EOL; /// todo: allow use of hook here call_user_func($callback, $curNode, $curChild); } } /** * determine if a node should be modified, by chcking to see if a child is a text node * and the text looks like an email address. * call a subordinate function to convert the text node into a mailto anchor DOMElement */ function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) { if( (strtolower($textContainer-nodeName) != 'a') (filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false) ) { convertMailtoToAnchor($textContainer, $textNode); } } /** * modify a DOMElement that has a DOMText node as a child; create a DOMElement * that represents and a tag, and set the value and href attirbute, so that it * acts as a 'mailto' link */ function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode) { $newNode = new DomElement('a', $textNode-nodeValue); $textContainer-replaceChild($newNode, $textNode); $newNode-setAttribute('href', mailto:{$textNode-nodeValue}); } -nathan
[PHP] Using DOM textContent Property
Hello, I am writing a filter in PHP that takes some HTML as input and goes through the HTML and adjusts certain tag attributes as needed. So, for example, if a tag is missing the title attribute, this filter adds a title attribute to the a tag. I'm doing this all using PHP 5 and the DOM parsing library, and it's working really well. The one snafu I'm running in to is dealing with users who will just type an e-mail address into an HTML document without actually making it a link - so, they'll just put [EMAIL PROTECTED] rather than a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a. I'd like for these incorrectly entered e-mail addresses to magically change into real clickable links, so I'd like my filter to be able to grab those plain text e-mail addresses and convert them to actual clickable links. I tried iterating through all the elements on a page using something like this: $Elements = $HTML-getElementsByTagName(*); for ($X = 0; $X $Elements-length; $X++) { ... SNIP ... } And then I tried looking at the textContent property of each node, but it seems that higher-level nodes include all the text of their children nodes (which is what the DOM documents say it should). But there doesn't appear to be any way to know if the textContent you've got is for just one node, or for a whole bunch of nodes. Is there any way to figure that out, so that I can adjust the textContent property of just the lowest-level nodes, rather than mucking up the higher-level ones? Tim Gustafson SOE Webmaster UC Santa Cruz [EMAIL PROTECTED] 831-459-5354 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] Using DOM textContent Property
On Tue, Sep 2, 2008 at 3:18 PM, Tim Gustafson [EMAIL PROTECTED] wrote: And then I tried looking at the textContent property of each node, but it seems that higher-level nodes include all the text of their children nodes (which is what the DOM documents say it should). But there doesn't appear to be any way to know if the textContent you've got is for just one node, or for a whole bunch of nodes. Is there any way to figure that out, so that I can adjust the textContent property of just the lowest-level nodes, rather than mucking up the higher-level ones? http://www.php.net/unsub.php if a node has children, then its not a leaf, so i imagine you could continue to traverse until you reach the leaf that actually has the address needing magical conversion.. also, for a performance increase, if you dont find a match at a high level, you could skip that entire sub-section of the tree; no need to go down to a leaf if you know theres no magic needed for the current branch :) -nathan
RE: [PHP] Using DOM textContent Property
if a node has children, then its not a leaf, so i imagine you could continue to traverse until you reach the leaf that actually has the address needing magical conversion. I tried that. $Element-hasChildNodes() returns true for just about everything except tags like br and img that have no corresponding /br or /img because the content that appears between p and /p, for example, apparently counts as a child node, even though they're not HTML tags. So, if you have: pFoo!/p when you look at $Element-hasChildNodes() for the p tag, you will get true, and $Element-childNodes-length is equal to 1, even though Foo! isn't an HTML tag. Interestingly though, when you iterate through the tree, you get the p tag as one of the elements, but you never get a text-only element that has that p as a parentNode. In fact, get_class($Element) always returns DOMElement, even on the text-only nodes, which I would have expected to be DOMText elements...but I guess not. So I'm wondering why $Element-hasChildNodes() would return true, but iterating through the DOM tree returns no elements that have that $Element as a parentNode. What's more, looking at $Element-childNodex-length isn't too helpful, because, for example: h2a name=bar/aFoo/h2 returns two child nodes, neither of which has Foo for its textContent. Tim Gustafson SOE Webmaster UC Santa Cruz [EMAIL PROTECTED] 831-459-5354 -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php