RE: [PHP] Using DOM textContent Property

2008-09-10 Thread Tim Gustafson
Nathan,

Thanks for your help on this.

I actually need to do this a different way I think though.  The problem is
that I'm not just replacing a text entity with a link entity.  For example,
consider this paragraph:

pFor information, please contact [EMAIL PROTECTED]/p

In this case, I want [EMAIL PROTECTED] to be a link, but not the rest of
the paragraph.  That means that the p entity has to be split into three
separate entities - one DOMText for For information, please contact , one
DOMEntity node for [EMAIL PROTECTED], and one DOMText node for ..

This seems doable with the DOM modle, but complicated.  I'm thinking regular
expressions might be the way to go again.  :\

Tim Gustafson
SOE Webmaster
UC Santa Cruz
[EMAIL PROTECTED]
831-459-5354


-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using DOM textContent Property

2008-09-10 Thread Nathan Nobbe
On Wed, Sep 10, 2008 at 10:35 AM, Tim Gustafson [EMAIL PROTECTED] wrote:

 Nathan,

 Thanks for your help on this.

 I actually need to do this a different way I think though.  The problem is
 that I'm not just replacing a text entity with a link entity.  For example,
 consider this paragraph:

 pFor information, please contact [EMAIL PROTECTED]/p

 In this case, I want [EMAIL PROTECTED] to be a link, but not the rest of
 the paragraph.  That means that the p entity has to be split into three
 separate entities - one DOMText for For information, please contact , one
 DOMEntity node for [EMAIL PROTECTED], and one DOMText node for ..

 This seems doable with the DOM modle, but complicated.  I'm thinking
 regular
 expressions might be the way to go again.  :\


so use some regex :D  thats the only way i know of to determine if DOMText
nodes contain email address(s) as substrings while retaining ones sanity...
i got it working, again by modifying the code from my original post and
dropping in an additional clause which will use regex to determine if there
is an email address embedded in a DOMText node, however, it checks to see if
the whole thing is a mail first, cause i think thats a little optimization,
but it could be ommitted.  heres the output of the script now (notice i
changed the input text to incorporate the new issue):

[EMAIL PROTECTED] ~/domIterator/initialTests $ php testDom.php
IN:
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN 
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2b[EMAIL PROTECTED]/b/h2ptext that we
dont want to turn into a link.. [EMAIL PROTECTED]/pa
name=barstuff inside the
link/aFoopcare/ppyoyser/p/body/html

OUT:
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN 
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2ba href=mailto:[EMAIL PROTECTED]
[EMAIL PROTECTED]/a/b/h2ptext that we dont want to turn into a
link.. a href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a/pa
name=barstuff inside the
link/aFoopcare/ppyoyser/p/body/html

and here is the code; sorry for the lengthy post fellas, i just want to post
all of it rather than just attempting to illustrate the segments ive
changed,

?php
$doc = new DOMDocument();
$doc-loadHTML('htmlbodyTestbrh2b[EMAIL PROTECTED]/b/h2ptext
that we dont want to turn into a link.. [EMAIL PROTECTED]/pa
name=barstuff inside the
link/aFoopcare/ppyoyser/p/body/html');
echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL;
findTextNodes($doc-getElementsByTagName('*'), 'convertToLinkIfNecc');
echo 'OUT: ' .  PHP_EOL . $doc-saveXML() . PHP_EOL;

/**
 * run through a DOMNodeList, looking for text nodes.  apply a callback to
 * all such text nodes that are encountered
 */
function  findTextNodes(DOMNodeList $nodesToSearch, $callback) {
foreach($nodesToSearch as $curNode) {
if($curNode-hasChildNodes())
foreach($curNode-childNodes as $curChild)
if($curChild instanceof DOMText)
call_user_func($callback, $curNode, $curChild);
}
}

/**
 * determine if a node should be modified, by chcking to see if a child is a
text node
 * and the text looks like an email address.
 * call a subordinate function to convert the text node into a mailto anchor
DOMElement
 */
function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) {
if(strtolower($textContainer-nodeName) === 'a') /// per original
request dont bother w/ a tags
return;
if(filter_var($textNode-wholeText, FILTER_VALIDATE_EMAIL) !== false) {
convertMailtoToAnchor($textContainer, $textNode);
} else { /// lets see if theres an email burried in this text node
/// regex taken from: http://www.regular-expressions.info/email.html
preg_match('/[EMAIL PROTECTED],4}\b/i',
$textNode-wholeText, $matches);
if(count($matches)  0)
rebuildTextNodeWithEmailAddrs($textContainer, $textNode,
$matches);
}
}

/**
 * given a DOMText instance w/ multiple email addresses, construct
 * a new set of nodes that contain the original text along w/ anchors for
 * all the bare email addresses
 */
 function rebuildTextNodeWithEmailAddrs(DomElement $textContainer, DOMText
$textNode, array $emailAddrs) {
 $nodeOrder = array();
/// construct array of elements
$origText = $textNode-wholeText;
foreach($emailAddrs as $curAddr) {
$startPos = strpos($origText, $curAddr);// start pos of cur
email
$txtBuff = substr($origText, 0, $startPos);// buffer so we can
check if its empty
if(!empty($txtBuff)) {
$eltTokens[] = $txtBuff;
$nodeOrder[] = 't';// indicate this token is a textNode
}
$eltTokens[] = $curAddr;
$nodeOrder[] = 'e';// indicate this token is an email addr
$origText = substr($origText, $startPos + strlen($curAddr));
}
/// now that we have the tokens delete the orig DOMText and drop in the

Re: [PHP] Using DOM textContent Property

2008-09-09 Thread Mario Trojan

Hi Nathan,

if you're already speaking of iterating children, i'd like to ask you 
another question:


Basically i was trying to do the same thing as Tim, when i experienced 
some difficulties iterating over DOMElement-childNodes with foreach and 
manipulating strings inside the nodes or even replacing 
DOMElement/DOMNode/DOMText with another node. Instead, i am currently 
iterating like this:


$child = $element-firstChild;
while ($child != null) {
$next_sibling = $child-nextSibling;

// Do something with child (manipulate, replace, ...)

// Continue iteration
$child = $next_sibling
}

Is this correct, or is there any better way?

Thank you in advance!
Mario


Nathan Nobbe schrieb:

bouncing back to the list so that others may benefit from our work...

On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson [EMAIL PROTECTED] wrote:


Nathan,

Thanks for the suggestion, but it's still not working for me.  Here's my
code:

===
$HTML = new DOMDocument();
@$HTML-loadHTML($text);
$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
  $Element =  $Elements-item($X);

 if ($Element-tagName == a) {
   # SNIP - Do something with A tags here
 } else if ($Element instanceof DOMText) {
   echo $Element-nodeValue; exit;
 }
}
===

This loop never executes the instanceof part of the code.  If I add:

 } else if ($Element instanceof DOMNode) {
   echo foo!; exit;
 }

Then it echos foo! as expected.  It just seems that none of the nodes in
the tree are DOMText nodes.  In fact, get_class($Element) returns
DOMElement for every node in the tree.



Tim,

i got your code working with minimal effort by pulling in two of the methods
i posted and making some revisions.  scope it out,
(this will produce the same output as my last post (the part after OUT:))

?php
$text = 'htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff
inside the link/aFoo/h2pcare/ppyoyser/p/body/html';
$HTML = new DOMDocument();
$HTML-loadHTML($text);
$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
 $Element =  $Elements-item($X);
 if($Element-hasChildNodes())
foreach($Element-childNodes as $curChild)
 if ($curChild-nodeName == a) {
   # SNIP - Do something with A tags here
 } else if ($curChild instanceof DOMText) {
  convertToLinkIfNecc($Element, $curChild);
 }
}
echo $HTML-saveXML() . PHP_EOL;


function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) {
if( (strtolower($textContainer-nodeName) != 'a') 
(filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false)
) {
convertMailtoToAnchor($textContainer, $textNode);
}
}
function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode)
{
$newNode = new DomElement('a', $textNode-nodeValue);
$textContainer-replaceChild($newNode, $textNode);
$newNode-setAttribute('href', mailto:{$textNode-nodeValue});
}
?

so, the problem is iterating over a tree structure will only show you whats
at the first level of the tree.  this is why you need to call
hasChildNodes(), and if that is true, call childNodes() and iterate across
that (and really, the code should be doing the same thing there as well,
calling hasChildNodes() and iterating over the results of childNodes()).
the code i have shown will work for the html i posted, however it wont work
on (x)html where these text nodes we're searching for are deeper in the tree
than the second level.  im sure you can cook up something that will recurse
down to the leafs :)
anyway, im going to try and hook up a RecursiveDOMDocumentIterator that
implements RecursiveIterator so that it has the convenient foreach support.
also, ill probly try to hook up a Filter varient of this class so that
situations like this are trivial.

stay tuned :D

-nathan



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using DOM textContent Property

2008-09-09 Thread Nathan Nobbe
On Tue, Sep 9, 2008 at 12:37 AM, Mario Trojan [EMAIL PROTECTED] wrote:

 Hi Nathan,

 if you're already speaking of iterating children, i'd like to ask you
 another question:

 Basically i was trying to do the same thing as Tim, when i experienced some
 difficulties iterating over DOMElement-childNodes with foreach and
 manipulating strings inside the nodes or even replacing
 DOMElement/DOMNode/DOMText with another node. Instead, i am currently
 iterating like this:

 $child = $element-firstChild;
 while ($child != null) {
$next_sibling = $child-nextSibling;

// Do something with child (manipulate, replace, ...)

// Continue iteration
$child = $next_sibling
 }

 Is this correct, or is there any better way?


i found this the other day on the DOMNodeList page on php.net,

essentially foreach will implicitly do what you are doing under the hood,
actually, it will also recurse into the children, whereas in this example
youve shown, youre only iterating over 1 sub-level of the tree (horizontally
across elements at the same level).  sometimes it makes sense to drive the
iteration yourself as you have shown, but i think the answer to your
question is that you must use a reference to the parent to perform
manipulations to the dom during iteration, see below (hope it helps :D),

-nathan

*a dot buffa at sns dot it*
29-May-2008 04:28
http://us2.php.net/manual/en/class.domnodelist.php#83513 I agree
with drichter at muvicom dot de.

For istance, in order to delete each child node of a particular parent node,

?php

while ($parentNode-hasChildNodes()){
  $domNodeList = $parentNode-childNodes;
  $parentNode-removeChild($domNodeList-item(0));
}

?

In other word you have to uptade the DomNodeList on every iteration.

In my opinion, the DomNodeList class is useless.


Re: [PHP] Using DOM textContent Property

2008-09-09 Thread Nathan Rixham

Nathan Nobbe wrote:


In my opinion, the DomNodeList class is useless.



agreed; ever tried making a replacement node class that extends it? then 
you see how useless it is! [yet a vital part of the dom structure]


ot here; but I thought maybe useful for reference; I do loads of xml/dom 
api work and find that this little iterator is very very useful; I've 
trimmed it down but you'll find below how *I* iterate through the dom 
grabbing the important values..


private function iterateDom( $nodeList )
{
  foreach( $nodeList as $values ) {
if( $values-nodeType == XML_ELEMENT_NODE ) {
  $nodeName = $values-nodeName;
  if( $values-attributes ) {
   for( $i=0;$values-attributes-item($i);$i++ ) {
$attributeName = $values-attributes-item($i)-nodeName
$attributeValue = $values-attributes-item($i)-nodeValue
   }
  }
  $values-children = $this-iterateDom( $values-childNodes );
  $tempNode[$nodeName] = $values;
} elseif( in_array($values-nodeType, array(XML_TEXT_NODE, 
XML_CDATA_SECTION_NODE)) ) {

  $nodeType = $values-nodeType;
  $nodeData = $values-data;
} elseif( $values-nodeType === XML_PI_NODE ) {
  $DOMProcessingInstruction = array('target' = $values-target, 
'data' = $values-data);

}
# other wise we ignore as all that's left is DOMComment
  }
}

might be useful for somebody

--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Using DOM textContent Property

2008-09-05 Thread Tim Gustafson
Nathan,
 
Thanks for the suggestion, but it's still not working for me.  Here's my
code:

=== 
$HTML = new DOMDocument();
@$HTML-loadHTML($text);
$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
  $Element =  $Elements-item($X);

  if ($Element-tagName == a) {
# SNIP - Do something with A tags here
  } else if ($Element instanceof DOMText) {
echo $Element-nodeValue; exit;
  }
}
=== 

This loop never executes the instanceof part of the code.  If I add:

  } else if ($Element instanceof DOMNode) {
echo foo!; exit;
  }

Then it echos foo! as expected.  It just seems that none of the nodes in
the tree are DOMText nodes.  In fact, get_class($Element) returns
DOMElement for every node in the tree.

Tim Gustafson
SOE Webmaster
UC Santa Cruz
[EMAIL PROTECTED]
831-459-5354



 




From: Nathan Nobbe [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 03, 2008 11:55 AM
To: Tim Gustafson
Cc: [EMAIL PROTECTED]; php-general@lists.php.net
Subject: Re: [PHP] Using DOM textContent Property


On Wed, Sep 3, 2008 at 10:03 AM, Tim Gustafson [EMAIL PROTECTED]
wrote:


 I think you might be better off using regexp on the text
 *before* sending it through the DOM parser. Send the
 user's text through a function that searches for URLs
 and email addresses, creating proper links as they're
 found, then use the output from that to move on to your
 DOM stuff. That way, you need not create new nodes in
 your nodelist.


I think that's the way I'm going to have to go, but I was
really hoping not
to.  Thanks for the suggestion!


i think i have what youre looking for Tim, take a look at this
script output

[EMAIL PROTECTED] ~ $ php testDom.php 
IN:
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2[EMAIL PROTECTED]a name=barstuff
inside the link/aFoo/h2pcare/ppyoyser/p/body/html

OUT: 
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2a
href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/aa
name=barstuff inside the
link/aFoo/h2pcare/ppyoyser/p/body/html

and heres the code using the DOM extension
you may have to tweak it to suit your needs, but currently i think
it does the trick ;)

?php
$doc = new DOMDocument();
$doc-loadHTML('htmlbodyTestbrh2[EMAIL PROTECTED]a
name=barstuff inside the
link/aFoo/h2pcare/ppyoyser/p/body/html');
echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL;
findTextNodes($doc-getElementsByTagName('*'),
'convertToLinkIfNecc');
echo 'OUT: ' .  PHP_EOL . $doc-saveXML() . PHP_EOL;

/**
 * run through a DOMNodeList, looking for text nodes.  apply a
callback to
 * all such text nodes that are encountered
 */
function  findTextNodes(DOMNodeList $nodesToSearch, $callback) {
foreach($nodesToSearch as $curNode) {
if($curNode-hasChildNodes())
foreach($curNode-childNodes as $curChild)
if($curChild instanceof DOMText)
#echo TEXT NODE FOUND:  . $curChild-nodeValue
. PHP_EOL;
/// todo: allow use of hook here
call_user_func($callback, $curNode, $curChild);
}
}

/**
 * determine if a node should be modified, by chcking to see if a
child is a text node
 * and the text looks like an email address.
 * call a subordinate function to convert the text node into a
mailto anchor DOMElement
 */
function convertToLinkIfNecc(DomElement $textContainer, DOMText
$textNode) {
if( (strtolower($textContainer-nodeName) != 'a') 
(filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !==
false) ) {
convertMailtoToAnchor($textContainer, $textNode);
}
}

/**
 * modify a DOMElement that has a DOMText node as a child; create a
DOMElement
 * that represents and a tag, and set the value and href attirbute,
so that it
 * acts as a 'mailto' link
 */
function convertMailtoToAnchor(DomElement $textContainer, DOMText
$textNode) {
$newNode = new DomElement('a', $textNode-nodeValue);
$textContainer-replaceChild($newNode, $textNode);
$newNode-setAttribute('href', mailto:{$textNode-nodeValue

[PHP] Using DOM textContent Property

2008-09-05 Thread Nathan Nobbe
bouncing back to the list so that others may benefit from our work...

On Fri, Sep 5, 2008 at 3:09 PM, Tim Gustafson [EMAIL PROTECTED] wrote:

 Nathan,

 Thanks for the suggestion, but it's still not working for me.  Here's my
 code:

 ===
 $HTML = new DOMDocument();
 @$HTML-loadHTML($text);
 $Elements = $HTML-getElementsByTagName(*);

 for ($X = 0; $X  $Elements-length; $X++) {
   $Element =  $Elements-item($X);

  if ($Element-tagName == a) {
# SNIP - Do something with A tags here
  } else if ($Element instanceof DOMText) {
echo $Element-nodeValue; exit;
  }
 }
 ===

 This loop never executes the instanceof part of the code.  If I add:

  } else if ($Element instanceof DOMNode) {
echo foo!; exit;
  }

 Then it echos foo! as expected.  It just seems that none of the nodes in
 the tree are DOMText nodes.  In fact, get_class($Element) returns
 DOMElement for every node in the tree.


Tim,

i got your code working with minimal effort by pulling in two of the methods
i posted and making some revisions.  scope it out,
(this will produce the same output as my last post (the part after OUT:))

?php
$text = 'htmlbodyTestbrh2[EMAIL PROTECTED]a name=barstuff
inside the link/aFoo/h2pcare/ppyoyser/p/body/html';
$HTML = new DOMDocument();
$HTML-loadHTML($text);
$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
 $Element =  $Elements-item($X);
 if($Element-hasChildNodes())
foreach($Element-childNodes as $curChild)
 if ($curChild-nodeName == a) {
   # SNIP - Do something with A tags here
 } else if ($curChild instanceof DOMText) {
  convertToLinkIfNecc($Element, $curChild);
 }
}
echo $HTML-saveXML() . PHP_EOL;


function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) {
if( (strtolower($textContainer-nodeName) != 'a') 
(filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false)
) {
convertMailtoToAnchor($textContainer, $textNode);
}
}
function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode)
{
$newNode = new DomElement('a', $textNode-nodeValue);
$textContainer-replaceChild($newNode, $textNode);
$newNode-setAttribute('href', mailto:{$textNode-nodeValue});
}
?

so, the problem is iterating over a tree structure will only show you whats
at the first level of the tree.  this is why you need to call
hasChildNodes(), and if that is true, call childNodes() and iterate across
that (and really, the code should be doing the same thing there as well,
calling hasChildNodes() and iterating over the results of childNodes()).
the code i have shown will work for the html i posted, however it wont work
on (x)html where these text nodes we're searching for are deeper in the tree
than the second level.  im sure you can cook up something that will recurse
down to the leafs :)
anyway, im going to try and hook up a RecursiveDOMDocumentIterator that
implements RecursiveIterator so that it has the convenient foreach support.
also, ill probly try to hook up a Filter varient of this class so that
situations like this are trivial.

stay tuned :D

-nathan


Re: [PHP] Using DOM textContent Property

2008-09-03 Thread php

Tim Gustafson wrote:

Hello,

I am writing a filter in PHP that takes some HTML as input and goes through
the HTML and adjusts certain tag attributes as needed.  So, for example, if
a tag is missing the title attribute, this filter adds a title attribute
to the a tag.

I'm doing this all using PHP 5 and the DOM parsing library, and it's working
really well.

The one snafu I'm running in to is dealing with users who will just type an
e-mail address into an HTML document without actually making it a link - so,
they'll just put [EMAIL PROTECTED] rather than a
href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a.  I'd like for these 
incorrectly
entered e-mail addresses to magically change into real clickable links, so
I'd like my filter to be able to grab those plain text e-mail addresses and
convert them to actual clickable links.

I tried iterating through all the elements on a page using something like
this:

$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
  ... SNIP ...
}



I think you might be better off using regexp on the text *before* 
sending it through the DOM parser. Send the user's text through a 
function that searches for URLs and email addresses, creating proper 
links as they're found, then use the output from that to move on to your 
DOM stuff. That way, you need not create new nodes in your nodelist.



--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Using DOM textContent Property

2008-09-03 Thread Tim Gustafson
 I think you might be better off using regexp on the text
 *before* sending it through the DOM parser. Send the
 user's text through a function that searches for URLs
 and email addresses, creating proper links as they're
 found, then use the output from that to move on to your 
 DOM stuff. That way, you need not create new nodes in
 your nodelist.

I think that's the way I'm going to have to go, but I was really hoping not
to.  Thanks for the suggestion!

Tim Gustafson
SOE Webmaster
UC Santa Cruz
[EMAIL PROTECTED]
831-459-5354




-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using DOM textContent Property

2008-09-03 Thread Nathan Nobbe
On Wed, Sep 3, 2008 at 10:03 AM, Tim Gustafson [EMAIL PROTECTED] wrote:

  I think you might be better off using regexp on the text
  *before* sending it through the DOM parser. Send the
  user's text through a function that searches for URLs
  and email addresses, creating proper links as they're
  found, then use the output from that to move on to your
  DOM stuff. That way, you need not create new nodes in
  your nodelist.

 I think that's the way I'm going to have to go, but I was really hoping not
 to.  Thanks for the suggestion!


i think i have what youre looking for Tim, take a look at this script output

[EMAIL PROTECTED] ~ $ php testDom.php
IN:
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN 
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2[EMAIL PROTECTED]a name=barstuff inside
the link/aFoo/h2pcare/ppyoyser/p/body/html

OUT:
?xml version=1.0 standalone=yes?
!DOCTYPE html PUBLIC -//W3C//DTD HTML 4.0 Transitional//EN 
http://www.w3.org/TR/REC-html40/loose.dtd;
htmlbodyTestbr/h2a href=mailto:[EMAIL PROTECTED]
[EMAIL PROTECTED]/aa name=barstuff inside the
link/aFoo/h2pcare/ppyoyser/p/body/html

and heres the code using the DOM extension
you may have to tweak it to suit your needs, but currently i think it does
the trick ;)

?php
$doc = new DOMDocument();
$doc-loadHTML('htmlbodyTestbrh2[EMAIL PROTECTED]a
name=barstuff inside the
link/aFoo/h2pcare/ppyoyser/p/body/html');
echo 'IN:' . PHP_EOL . $doc-saveXML() . PHP_EOL;
findTextNodes($doc-getElementsByTagName('*'), 'convertToLinkIfNecc');
echo 'OUT: ' .  PHP_EOL . $doc-saveXML() . PHP_EOL;

/**
 * run through a DOMNodeList, looking for text nodes.  apply a callback to
 * all such text nodes that are encountered
 */
function  findTextNodes(DOMNodeList $nodesToSearch, $callback) {
foreach($nodesToSearch as $curNode) {
if($curNode-hasChildNodes())
foreach($curNode-childNodes as $curChild)
if($curChild instanceof DOMText)
#echo TEXT NODE FOUND:  . $curChild-nodeValue .
PHP_EOL;
/// todo: allow use of hook here
call_user_func($callback, $curNode, $curChild);
}
}

/**
 * determine if a node should be modified, by chcking to see if a child is a
text node
 * and the text looks like an email address.
 * call a subordinate function to convert the text node into a mailto anchor
DOMElement
 */
function convertToLinkIfNecc(DomElement $textContainer, DOMText $textNode) {
if( (strtolower($textContainer-nodeName) != 'a') 
(filter_var($textNode-nodeValue, FILTER_VALIDATE_EMAIL) !== false)
) {
convertMailtoToAnchor($textContainer, $textNode);
}
}

/**
 * modify a DOMElement that has a DOMText node as a child; create a
DOMElement
 * that represents and a tag, and set the value and href attirbute, so that
it
 * acts as a 'mailto' link
 */
function convertMailtoToAnchor(DomElement $textContainer, DOMText $textNode)
{
$newNode = new DomElement('a', $textNode-nodeValue);
$textContainer-replaceChild($newNode, $textNode);
$newNode-setAttribute('href', mailto:{$textNode-nodeValue});
}


-nathan


[PHP] Using DOM textContent Property

2008-09-02 Thread Tim Gustafson
Hello,

I am writing a filter in PHP that takes some HTML as input and goes through
the HTML and adjusts certain tag attributes as needed.  So, for example, if
a tag is missing the title attribute, this filter adds a title attribute
to the a tag.

I'm doing this all using PHP 5 and the DOM parsing library, and it's working
really well.

The one snafu I'm running in to is dealing with users who will just type an
e-mail address into an HTML document without actually making it a link - so,
they'll just put [EMAIL PROTECTED] rather than a
href=mailto:[EMAIL PROTECTED][EMAIL PROTECTED]/a.  I'd like for these 
incorrectly
entered e-mail addresses to magically change into real clickable links, so
I'd like my filter to be able to grab those plain text e-mail addresses and
convert them to actual clickable links.

I tried iterating through all the elements on a page using something like
this:

$Elements = $HTML-getElementsByTagName(*);

for ($X = 0; $X  $Elements-length; $X++) {
  ... SNIP ...
}

And then I tried looking at the textContent property of each node, but it
seems that higher-level nodes include all the text of their children nodes
(which is what the DOM documents say it should).  But there doesn't appear
to be any way to know if the textContent you've got is for just one node, or
for a whole bunch of nodes.  Is there any way to figure that out, so that I
can adjust the textContent property of just the lowest-level nodes, rather
than mucking up the higher-level ones?

Tim Gustafson
SOE Webmaster
UC Santa Cruz
[EMAIL PROTECTED]
831-459-5354



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



Re: [PHP] Using DOM textContent Property

2008-09-02 Thread Nathan Nobbe
On Tue, Sep 2, 2008 at 3:18 PM, Tim Gustafson [EMAIL PROTECTED] wrote:

 And then I tried looking at the textContent property of each node, but it
 seems that higher-level nodes include all the text of their children nodes
 (which is what the DOM documents say it should).  But there doesn't appear
 to be any way to know if the textContent you've got is for just one node,
 or
 for a whole bunch of nodes.  Is there any way to figure that out, so that I
 can adjust the textContent property of just the lowest-level nodes, rather
 than mucking up the higher-level ones? http://www.php.net/unsub.php


if a node has children, then its not a leaf, so i imagine you could continue
to traverse until you reach the leaf that actually has the address needing
magical conversion..

also, for a performance increase, if you dont find a match at a high level,
you could skip that entire sub-section of the tree; no need to go down to a
leaf if you know theres no magic needed for the current branch :)

-nathan


RE: [PHP] Using DOM textContent Property

2008-09-02 Thread Tim Gustafson
 if a node has children, then its not a leaf, so i imagine
 you could continue to traverse until you reach the leaf
 that actually has the address needing magical conversion.

I tried that.  $Element-hasChildNodes() returns true for just about
everything except tags like br and img that have no corresponding /br
or /img because the content that appears between p and /p, for
example, apparently counts as a child node, even though they're not HTML
tags.  So, if you have:

pFoo!/p

when you look at $Element-hasChildNodes() for the p tag, you will get
true, and $Element-childNodes-length is equal to 1, even though Foo!
isn't an HTML tag.  Interestingly though, when you iterate through the tree,
you get the p tag as one of the elements, but you never get a text-only
element that has that p as a parentNode.  In fact, get_class($Element)
always returns DOMElement, even on the text-only nodes, which I would have
expected to be DOMText elements...but I guess not.  So I'm wondering why
$Element-hasChildNodes() would return true, but iterating through the DOM
tree returns no elements that have that $Element as a parentNode.

What's more, looking at $Element-childNodex-length isn't too helpful,
because, for example:

h2a name=bar/aFoo/h2

returns two child nodes, neither of which has Foo for its textContent.

Tim Gustafson
SOE Webmaster
UC Santa Cruz
[EMAIL PROTECTED]
831-459-5354



-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php