ID:               49984
 User updated by:  ppass at hotmail dot fr
 Reported By:      ppass at hotmail dot fr
 Status:           Bogus
 Bug Type:         DOM XML related
 Operating System: Linux ns1 2.6.28.4-rsbac
 PHP Version:      5.2.11
 New Comment:

This is still an open topic for me since there seems no easy way to
implement in PHP their suggestion (adding the HTML_PARSE_RECOVER
option when creating the parsing context).

Is this something that can be done in PHP and how?
Please advise, otherwise the subject remains open.


Previous Comments:
------------------------------------------------------------------------

[2009-11-02 17:45:51] ppass at hotmail dot fr

The reply from the libxml2 team is to try to add the HTML_PARSE_RECOVER
option when creating the
parsing context.

I have no idea what that means. Does anybody know how this can be done
from PHP code?

------------------------------------------------------------------------

[2009-11-02 13:46:20] ppass at hotmail dot fr

That you for details, I just filed a bug in their system.

------------------------------------------------------------------------

[2009-11-02 06:53:03] ras...@php.net

We didn't write the DOM implementation.  We are simply using libxml2. 
Information on how to file a bug against libxml2 is here:
http://xmlsoft.org/bugs.html

But I suspect they won't consider this a bug.  Their relaxed html
parser isn't a full html parser that knows about embedded script
objects.

This would only be a PHP bug if we are somehow calling libxml2
incorrectly causing this, but it doesn't appear to be the case here.

------------------------------------------------------------------------

[2009-11-02 05:42:09] ppass at hotmail dot fr

No reaction still to this bug. Maybe my previous title was too
specific. More generally speaking, it means that the DOM model is broken
in php when ever a script tag contains other tags in its text.

This is a serious bug that must be corrected asap, other wise it is not
possible to make a reliable use of DOM.

------------------------------------------------------------------------

[2009-10-24 04:27:57] ppass at hotmail dot fr

Description:
------------
The script node's parent is a div.
The script node has the text '</div>' inside its script.

The DOM node returns only partial contents of the script node, as if
the node was mistakenly truncated when reaching the '</div>' text.

Reproduce code:
---------------
    $html = '<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN"><html><head><meta http-equiv="content-type"
content="text/html;
charset=utf-8"><title>Title</title></head><body><div><script
type="text/javascript" id="script1">function dummy {
object.innerHTML="<div>text</div>"; } function dummy2 { alert("hello");
} </script> </div> </body> </html>';
 
    $dom = new DOMDocument('1.0', 'UTF-8');
    @$dom->loadHTML($html);

    $script_node = $dom->getElementById('script1');
    Echo  "<![CDATA[$script_node->nodeValue]]>"; 


Expected result:
----------------
function dummy { object.innerHTML="<div>text</div>"; } function dummy2
{ alert("hello"); } 

I expect to see the whole content of the script node.

Actual result:
--------------
function dummy { object.innerHTML="<div>text

The script node has been truncated.



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=49984&edit=1

Reply via email to