ID:               41980
 User updated by:  borys dot forytarz at gmail dot com
 Reported By:      borys dot forytarz at gmail dot com
-Status:           Feedback
+Status:           Open
 Bug Type:         DOM XML related
 Operating System: Linux
 PHP Version:      5.2.0
 New Comment:

I have checked about files encodings.

mb_detect_encoding() returns, that they are ASCII-encoded (!?). So I
wrote a simple script to convert them to utf-8:

$cont = file_get_contents('login.php.tpl');
$f = fopen('login.php.tpl','w');
echo "\n".mb_detect_encoding('login.php.tpl').' > ';
echo mb_detect_encoding('login.php.tpl')."\n";

and the output is: ASCII > ASCII (I expected ASCII > UTF-8)

result of using iconv instead of mb_convert_encoding is the same

what's going on?

Previous Comments:

[2007-07-12 20:38:33] [EMAIL PROTECTED]

Please try using this CVS snapshot:
For Windows (zip):

For Windows (installer):


[2007-07-12 19:58:58] borys dot forytarz at gmail dot com

there should be:

foreach($content->childNodes as $child) {



[2007-07-12 19:55:58] borys dot forytarz at gmail dot com

Here is an example:

At first, source files (both encoded with UTF-8)

First file (main.tpl):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"

        <title>Some title</title>
        <meta http-equiv="content-type" content="text/html; charset=utf-8" />

Some polish letters: &#281; ó &#261; &#347; &#263; &#380; &#378; &#324;
- they are encoded correctly and displays correctly.

Second file (contents.tpl):

<h1>some polish letters, like: &#281; ó &#322; &#261; &#347; &#263;
&#378; &#324; &#380; - they are not encoded correctly and does not
display correctly.</h1>

PHP file:
$dom = new DOMDocument('1.0','UTF-8');

$dom2 = new DOMDocument('1.0','UTF-8');

$contents = $dom2->getElementsByTagName('content');
$body = $dom->getElementsByTagName('body')->items(0);

foreach($contents as $content) {
    foreach($content as $child) {
        $imp = $dom->importNode($child,true);


It is something like above. I was writing from memory because the real
script is really huge. But it demonstrates the idea and what is going
not properly.


[2007-07-12 19:24:45] borys dot forytarz at gmail dot com

There is a problem with DOM and encoding. I have two separate files,
one full XHTML code (DTD, head, meta, body and more contents) saved in
UTF-8. Meta declaration is UTF-8, server sends the code in UTF-8 too.
The second file is a simple file without any DTD, head, meta and body.
Saved in UTF-8 too. The problem is, when I import nodes from the second
file using importNode(), in the output there are invalid encoded
characters (those who were declared in the second file). It is strange
because as I read, DOM works in UTF-8 so there should be not such a

What is more, I was debugging the properties such as actualEncoding and
they shown me that there is UTF-8...

If it's not a bug, but I think it is, how to fix that? I can't declare
in the second file DTD, head and body elements.

Reproduce code:
$this->dom = new DOMDocument('1.0','UTF-8');
$this->dom->encoding = 'UTF-8';

$this->dom->formatOutput = self::$formatOutput;
$this->dom->preserveWhiteSpace = self::$preserveWhiteSpace;


echo $this->dom->saveXML();

The above works well for the complete XHTML file. But when I load an
incomplete file (encoded in UTF-8) I don't see properly encoded
characters when I import nodes from the second document to the first

I tried to convert the whole output with iconv() and
mb_convert_encoding() but it seems not to make any difference at all.

Expected result:
Properly encoded characters from both complete XHTML file and second
"poor" file. The second file is such as follows:

<content id="something">
   <h1>some string</h1>

Actual result:
Not properly encoded characters from between <content> tag.


Edit this bug report at

Reply via email to