From:             
Operating system: Windows WAMP + LAMP(?)
PHP version:      5.3.2
Package:          DOM XML related
Bug Type:         Bug
Bug description:DOMDocument::load() UTF-8 limitation

Description:
------------
The DOMDocument::load() function ONLY loads UTF-8 encoded files.

Ex: 'article.php' contains :

$xmlDoc = new DOMDocument();

$page = 'article.xsl';

$xmlDoc->load($page);

$xmlDoc->load('cours.xml');



Let's consider 'article.xsl' contains '... Précédent ...' (not pure ASCII
chars)

If the content of 'article.xsl' is iso-8859-1 encoded, the subsequent
error

appears (same if 'cours.xml' is iso-8859-1 encoded):



"DOMDocument::load() [domdocument.load]: Input is not proper UTF-8,
indicate encoding ! Bytes: 0xE9 0x62 0x75 0x74 in
file:///C:/wamp/www/xsl2/article.xsl, line: 71 in
C:\wamp\www\xsl2\article.php on line 13"



So, it's imperative to UTF-8 encode 'cours.xml' and 'article.xsl'.



Of course $page = utf8_encode($page); ... is of no use,

because the 'utf8_encode' only operates on the string 'article.xsl', and
not on the file content !.



CONCLUSION : It's not really a BUG in the ->load() function.

But it would be really important to have a supplementary optional
parameter,

indicating the encoding of the incoming file:



-----Desired improvment ----------->

Add an optional parameter describing the $file actual encoding:  



$xmlDoc->load($page, 'iso-8859-1');

DOMDocument::load( string $file [, string $encoding])



The $encoding optional parameter thus would be useful to describe the
actual $file encoding (if not UTF-8).

----------- END ---------------------- 

















Test script:
---------------
[test.php]

 <?php

 $xmlDoc = new DOMDocument();

 $xmlDoc->load("cours.xml");

 ?>



[cours.xml]  (no matter the line encoding... 

The problem is caused by the 'é' from 'éclair'...)



<?xml version="1.0" encoding="UTF-8"?>

<root>

  <chapitre titre="Titre du chapitre 1">

    <partie titre="Titre de la partie 1">

      Texte éclair 

    </partie>

  </chapitre>

 </root>









(displays):



Warning: DOMDocument::load() [domdocument.load]: Input is not proper UTF-8,
indicate encoding ! Bytes: 0xE9 0x63 0x6C 0x61 in
file:///C:/wamp/www/xsl2/cours.xml, line: 5 in C:\wamp\www\xsl2\test.php on
line 3


-- 
Edit bug report at http://bugs.php.net/bug.php?id=51325&edit=1
-- 
Try a snapshot (PHP 5.2):            
http://bugs.php.net/fix.php?id=51325&r=trysnapshot52
Try a snapshot (PHP 5.3):            
http://bugs.php.net/fix.php?id=51325&r=trysnapshot53
Try a snapshot (PHP 6.0):            
http://bugs.php.net/fix.php?id=51325&r=trysnapshot60
Fixed in SVN:                        
http://bugs.php.net/fix.php?id=51325&r=fixed
Fixed in SVN and need be documented: 
http://bugs.php.net/fix.php?id=51325&r=needdocs
Fixed in release:                    
http://bugs.php.net/fix.php?id=51325&r=alreadyfixed
Need backtrace:                      
http://bugs.php.net/fix.php?id=51325&r=needtrace
Need Reproduce Script:               
http://bugs.php.net/fix.php?id=51325&r=needscript
Try newer version:                   
http://bugs.php.net/fix.php?id=51325&r=oldversion
Not developer issue:                 
http://bugs.php.net/fix.php?id=51325&r=support
Expected behavior:                   
http://bugs.php.net/fix.php?id=51325&r=notwrong
Not enough info:                     
http://bugs.php.net/fix.php?id=51325&r=notenoughinfo
Submitted twice:                     
http://bugs.php.net/fix.php?id=51325&r=submittedtwice
register_globals:                    
http://bugs.php.net/fix.php?id=51325&r=globals
PHP 4 support discontinued:          http://bugs.php.net/fix.php?id=51325&r=php4
Daylight Savings:                    http://bugs.php.net/fix.php?id=51325&r=dst
IIS Stability:                       
http://bugs.php.net/fix.php?id=51325&r=isapi
Install GNU Sed:                     
http://bugs.php.net/fix.php?id=51325&r=gnused
Floating point limitations:          
http://bugs.php.net/fix.php?id=51325&r=float
No Zend Extensions:                  
http://bugs.php.net/fix.php?id=51325&r=nozend
MySQL Configuration Error:           
http://bugs.php.net/fix.php?id=51325&r=mysqlcfg

Reply via email to