From: saramaca at libertysurf dot fr Operating system: Windows XP PHP version: 5.1.0 PHP Bug Type: XML related Bug description: xml_parse_into_struct() chokes on the UTF-8 BOM
Description: ------------ In PHP4 xml_parse_into_struct() can parse an UTF-8-encoded XML file with or without a UTF-8 BOM (\xEF\xBB\xBF). In PHP 5, this is no longer the case and it raises an error saying the string doesn't contain any XML data (Empty document). Additionally PHP 5's xml_parse_into_struct() does *NOT* place default attribute values into the struct (e.g. despite the DTD provided, $content[1]['attributes']['type'] isn't set to "literal" in actual result section below ; please compare it to expected result.) This used to work under PHP 4.1.x and above (but the parser is based on expat AFAIK.) PS: I guess "manually" stripping this magic number -- if embedded -- before calling the function would yield the expected result. However I found an acceptable work-around that seems to work equally well across versions 4 and 5 of PHP : <?php ... $parser = xml_parser_create(''); xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, $encoding); ... ?> Rather than: <?php ... $parser = xml_parser_create($encoding); ... ?> Reproduce code: --------------- http://www.diptyque.net/bugs/utf8_bom.php ; running PHP 4 --> outputs expected result http://www.diptyque.net/bugs/utf8_bom.phps ; source code Expected result: ---------------- w/ autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou [type] => literal ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) w/o autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou [type] => literal ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) Actual result: -------------- w/ autodetect --> Array ( [0] => Array ( [tag] => bundle [type] => open [level] => 1 [value] => ) [1] => Array ( [tag] => resource [type] => complete [level] => 2 [attributes] => Array ( [key] => rSeeYou ) [value] => A bient&244;t ) [2] => Array ( [tag] => bundle [value] => [type] => cdata [level] => 1 ) [3] => Array ( [tag] => bundle [type] => close [level] => 1 ) ) w/o autodetect --> Empty document -- Edit bug report at http://bugs.php.net/?id=35447&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=35447&r=trysnapshot4 Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=35447&r=trysnapshot50 Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=35447&r=trysnapshot51 Fixed in CVS: http://bugs.php.net/fix.php?id=35447&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=35447&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=35447&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=35447&r=needscript Try newer version: http://bugs.php.net/fix.php?id=35447&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=35447&r=support Expected behavior: http://bugs.php.net/fix.php?id=35447&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=35447&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=35447&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=35447&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=35447&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=35447&r=dst IIS Stability: http://bugs.php.net/fix.php?id=35447&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=35447&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=35447&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=35447&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=35447&r=mysqlcfg