From:             saramaca at libertysurf dot fr
Operating system: Windows XP
PHP version:      5.1.0
PHP Bug Type:     XML related
Bug description:  xml_parse_into_struct() chokes on the UTF-8 BOM

Description:
------------
In PHP4 xml_parse_into_struct() can parse an UTF-8-encoded XML file with
or without a UTF-8 BOM (\xEF\xBB\xBF). In PHP 5, this is no longer the
case and it raises an error saying the string doesn't contain any XML data
(Empty document). 

Additionally PHP 5's xml_parse_into_struct() does *NOT* place default
attribute values into the struct (e.g. despite the DTD provided,
$content[1]['attributes']['type'] isn't set to "literal" in actual result
section below ; please compare it to expected result.) This used to work
under PHP 4.1.x and above (but the parser is based on expat AFAIK.) 

PS: I guess "manually" stripping this magic number -- if embedded --
before calling the function would yield the expected result. However I
found an acceptable work-around that seems to work equally well across
versions 4 and 5 of PHP :

<?php
...
$parser = xml_parser_create('');
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, $encoding);
...
?>

Rather than:

<?php
...
$parser = xml_parser_create($encoding);
...
?>

Reproduce code:
---------------
http://www.diptyque.net/bugs/utf8_bom.php
; running PHP 4 --> outputs expected result

http://www.diptyque.net/bugs/utf8_bom.phps
; source code

Expected result:
----------------
w/ autodetect -->
Array
(
    [0] => Array
        (
            [tag] => bundle
            [type] => open
            [level] => 1
            [value] =>

        )

    [1] => Array
        (
            [tag] => resource
            [type] => complete
            [level] => 2
            [attributes] => Array
                (
                    [key] => rSeeYou
                    [type] => literal
                )

            [value] => A bient&244;t
        )

    [2] => Array
        (
            [tag] => bundle
            [value] =>

            [type] => cdata
            [level] => 1
        )

    [3] => Array
        (
            [tag] => bundle
            [type] => close
            [level] => 1
        )

)
w/o autodetect -->
Array
(
    [0] => Array
        (
            [tag] => bundle
            [type] => open
            [level] => 1
            [value] =>

        )

    [1] => Array
        (
            [tag] => resource
            [type] => complete
            [level] => 2
            [attributes] => Array
                (
                    [key] => rSeeYou
                    [type] => literal
                )

            [value] => A bient&244;t
        )

    [2] => Array
        (
            [tag] => bundle
            [value] =>

            [type] => cdata
            [level] => 1
        )

    [3] => Array
        (
            [tag] => bundle
            [type] => close
            [level] => 1
        )

)

Actual result:
--------------
w/ autodetect -->
Array
(
    [0] => Array
        (
            [tag] => bundle
            [type] => open
            [level] => 1
            [value] =>

        )

    [1] => Array
        (
            [tag] => resource
            [type] => complete
            [level] => 2
            [attributes] => Array
                (
                    [key] => rSeeYou
                )

            [value] => A bient&244;t
        )

    [2] => Array
        (
            [tag] => bundle
            [value] =>

            [type] => cdata
            [level] => 1
        )

    [3] => Array
        (
            [tag] => bundle
            [type] => close
            [level] => 1
        )

)
w/o autodetect -->
Empty document

-- 
Edit bug report at http://bugs.php.net/?id=35447&edit=1
-- 
Try a CVS snapshot (php4):   http://bugs.php.net/fix.php?id=35447&r=trysnapshot4
Try a CVS snapshot (php5.0): 
http://bugs.php.net/fix.php?id=35447&r=trysnapshot50
Try a CVS snapshot (php5.1): 
http://bugs.php.net/fix.php?id=35447&r=trysnapshot51
Fixed in CVS:                http://bugs.php.net/fix.php?id=35447&r=fixedcvs
Fixed in release:            http://bugs.php.net/fix.php?id=35447&r=alreadyfixed
Need backtrace:              http://bugs.php.net/fix.php?id=35447&r=needtrace
Need Reproduce Script:       http://bugs.php.net/fix.php?id=35447&r=needscript
Try newer version:           http://bugs.php.net/fix.php?id=35447&r=oldversion
Not developer issue:         http://bugs.php.net/fix.php?id=35447&r=support
Expected behavior:           http://bugs.php.net/fix.php?id=35447&r=notwrong
Not enough info:             
http://bugs.php.net/fix.php?id=35447&r=notenoughinfo
Submitted twice:             
http://bugs.php.net/fix.php?id=35447&r=submittedtwice
register_globals:            http://bugs.php.net/fix.php?id=35447&r=globals
PHP 3 support discontinued:  http://bugs.php.net/fix.php?id=35447&r=php3
Daylight Savings:            http://bugs.php.net/fix.php?id=35447&r=dst
IIS Stability:               http://bugs.php.net/fix.php?id=35447&r=isapi
Install GNU Sed:             http://bugs.php.net/fix.php?id=35447&r=gnused
Floating point limitations:  http://bugs.php.net/fix.php?id=35447&r=float
No Zend Extensions:          http://bugs.php.net/fix.php?id=35447&r=nozend
MySQL Configuration Error:   http://bugs.php.net/fix.php?id=35447&r=mysqlcfg

Reply via email to