ID: 47502
User updated by: grodny at oneclick dot sk
Reported By: grodny at oneclick dot sk
Status: Open
Bug Type: XML related
Operating System: Windows
PHP Version: 5.3CVS-2009-02-25 (snap)
New Comment:
Possible solution could be introduction of second optional argument and
thus enhacing current functionality and keeping backward compatibility.
xml_get_current_byte_index (resource $parser [, int $position=0])
position:
If 0, keep current behaviour for backward compatibility.
If -1, return index of first byte of node
(for start element or PI it is '<', for text node it is first byte
of $data string passed to handler, etc.)
If 1, return index after last byte of node.
(for start element or PI byte index after '>', for text node after
last byte of $data string passed to handler)
Previous Comments:
------------------------------------------------------------------------
[2009-02-25 15:56:11] grodny at oneclick dot sk
Description:
------------
Byte index returned by xml_get_current_byte_index() call in character
data handler, points to different locations of XML source, based on
character data being parsed.
If parsed string passed as second argument to handler starts with ASCII
non-white space character, byte index is offset to location before
parsed string.
If parsed string starts with white space, or UTF-8 character, it points
after parsed string.
To keep consistency with other handlers, it should return offset to
location after parsed string, in all cases.
Reproduce code:
---------------
$xml = '<R><N>before</N><N>'
.html_entity_decode('§', ENT_COMPAT, 'UTF-8')
.'after</N><N> after</N>before </R>';
function cdata ($p, $cdata) {
global $xml;
$off = xml_get_current_byte_index($p);
echo 'CDATA: "',
htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL,
'AFTER-INDEX: "',
htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"',
PHP_EOL;
}
$p = xml_parser_create('UTF-8');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p, $xml, true);
xml_parser_free($p);
Expected result:
----------------
CDATA: "before"
AFTER-INDEX: "</N><N>§after</N><N> after</N>before </R>"
CDATA: "§after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "</R>"
Actual result:
--------------
CDATA: "before"
AFTER-INDEX: "before</N><N>§after</N><N> after</N>before </R>"
CDATA: "§after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "before </R>"
------------------------------------------------------------------------
--
Edit this bug report at http://bugs.php.net/?id=47502&edit=1