Edit report at http://bugs.php.net/bug.php?id=47502&edit=1
ID: 47502 Updated by: fel...@php.net Reported by: grodny at oneclick dot sk Summary: xml_get_current_byte_index inside character data handler returns wrong offset -Status: Open +Status: Assigned Type: Bug Package: XML related Operating System: Windows PHP Version: 5.3CVS-2009-02-25 (snap) -Assigned To: +Assigned To: rrichards Previous Comments: ------------------------------------------------------------------------ [2009-02-26 09:13:07] grodny at oneclick dot sk Possible solution could be introduction of second optional argument and thus enhacing current functionality and keeping backward compatibility. xml_get_current_byte_index (resource $parser [, int $position=0]) position: If 0, keep current behaviour for backward compatibility. If -1, return index of first byte of node (for start element or PI it is '<', for text node it is first byte of $data string passed to handler, etc.) If 1, return index after last byte of node. (for start element or PI byte index after '>', for text node after last byte of $data string passed to handler) ------------------------------------------------------------------------ [2009-02-25 15:56:11] grodny at oneclick dot sk Description: ------------ Byte index returned by xml_get_current_byte_index() call in character data handler, points to different locations of XML source, based on character data being parsed. If parsed string passed as second argument to handler starts with ASCII non-white space character, byte index is offset to location before parsed string. If parsed string starts with white space, or UTF-8 character, it points after parsed string. To keep consistency with other handlers, it should return offset to location after parsed string, in all cases. Reproduce code: --------------- $xml = '<R><N>before</N><N>' .html_entity_decode('§', ENT_COMPAT, 'UTF-8') .'after</N><N> after</N>before </R>'; function cdata ($p, $cdata) { global $xml; $off = xml_get_current_byte_index($p); echo 'CDATA: "', htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL, 'AFTER-INDEX: "', htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"', PHP_EOL; } $p = xml_parser_create('UTF-8'); xml_set_character_data_handler($p, 'cdata'); xml_parse($p, $xml, true); xml_parser_free($p); Expected result: ---------------- CDATA: "before" AFTER-INDEX: "</N><N>§after</N><N> after</N>before </R>" CDATA: "§after" AFTER-INDEX: "</N><N> after</N>before </R>" CDATA: " after" AFTER-INDEX: "</N>before </R>" CDATA: "before " AFTER-INDEX: "</R>" Actual result: -------------- CDATA: "before" AFTER-INDEX: "before</N><N>§after</N><N> after</N>before </R>" CDATA: "§after" AFTER-INDEX: "</N><N> after</N>before </R>" CDATA: " after" AFTER-INDEX: "</N>before </R>" CDATA: "before " AFTER-INDEX: "before </R>" ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/bug.php?id=47502&edit=1