ID:               47502
 User updated by:  grodny at oneclick dot sk
 Reported By:      grodny at oneclick dot sk
 Status:           Open
 Bug Type:         XML related
 Operating System: Windows
 PHP Version:      5.3CVS-2009-02-25 (snap)
 New Comment:

Possible solution could be introduction of second optional argument and
thus enhacing current functionality and keeping backward compatibility.

xml_get_current_byte_index (resource $parser [, int $position=0])

position:
  If 0, keep current behaviour for backward compatibility.
  If -1, return index of first byte of node
    (for start element or PI it is '<', for text node it is first byte
of $data string passed to handler, etc.)
  If 1, return index after last byte of node.
    (for start element or PI byte index after '>', for text node after
last byte of $data string passed to handler)


Previous Comments:
------------------------------------------------------------------------

[2009-02-25 15:56:11] grodny at oneclick dot sk

Description:
------------
Byte index returned by xml_get_current_byte_index() call in character
data handler, points to different locations of XML source, based on
character data being parsed.

If parsed string passed as second argument to handler starts with ASCII
non-white space character, byte index is offset to location before
parsed string.

If parsed string starts with white space, or UTF-8 character, it points
after parsed string.

To keep consistency with other handlers, it should return offset to
location after parsed string, in all cases.


Reproduce code:
---------------
$xml = '<R><N>before</N><N>'
        .html_entity_decode('&sect;', ENT_COMPAT, 'UTF-8')
        .'after</N><N> after</N>before </R>';

function cdata ($p, $cdata) {
  global $xml;

  $off = xml_get_current_byte_index($p);

  echo 'CDATA: "',
    htmlentities($cdata, ENT_COMPAT, 'UTF-8'), '"', PHP_EOL,
    'AFTER-INDEX: "',
    htmlentities(substr($xml, $off), ENT_COMPAT, 'UTF-8'), '"',
    PHP_EOL;
}

$p = xml_parser_create('UTF-8');
xml_set_character_data_handler($p, 'cdata');
xml_parse($p, $xml, true);
xml_parser_free($p);


Expected result:
----------------
CDATA: "before"
AFTER-INDEX: "</N><N>§after</N><N> after</N>before </R>"
CDATA: "§after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "</R>"


Actual result:
--------------
CDATA: "before"
AFTER-INDEX: "before</N><N>§after</N><N> after</N>before </R>"
CDATA: "§after"
AFTER-INDEX: "</N><N> after</N>before </R>"
CDATA: " after"
AFTER-INDEX: "</N>before </R>"
CDATA: "before "
AFTER-INDEX: "before </R>"


------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=47502&edit=1

Reply via email to