Edit report at https://bugs.php.net/bug.php?id=63430&edit=1
ID: 63430
Comment by: mail+php at requinix dot net
Reported by: lussenburg_rm at hotmail dot com
Summary: xml data parsing bug
Status: Open
Type: Bug
Package: XML Reader
Operating System: windows 7
PHP Version: Irrelevant
Block user comment: N
Private report: N
New Comment:
Even if the input is "faulty" example code is still important. For all we know
it's a complex problem you're triggering because of something subtle in your
code.
I can't reproduce it with
<?php
$xml = <<<XML
<rss>
<channel>
<title>feed title</title>
<description>feed description</description>
<pubDate>Mon, 29 Oct 2012 13:30:00 +0100</pubDate><item>
<title>item title</title>
<description>item description</description>
<link>itemlink</link>
</item>
</channel>
</rss>
XML;
$reader = new XMLReader();
$reader->xml($xml);
// http://www.php.net/manual/en/class.xmlreader.php#88264
function xml2assoc($xml) { removed for brevity }
print_r(xml2assoc($reader));
?>
PHP 5.4.3 and libxml 2.7.7
Previous Comments:
------------------------------------------------------------------------
[2012-11-03 17:23:12] lussenburg_rm at hotmail dot com
Description:
------------
---
>From manual page:
>http://www.php.net/xmlreader.read#refsect1-xmlreader.read-description
---
The bug isn't realy in the code so im not including any script here, but it is
related to the xml input. For example i'm reading some rss feeds (note that i
neither compose, nor responsible for the layout) that look like this:
<rss>
<channel>
<title>feed title</title>
<description>feed description</description>
<pubDate>Mon, 29 Oct 2012 13:30:00 +0100</pubDate>
<item>
<title>item title</title>
<description>item description</description>
<link>http://itemlink</link>
</item>
<item>
<title>item title</title>
<description>item description</description>
<link>http://bla</link>
</item>
...
</channel>
</rss>
Everything was working perfectly fine until i kept getting values from the
first 'item title' and 'item description' in the 'feed title' and 'feed
description' node values. When i examined the xml data i found out that it only
happens when the first <item> tag directly follows the last of the <channel>
nodes (<title>, <description>, <pubDate> etc) without a carriage return/newline.
To work around this, before passing the data to XMLReader::xml(), i replace all
occurences of "><item>" with ">\r\n<item>", which works fine, but maybe it
could be resolved so this workaround isn't neccesary anymore.
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=63430&edit=1