Edit report at https://bugs.php.net/bug.php?id=63430&edit=1
ID: 63430 Updated by: ras...@php.net Reported by: lussenburg_rm at hotmail dot com Summary: xml data parsing bug -Status: Closed +Status: Not a bug Type: Bug Package: XML Reader Operating System: windows 7 PHP Version: Irrelevant Block user comment: N Private report: N Previous Comments: ------------------------------------------------------------------------ [2012-11-21 11:33:03] lussenburg_rm at hotmail dot com . ------------------------------------------------------------------------ [2012-11-21 11:32:16] lussenburg_rm at hotmail dot com That does work indeed, thanks. I guess i misunderstood the explanation of next(). i didn't expect it to skip over the beginning <tag> of a new element. i thougt it would only skip over all subtrees of the current element, and that the read at the top of the loop would start at the <item> element. Compliments on the 'super fast' reply also ! ------------------------------------------------------------------------ [2012-11-20 21:44:29] mail+php at requinix dot net Hate to burst your bubble but there's a flaw in your code. The problem occurs when * There is a node before an <item> with no whitespace (ie, a #text) in between * Said node has children * Said node has an entry in $siblings The last two cause a line of code near the bottom if ( $node->hasChildNodes() && ($mode == 1 || $siblings[$node->nodeName]) ) $xml->next(); to fire. next() will skip over the rest of the node and, in lieu of a subsequent #text, advance to the <item>. But at the top of your loop you have a read(). That will skip over the tag and into the following #text (between the <item> and the <title>). You can confirm this by outputting the node name at the beginning of the loop - before the switch that would skip over it: <image>, then #text, then <title>. It works for me if I change the while loop into a do/while: * $xml->read() before the loop to initialize * flag=false at the start of the loop * the aforementioned line sets flag=$xml->next() * do/while ( flag || $xml->read() ) If you'd like to know more you can email me at this address. ------------------------------------------------------------------------ [2012-11-20 20:30:51] lussenburg_rm at hotmail dot com Hi there, This code is for testing purposes so i could learn how XMLReader() works before incorporating it in a RssWebfeed class i've written. In this code the only thing i replace, to work around the bug i got, is the bit that is commented out in this example. 'nosnieuwsalgemeen.xml' is the file I have saved on my pc so i don't have to read it from internet everytime. It is the contents of http://feeds.nos.nl/nosnieuwsalgemeen. Another example is http://www.nasa.gov/rss/breaking_news.rss, but this one doesn't give the bug. In the implementation, I need to get the data that comes before the first <item> into a feed database which identifies different feed id's and its title and description. When i encounter the first <item> these are records that go into a 2nd database which defines items for a particular feed. Here's the code: /* $find = array ( '<![CDATA[', ']]>', '><item>' ); $repl = array ( '', '', '>\r\n<item>' ); */ $file = 'nasa_breaking_news.xml'; $cont = file_get_contents($file); //$cont = str_ireplace($find, $repl, $cont); $nodes = array ( 'rss' => array( 'version' => 'rss_version' ), 'guid' => true, 'link' => true, 'title' => true, 'description' => true, 'pubDate' => true, 'lastBuildDate' => true, 'language' => true, 'image' => true, 'enclosure' => array( 'url' => 'enclosure', 'type' => 'type', 'width' => 'imgwidth' ), 'managingEditor' => true, 'related' => true, ); $siblings = array ( 'image' => array( 'url' => 'image', 'title' => 'alt', 'link' => 'link', 'description' => 'title' ), ); $xml = new XMLReader(); if ( $xml ) { echo ' <div class="e large">xml = new XMLReader()</div> <div>gelukt</div> <br>'; } if ( $xml->xml($cont, THIS_CHARSET, LIBXML_NOERROR|LIBXML_NOWARNING) === true ) { printf( ' <div class="e large">xml->open()</div> <div>%s</div> <br>', $file ); echo ' <br>'; $mode = 0; $element = ''; $itemcount = 0; while ( $xml->read() ) { if ( $xml->name == 'item' ) { switch ( $xml->nodeType ) { case XMLReader::ELEMENT: $itemcount++; $mode = 1; break; case XMLReader::END_ELEMENT: $mode = 0; break; } } $element = ''; switch ( $xml->nodeType ) { case XMLReader::END_ELEMENT: case XMLReader::SIGNIFICANT_WHITESPACE: case XMLReader::WHITESPACE: case XMLReader::TEXT: case XMLReader::CDATA: continue 2; } printf( ' <br> <div style="padding-left:%uem;"> <div class="e large">xml->read():</div> <div>xml->name: %s%s</div> <div>xml->nodeType: %d</div> <div>xml->isEmpty: %s</div> <div>xml->hasvalue: %s</div> <div>xml->attr: %s</div> <div>xml->depth: %d</div>', $mode+1, $xml->name, $xml->name=='item' ? sprintf(' (rec#: %u)', $itemcount) : '', $xml->nodeType, $xml->isEmptyElement ? "yes" : "no", $xml->hasValue ? "yes" : "no", $xml->hasAttributes ? $xml->attributeCount : "no", $xml->depth ); if ( !$nodes[$xml->name] ) { echo ' </div>'; continue; } switch ( $xml->nodeType ) { case XMLReader::ELEMENT: $element = $xml->name; printf( ' <div%s>', $nodes[$xml->name] ? ' class="grey"' : '' ); if ( $nodes[$xml->name] === true ) { printf( ' <div>INNER: %s</div>', $xml->readInnerXML() ); } if ( $node = $xml->expand() ) { printf( ' <div>node->name: %s</div>', $node->nodeName ); printf( ' <div>node->childs: %s</div>', $node->hasChildNodes() ? "".$node->childNodes->length : "no" ); if ( $xml->hasAttributes && $node->attributes !== null ) { echo ' <div>node->attr: '; for ( $i = 0; $i < $xml->attributeCount; $i++ ) { $item = $node->attributes->item($i); if ( $nodes[$xml->name][$item->nodeName] ) printf('[%s=%s]', $nodes[$xml->name][$item->nodeName], $item->nodeValue); } echo ' </div>'; } if ( $node->hasChildNodes() && $siblings[$node->nodeName] ) { echo '<div>node->items:'; for ( $i = 0; $i < $node->childNodes->length - 1; $i++ ) { $item = $node->childNodes->item($i); if ( $item->nodeType == XMLReader::ELEMENT && $siblings[$node->nodeName][$item->nodeName]) { echo '['.$siblings[$node->nodeName][$item->nodeName].'='.$item->nodeValue.']'; } } echo '</div>'; } if ( $node->hasChildNodes() && ($mode == 1 || $siblings[$node->nodeName]) ) $xml->next(); } echo ' </div>'; break; } echo ' </div>'; } $ret = $xml->close(); printf( ' <br> <div class="bordertop"> <div class="e large">xml->close():</div> <div>%sgelukt</div> </div>', $ret===false ? 'niet ' : '' ); } ------------------------------------------------------------------------ [2012-11-07 19:50:22] mail+php at requinix dot net Even if the input is "faulty" example code is still important. For all we know it's a complex problem you're triggering because of something subtle in your code. I can't reproduce it with <?php $xml = <<<XML <rss> <channel> <title>feed title</title> <description>feed description</description> <pubDate>Mon, 29 Oct 2012 13:30:00 +0100</pubDate><item> <title>item title</title> <description>item description</description> <link>itemlink</link> </item> </channel> </rss> XML; $reader = new XMLReader(); $reader->xml($xml); // http://www.php.net/manual/en/class.xmlreader.php#88264 function xml2assoc($xml) { removed for brevity } print_r(xml2assoc($reader)); ?> PHP 5.4.3 and libxml 2.7.7 ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at https://bugs.php.net/bug.php?id=63430 -- Edit this bug report at https://bugs.php.net/bug.php?id=63430&edit=1