Edit report at https://bugs.php.net/bug.php?id=63430&edit=1

 ID:                 63430
 Updated by:         ras...@php.net
 Reported by:        lussenburg_rm at hotmail dot com
 Summary:            xml data parsing bug
-Status:             Closed
+Status:             Not a bug
 Type:               Bug
 Package:            XML Reader
 Operating System:   windows 7
 PHP Version:        Irrelevant
 Block user comment: N
 Private report:     N



Previous Comments:
------------------------------------------------------------------------
[2012-11-21 11:33:03] lussenburg_rm at hotmail dot com

.

------------------------------------------------------------------------
[2012-11-21 11:32:16] lussenburg_rm at hotmail dot com

That does work indeed, thanks. I guess i misunderstood the explanation of 
next(). i didn't expect it to skip over the beginning <tag> of a new element. i 
thougt it would only skip over all subtrees of the current element, and that 
the read at the top of the loop would start at the <item> element.

Compliments on the 'super fast' reply also !

------------------------------------------------------------------------
[2012-11-20 21:44:29] mail+php at requinix dot net

Hate to burst your bubble but there's a flaw in your code. The problem occurs 
when
* There is a node before an <item> with no whitespace (ie, a #text) in between
* Said node has children
* Said node has an entry in $siblings

The last two cause a line of code near the bottom

if ( $node->hasChildNodes() && ($mode == 1 || $siblings[$node->nodeName]) )
  $xml->next();

to fire. next() will skip over the rest of the node and, in lieu of a 
subsequent 
#text, advance to the <item>. But at the top of your loop you have a read(). 
That 
will skip over the tag and into the following #text (between the <item> and the 
<title>). You can confirm this by outputting the node name at the beginning of 
the 
loop - before the switch that would skip over it: <image>, then #text, then 
<title>.

It works for me if I change the while loop into a do/while:
* $xml->read() before the loop to initialize
* flag=false at the start of the loop
* the aforementioned line sets flag=$xml->next()
* do/while ( flag || $xml->read() )

If you'd like to know more you can email me at this address.

------------------------------------------------------------------------
[2012-11-20 20:30:51] lussenburg_rm at hotmail dot com

Hi there,

This code is for testing purposes so i could learn how XMLReader() works before 
incorporating it in a RssWebfeed class i've written.
In this code the only thing i replace, to work around the bug i got, is the bit 
that is commented out in this example. 'nosnieuwsalgemeen.xml' is the file I 
have saved on my pc so i don't have to read it from internet everytime. It is 
the contents of http://feeds.nos.nl/nosnieuwsalgemeen. Another example is 
http://www.nasa.gov/rss/breaking_news.rss, but this one doesn't give the bug.
In the implementation, I need to get the data that comes before the first 
<item> into a feed database which identifies different feed id's and its title 
and description. When i encounter the first <item> these are records that go 
into a 2nd database which defines items for a particular feed.


Here's the code:


/*
$find = array (
        '<![CDATA[', ']]>', '><item>'
);
$repl = array (
        '',          '',    '>\r\n<item>'
);
*/

$file = 'nasa_breaking_news.xml';

$cont = file_get_contents($file);
//$cont = str_ireplace($find, $repl, $cont);

$nodes = array (
        'rss'            => array( 'version' => 'rss_version' ),
        'guid'           => true,
        'link'           => true,
        'title'          => true,
        'description'    => true,
        'pubDate'        => true,
        'lastBuildDate'  => true,
        'language'       => true,
        'image'          => true,
        'enclosure'      => array( 'url' => 'enclosure', 'type' => 'type', 
'width' => 'imgwidth' ),
        'managingEditor' => true,
        'related'        => true,
);

$siblings = array (
        'image' => array( 'url' => 'image', 'title' => 'alt', 'link' => 'link', 
'description' => 'title' ),
);

$xml = new XMLReader();

if ( $xml ) {
        echo '
        <div class="e large">xml = new XMLReader()</div>
        <div>gelukt</div>
        <br>';
}

if ( $xml->xml($cont, THIS_CHARSET, LIBXML_NOERROR|LIBXML_NOWARNING) === true ) 
{
        printf( '
        <div class="e large">xml->open()</div>
        <div>%s</div>
        <br>',
        $file
        );

        echo '
        <br>';

        $mode        = 0;
        $element     = '';
        $itemcount   = 0;

        while ( $xml->read() ) {

                if ( $xml->name == 'item' ) {
                        switch ( $xml->nodeType ) {
                        case XMLReader::ELEMENT:
                                $itemcount++;
                                $mode = 1;
                                break;
                        case XMLReader::END_ELEMENT:
                                $mode = 0;
                                break;
                        }
                }

                $element = '';

                switch ( $xml->nodeType ) {
                case XMLReader::END_ELEMENT:
                case XMLReader::SIGNIFICANT_WHITESPACE:
                case XMLReader::WHITESPACE:
                case XMLReader::TEXT:
                case XMLReader::CDATA:
                        continue 2;
                }

                printf( '
                <br>
                <div style="padding-left:%uem;">
                <div class="e large">xml->read():</div>
                <div>xml->name: %s%s</div>
                <div>xml->nodeType: %d</div>
                <div>xml->isEmpty: %s</div>
                <div>xml->hasvalue: %s</div>
                <div>xml->attr: %s</div>
                <div>xml->depth: %d</div>',
                $mode+1,
                $xml->name,
                $xml->name=='item' ? sprintf(' (rec#: %u)', $itemcount) : '',
                $xml->nodeType,
                $xml->isEmptyElement ? "yes" : "no",
                $xml->hasValue ? "yes" : "no",
                $xml->hasAttributes ? $xml->attributeCount : "no",
                $xml->depth
                );

                if ( !$nodes[$xml->name] ) {
                        echo '
                        </div>';
                        continue;
                }

                switch ( $xml->nodeType ) {
                case XMLReader::ELEMENT:
                        $element = $xml->name;
                        printf( '
                        <div%s>',
                        $nodes[$xml->name] ? ' class="grey"' : ''
                        );
                        if ( $nodes[$xml->name] === true ) {
                                printf( '
                                <div>INNER: %s</div>',
                                $xml->readInnerXML()
                                );
                        }
                        if ( $node = $xml->expand() ) {
                                printf( '
                                <div>node->name: %s</div>',
                                $node->nodeName
                                );
                                printf( '
                                <div>node->childs: %s</div>',
                                $node->hasChildNodes() ? 
"".$node->childNodes->length : "no"
                                );
                                if ( $xml->hasAttributes && $node->attributes 
!== null ) {
                                        echo '
                                        <div>node->attr: ';
                                        for ( $i = 0; $i < 
$xml->attributeCount; $i++ ) {
                                                $item = 
$node->attributes->item($i);
                                                if ( 
$nodes[$xml->name][$item->nodeName] ) printf('[%s=%s]', 
$nodes[$xml->name][$item->nodeName], $item->nodeValue);
                                        }
                                        echo '
                                        </div>';
                                }
                                if ( $node->hasChildNodes() && 
$siblings[$node->nodeName] ) {
                                        echo '<div>node->items:';
                                        for ( $i = 0; $i < 
$node->childNodes->length - 1; $i++ ) {
                                                $item = 
$node->childNodes->item($i);
                                                if ( $item->nodeType == 
XMLReader::ELEMENT && $siblings[$node->nodeName][$item->nodeName]) {
                                                        echo 
'['.$siblings[$node->nodeName][$item->nodeName].'='.$item->nodeValue.']';
                                                }
                                        }
                                        echo '</div>';
                                }
                                if ( $node->hasChildNodes() && ($mode == 1 || 
$siblings[$node->nodeName]) ) $xml->next();
                        }
                        echo '
                        </div>';
                        break;
                }
                echo '
                </div>';
        }

        $ret = $xml->close();

        printf( '
        <br>
        <div class="bordertop">
        <div class="e large">xml->close():</div>
        <div>%sgelukt</div>
        </div>',
        $ret===false ? 'niet ' : ''
        );

}

------------------------------------------------------------------------
[2012-11-07 19:50:22] mail+php at requinix dot net

Even if the input is "faulty" example code is still important. For all we know 
it's a complex problem you're triggering because of something subtle in your 
code.

I can't reproduce it with

<?php
$xml = <<<XML
<rss>
 <channel>
  <title>feed title</title>
  <description>feed description</description>
  <pubDate>Mon, 29 Oct 2012 13:30:00 +0100</pubDate><item>
    <title>item title</title>
    <description>item description</description>
    <link>itemlink</link>
  </item>
 </channel>
</rss>
XML;

$reader = new XMLReader();
$reader->xml($xml);

// http://www.php.net/manual/en/class.xmlreader.php#88264
function xml2assoc($xml) { removed for brevity }

print_r(xml2assoc($reader));
?>

PHP 5.4.3 and libxml 2.7.7

------------------------------------------------------------------------


The remainder of the comments for this report are too long. To view
the rest of the comments, please view the bug report online at

    https://bugs.php.net/bug.php?id=63430


-- 
Edit this bug report at https://bugs.php.net/bug.php?id=63430&edit=1

Reply via email to