Bug #63430 [Opn]: xml data parsing bug

2012-11-21 Thread lussenburg_rm at hotmail dot com
Edit report at https://bugs.php.net/bug.php?id=63430edit=1

 ID: 63430
 User updated by:lussenburg_rm at hotmail dot com
 Reported by:lussenburg_rm at hotmail dot com
 Summary:xml data parsing bug
 Status: Open
 Type:   Bug
 Package:XML Reader
 Operating System:   windows 7
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

That does work indeed, thanks. I guess i misunderstood the explanation of 
next(). i didn't expect it to skip over the beginning tag of a new element. i 
thougt it would only skip over all subtrees of the current element, and that 
the read at the top of the loop would start at the item element.

Compliments on the 'super fast' reply also !


Previous Comments:

[2012-11-20 21:44:29] mail+php at requinix dot net

Hate to burst your bubble but there's a flaw in your code. The problem occurs 
when
* There is a node before an item with no whitespace (ie, a #text) in between
* Said node has children
* Said node has an entry in $siblings

The last two cause a line of code near the bottom

if ( $node-hasChildNodes()  ($mode == 1 || $siblings[$node-nodeName]) )
  $xml-next();

to fire. next() will skip over the rest of the node and, in lieu of a 
subsequent 
#text, advance to the item. But at the top of your loop you have a read(). 
That 
will skip over the tag and into the following #text (between the item and the 
title). You can confirm this by outputting the node name at the beginning of 
the 
loop - before the switch that would skip over it: image, then #text, then 
title.

It works for me if I change the while loop into a do/while:
* $xml-read() before the loop to initialize
* flag=false at the start of the loop
* the aforementioned line sets flag=$xml-next()
* do/while ( flag || $xml-read() )

If you'd like to know more you can email me at this address.


[2012-11-20 20:30:51] lussenburg_rm at hotmail dot com

Hi there,

This code is for testing purposes so i could learn how XMLReader() works before 
incorporating it in a RssWebfeed class i've written.
In this code the only thing i replace, to work around the bug i got, is the bit 
that is commented out in this example. 'nosnieuwsalgemeen.xml' is the file I 
have saved on my pc so i don't have to read it from internet everytime. It is 
the contents of http://feeds.nos.nl/nosnieuwsalgemeen. Another example is 
http://www.nasa.gov/rss/breaking_news.rss, but this one doesn't give the bug.
In the implementation, I need to get the data that comes before the first 
item into a feed database which identifies different feed id's and its title 
and description. When i encounter the first item these are records that go 
into a 2nd database which defines items for a particular feed.


Here's the code:


/*
$find = array (
'![CDATA[', ']]', 'item'
);
$repl = array (
'',  '','\r\nitem'
);
*/

$file = 'nasa_breaking_news.xml';

$cont = file_get_contents($file);
//$cont = str_ireplace($find, $repl, $cont);

$nodes = array (
'rss'= array( 'version' = 'rss_version' ),
'guid'   = true,
'link'   = true,
'title'  = true,
'description'= true,
'pubDate'= true,
'lastBuildDate'  = true,
'language'   = true,
'image'  = true,
'enclosure'  = array( 'url' = 'enclosure', 'type' = 'type', 
'width' = 'imgwidth' ),
'managingEditor' = true,
'related'= true,
);

$siblings = array (
'image' = array( 'url' = 'image', 'title' = 'alt', 'link' = 'link', 
'description' = 'title' ),
);

$xml = new XMLReader();

if ( $xml ) {
echo '
div class=e largexml = new XMLReader()/div
divgelukt/div
br';
}

if ( $xml-xml($cont, THIS_CHARSET, LIBXML_NOERROR|LIBXML_NOWARNING) === true ) 
{
printf( '
div class=e largexml-open()/div
div%s/div
br',
$file
);

echo '
br';

$mode= 0;
$element = '';
$itemcount   = 0;

while ( $xml-read() ) {

if ( $xml-name == 'item' ) {
switch ( $xml-nodeType ) {
case XMLReader::ELEMENT:
$itemcount++;
$mode = 1;
break;
case XMLReader::END_ELEMENT:
$mode = 0;
break;
}
}

$element = '';

switch ( $xml-nodeType ) {
case XMLReader::END_ELEMENT:
case XMLReader::SIGNIFICANT_WHITESPACE:
case XMLReader::WHITESPACE

Bug #63430 [Opn-Csd]: xml data parsing bug

2012-11-21 Thread lussenburg_rm at hotmail dot com
Edit report at https://bugs.php.net/bug.php?id=63430edit=1

 ID: 63430
 User updated by:lussenburg_rm at hotmail dot com
 Reported by:lussenburg_rm at hotmail dot com
 Summary:xml data parsing bug
-Status: Open
+Status: Closed
 Type:   Bug
 Package:XML Reader
 Operating System:   windows 7
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

.


Previous Comments:

[2012-11-21 11:32:16] lussenburg_rm at hotmail dot com

That does work indeed, thanks. I guess i misunderstood the explanation of 
next(). i didn't expect it to skip over the beginning tag of a new element. i 
thougt it would only skip over all subtrees of the current element, and that 
the read at the top of the loop would start at the item element.

Compliments on the 'super fast' reply also !


[2012-11-20 21:44:29] mail+php at requinix dot net

Hate to burst your bubble but there's a flaw in your code. The problem occurs 
when
* There is a node before an item with no whitespace (ie, a #text) in between
* Said node has children
* Said node has an entry in $siblings

The last two cause a line of code near the bottom

if ( $node-hasChildNodes()  ($mode == 1 || $siblings[$node-nodeName]) )
  $xml-next();

to fire. next() will skip over the rest of the node and, in lieu of a 
subsequent 
#text, advance to the item. But at the top of your loop you have a read(). 
That 
will skip over the tag and into the following #text (between the item and the 
title). You can confirm this by outputting the node name at the beginning of 
the 
loop - before the switch that would skip over it: image, then #text, then 
title.

It works for me if I change the while loop into a do/while:
* $xml-read() before the loop to initialize
* flag=false at the start of the loop
* the aforementioned line sets flag=$xml-next()
* do/while ( flag || $xml-read() )

If you'd like to know more you can email me at this address.


[2012-11-20 20:30:51] lussenburg_rm at hotmail dot com

Hi there,

This code is for testing purposes so i could learn how XMLReader() works before 
incorporating it in a RssWebfeed class i've written.
In this code the only thing i replace, to work around the bug i got, is the bit 
that is commented out in this example. 'nosnieuwsalgemeen.xml' is the file I 
have saved on my pc so i don't have to read it from internet everytime. It is 
the contents of http://feeds.nos.nl/nosnieuwsalgemeen. Another example is 
http://www.nasa.gov/rss/breaking_news.rss, but this one doesn't give the bug.
In the implementation, I need to get the data that comes before the first 
item into a feed database which identifies different feed id's and its title 
and description. When i encounter the first item these are records that go 
into a 2nd database which defines items for a particular feed.


Here's the code:


/*
$find = array (
'![CDATA[', ']]', 'item'
);
$repl = array (
'',  '','\r\nitem'
);
*/

$file = 'nasa_breaking_news.xml';

$cont = file_get_contents($file);
//$cont = str_ireplace($find, $repl, $cont);

$nodes = array (
'rss'= array( 'version' = 'rss_version' ),
'guid'   = true,
'link'   = true,
'title'  = true,
'description'= true,
'pubDate'= true,
'lastBuildDate'  = true,
'language'   = true,
'image'  = true,
'enclosure'  = array( 'url' = 'enclosure', 'type' = 'type', 
'width' = 'imgwidth' ),
'managingEditor' = true,
'related'= true,
);

$siblings = array (
'image' = array( 'url' = 'image', 'title' = 'alt', 'link' = 'link', 
'description' = 'title' ),
);

$xml = new XMLReader();

if ( $xml ) {
echo '
div class=e largexml = new XMLReader()/div
divgelukt/div
br';
}

if ( $xml-xml($cont, THIS_CHARSET, LIBXML_NOERROR|LIBXML_NOWARNING) === true ) 
{
printf( '
div class=e largexml-open()/div
div%s/div
br',
$file
);

echo '
br';

$mode= 0;
$element = '';
$itemcount   = 0;

while ( $xml-read() ) {

if ( $xml-name == 'item' ) {
switch ( $xml-nodeType ) {
case XMLReader::ELEMENT:
$itemcount++;
$mode = 1;
break;
case XMLReader::END_ELEMENT:
$mode = 0;
break;
}
}

$element = '';

switch

Bug #63430 [Opn]: xml data parsing bug

2012-11-20 Thread lussenburg_rm at hotmail dot com
Edit report at https://bugs.php.net/bug.php?id=63430edit=1

 ID: 63430
 User updated by:lussenburg_rm at hotmail dot com
 Reported by:lussenburg_rm at hotmail dot com
 Summary:xml data parsing bug
 Status: Open
 Type:   Bug
 Package:XML Reader
 Operating System:   windows 7
 PHP Version:Irrelevant
 Block user comment: N
 Private report: N

 New Comment:

Hi there,

This code is for testing purposes so i could learn how XMLReader() works before 
incorporating it in a RssWebfeed class i've written.
In this code the only thing i replace, to work around the bug i got, is the bit 
that is commented out in this example. 'nosnieuwsalgemeen.xml' is the file I 
have saved on my pc so i don't have to read it from internet everytime. It is 
the contents of http://feeds.nos.nl/nosnieuwsalgemeen. Another example is 
http://www.nasa.gov/rss/breaking_news.rss, but this one doesn't give the bug.
In the implementation, I need to get the data that comes before the first 
item into a feed database which identifies different feed id's and its title 
and description. When i encounter the first item these are records that go 
into a 2nd database which defines items for a particular feed.


Here's the code:


/*
$find = array (
'![CDATA[', ']]', 'item'
);
$repl = array (
'',  '','\r\nitem'
);
*/

$file = 'nasa_breaking_news.xml';

$cont = file_get_contents($file);
//$cont = str_ireplace($find, $repl, $cont);

$nodes = array (
'rss'= array( 'version' = 'rss_version' ),
'guid'   = true,
'link'   = true,
'title'  = true,
'description'= true,
'pubDate'= true,
'lastBuildDate'  = true,
'language'   = true,
'image'  = true,
'enclosure'  = array( 'url' = 'enclosure', 'type' = 'type', 
'width' = 'imgwidth' ),
'managingEditor' = true,
'related'= true,
);

$siblings = array (
'image' = array( 'url' = 'image', 'title' = 'alt', 'link' = 'link', 
'description' = 'title' ),
);

$xml = new XMLReader();

if ( $xml ) {
echo '
div class=e largexml = new XMLReader()/div
divgelukt/div
br';
}

if ( $xml-xml($cont, THIS_CHARSET, LIBXML_NOERROR|LIBXML_NOWARNING) === true ) 
{
printf( '
div class=e largexml-open()/div
div%s/div
br',
$file
);

echo '
br';

$mode= 0;
$element = '';
$itemcount   = 0;

while ( $xml-read() ) {

if ( $xml-name == 'item' ) {
switch ( $xml-nodeType ) {
case XMLReader::ELEMENT:
$itemcount++;
$mode = 1;
break;
case XMLReader::END_ELEMENT:
$mode = 0;
break;
}
}

$element = '';

switch ( $xml-nodeType ) {
case XMLReader::END_ELEMENT:
case XMLReader::SIGNIFICANT_WHITESPACE:
case XMLReader::WHITESPACE:
case XMLReader::TEXT:
case XMLReader::CDATA:
continue 2;
}

printf( '
br
div style=padding-left:%uem;
div class=e largexml-read():/div
divxml-name: %s%s/div
divxml-nodeType: %d/div
divxml-isEmpty: %s/div
divxml-hasvalue: %s/div
divxml-attr: %s/div
divxml-depth: %d/div',
$mode+1,
$xml-name,
$xml-name=='item' ? sprintf(' (rec#: %u)', $itemcount) : '',
$xml-nodeType,
$xml-isEmptyElement ? yes : no,
$xml-hasValue ? yes : no,
$xml-hasAttributes ? $xml-attributeCount : no,
$xml-depth
);

if ( !$nodes[$xml-name] ) {
echo '
/div';
continue;
}

switch ( $xml-nodeType ) {
case XMLReader::ELEMENT:
$element = $xml-name;
printf( '
div%s',
$nodes[$xml-name] ? ' class=grey' : ''
);
if ( $nodes[$xml-name] === true ) {
printf( '
divINNER: %s/div',
$xml-readInnerXML()
);
}
if ( $node = $xml-expand() ) {
printf

[PHP-BUG] Bug #63430 [NEW]: xml data parsing bug

2012-11-03 Thread lussenburg_rm at hotmail dot com
From: lussenburg_rm at hotmail dot com
Operating system: windows 7
PHP version:  Irrelevant
Package:  XML Reader
Bug Type: Bug
Bug description:xml data parsing bug

Description:

---
From manual page:
http://www.php.net/xmlreader.read#refsect1-xmlreader.read-description
---
The bug isn't realy in the code so im not including any script here, but it
is related to the xml input. For example i'm reading some rss feeds (note
that i neither compose, nor responsible for the layout) that look like
this:

rss
 channel
  titlefeed title/title
  descriptionfeed description/description
  pubDateMon, 29 Oct 2012 13:30:00 +0100/pubDate
  item
titleitem title/title
descriptionitem description/description
linkhttp://itemlink/link
  /item
  item
titleitem title/title
descriptionitem description/description
linkhttp://bla/link
  /item
  ...
 /channel
/rss

Everything was working perfectly fine until i kept getting values from the
first 'item title' and 'item description' in the 'feed title' and 'feed
description' node values. When i examined the xml data i found out that it
only happens when the first item tag directly follows the last of the
channel nodes (title, description, pubDate etc) without a carriage
return/newline.
To work around this, before passing the data to XMLReader::xml(), i replace
all occurences of item with \r\nitem, which works fine, but maybe
it could be resolved so this workaround isn't neccesary anymore.



-- 
Edit bug report at https://bugs.php.net/bug.php?id=63430edit=1
-- 
Try a snapshot (PHP 5.4):   
https://bugs.php.net/fix.php?id=63430r=trysnapshot54
Try a snapshot (PHP 5.3):   
https://bugs.php.net/fix.php?id=63430r=trysnapshot53
Try a snapshot (trunk): 
https://bugs.php.net/fix.php?id=63430r=trysnapshottrunk
Fixed in SVN:   https://bugs.php.net/fix.php?id=63430r=fixed
Fixed in release:   https://bugs.php.net/fix.php?id=63430r=alreadyfixed
Need backtrace: https://bugs.php.net/fix.php?id=63430r=needtrace
Need Reproduce Script:  https://bugs.php.net/fix.php?id=63430r=needscript
Try newer version:  https://bugs.php.net/fix.php?id=63430r=oldversion
Not developer issue:https://bugs.php.net/fix.php?id=63430r=support
Expected behavior:  https://bugs.php.net/fix.php?id=63430r=notwrong
Not enough info:
https://bugs.php.net/fix.php?id=63430r=notenoughinfo
Submitted twice:
https://bugs.php.net/fix.php?id=63430r=submittedtwice
register_globals:   https://bugs.php.net/fix.php?id=63430r=globals
PHP 4 support discontinued: https://bugs.php.net/fix.php?id=63430r=php4
Daylight Savings:   https://bugs.php.net/fix.php?id=63430r=dst
IIS Stability:  https://bugs.php.net/fix.php?id=63430r=isapi
Install GNU Sed:https://bugs.php.net/fix.php?id=63430r=gnused
Floating point limitations: https://bugs.php.net/fix.php?id=63430r=float
No Zend Extensions: https://bugs.php.net/fix.php?id=63430r=nozend
MySQL Configuration Error:  https://bugs.php.net/fix.php?id=63430r=mysqlcfg