ID: 50156 Comment by: edwin at bitstorm dot org Reported By: edwin at bitstorm dot org Status: Open Bug Type: XML Reader Operating System: Ubuntu 9.04 PHP Version: 5.2SVN-2009-11-12 (SVN) New Comment:
Turns out I can use the isEmptyElement-property to find out when dealing with an <a/>-element. This is a bit unfortunate, because, for example, SAX (the mother of all xml-readers?) does not use this mechnism and works as I would expect. It's just how libxml seems to work, so it should probably not be marked as a PHP-bug. This bug can be closed: "not a bug"... :-/ A little parsing example in the documentation might be a very good idea, though. Previous Comments: ------------------------------------------------------------------------ [2009-11-12 21:14:17] edwin at bitstorm dot org XML-parser used is libxml 2.7.3 . ------------------------------------------------------------------------ [2009-11-12 21:02:52] edwin at bitstorm dot org <?php // Source code for bug #50156 // The following code will output the text below, which // is not what you expect when you see the xml. // // Barcode is not a child of Type, but how can you know? // // Titles // Titles // Titles - Title // Titles - Title // Titles - Title - ID // Titles - Title - ID // Titles - Title // Titles - Title // Titles - Title - Type // Titles - Title - Type // Titles - Title - Type - Barcode // Titles - Title - Type // Titles - Title - Type // Titles - Title // Titles - Title // Titles $xml = " <Titles> <Title> <ID>429</ID> <Type/> <Barcode></Barcode> </Title> </Titles> "; $expected = " Titles Titles Titles - Title Titles - Title Titles - Title - ID Titles - Title - ID Titles - Title Titles - Title Titles - Title - Type Titles - Title Titles - Title Titles - Title - Barcode Titles - Title Titles - Title Titles Titles "; $reader = new XMLReader(); $reader->xml($xml); $actual = ''; // Make a stack for every element $stack = array(); while ($reader->read()) { switch($reader->nodeType) { case XMLReader::ELEMENT: array_push($stack, $reader->name); break; case XMLReader::END_ELEMENT: array_pop($stack); break; } $actual .= join(' - ', $stack)."\n"; } // Clean up and make it OS-agnostic $expected = preg_replace('/\\r/', '', trim($expected)); $actual = preg_replace('/\\r/', '', trim($actual)); // Print result echo "<h3>Expected</h3>\n"; echo "<pre>$expected</pre>\n"; echo "<h3>Actual</h3>\n"; echo "<pre>$actual</pre>\n"; // Test it if ($expected == $actual) { echo "<strong>Good</strong>"; } else { echo "<strong>Not good</strong>"; } ?> ------------------------------------------------------------------------ [2009-11-12 13:48:14] edwin at bitstorm dot org Description: ------------ Element <a></a> returns twice, one for XMLReader::ELEMENT and one for XMLReader::END_ELEMENT. Element <a/> returns once, for XMLReader::ELEMENT. That should return a XMLReader::END_ELEMENT too, because that's implicit. Problem is that now you can't distinguish between <a><b> and <a/><b> and that's a bug. Reproduce code: --------------- $reader = new XMLReader(); $reader->open($file); echo "<table>\n"; while ($reader->read()) { echo "<tr><td>".$reader->nodeType."</td><td>".$node = $reader->name."</td><td>".$reader->value."</td></tr>\n"; } echo "</table>\n"; Input: <Titles> <Title> <ID>429</ID> <Type /> <Barcode> </Barcode> Expected result: ---------------- 1 Titles 14 #text 1 Title 14 #text 1 ID 3 #text 429 15 ID 14 #text 1 Type 14 #text 15 Type 14 #text 1 Barcode 14 #text 15 Barcode 14 #text Actual result: -------------- 1 Titles 14 #text 1 Title 14 #text 1 ID 3 #text 429 15 ID 14 #text 1 Type 14 #text 1 Barcode 14 #text 15 Barcode 14 #text ------------------------------------------------------------------------ -- Edit this bug report at http://bugs.php.net/?id=50156&edit=1