Re: [PHP] URL restriction on XML file
That's because the character data is split on the borders of the entities, so for http://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1 characterData() will be called 5 times: http://feeds.example.com/?rid=318045f7e13e0b66 cat=48cba686fe041718 f=1 Solution is inlined below Roger Thomas wrote: I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the 'amp;' characters. My XML file looks like this: --- start of XML --- ?xml version=1.0 encoding=iso-8859-1? rss version=2.0 channel titleTest News .Net - Newspapers on the Net/title copyrightSmall News Network.com/copyright linkhttp://www.example.com//link descriptionContinuously updating Example News./description languageen-us/language pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate ttl30/ttl item titleGroup buys SunGard for US$10.4bil/title linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link descriptionNEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev.../description source url=http://biz.theexample.com/;The Paper/source /item item titleStrong quake hits Indonesia coast/title linkhttp://feeds.example.com/news/world/quake.html/link descriptiona quot;widely destructive tsunamiquot; and the quake was felt as far away as Malaysia./description source url=http://biz.theexample.com.net/;The Paper/source /item item titleFinal News/title linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link descriptionWe are going to expect something new this weekend .../description source url=http://biz.theexample.com/;The Paper/source /item /channel /rss --- end of XML --- For the sake of testing, my script only print out the url-link to those news above. I got these: f=1 http://feeds.example.com/news/world/quake.html cat=somecat The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters. I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise. -- start of script -- ? $file = test.xml; $currentTag = ; function startElement($parser, $name, $attrs) { global $currentTag; $currentTag = $name; } function endElement($parser, $name) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 0; case LINK: if ($start == 1) #print A HREF = \.$URL.\$TITLE/ABR; print $URL.BR; break; } $currentTag = ; // Reset also other variables: $URL = ''; $TITLE = ''; } function characterData($parser, $data) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 1; case TITLE: $TITLE = $data; // append instead: $TITLE .= $data; break; case LINK: $URL = $data; // append instead: $URL .= $data; // Warning: entities are decoded at this point, you will receive , not amp; break; } } $xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, startElement, endElement); xml_set_character_data_handler($xml_parser, characterData); if (!($fp = fopen($file, r))) { die(Cannot locate XML data file: $file); } while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf(XML error: %s at line %d, xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ? -- end of script -- TIA. Roger --- Sign Up for free Email at http://ureg.home.net.my/ --- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
Re: [PHP] URL restriction on XML file
Hi Marek, Thank you for the solution. -- Roger Quoting Marek Kilimajer [EMAIL PROTECTED]: That's because the character data is split on the borders of the entities, so for http://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1 characterData() will be called 5 times: http://feeds.example.com/?rid=318045f7e13e0b66 cat=48cba686fe041718 f=1 Solution is inlined below Roger Thomas wrote: I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the 'amp;' characters. My XML file looks like this: --- start of XML --- ?xml version=1.0 encoding=iso-8859-1? rss version=2.0 channel titleTest News .Net - Newspapers on the Net/title copyrightSmall News Network.com/copyright linkhttp://www.example.com//link descriptionContinuously updating Example News./description languageen-us/language pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate ttl30/ttl item titleGroup buys SunGard for US$10.4bil/title linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link descriptionNEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev.../description source url=http://biz.theexample.com/;The Paper/source /item item titleStrong quake hits Indonesia coast/title linkhttp://feeds.example.com/news/world/quake.html/link descriptiona quot;widely destructive tsunamiquot; and the quake was felt as far away as Malaysia./description source url=http://biz.theexample.com.net/;The Paper/source /item item titleFinal News/title linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link descriptionWe are going to expect something new this weekend .../description source url=http://biz.theexample.com/;The Paper/source /item /channel /rss --- end of XML --- For the sake of testing, my script only print out the url-link to those news above. I got these: f=1 http://feeds.example.com/news/world/quake.html cat=somecat The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters. I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise. -- start of script -- ? $file = test.xml; $currentTag = ; function startElement($parser, $name, $attrs) { global $currentTag; $currentTag = $name; } function endElement($parser, $name) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 0; case LINK: if ($start == 1) #print A HREF = \.$URL.\$TITLE/ABR; print $URL.BR; break; } $currentTag = ; // Reset also other variables: $URL = ''; $TITLE = ''; } function characterData($parser, $data) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 1; case TITLE: $TITLE = $data; // append instead: $TITLE .= $data; break; case LINK: $URL = $data; // append instead: $URL .= $data; // Warning: entities are decoded at this point, you will receive , not amp; break; } } $xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, startElement, endElement); xml_set_character_data_handler($xml_parser, characterData); if (!($fp = fopen($file, r))) { die(Cannot locate XML data file: $file); } while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf(XML error: %s at line %d, xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ? -- end of script -- TIA. Roger --- Sign Up for free Email at http://ureg.home.net.my/ --- --- Sign Up for free Email at http://ureg.home.net.my/ --- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php
[PHP] URL restriction on XML file
I have a short script to parse my XML file. The parsing produces no error and all output looks good EXCEPT url-links were truncated IF it contain the 'amp;' characters. My XML file looks like this: --- start of XML --- ?xml version=1.0 encoding=iso-8859-1? rss version=2.0 channel titleTest News .Net - Newspapers on the Net/title copyrightSmall News Network.com/copyright linkhttp://www.example.com//link descriptionContinuously updating Example News./description languageen-us/language pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate ttl30/ttl item titleGroup buys SunGard for US$10.4bil/title linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link descriptionNEW YORK: A group of seven private equity investment firms agreed yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth US$10.4bil plus debt, making it the biggest lev.../description source url=http://biz.theexample.com/;The Paper/source /item item titleStrong quake hits Indonesia coast/title linkhttp://feeds.example.com/news/world/quake.html/link descriptiona quot;widely destructive tsunamiquot; and the quake was felt as far away as Malaysia./description source url=http://biz.theexample.com.net/;The Paper/source /item item titleFinal News/title linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link descriptionWe are going to expect something new this weekend .../description source url=http://biz.theexample.com/;The Paper/source /item /channel /rss --- end of XML --- For the sake of testing, my script only print out the url-link to those news above. I got these: f=1 http://feeds.example.com/news/world/quake.html cat=somecat The output for line 1 is truncated to 'f=1' and the output of line 3 is truncated to 'cat=somecat'. ie, the script only took the last parameter of the url-link. The output for line 2 is correct since it has NO parameters. I am not sure what I have done wrong in my script. Is it bcos the RSS spec says that you cannot have parameters in URL ? Please advise. -- start of script -- ? $file = test.xml; $currentTag = ; function startElement($parser, $name, $attrs) { global $currentTag; $currentTag = $name; } function endElement($parser, $name) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 0; case LINK: if ($start == 1) #print A HREF = \.$URL.\$TITLE/ABR; print $URL.BR; break; } $currentTag = ; } function characterData($parser, $data) { global $currentTag, $TITLE, $URL, $start; switch ($currentTag) { case ITEM: $start = 1; case TITLE: $TITLE = $data; break; case LINK: $URL = $data; break; } } $xml_parser = xml_parser_create(); xml_set_element_handler($xml_parser, startElement, endElement); xml_set_character_data_handler($xml_parser, characterData); if (!($fp = fopen($file, r))) { die(Cannot locate XML data file: $file); } while ($data = fread($fp, 4096)) { if (!xml_parse($xml_parser, $data, feof($fp))) { die(sprintf(XML error: %s at line %d, xml_error_string(xml_get_error_code($xml_parser)), xml_get_current_line_number($xml_parser))); } } xml_parser_free($xml_parser); ? -- end of script -- TIA. Roger --- Sign Up for free Email at http://ureg.home.net.my/ --- -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php