Re: [PHP] URL restriction on XML file

2005-03-30 Thread Marek Kilimajer
That's because the character data is split on the borders of the 
entities, so for

http://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1
characterData() will be called 5 times:
http://feeds.example.com/?rid=318045f7e13e0b66

cat=48cba686fe041718

f=1
Solution is inlined below
Roger Thomas wrote:
I have a short script to parse my XML file. The parsing produces no error and all 
output looks good EXCEPT url-links were truncated IF it contain the 'amp;' 
characters.
My XML file looks like this:
--- start of XML ---
?xml version=1.0 encoding=iso-8859-1?
rss version=2.0
channel
titleTest News .Net - Newspapers on the Net/title
copyrightSmall News Network.com/copyright
linkhttp://www.example.com//link
descriptionContinuously updating Example News./description
languageen-us/language
pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate
lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate
ttl30/ttl
item
titleGroup buys SunGard for US$10.4bil/title
linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link
descriptionNEW YORK: A group of seven private equity investment firms agreed 
yesterday to buy financial technology company SunGard Data Systems Inc in a deal worth 
US$10.4bil plus debt, making it the biggest lev.../description
source url=http://biz.theexample.com/;The Paper/source
/item
item
titleStrong quake hits Indonesia coast/title
linkhttp://feeds.example.com/news/world/quake.html/link
descriptiona quot;widely destructive tsunamiquot; and the quake was felt as far 
away as Malaysia./description
source url=http://biz.theexample.com.net/;The Paper/source
/item
item
titleFinal News/title
linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link
descriptionWe are going to expect something new this weekend .../description
source url=http://biz.theexample.com/;The Paper/source
/item
/channel
/rss
--- end of XML ---
For the sake of testing, my script only print out the url-link to those news 
above. I got these:
f=1
http://feeds.example.com/news/world/quake.html
cat=somecat
The output for line 1 is truncated to 'f=1' and the output of line 3 is 
truncated to 'cat=somecat'. ie, the script only took the last parameter of the 
url-link. The output for line 2 is correct since it has NO parameters.
I am not sure what I have done wrong in my script. Is it bcos the RSS spec says 
that you cannot have parameters in URL ? Please advise.
-- start of script --
?
$file = test.xml;
$currentTag = ;
function startElement($parser, $name, $attrs) {
global $currentTag;
$currentTag = $name;
}
function endElement($parser, $name) {
global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) {
case ITEM:
$start = 0;
case LINK:
 if ($start == 1)
 #print A HREF = \.$URL.\$TITLE/ABR;
 print $URL.BR;
 break;
}
   $currentTag = ;
// Reset also other variables:
   $URL = '';
   $TITLE = '';
}
function characterData($parser, $data) {
global $currentTag, $TITLE, $URL, $start;
switch ($currentTag) {
case ITEM:
$start = 1;
case TITLE:
   $TITLE = $data;
// append instead:
$TITLE .= $data;
   break;
case LINK:
$URL = $data;
// append instead:
$URL .= $data;
// Warning: entities are decoded at this point, you will receive , not 
amp;

break;
}
}
$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, startElement, endElement);
xml_set_character_data_handler($xml_parser, characterData);
if (!($fp = fopen($file, r))) {
die(Cannot locate XML data file: $file);
}
while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf(XML error: %s at line %d,
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}
xml_parser_free($xml_parser);
?
-- end of script --
TIA.
Roger
---
Sign Up for free Email at http://ureg.home.net.my/
---
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


Re: [PHP] URL restriction on XML file

2005-03-30 Thread Roger Thomas
Hi Marek,
Thank you for the solution.

--
Roger

Quoting Marek Kilimajer [EMAIL PROTECTED]:

 That's because the character data is split on the borders of the 
 entities, so for
 
 http://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1
 
 characterData() will be called 5 times:
 
 http://feeds.example.com/?rid=318045f7e13e0b66
 
 cat=48cba686fe041718
 
 f=1
 
 Solution is inlined below
 
 Roger Thomas wrote:
  I have a short script to parse my XML file. The parsing produces no error
 and all output looks good EXCEPT url-links were truncated IF it contain the
 'amp;' characters.
  
  My XML file looks like this:
  --- start of XML ---
  ?xml version=1.0 encoding=iso-8859-1?
  rss version=2.0
  channel
  titleTest News .Net - Newspapers on the Net/title
  copyrightSmall News Network.com/copyright
  linkhttp://www.example.com//link
  descriptionContinuously updating Example News./description
  languageen-us/language
  pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate
  lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate
  ttl30/ttl
  item
  titleGroup buys SunGard for US$10.4bil/title
 
 linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link
  descriptionNEW YORK: A group of seven private equity investment firms
 agreed yesterday to buy financial technology company SunGard Data Systems Inc
 in a deal worth US$10.4bil plus debt, making it the biggest
 lev.../description
  source url=http://biz.theexample.com/;The Paper/source
  /item
  item
  titleStrong quake hits Indonesia coast/title
  linkhttp://feeds.example.com/news/world/quake.html/link
  descriptiona quot;widely destructive tsunamiquot; and the quake was
 felt as far away as Malaysia./description
  source url=http://biz.theexample.com.net/;The Paper/source
  /item
  item
  titleFinal News/title
  linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link
  descriptionWe are going to expect something new this weekend
 .../description
  source url=http://biz.theexample.com/;The Paper/source
  /item
  /channel
  /rss
  --- end of XML ---
  
  For the sake of testing, my script only print out the url-link to those
 news above. I got these:
  f=1
  http://feeds.example.com/news/world/quake.html
  cat=somecat
  
  The output for line 1 is truncated to 'f=1' and the output of line 3 is
 truncated to 'cat=somecat'. ie, the script only took the last parameter of
 the url-link. The output for line 2 is correct since it has NO parameters.
  
  I am not sure what I have done wrong in my script. Is it bcos the RSS spec
 says that you cannot have parameters in URL ? Please advise.
  
  -- start of script --
  ?
  $file = test.xml;
  $currentTag = ;
  
  function startElement($parser, $name, $attrs) {
  global $currentTag;
  $currentTag = $name;
  }
  
  function endElement($parser, $name) {
  global $currentTag, $TITLE, $URL, $start;
  
  switch ($currentTag) {
  case ITEM:
  $start = 0;
  case LINK:
   if ($start == 1)
   #print A HREF = \.$URL.\$TITLE/ABR;
   print $URL.BR;
   break;
  }
 $currentTag = ;
 
 // Reset also other variables:
 $URL = '';
 $TITLE = '';
 
  }
  
  function characterData($parser, $data) {
  global $currentTag, $TITLE, $URL, $start;
  
  switch ($currentTag) {
  case ITEM:
  $start = 1;
  case TITLE:
 $TITLE = $data;
 
 // append instead:
 $TITLE .= $data;
 
 break;
  case LINK:
  $URL = $data;
 
 // append instead:
 $URL .= $data;
 
 // Warning: entities are decoded at this point, you will receive , not 
 amp;
 
  break;
  }
  }
  
  $xml_parser = xml_parser_create();
  xml_set_element_handler($xml_parser, startElement, endElement);
  xml_set_character_data_handler($xml_parser, characterData);
  
  if (!($fp = fopen($file, r))) {
  die(Cannot locate XML data file: $file);
  }
  
  while ($data = fread($fp, 4096)) {
  if (!xml_parse($xml_parser, $data, feof($fp))) {
  die(sprintf(XML error: %s at line %d,
  xml_error_string(xml_get_error_code($xml_parser)),
  xml_get_current_line_number($xml_parser)));
  }
  }
  
  xml_parser_free($xml_parser);
  
  ?
  -- end of script --
  
  TIA.
  Roger
  
  
  ---
  Sign Up for free Email at http://ureg.home.net.my/
  ---
  
 
 





---
Sign Up for free Email at http://ureg.home.net.my/
---

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[PHP] URL restriction on XML file

2005-03-29 Thread Roger Thomas
I have a short script to parse my XML file. The parsing produces no error and 
all output looks good EXCEPT url-links were truncated IF it contain the 'amp;' 
characters.

My XML file looks like this:
--- start of XML ---
?xml version=1.0 encoding=iso-8859-1?
rss version=2.0
channel
titleTest News .Net - Newspapers on the Net/title
copyrightSmall News Network.com/copyright
linkhttp://www.example.com//link
descriptionContinuously updating Example News./description
languageen-us/language
pubDateTue, 29 Mar 2005 18:01:01 -0600/pubDate
lastBuildDateTue, 29 Mar 2005 18:01:01 -0600/lastBuildDate
ttl30/ttl
item
titleGroup buys SunGard for US$10.4bil/title
linkhttp://feeds.example.com/?rid=318045f7e13e0b66amp;cat=48cba686fe041718amp;f=1/link
descriptionNEW YORK: A group of seven private equity investment firms agreed 
yesterday to buy financial technology company SunGard Data Systems Inc in a 
deal worth US$10.4bil plus debt, making it the biggest lev.../description
source url=http://biz.theexample.com/;The Paper/source
/item
item
titleStrong quake hits Indonesia coast/title
linkhttp://feeds.example.com/news/world/quake.html/link
descriptiona quot;widely destructive tsunamiquot; and the quake was felt as 
far away as Malaysia./description
source url=http://biz.theexample.com.net/;The Paper/source
/item
item
titleFinal News/title
linkhttp://feeds.example.com/?id=abcdefamp;cat=somecat/link
descriptionWe are going to expect something new this weekend .../description
source url=http://biz.theexample.com/;The Paper/source
/item
/channel
/rss
--- end of XML ---

For the sake of testing, my script only print out the url-link to those news 
above. I got these:
f=1
http://feeds.example.com/news/world/quake.html
cat=somecat

The output for line 1 is truncated to 'f=1' and the output of line 3 is 
truncated to 'cat=somecat'. ie, the script only took the last parameter of the 
url-link. The output for line 2 is correct since it has NO parameters.

I am not sure what I have done wrong in my script. Is it bcos the RSS spec says 
that you cannot have parameters in URL ? Please advise.

-- start of script --
?
$file = test.xml;
$currentTag = ;

function startElement($parser, $name, $attrs) {
global $currentTag;
$currentTag = $name;
}

function endElement($parser, $name) {
global $currentTag, $TITLE, $URL, $start;

switch ($currentTag) {
case ITEM:
$start = 0;
case LINK:
 if ($start == 1)
 #print A HREF = \.$URL.\$TITLE/ABR;
 print $URL.BR;
 break;
}
   $currentTag = ;
}

function characterData($parser, $data) {
global $currentTag, $TITLE, $URL, $start;

switch ($currentTag) {
case ITEM:
$start = 1;
case TITLE:
   $TITLE = $data;
   break;
case LINK:
$URL = $data;
break;
}
}

$xml_parser = xml_parser_create();
xml_set_element_handler($xml_parser, startElement, endElement);
xml_set_character_data_handler($xml_parser, characterData);

if (!($fp = fopen($file, r))) {
die(Cannot locate XML data file: $file);
}

while ($data = fread($fp, 4096)) {
if (!xml_parse($xml_parser, $data, feof($fp))) {
die(sprintf(XML error: %s at line %d,
xml_error_string(xml_get_error_code($xml_parser)),
xml_get_current_line_number($xml_parser)));
}
}

xml_parser_free($xml_parser);

?
-- end of script --

TIA.
Roger


---
Sign Up for free Email at http://ureg.home.net.my/
---

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php