ID: 15092
User updated by: [EMAIL PROTECTED]
Reported By: [EMAIL PROTECTED]
Status: Open
Bug Type: XML related
Operating System: Win 2k (all I gues)
PHP Version: 4.1.0
New Comment:

After some more testes I found that the only literal entities that work
are:  &, < > and ". 
*ALL* others (like   © a.s.o.) cause an
XML_ERROR_UNDEFINED_ENTITY error.

The best work around to this problem, is to tranlate the entities 
found in the XML source to theire numeric equivalent E.g.   to
  / © to © a.s.o.
Following function will do the job:

  /**
  * Translate literal entities to their numeric equivalents and vice
versa.
  *
  * PHP's XML parser (in V 4.1.0) has problems with entities! The only
one's that are recognized
  * are &, < > and ". *ALL* others (like   ©
a.s.o.) cause an 
  * XML_ERROR_UNDEFINED_ENTITY error. I reported this as bug at
http://bugs.php.net/bug.php?id=15092
  * The work around is to translate the entities found in the XML
source to their numeric equivalent
  * E.g.   to   / © to © a.s.o.
  * 
  * NOTE: Entities &, < > and " are left 'as is'
  * 
  * @author Sam Blum [EMAIL PROTECTED]
  * @param string $xmlSource The XML string
  * @param bool   $reverse (default=FALSE) Translate numeric entities
to literal entities.
  * @return The XML string with translatet entities.
  */
  function _translateLiteral2NumericEntities($xmlSource, $reverse =
FALSE) {
    static $literal2NumericEntity;
    
    if (empty($literal2NumericEntity)) {
      $transTbl = get_html_translation_table(HTML_ENTITIES);
      foreach ($transTbl as $char => $entity) {
        if (strpos('&"<>', $char) !== FALSE) continue;
        $literal2NumericEntity[$entity] = '&#'.ord($char).';';
      }
    }
    if ($reverse) {
      return strtr($xmlSource, array_flip($literal2NumericEntity));
    } else {
      return strtr($xmlSource, $literal2NumericEntity);
    }
  }






Previous Comments:
------------------------------------------------------------------------

[2002-01-17 21:03:07] [EMAIL PROTECTED]

PHP XML-parser has problems with the full iso8859-1 char set when
trying to use entity names. E.g. the parser will fail with "undefined
entity" if the XML data you parse contains &nbsp; or &copy; a.s.o.
(there many more).

Some entities do work, like &lt; &gt; &amp; as well as the alternative
notation unsing the ISO-code number: like non-breaking space  === 
&#160;

For a full iso8859-1 list and it's entities see:
http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html

Here's the test script you can use to check the error :
<?php
$xmlString[0] = "<AAA>&#160;</AAA>";
$xmlString[1] = "<AAA>&nbsp;</AAA>";

  function startElement($xml_parser, $name, $attrs) {}
  function endElement($xml_parser, $name) {}
  function characterData($xml_parser, $text) {echo "Handling character
data: '".htmlspecialchars($text)."'<br>";}
  
  $xml_parser = xml_parser_create();
  xml_set_element_handler($xml_parser, "startElement", "endElement");
  xml_set_character_data_handler($xml_parser,  "characterData");
  
  // Parse the XML data.
  if (!xml_parse($xml_parser, $xmlString[1], TRUE)) {
   echo "XML error in given {$source} on line ".
xml_get_current_line_number($xml_parser) . 
        '  column ' . xml_get_current_column_number($xml_parser) .
        '. Reason:' .
xml_error_string(xml_get_error_code($xml_parser));
  }
?>



------------------------------------------------------------------------



Edit this bug report at http://bugs.php.net/?id=15092&edit=1


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to