ID: 15092
Updated by: chregu
Reported By: [EMAIL PROTECTED]
Old Status: Open
Status: Feedback
Bug Type: XML related
Operating System: Win 2k (all I gues)
PHP Version: 4.1.0
New Comment:

This doesn't work, because the default entities are only:
<!ENTITY lt     "&#38;#60;"> 
<!ENTITY gt     "&#62;"> 
<!ENTITY amp    "&#38;#38;"> 
<!ENTITY apos   "&#39;"> 
<!ENTITY quot   "&#34;"> 

For the latin1-entities to work, you have to set an external entity to

http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent
(or some local file with that content)
and then set up a xml_set_external_entity_ref_handler().
See the details about that in the manual.

(set to feedback, 'cause I didn't really test it, if someone can verify
that, we can close it.)



Previous Comments:
------------------------------------------------------------------------

[2002-01-22 09:49:09] [EMAIL PROTECTED]

After some more testes I found that the only literal entities that work
are:  &amp;, &lt; &gt; and &quot;. 
*ALL* others (like &nbsp; &copy; a.s.o.) cause an
XML_ERROR_UNDEFINED_ENTITY error.

The best work around to this problem, is to tranlate the entities 
found in the XML source to theire numeric equivalent E.g. &nbsp; to
&#160; / &copy; to &#169; a.s.o.
Following function will do the job:

  /**
  * Translate literal entities to their numeric equivalents and vice
versa.
  *
  * PHP's XML parser (in V 4.1.0) has problems with entities! The only
one's that are recognized
  * are &amp;, &lt; &gt; and &quot;. *ALL* others (like &nbsp; &copy;
a.s.o.) cause an 
  * XML_ERROR_UNDEFINED_ENTITY error. I reported this as bug at
http://bugs.php.net/bug.php?id=15092
  * The work around is to translate the entities found in the XML
source to their numeric equivalent
  * E.g. &nbsp; to &#160; / &copy; to &#169; a.s.o.
  * 
  * NOTE: Entities &amp;, &lt; &gt; and &quot; are left 'as is'
  * 
  * @author Sam Blum [EMAIL PROTECTED]
  * @param string $xmlSource The XML string
  * @param bool   $reverse (default=FALSE) Translate numeric entities
to literal entities.
  * @return The XML string with translatet entities.
  */
  function _translateLiteral2NumericEntities($xmlSource, $reverse =
FALSE) {
    static $literal2NumericEntity;
    
    if (empty($literal2NumericEntity)) {
      $transTbl = get_html_translation_table(HTML_ENTITIES);
      foreach ($transTbl as $char => $entity) {
        if (strpos('&"<>', $char) !== FALSE) continue;
        $literal2NumericEntity[$entity] = '&#'.ord($char).';';
      }
    }
    if ($reverse) {
      return strtr($xmlSource, array_flip($literal2NumericEntity));
    } else {
      return strtr($xmlSource, $literal2NumericEntity);
    }
  }





------------------------------------------------------------------------

[2002-01-17 21:03:07] [EMAIL PROTECTED]

PHP XML-parser has problems with the full iso8859-1 char set when
trying to use entity names. E.g. the parser will fail with "undefined
entity" if the XML data you parse contains &nbsp; or &copy; a.s.o.
(there many more).

Some entities do work, like &lt; &gt; &amp; as well as the alternative
notation unsing the ISO-code number: like non-breaking space  === 
&#160;

For a full iso8859-1 list and it's entities see:
http://www.ramsch.org/martin/uni/fmi-hp/iso8859-1.html

Here's the test script you can use to check the error :
<?php
$xmlString[0] = "<AAA>&#160;</AAA>";
$xmlString[1] = "<AAA>&nbsp;</AAA>";

  function startElement($xml_parser, $name, $attrs) {}
  function endElement($xml_parser, $name) {}
  function characterData($xml_parser, $text) {echo "Handling character
data: '".htmlspecialchars($text)."'<br>";}
  
  $xml_parser = xml_parser_create();
  xml_set_element_handler($xml_parser, "startElement", "endElement");
  xml_set_character_data_handler($xml_parser,  "characterData");
  
  // Parse the XML data.
  if (!xml_parse($xml_parser, $xmlString[1], TRUE)) {
   echo "XML error in given {$source} on line ".
xml_get_current_line_number($xml_parser) . 
        '  column ' . xml_get_current_column_number($xml_parser) .
        '. Reason:' .
xml_error_string(xml_get_error_code($xml_parser));
  }
?>



------------------------------------------------------------------------



Edit this bug report at http://bugs.php.net/?id=15092&edit=1


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
To contact the list administrators, e-mail: [EMAIL PROTECTED]

Reply via email to