Behrooz Shabani wrote:

> i want to know why we have a preg_replace in "common_xml_safe_str"
> function witch is in "lib/util.php:573"?

Because there are certain UTF-8 characters that are not legal XML.  For
example, control characters (below 0x20) other than tab, CR and LF.
Some parsers were having trouble trying to parse our feeds.

> it causes problem in some special characters like ZWNJ ( ‌ ). for
> example look @ http://identi.ca/api/statuses/show/2376006.xml , ZWNJ is
> converted to *

Yeah, it shouldn't do that with ZWNJ.  That's a bug.  The function is dumb
and being overly aggressive.  But some other formatting characters should
be stripped or replaced.  ZWNBSP, for example.  I chose to replace bad
chars in feeds with a * instead of stripping them out entirely because
that's what Twitter does (or did at the time).

> also i should mention that it's better we convert UTF-8 characters to
> their equal entity by using htmlentities($text, ENT_COMPAT, "UTF-8")

The function does that, it's just more careful.  If an input string
contains an invalid UTF-8 character htmlentities() will simply return
null, so I send the string through iconv() first to make sure that never
happens.

> but XMLWriter::text method will convert & character to & so i think
> best solution is:
>    1. convert text using htmlentities in common_xml_safe_str
>    2. replace XMLWriter::text with XMLWriter::writeCData in element
> method (lib/xmloutputter.php:133)

I don't think it's going to be that simple.  htmlentities() handles
pretty much everything necessary for creating HTML, but we've got
additional things to consider when making XML feeds.  What we'll
probably have to do is look specifically for characters like ZWNJ and
convert them to their named or numbered entities before sending them
through the sanitizing preg_replace().  Either that or come up with a
much better regexp.

> related ticket http://laconi.ca/trac/ticket/1141

Zach


_______________________________________________
Laconica-dev mailing list
[email protected]
http://mail.laconi.ca/mailman/listinfo/laconica-dev

Reply via email to