Behrooz Shabani wrote:
> i mentioned it because actually we don't use htmlentities now! here is
> function:
>
> function common_xml_safe_str($str){ $xmlStr =
> htmlentities(iconv('UTF-8', 'UTF-8//IGNORE', $str), ENT_NOQUOTES,
> 'UTF-8');
>
> // Replace control, formatting, and surrogate characters with '*', ala
> Twitter return preg_replace('/[\p{Cc}\p{Cf}\p{Cs}]/u', '*', $str); }
>
> $str is passing to preg_replace witch is not converted by
> htmlentities! now if we pass $xmlStr to preg_replace we have problem
> with XMLWriter::text
Yep, I see it. That's a pretty dumb mistake. :(
> because it will replace entitries. for example after passing $xmlStr
> ZWNJ will convert to &XWNJ; (like what we expect) but when result
> passes to MLWriter::text our character will convert to ‌
> (& => &).
> ----------------------------------------------------------------------
> actually i think a mistake happened here (passing $str to preg_replace
> not $xmlStr) and after fixing it we need to use XMLWriter::writeCData
> instead of XMLWriter::text because converting & to &
Wont that add zillions of CDATA sections ("<![CDATA[...") ? I don't
think we want that. If the text is good to go after going through
htmlentities(), then maybe we need to use XMLWriter::writeRaw() to avoid
double encoding.
BUT now I'm not sure we need to use htmlentities() at all. I think we
can just kill any bad characters before they get to XMLWriter::text().
Maybe make the regexp only take out control codes. (I can't remember now
whether we had problems with other UTF-8 chars or not).
i.e.:
function common_xml_safe_str($str)
{
// neutralize control codes
// see: http://www.w3.org/International/questions/qa-controls
return preg_replace('/[\p{Cc}]/u', '*', $str);
// or maybe
// return preg_replace ('/[\\x00-\\x08\\x0B\\x0C\\x0E-\\x1F]/S',
'*', $str);
}
Can you help test? We'd need to go back through the old tickets about
feed generation problems due to bad chars in the XML, as well as test
out ZWNJ (for Farsi?), etc.
Zach
_______________________________________________
Laconica-dev mailing list
[email protected]
http://mail.laconi.ca/mailman/listinfo/laconica-dev