andrei Wed Aug 2 21:51:43 2006 UTC Modified files: /php-src unicode-progress.txt Log: Notes after analyzing remainder of string.c. http://cvs.php.net/viewvc.cgi/php-src/unicode-progress.txt?r1=1.33&r2=1.34&diff_format=u Index: php-src/unicode-progress.txt diff -u php-src/unicode-progress.txt:1.33 php-src/unicode-progress.txt:1.34 --- php-src/unicode-progress.txt:1.33 Wed Aug 2 20:31:51 2006 +++ php-src/unicode-progress.txt Wed Aug 2 21:51:43 2006 @@ -9,10 +9,140 @@ ------- natsort(), natcasesort() Params API - Either port strnatcmp() to support Unicode or maybe use ICU's numeric collation + Either port strnatcmp() to support Unicode or maybe use ICU's + numeric collation. Update: can't seem to get the right collation + parameters to duplicate strnatcmp() functionality. Conclusion: port + to support Unicode. string.c -------- + addcslashes() + Params API. Figure out how to escape characters > 255. + + basename() + Create php_u_basename() without mbstring stuff + + chunk_split() + Params API, Unicode upgrades. Split on codepoint level. + + count_chars() + Params API. Do we really want to go through the whole Unicode table? + May need to use hashtable instead of array. + + dirname() + Create php_u_dirname() + + hebrev(), hebrevc() + Figure out if this is something we can use ICU for, internally. + + localeconv() + Params API, update to use *_rt_* API. + + money_format() + Just IS_UNICODE support with *_rt_* API. + + nl_langinfo() + Params API, otherwise leave alone + + nl2br() + Params API, IS_UNICODE support + + pathinfo() + Simple upgrade, based on php_u_basename/php_u_dirname + + parse_str() + Params API. How do we deal with encoding of the data? + + quotemeta() + Params API, IS_UNICODE upgrade + + similar_text() + Params API + + sscanf() + Params API. Rest - no idea yet. + + str_replace() + Params API, IS_UNICODE upgrade + + stri_replace() + Params API, IS_UNICODE upgrade. Case-folding should be handled + similar to stristr(). + + str_rot13() + Params API, IS_UNICODE support + + str_shuffle() + Params API, IS_UNICODE support + + str_split() + IS_UNICODE support, split on codepoint level. + + str_word_count() + Params API, IS_UNICODE support, using u_isalpha(), etc. + + strcoll() + Params API, upgrade to use Collator if TT == IS_UNICODE, test + + stripcslashes() + Params API. Depends on how addcslashes() is implemented. + + stristr() + This is the problematic one. There are a few approaches: + + 1. Case-fold both need and haystack and then do simple search. + + 2. Look at the implementation behind functions like + u_strcasecmp() and try to adapt it to a string search. The + implementation case-folds both strings incrementally. For + a search, one would want to case-fold the pattern beforehand, + but not the text in which you are searching. + + 3. Take the first character in the pattern and get the set of + all characters that have the same case folding (see the + UnicodeSet/USet API). Then search in the string for the + occurrence of any one of the set items (which include + strings!). Then do a case-insensitive comparison, allowing + a match that does not end with the end of the text. + + The problematic cases are of course those Ã->ss and similar. + + All other approaches bite. + + stripos() + Review. Probably needs the same approach as stristr(). + + strnatcmp(), strnatcasecmp() + Params API. The rest depends on porting of strnatcmp.c + + strripos() + Probably needs the same approach as stristr(). + + strrchr() + Needs update so that it doesn't try to find half of a surrogate + pair. + + strrev() + Params API + + strtoupper(), strtolower(), strtotitle() + Params API + + strtr() + Check on Derick's progress. + + substr_compare() + IS_UNICODE support, case folding based on the same algorithm as + stristr(). + + substr_replace() + Params API, test + + wordwrap() + Upgrade, do wordwrapping on glyph level, maybe use additional + whitespace chars instead of just space. + + Completed @@ -157,4 +287,4 @@ zend_thread_id() zend_version() -vim: set et ts=4 sts: +vim: set et ts=4 sts=4:
-- PHP CVS Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php