On Wed, 2007-03-28 at 14:39 -0700, Erick Tryzelaar wrote: > I've been reading up on unicode, and I don't think our strings are > unicode safe. Even though we can embed unicode characters in our > strings, it looks like stl::string (which we use for our strings) > doesn't like variable length character code points. So, functions like > find, trim, regmatch, and etc won't do anything sensible with chars > wider than 1 char (as far as I know).
Ops like trim/strip will simply miss some whitespaces, they won't do the wrong thing provided they treat high bit set chars as non-space. > How can we deal with this? We don't, it needs user libraries, some of which are extremely expensive because they need tables of 30,000 characters. > A lot > of other new languages are moving towards using utf-8 and utf-16 for > their strings. Should we follow them? We'd have to recode everything > though. We could use something like this: > > http://utfcpp.sourceforge.net/ Licence not visible but looks like BOOST licence which is cool. > But it might be better to just write it in felix. Probably better in C++ because then it is pre-compiled. Felix code gets compiled every time. Also, considerable expertise is required to maintain a library, why not let the project team working on it do that? We're short of resources. But i have no idea if that library is useful: our RTL already has a UTF-8 codec I wrote. -- John Skaller <skaller at users dot sf dot net> Felix, successor to C++: http://felix.sf.net ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Felix-language mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/felix-language
