Hong Zhang wrote: > > I think it will be relative easy to deal with different compiler > and different operating system. However, ICU does contain some > C++ code. It will make life much harder, since current Parrot > only assume ANSI C (even a subset of it). > > Hong > > > This is rather concerning to me. As I understand it, one of > > the goals for > > parrot was to be able to have a usable subset of it which is totally > > platform-neutral (pure ANSI C). If we start to depend too much on > > another library which may not share that goal, we could have trouble > > with the parrot build process (which was supposed to be > > shipped as parrot bytecode)
I guess it's obvious that I hadn't looked at the target platforms for ICU as closely as I probably should have. C vs. C++ doesn't concern me, as it can always be rewritten, but lack of platforms like OS X does. Given that, I think an interim solution consisting of basic Unicode utilities we'll need, such as Unicode_isdigit(). This can be a simple wrapper around isdigit() for the moment, until I sort out which files we need from the Unicode database, and what support functions/data structures will be required. Given that we're dedicated to either UTF-16 or UTF-32 for internal string representation (undecided as of yet, and isn't affected by this), we can get away with creating a simple unicode.{c.h} suite of functions that looks like: Parrot_Int Parrot_isDigit(char* glyph); We can get away with the simplicity here because the character array should already be a valid UTF-{16,32) string, and responsibility for making sure there's a valid glyph at that offset can be safely offloaded to the caller, if not higher up the calling chain. Also, it should be in a separate file because, assuming the final internal representation matches that of the RE engine, the engine can use these utilities as well. Now, admittedly this is only slightly better-thought-out than the origina proposal, but I think it has a much better chance of being implemented, and in a fairly short amount of time. (He said, knowing full well that there's always one more problem) ASCII versions of the functions should be almost trivial, and can be left in there as a compile-time switch should we choose to do an ASCII-only or UTF-8-only version. In conclusion, this approach feels more workable, and the full UTF-16 implementation details can be rolled out incrementally, rather than a single mass migration. If this suggestion flies, I'll rewrite strings.pdd and post it in the next few days. -- Jeff <[EMAIL PROTECTED]>