Jeff Clites <[EMAIL PROTECTED]> wrote: > On Apr 10, 2004, at 6:13 AM, Leopold Toetsch wrote:
>> 2) String PBC layout. The internal string type has changed. This >> currently breaks native_pbc tests (that have strings) as well as some >> "parrot xx.pbc" tests related to strings. > These are working for me (which tests are failing for you?)-- $ make testr Failed Test Stat Wstat Total Fail Failed List of Failed ------------------------------------------------------------------------------- t/pmc/perlstring.t 3 768 33 3 9.09% 1-3 53 subtests skipped. Failed 1/89 test scripts, 98.88% okay. 3/1432 subtests failed, 99.79% okay. I didn't look further yet. > ... Of > course, since the internals changed the pbc layout changed also, so the > native_pbc test files need to be regenerated on the various > platforms No problem. > But, it's correct that there's no backward-compatibility code in place, > to allow reading old pbc files. Do we want to have that sort of thing > at this stage? No, not needed. >> The layout seems to depend somehow on the supported Unicode levels (or >> not). So before fixing the PBC issues, I'd just have a statememt: >> parrot_string_t looks such and such or of course as is now. > Could you rephrase? I'm not understanding what you are saying. Well, the question is: Is s->representation enough to describe our strings? > ... The only other wrinkle is that for cases where > s->representation is 2 or 4, we need to endianness correct when we use > the bytecode. Yep. > This is probably a separate discussion, but we _could_ decide instead > to represent strings in pbc files always in UTF-8. Advantage: Simpler, > no endianness correction needed, probably durable to further changes in > string internals, could isolate s->representation awareness to string.c > and string_primitives.c. Disadvantages: De-serializing a string from a > pbc file will always involve a copy, and could result in larger files > in some cases. I could argue it either way--one's cleaner, the other is > probably faster. Strings from PBC constants can't be used directly anyway. We munmap() or free() the image after loading, so string constants are always copied. I think using UTF-8 would be best. >> There is of course still the question: Should we really have ICU in >> the tree. This needs tracking updates and patching (again) to make it >> build and so on. > One consideration is that I may need to patch ICU a few places--there's > at least one API which they only expose in C++, so I need to wrap it in > C and it's cleaner to do that as a patch to ICU rather than having C++ > code in the core of parrot. Can we get the ICU maintainers to integrated that interface? > JEff leo