Re: [PATCH] Get string.c to compile in MS VC++
On Apr 11, 2004, at 7:05 PM, Jeff Clites wrote: On Apr 11, 2004, at 4:52 PM, Jonathan Worthington wrote: On Apr 9, 2004, at 3:26 PM, Jonathan Worthington wrote: I'm having a crack at getting the ICU changes building on Win32. -- Failed Test Status Wstat Total Fail Failed List of Failed -- -- t\op\integer.t 1 256391 2.56% 4 t\op\number.t 32 819238 32 84.21% 1-24, 27, 29, 31, 34-38 -- Yep, the pattern I think I see is that tests which involve floats (except for 0.0) are failing, which is what would happen if ICU can't find it's data. (Because, it either ends up thinking that nothing is a digit character, or that they all have digit value zero.) I'll have to add a quick test inside string_init to detect this case, so that we can blow up instead of misbehaving. ... See if you have a .dat file (or a bunch of individual files) in blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then that's what's going on. Right now, I have that path hard-coded--of course I need to pull that out into a config I've submitted a patch, [perl #28473], which makes that location configurable (still defaulting to blib/lib/icu/2.6.1), and which will cause parrot to complain if it can't find the necessary data files. That should also cause the build to fail near the end if something is wrong, when it tries to create library/config.fpmc. That doesn't fix your case, but at least it should make such failures clearer, and easier to fix. JEff
Re: [perl #28393] [PATCH] Tcl pmcs
Will Coleda [EMAIL PROTECTED] wrote: dyld: ./parrot Undefined symbols: _Parrot_tclobject_morph _Parrot_tclobject_set_pmc Ah yes. That's ugly. So here we go: 0) The PMCs were pre-ICU. I've adapted them. Should I check it in or send it back to you? 1) dynamic PMCs need a dynpmc flag on the class definition line. This causes the PMC compiler to add additional code for dynamic loading: pmclass TclString extends tclobject dynpmc { 2) Nasty dependencies. I got around that by changing the Makefile like so: tclobject$(SO) : tclobject.c $(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \ -I../include -I../classes \ -L../blib/lib -lparrot $ $(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@|' cd ../runtime/parrot/dynext; ln -sf tclobject.so libtclobject.so %$(SO) : %.c $(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \ -I../include -I../classes \ -L../blib/lib -lparrot -L../runtime/parrot/dynext \ -ltclobject $ $(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@ (and disabling non-tcl shared classes for now) That is for all tcl* but tclobject libtclobject.so is added as a library. This might also need the LD_LIBRARY_PATH to contain Fruntime/parrot/dynext. I don't know how we do that platform independend. It might be simpler to have a utility that copies all tcl*.c together into one and chains the Parrot_lib_type_load() functions. So only one shared lib would be loaded that registers all classes. Should be a rather simple script. E.g. merge-classes -o tcl-all.c tclobject.c tclstring.c ... Then compile and loadlib only the tcl-all. This would need a Parrot_lib_tcl-all_load() function that calls _load() for all contained PMCs. 3) For now $ cat tcl.pasm loadlib P10, tclobject print ok 1\n loadlib P11, tclstring print ok 2\n new P1, .TclString set P1, ok 3\n set S1, P1 print S1 end $ parrot tcl.pasm ok 1 ok 2 ok 3 leo
Re: [perl #28182] thr-primes.imc segfaults
Will Coleda [EMAIL PROTECTED] wrote: bash-2.05a$ ./parrot examples/assembly/thr-primes.imc SNIP Found prime 401 Found prime 409 Found prime 419 Found prime 421 Segmentation fault Fixed. The interpreter-lo_var_ptr for threads wasn't correctly initialized. So trace_mem_block went into other threads stacks. leo
Re: [PATCH] Get string.c to compile in MS VC++
snip See if you have a .dat file (or a bunch of individual files) in blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then that's what's going on. Right now, I have that path hard-coded--of course I need to pull that out into a config--but it probably means that either the data files aren't getting created, or just that they are in a different location. Glancing at your icu.pl patch, it may just be missing moving the .dat file (or, maybe creating it too). They are missing, and in fact weren't even being created. Turns out I'd somehow managed to miss a line out in the makefile, namely the one that made the data. D'oh. I added it in, but this gave rise to new problems. The .mak file (in icu/source/data) was missing some paths so some of the tools were not being found. That was easily fixed, and now it gets quite a way through making the data, until it hits a point where it starts giving errors like this:- -- Making Locale Resource Bundle files ..\..\locales\root.txt:39: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\root.txt:37: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\root.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ar.txt:16: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ar.txt:14: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\ar.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ca.txt:12: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ca.txt:10: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR -- Any ideas? [**] There are 4 or so options for how to package the ICU data--into a dynamic library, into an archive library, as a bunch of separate files, or as a single file. Any should work, but the last two are faster to build, and a bit more flexible. Looking at the makefile, I believe it would package the data in each of these ways, then we'd just copy the .dat into the appropriate place. Once I've got it to actually work, I'll see if I can get it down to generating the .dat only. Jonathan
Re: Build problems in i386 linux
FWIW, taking a look at the Debian-specific ICU patches for their testing branch might be worthwhile. They've patched the thing up to get it to build, and since it's the only system I have locally that *doesn't* build ICU, I suspect there's something either Debian-ish going wrong, or more Linux in general. I'll try and get patches to use the system ICU install if there is one done today. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #28393] [PATCH] Tcl pmcs
Will Coleda [EMAIL PROTECTED] wrote: Attached, find a .tgz that can be exploded in the top level of parrot which creates the abstract pmc tclobject, with children TclString, TclInt, TclFloat, and container pmcs TclList (an array) and TclArray (a hash). Applied now, slightly modified for dynclasses and with string-related adaptions due to API changes. leo
ICU bug some places
According to the IBM website, ICU triggers a GCC bug in the gcc 3.x series--you can't compile with the -O2 optimization setting. -O3 works. I'll put a patch in. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
ICU fixed
I just checked in a patch for the problems building the data files. If the folks having problems could try it out that'd be great. (Works for me locally, but...) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: ICU fixed
On Monday 12 April 2004 17:01, Dan Sugalski wrote: I just checked in a patch for the problems building the data files. If the folks having problems could try it out that'd be great. (Works for me locally, but...) Seems to work, I get now an all test successfull on Linux, for details see www.luusa.org/~marcus/parrottest Have fun, Marcus -- :: Marcus Thiesen :: www.thiesen.org :: ICQ#108989768 :: 0x754675F2 :: Let's say the docs present a simplified view of reality... :-) Larry Wall
Plans for string processing
Okay, I've not dug through all the fallout from the ICU checkin, but I can see there's an awful lot. I'll dig through that in a bit, but... Here's the plan. We've gone over it in the past, but I'm not sure everything's been gathered together, so it's time to do so. Some declarations: 1) Parrot will *not* require Unicode. Period. Ever. (Well, upon release, at least) We will strongly recommend it, however, and use it if we have it 2) Parrot *will* support multiple encodings (the bytes-code points stuff), character sets (code points-meaning of a sort), and language-specific overrides of character set behaviour. 3) All string data can be dealt with as either a series of bytes, code points, or characters. (Characters are potentially multiple code points--basically combining character stuff from those standards that do so) 4) We will *not* use ICU for core functions. (string to number or number to string conversions, for example) 5) Parrot will autoconvert strings as needed. If a string can't be converted, parrot will throw an exception. This goes for language, character set, or encoding. 6) There *may* be an overriding set of rules for throwing conversion exceptions. (They may be supressed on lossy conversions, or required for any conversions) 7) There *may* be an overriding language used for language-specific operations (case folding or sorting). I know ICU's got all sorts of nifty features, but bluntly we're not going to use most of them. The original split of encoding, character set, and language is one that I want to keep. I know we've lost a good chunk of that with the latest ICU patch, but that's only temporary and the breakage is worth it to get Unicode actually in use. I expect I need to step up to the plate and get an alternate encoding and charset in, so I'll probably take a shot at JIS X 0208:1997 or CNS11643-1992. (Or whatever the current version of those is) As far as Parrot is concerned, a string is a series of bytes which may, via its encoding, be turned into a series of 32 bit integer code points. Those 32-bit integer code points can be turned, via its character set, into a series of characters where each character is one or more code points. Those characters may be classified and transformed based on the language of the string. The responsibilities of the three layers are: Encoding *) Transform stream of bytes to and from a set of 32-bit integers *) Manages byte buffer (so buffer positioning and manipulation by code point offset is handled here) Character set = *) Provides default manipulation and comparison behaviour (sorting and case mangling) *) Provides default character classifications (digit, word char, space, punctuation, whatever) *) Provides code point and character manipulation. (substring functionality, basically) *) Provides integrity features (exceptions if a string would be invalid) Language *) Provides language-sensitive manipulation of characters (case mangling) *) Provides language-sensitive comparisons *) Provides language-sensitive character overrides ('ll' treated as a single character, for example, in Spanish if that's still desired) *) Provides language-sensitive grouping overrides. Since examples are good, here are a few. They're in an If we/Then Parrot format. IW: Mush together (either concatenate or substr replacement) two strings of different languages but same charset TP: Checks to see if that's allowed. If not, an exception is thrown. If so, we do the operation. If one string is manipulated the language stays whatever that string was. If a new string is created either the left side wins or the default language is used, depending on the interpreter setting. IW: Mush together two strings of different charsets TP: If the two strings can be losslessly converted to one of the two charsets, do so, otherwise transform to Unicode and mush together. If transformation is lossy optionally throw an exception (or warning) Language rules above still apply. IW: Force a conversion to a different character set TP: Does it. An exception or warning may be thrown if the conversion is not lossless. Please note that in most cases parrot deals with string data as *strings* in S registers (or hiding behind PMCs) not as integers in I registers (even though we treat strings as a series of abstract integer code points). This is because even something as simple as give me character 5 may return a series of code points if character 5 is a combining character set. We may (possibly, but possibly not) get a bit dirtier for the regex code for speed reasons, but we'll see about that. Also note that some languages, such as perl 6, have a more restricted view of things. That's fine, but we don't really care much as long as everything that they need is provided, so the fact that Larry's mandated the Ux levels is fine, but as they're a (possibly excessively) restricted subset of what we're going to do
Re: ICU fixed
At 5:23 PM +0200 4/12/04, Marcus Thiesen wrote: On Monday 12 April 2004 17:01, Dan Sugalski wrote: I just checked in a patch for the problems building the data files. If the folks having problems could try it out that'd be great. (Works for me locally, but...) Seems to work, I get now an all test successfull on Linux, for details see www.luusa.org/~marcus/parrottest Glad its working. I see you've a selection of machines there. Any reason to not just drop them into the current tinderbox system? -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: ICU fixed
On Monday 12 April 2004 17:46, Dan Sugalski wrote: I see you've a selection of machines there. Any reason to not just drop them into the current tinderbox system? Most of the machines don't run 24/7 and I didn't really understand the existing system, so I did it my way(TM). I didn't really like what I saw about the old system and what I did is more or less historically grown, so I'll keep it that way, but I could forward this mails to any location. Have fun, Marcus -- :: Marcus Thiesen :: www.thiesen.org :: ICQ#108989768 :: 0x754675F2 :: There are worse things in life than death. Have you ever spent an evening with an insurance salesman Woody Allen pgp0.pgp Description: signature
Re: [PATCH] Get string.c to compile in MS VC++
On Apr 12, 2004, at 5:33 AM, Jonathan Worthington wrote: snip See if you have a .dat file (or a bunch of individual files) in blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then that's what's going on. Right now, I have that path hard-coded--of course I need to pull that out into a config--but it probably means that either the data files aren't getting created, or just that they are in a different location. Glancing at your icu.pl patch, it may just be missing moving the .dat file (or, maybe creating it too). They are missing, and in fact weren't even being created. Turns out I'd somehow managed to miss a line out in the makefile, namely the one that made the data. D'oh. I added it in, but this gave rise to new problems. The .mak file (in icu/source/data) was missing some paths so some of the tools were not being found. That was easily fixed, and now it gets quite a way through making the data, until it hits a point where it starts giving errors like this:- -- Making Locale Resource Bundle files ..\..\locales\root.txt:39: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\root.txt:37: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\root.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ar.txt:16: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ar.txt:14: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\ar.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ca.txt:12: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ca.txt:10: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR -- Any ideas? This error was showing up on Linux, and I was able to get it to happen for me by running the genrb tool with a parameter (or env. variable) missing. (Probably, the cause in the Linux case was actually something else.) Take a look at my first post in the Build problems in i386 linux thread. One cause of this error is that the 'genrb' tool (built before this point) can't find a data file it needs--the file is icudt26b_ucadata.icu (possibly with a different prefix for you), and is probably in the icu/source/data/out/build directory. On Unix systems, it's located via the ICU_DATA env. variable (which apparently has to end with a slash), which the Makefile in icu/source/data sets up, or it can be passed via a -i argument to 'genrb' (either way, pointing to the directory containing that file). So take a look and see how that tool is being invoked in the build process, and whether a parameter is missing (or pointed to the wrong place). That's my bet for what's going on. (The Linux case was only failing on one if the files, but it sounds like you're failing on all of them, which is the behavior I'd expect if this is the problem.) But, we're getting there! JEff
Re: Build problems in i386 linux
On Apr 12, 2004, at 6:37 AM, Dan Sugalski wrote: I'll try and get patches to use the system ICU install if there is one done today. Also take a look at my [perl #28473], which I don't think has made it to the list yet. That's a patch to make the location of ICU's data directory configurable, so it should help with that. JEff
Re: ICU bug some places
Dan, as soon as you put the patch in, say so I can update cvs and re-test. Thanks Alberto Dan Sugalski wrote: According to the IBM website, ICU triggers a GCC bug in the gcc 3.x series--you can't compile with the -O2 optimization setting. -O3 works. I'll put a patch in.
Re: ICU bug some places
At 5:14 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote: Dan, as soon as you put the patch in, say so I can update cvs and re-test. It's in. :) Dan Sugalski wrote: According to the IBM website, ICU triggers a GCC bug in the gcc 3.x series--you can't compile with the -O2 optimization setting. -O3 works. I'll put a patch in. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: ICU fixed
ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build it_IT_PREEURO.txt ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build ja.txt ../data/locales/ja.txt:15: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3 make[1]: Leaving directory `/home/ambs/Junk/Parrot/parrot/icu/source/data' make: *** [blib/lib/libicuuc.a] Error 2 :-| Maybe the anoncvs is not updated, yet? Alberto Marcus Thiesen wrote: On Monday 12 April 2004 17:01, Dan Sugalski wrote: I just checked in a patch for the problems building the data files. If the folks having problems could try it out that'd be great. (Works for me locally, but...) Seems to work, I get now an all test successfull on Linux, for details see www.luusa.org/~marcus/parrottest Have fun, Marcus
Re: ICU fixed
At 5:21 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote: ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build it_IT_PREEURO.txt ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build ja.txt ../data/locales/ja.txt:15: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3 make[1]: Leaving directory `/home/ambs/Junk/Parrot/parrot/icu/source/data' make: *** [blib/lib/libicuuc.a] Error 2 :-| Maybe the anoncvs is not updated, yet? Nope -- there's only one CVS. The change is to the configure script for ICU, though, so you may want to do a make realclean first to make sure the changes are picked up. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: ICU fixed
OK, make clean != make realclean :-) Passed that problem. Thanks Alberto Dan Sugalski wrote: At 5:21 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote: ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build it_IT_PREEURO.txt ICU_DATA=../data/out/build LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d ../data/out/build ja.txt ../data/locales/ja.txt:15: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3 make[1]: Leaving directory `/home/ambs/Junk/Parrot/parrot/icu/source/data' make: *** [blib/lib/libicuuc.a] Error 2 :-| Maybe the anoncvs is not updated, yet? Nope -- there's only one CVS. The change is to the configure script for ICU, though, so you may want to do a make realclean first to make sure the changes are picked up.
Re: Plans for string processing
Just thought I'd mention that I'm in the process of trying to get strings.pod updated to reflect the current state of affairs. Mike
Re: cvs commit: parrot/ops math.ops
Dan Sugalski [EMAIL PROTECTED] wrote: --- math.ops23 Mar 2004 07:27:51 - 1.15 +++ math.ops12 Apr 2004 14:59:12 - 1.16 @@ -601,6 +601,8 @@ =item Bmul(out INT, in INT, in INT) +=item Bmul(out INT, in INT, in NUM) Seems to be a bit asymmetric when comparing to other math ops like Cadd, where this variant is missing. leo
[perl #28494] [PATCH] unescape strings
# New Ticket Created by Leopold Toetsch # Please include the string: [perl #28494] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 Attached patch: * adds a new test file for Unicode-related string tests * reimplements string_unescape_cstring which uses now ICU for the work * fixes a bug in string_compare with equally length strings It's also by far more efficient then the old code. TODO: move it out of string.c, docs. Jeff, please have a look at it. leo --- parrot/MANIFEST Mon Apr 12 15:43:05 2004 +++ parrot-leo/MANIFEST Mon Apr 12 18:41:07 2004 @@ -2596,6 +2596,7 @@ t/op/rx.t [] t/op/stacks.t [] t/op/string.t [] +t/op/stringu.t[] t/op/time.t [] t/op/trans.t [] t/op/types.t [] --- /dev/null Fri Feb 28 14:27:28 2003 +++ parrot-leo/t/op/stringu.t Mon Apr 12 18:40:40 2004 @@ -0,0 +1,57 @@ +#! perl -w +# Copyright: 2001-2004 The Perl Foundation. All Rights Reserved. +# $Id$ + +=head1 NAME + +t/op/stringu.t - Unicode String Test + +=head1 SYNOPSIS + + % perl -Ilib t/op/stringu.t + +=head1 DESCRIPTION + +Tests Parrot's unicode string system. + +=cut +#' + +use Parrot::Test tests = 4; +use Test::More; + +output_is( 'CODE', OUTPUT, angstrom ); +chr S0, 0x212B +print S0 +print \n +end +CODE +\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom ); +set S0, \x{212b} +print S0 +print \n +end +CODE +\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom 2 ); +set S0, aa\x{212b} +print S0 +print \n +end +CODE +aa\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom 3 ); +set S0, aa\x{212b}-aa +print S0 +print \n +end +CODE +aa\xe2\x84\xab-aa +OUTPUT --- parrot/src/string.c Sun Apr 11 15:16:48 2004 +++ parrot-leo/src/string.c Mon Apr 12 18:40:29 2004 @@ -1626,24 +1626,28 @@ type1 *curr1 = (type1 *)s1-strstart; \ type2 *curr2 = (type2 *)s2-strstart; \ \ -while( (_index++ minlen) (*curr1 == *curr2) ) \ +while( (_index minlen) (*curr1 == *curr2) ) \ { \ ++curr1; \ ++curr2; \ +++_index; \ } \ +if (_index == minlen s1-strlen == s2-strlen) { \ +result = 0; \ +break; \ +} \ +result = *curr1 - *curr2; \ \ -*result = *curr1 - *curr2; \ - \ -if( !*result ) \ +if( !result ) \ { \ if( s1-strlen != s2-strlen ) \ { \ -*result = s1-strlen s2-strlen ? 1 : -1; \ +result = s1-strlen s2-strlen ? 1 : -1; \ } \ } \ else \ { \ -*result = *result 0 ? 1 : -1; \ +result = result 0 ? 1 : -1; \ } \ } while(0) @@ -1691,13 +1695,13 @@ { case enum_stringrep_one: /* could use memcmp in this one case; faster?? */ -COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp); break; case enum_stringrep_two: -COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp); break; case enum_stringrep_four: -COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp); break; default: /* trouble! */ @@ -1731,18 +1735,18 @@ if( smaller-representation == enum_stringrep_two ) { COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt2, -larger, smaller, cmp); +larger, smaller, cmp); } else /* smaller-representation == enum_stringrep_one */ { COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt1, -larger, smaller, cmp); +larger, smaller, cmp); } } else /* larger-representation == enum_stringrep_two, smaller-representation == enum_stringrep_one */ { -COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt1, larger, smaller, cmp); +COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt1, larger, smaller, cmp); } return cmp * multiplier; @@ -3052,7 +3056,69 @@ =cut */ +#if 1 +/* TODO move this out of string.c */ +#include unicode/ustring.h +static UChar +char_at(Parrot_Int4 offs, void* context) +{ +return *((char*)context + offs); +} +STRING * +string_unescape_cstring(struct Parrot_Interp * interpreter, +char *cstring,
Re: [perl #28461] [PATCH] Spelling Nit for diagnostic message
On Sun, 2004-04-11 at 11:10, Will Coleda wrote: Another instance of unknow bash-2.05a$ cvs diff src/dynext.c Index: src/dynext.c Thanks, applied. -- c
Re: [perl #28494] [PATCH] unescape strings
That's really funny--I wrote almost exactly the same code w.r.t. string_unescape_cstring last night, and I also always use U+212b for testing any time I need to come up with a readable character outside of the Latin range. Strange coincidences. I'll take a look and see if there is anything significantly different in our implementations, and get back to you. (It's definitely convenient, especially for testing, to have a way to represent arbitrary characters in string literals.) JEff On Apr 12, 2004, at 9:54 AM, Leopold Toetsch (via RT) wrote: # New Ticket Created by Leopold Toetsch # Please include the string: [perl #28494] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 Attached patch: * adds a new test file for Unicode-related string tests * reimplements string_unescape_cstring which uses now ICU for the work * fixes a bug in string_compare with equally length strings It's also by far more efficient then the old code. TODO: move it out of string.c, docs. Jeff, please have a look at it. leo --- parrot/MANIFEST Mon Apr 12 15:43:05 2004 +++ parrot-leo/MANIFEST Mon Apr 12 18:41:07 2004 @@ -2596,6 +2596,7 @@ t/op/rx.t [] t/op/stacks.t [] t/op/string.t [] +t/op/stringu.t[] t/op/time.t [] t/op/trans.t [] t/op/types.t [] --- /dev/null Fri Feb 28 14:27:28 2003 +++ parrot-leo/t/op/stringu.t Mon Apr 12 18:40:40 2004 @@ -0,0 +1,57 @@ +#! perl -w +# Copyright: 2001-2004 The Perl Foundation. All Rights Reserved. +# $Id$ + +=head1 NAME + +t/op/stringu.t - Unicode String Test + +=head1 SYNOPSIS + + % perl -Ilib t/op/stringu.t + +=head1 DESCRIPTION + +Tests Parrot's unicode string system. + +=cut +#' + +use Parrot::Test tests = 4; +use Test::More; + +output_is( 'CODE', OUTPUT, angstrom ); +chr S0, 0x212B +print S0 +print \n +end +CODE +\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom ); +set S0, \x{212b} +print S0 +print \n +end +CODE +\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom 2 ); +set S0, aa\x{212b} +print S0 +print \n +end +CODE +aa\xe2\x84\xab +OUTPUT + +output_is( 'CODE', OUTPUT, escaped angstrom 3 ); +set S0, aa\x{212b}-aa +print S0 +print \n +end +CODE +aa\xe2\x84\xab-aa +OUTPUT --- parrot/src/string.c Sun Apr 11 15:16:48 2004 +++ parrot-leo/src/string.c Mon Apr 12 18:40:29 2004 @@ -1626,24 +1626,28 @@ type1 *curr1 = (type1 *)s1-strstart; \ type2 *curr2 = (type2 *)s2-strstart; \ \ -while( (_index++ minlen) (*curr1 == *curr2) ) \ +while( (_index minlen) (*curr1 == *curr2) ) \ { \ ++curr1; \ ++curr2; \ +++_index; \ } \ +if (_index == minlen s1-strlen == s2-strlen) { \ +result = 0; \ +break; \ +} \ +result = *curr1 - *curr2; \ \ -*result = *curr1 - *curr2; \ - \ -if( !*result ) \ +if( !result ) \ { \ if( s1-strlen != s2-strlen ) \ { \ -*result = s1-strlen s2-strlen ? 1 : -1; \ +result = s1-strlen s2-strlen ? 1 : -1; \ } \ } \ else \ { \ -*result = *result 0 ? 1 : -1; \ +result = result 0 ? 1 : -1; \ } \ } while(0) @@ -1691,13 +1695,13 @@ { case enum_stringrep_one: /* could use memcmp in this one case; faster?? */ -COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp); break; case enum_stringrep_two: -COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp); break; case enum_stringrep_four: -COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp); +COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp); break; default: /* trouble! */ @@ -1731,18 +1735,18 @@ if( smaller-representation == enum_stringrep_two ) { COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt2, -larger, smaller, cmp); +larger, smaller, cmp); } else /* smaller-representation == enum_stringrep_one */ { COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt1, -larger, smaller, cmp); +larger, smaller, cmp); } } else /* larger-representation == enum_stringrep_two,
Re: cvs commit: parrot/ops math.ops
At 6:02 PM +0200 4/12/04, Leopold Toetsch wrote: Dan Sugalski [EMAIL PROTECTED] wrote: --- math.ops 23 Mar 2004 07:27:51 - 1.15 +++ math.ops 12 Apr 2004 14:59:12 - 1.16 @@ -601,6 +601,8 @@ =item Bmul(out INT, in INT, in INT) +=item Bmul(out INT, in INT, in NUM) Seems to be a bit asymmetric when comparing to other math ops like Cadd, where this variant is missing. Yeah, it is. I'm still not 100% sure they should be in there, nor that we shouldn't have more of them for the other basic math ops. (That's one of the reasons they aren't in the ops list yet) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Status of the ICU problem...
All tests successful, 84 subtests skipped. Files=100, Tests=1498, 517 wallclock secs (263.66 cusr + 68.59 csys = 332.25 CPU) Cheers :-D Alberto
Strings rationale
I'm going to write up some information on my view of strings, and the rationale behind it, so that there's a clear explanation that we can use for discussion. That will give us something more organized to talk about. It will probably take a day or two for me to get that done. I'll also respond to Dan's concerns, but that will be easier to do once I've spelled out what I'm thinking, so that we can minimize problems due to miscommunication. JEff
Re: Strings rationale
At 10:14 AM -0700 4/12/04, Jeff Clites wrote: I'm going to write up some information on my view of strings, and the rationale behind it, so that there's a clear explanation that we can use for discussion. That will give us something more organized to talk about. It will probably take a day or two for me to get that done. As long as it doesn't essentially read Because all the cool kids are doing it or because it makes my life easier, which are the two common rationales--neither of those are sufficient. :) I'll hold off editing the interface back to the way it was (and the way I want it) until you've had a chance to make your pitch, though. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
ICU Building On Win32 (was Re: [PATCH] Get string.c to compile in MS VC++)
Jeff Clites [EMAIL PROTECTED] wrote: On Apr 12, 2004, at 5:33 AM, Jonathan Worthington wrote: snip See if you have a .dat file (or a bunch of individual files) in blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then that's what's going on. Right now, I have that path hard-coded--of course I need to pull that out into a config--but it probably means that either the data files aren't getting created, or just that they are in a different location. Glancing at your icu.pl patch, it may just be missing moving the .dat file (or, maybe creating it too). They are missing, and in fact weren't even being created. Turns out I'd somehow managed to miss a line out in the makefile, namely the one that made the data. D'oh. I added it in, but this gave rise to new problems. The .mak file (in icu/source/data) was missing some paths so some of the tools were not being found. That was easily fixed, and now it gets quite a way through making the data, until it hits a point where it starts giving errors like this:- -- Making Locale Resource Bundle files ..\..\locales\root.txt:39: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\root.txt:37: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\root.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ar.txt:16: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ar.txt:14: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR couldn't parse the file ..\..\locales\ar.txt. Error:U_INVALID_FORMAT_ERROR ..\..\locales\ca.txt:12: warning: %Collation could not be constructed from CollationElements - check context! ..\..\locales\ca.txt:10: parse error. Stopped parsing with U_INVALID_FORMAT_ERROR -- Any ideas? This error was showing up on Linux, and I was able to get it to happen for me by running the genrb tool with a parameter (or env. variable) missing. (Probably, the cause in the Linux case was actually something else.) Take a look at my first post in the Build problems in i386 linux thread. One cause of this error is that the 'genrb' tool (built before this point) can't find a data file it needs--the file is icudt26b_ucadata.icu (possibly with a different prefix for you), and is probably in the icu/source/data/out/build directory. On Unix systems, it's located via the ICU_DATA env. variable (which apparently has to end with a slash), which the Makefile in icu/source/data sets up, or it can be passed via a -i argument to 'genrb' (either way, pointing to the directory containing that file). So take a look and see how that tool is being invoked in the build process, and whether a parameter is missing (or pointed to the wrong place). That's my bet for what's going on. (The Linux case was only failing on one if the files, but it sounds like you're failing on all of them, which is the behavior I'd expect if this is the problem.) Yup, that was it. There were a few other little issues with the makefile that I had to deal with, but it all appears to be working now. There are only 3 tests failing, and as I remember they were ones that failed before the big ICU patch. I've attached the patches, and (fingers crossed) this will get Parrot going on Win32 again. Summary of changes:- * Add source/allinone/all/all.dsp (which was moved to the attic previously). * Add a modified source/allinone/allinone.dsw (which was moved to the attic previously). Changes are due to the fact that we do not have everything the full ICU tree would have. * Modify config/gen/icu.pl to write the makefile entries for building ICU on Win32 and ensure .dsp files have proper Win32 line endings (MS VC++ is very fussy about this). * Modify icu/source/data/makedata.mak to correct a few path issues and remove parts relating to things we don't have on the ICU source tree. Jonathan win32icu.patch Description: Binary data icuwin32missing.patch Description: Binary data
Re: ICU Building On Win32 (was Re: [PATCH] Get string.c to compile in MS VC++)
At 8:46 PM +0100 4/12/04, Jonathan Worthington wrote: I've attached the patches, and (fingers crossed) this will get Parrot going on Win32 again. Applied, thanks. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Strings rationale
On Apr 12, 2004, at 10:23 AM, Dan Sugalski wrote: At 10:14 AM -0700 4/12/04, Jeff Clites wrote: I'm going to write up some information on my view of strings, and the rationale behind it, so that there's a clear explanation that we can use for discussion. That will give us something more organized to talk about. It will probably take a day or two for me to get that done. As long as it doesn't essentially read Because all the cool kids are doing it or because it makes my life easier, which are the two common rationales--neither of those are sufficient. :) Of course. The argument will be that this model delivers semantics that match the concept that a string is trying to capture, and that it gives developers the tools that they need and want in working with them. The only point in mentioning precedent is to indicate that the pros and cons of such an approach are well-know--that there aren't hidden gotchas. But before I justify the model, I need to fully explain it. One can't disagree (or agree) with a model, or its goals, until it's clear what those are. JEff
Re: Plans for string processing
Dan~ I know that you are not technically required to defend your position, but I would like an explanation of one part of this plan. Dan Sugalski wrote: 4) We will *not* use ICU for core functions. (string to number or number to string conversions, for example) Why not? It seems like we would just be reinventing a rather large wheel here. Matt
Re: [perl #28393] [PATCH] Tcl pmcs
Did the makefile change make it in? On Monday, April 12, 2004, at 04:57 AM, Leopold Toetsch wrote: Will Coleda [EMAIL PROTECTED] wrote: dyld: ./parrot Undefined symbols: _Parrot_tclobject_morph _Parrot_tclobject_set_pmc Ah yes. That's ugly. So here we go: 0) The PMCs were pre-ICU. I've adapted them. Should I check it in or send it back to you? 1) dynamic PMCs need a dynpmc flag on the class definition line. This causes the PMC compiler to add additional code for dynamic loading: pmclass TclString extends tclobject dynpmc { 2) Nasty dependencies. I got around that by changing the Makefile like so: tclobject$(SO) : tclobject.c $(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \ -I../include -I../classes \ -L../blib/lib -lparrot $ $(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@|' cd ../runtime/parrot/dynext; ln -sf tclobject.so libtclobject.so %$(SO) : %.c $(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \ -I../include -I../classes \ -L../blib/lib -lparrot -L../runtime/parrot/dynext \ -ltclobject $ $(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@ (and disabling non-tcl shared classes for now) That is for all tcl* but tclobject libtclobject.so is added as a library. This might also need the LD_LIBRARY_PATH to contain Fruntime/parrot/dynext. I don't know how we do that platform independend. It might be simpler to have a utility that copies all tcl*.c together into one and chains the Parrot_lib_type_load() functions. So only one shared lib would be loaded that registers all classes. Should be a rather simple script. E.g. merge-classes -o tcl-all.c tclobject.c tclstring.c ... Then compile and loadlib only the tcl-all. This would need a Parrot_lib_tcl-all_load() function that calls _load() for all contained PMCs. 3) For now $ cat tcl.pasm loadlib P10, tclobject print ok 1\n loadlib P11, tclstring print ok 2\n new P1, .TclString set P1, ok 3\n set S1, P1 print S1 end $ parrot tcl.pasm ok 1 ok 2 ok 3 leo -- Will Coke Coledawill at coleda dot com
[perl #28502] [PATCH] dynclasses/README
# New Ticket Created by Will Coleda # Please include the string: [perl #28502] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28502 Here's an updated version of dynclasses/README that sums up recent notes, and PODifies the doc. (It's a big enough change, I just attached the whole file.) README Description: Binary data -- Will Coke Coledawill at coleda dot com
Re: [perl #28502] [PATCH] dynclasses/README
Immediately after I sent this, it occurred to me that I was missing the dynpmc flag that Leo had just mentioned. Re-attachment. README Description: Binary data On Monday, April 12, 2004, at 07:35 PM, Will Coleda (via RT) wrote: # New Ticket Created by Will Coleda # Please include the string: [perl #28502] # in the subject line of all future correspondence about this issue. # URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28502 Here's an updated version of dynclasses/README that sums up recent notes, and PODifies the doc. (It's a big enough change, I just attached the whole file.) README -- Will Coke Coledawill at coleda dot com -- Will Coke Coledawill at coleda dot com