Re: [PATCH] Get string.c to compile in MS VC++

2004-04-12 Thread Jeff Clites
On Apr 11, 2004, at 7:05 PM, Jeff Clites wrote:

On Apr 11, 2004, at 4:52 PM, Jonathan Worthington wrote:

On Apr 9, 2004, at 3:26 PM, Jonathan Worthington wrote:

I'm having a crack at getting the ICU changes building on Win32.
--
Failed Test   Status Wstat Total Fail  Failed  List of Failed
-- 
--

t\op\integer.t 1   256391   2.56%  4
t\op\number.t 32  819238   32  84.21%  1-24, 27, 29,  
31,
34-38
--
Yep, the pattern I think I see is that tests which involve floats  
(except for 0.0) are failing, which is what would happen if ICU can't  
find it's data. (Because, it either ends up thinking that nothing is a  
digit character, or that they all have digit value zero.) I'll have to  
add a quick test inside string_init to detect this case, so that we  
can blow up instead of misbehaving.
...
See if you have a .dat file (or a bunch of individual files) in  
blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then  
that's what's going on. Right now, I have that path hard-coded--of  
course I need to pull that out into a config
I've submitted a patch, [perl #28473], which makes that location  
configurable (still defaulting to blib/lib/icu/2.6.1), and which will  
cause parrot to complain if it can't find the necessary data files.  
That should also cause the build to fail near the end if something is  
wrong, when it tries to create library/config.fpmc.

That doesn't fix your case, but at least it should make such failures  
clearer, and easier to fix.

JEff



Re: [perl #28393] [PATCH] Tcl pmcs

2004-04-12 Thread Leopold Toetsch
Will Coleda [EMAIL PROTECTED] wrote:

 dyld: ./parrot Undefined symbols:
 _Parrot_tclobject_morph
 _Parrot_tclobject_set_pmc

Ah yes. That's ugly. So here we go:

0) The PMCs were pre-ICU. I've adapted them. Should I check it in or
send it back to you?

1) dynamic PMCs need a dynpmc flag on the class definition line. This
causes the PMC compiler to add additional code for dynamic loading:

  pmclass TclString extends tclobject dynpmc {

2) Nasty dependencies. I got around that by changing the Makefile like
so:

tclobject$(SO) : tclobject.c
$(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \
-I../include -I../classes \
-L../blib/lib -lparrot $
$(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@|'
cd ../runtime/parrot/dynext; ln -sf tclobject.so libtclobject.so

%$(SO) : %.c
$(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \
-I../include -I../classes \
-L../blib/lib -lparrot -L../runtime/parrot/dynext \
-ltclobject $
$(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@

(and disabling non-tcl shared classes for now)

That is for all tcl* but tclobject libtclobject.so is added as a
library. This might also need the LD_LIBRARY_PATH to contain
Fruntime/parrot/dynext.

I don't know how we do that platform independend.

It might be simpler to have a utility that copies all tcl*.c together
into one and chains the Parrot_lib_type_load() functions. So only one
shared lib would be loaded that registers all classes. Should be a
rather simple script.

E.g. merge-classes -o tcl-all.c tclobject.c tclstring.c ...

Then compile and loadlib only the tcl-all. This would need a
Parrot_lib_tcl-all_load() function that calls _load() for all contained
PMCs.

3) For now
$ cat tcl.pasm
  loadlib P10, tclobject
  print ok 1\n
  loadlib P11, tclstring
  print ok 2\n
  new P1, .TclString
  set P1, ok 3\n
  set S1, P1
  print S1
  end

$ parrot tcl.pasm
ok 1
ok 2
ok 3

leo


Re: [perl #28182] thr-primes.imc segfaults

2004-04-12 Thread Leopold Toetsch
Will Coleda [EMAIL PROTECTED] wrote:

 bash-2.05a$ ./parrot examples/assembly/thr-primes.imc
 SNIP
 Found prime 401
 Found prime 409
 Found prime 419
 Found prime 421
 Segmentation fault

Fixed. The interpreter-lo_var_ptr for threads wasn't correctly
initialized. So trace_mem_block went into other threads stacks.

leo


Re: [PATCH] Get string.c to compile in MS VC++

2004-04-12 Thread Jonathan Worthington
 snip
 See if you have a .dat file (or a bunch of individual files) in
 blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then
 that's what's going on. Right now, I have that path hard-coded--of
 course I need to pull that out into a config--but it probably means
 that either the data files aren't getting created, or just that they
 are in a different location. Glancing at your icu.pl patch, it may just
 be missing moving the .dat file (or, maybe creating it too).

They are missing, and in fact weren't even being created.  Turns out I'd
somehow managed to miss a line out in the makefile, namely the one that made
the data.  D'oh.

I added it in, but this gave rise to new problems.  The .mak file (in
icu/source/data) was missing some paths so some of the tools were not being
found.  That was easily fixed, and now it gets quite a way through making
the data, until it hits a point where it starts giving errors like this:-

--
Making Locale Resource Bundle files
..\..\locales\root.txt:39: warning: %Collation could not be constructed from
CollationElements - check context!
..\..\locales\root.txt:37: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
couldn't parse the file ..\..\locales\root.txt. Error:U_INVALID_FORMAT_ERROR
..\..\locales\ar.txt:16: warning: %Collation could not be constructed from
CollationElements - check context!
..\..\locales\ar.txt:14: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
couldn't parse the file ..\..\locales\ar.txt. Error:U_INVALID_FORMAT_ERROR
..\..\locales\ca.txt:12: warning: %Collation could not be constructed from
CollationElements - check context!
..\..\locales\ca.txt:10: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
--

Any ideas?

 [**] There are 4 or so options for how to package the ICU data--into a
 dynamic library, into an archive library, as a bunch of separate files,
 or as a single file. Any should work, but the last two are faster to
 build, and a bit more flexible.

Looking at the makefile, I believe it would package the data in each of
these ways, then we'd just copy the .dat into the appropriate place.  Once
I've got it to actually work, I'll see if I can get it down to generating
the .dat only.

Jonathan




Re: Build problems in i386 linux

2004-04-12 Thread Dan Sugalski
FWIW, taking a look at the Debian-specific ICU patches for their 
testing branch might be worthwhile. They've patched the thing up to 
get it to build, and since it's the only system I have locally that 
*doesn't* build ICU, I suspect there's something either Debian-ish 
going wrong, or more Linux in general.

I'll try and get patches to use the system ICU install if there is 
one done today.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [perl #28393] [PATCH] Tcl pmcs

2004-04-12 Thread Leopold Toetsch
Will Coleda [EMAIL PROTECTED] wrote:

 Attached, find a .tgz that can be exploded in the top level of parrot
 which creates the abstract pmc tclobject, with children TclString,
 TclInt, TclFloat, and container pmcs TclList (an array) and
 TclArray (a hash).

Applied now, slightly modified for dynclasses and with string-related
adaptions due to API changes.

leo


ICU bug some places

2004-04-12 Thread Dan Sugalski
According to the IBM website, ICU triggers a GCC bug in the gcc 3.x 
series--you can't compile with the -O2 optimization setting. -O3 
works. I'll put a patch in.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


ICU fixed

2004-04-12 Thread Dan Sugalski
I just checked in a patch for the problems building the data files. 
If the folks having problems could try it out that'd be great. (Works 
for me locally, but...)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU fixed

2004-04-12 Thread Marcus Thiesen
On Monday 12 April 2004 17:01, Dan Sugalski wrote:
 I just checked in a patch for the problems building the data files.
 If the folks having problems could try it out that'd be great. (Works
 for me locally, but...)
Seems to work, I get now an all test successfull on Linux, for details see 
www.luusa.org/~marcus/parrottest

Have fun,
Marcus


-- 
 :: Marcus Thiesen :: www.thiesen.org :: ICQ#108989768 :: 0x754675F2 :: 

Let's say the docs present a simplified view of reality... :-) 
   Larry Wall


Plans for string processing

2004-04-12 Thread Dan Sugalski
Okay, I've not dug through all the fallout from the ICU checkin, but 
I can see there's an awful lot. I'll dig through that in a bit, but...

Here's the plan. We've gone over it in the past, but I'm not sure 
everything's been gathered together, so it's time to do so.

Some declarations:

1) Parrot will *not* require Unicode. Period. Ever. (Well, upon 
release, at least) We will strongly recommend it, however, and use it 
if we have it
2) Parrot *will* support multiple encodings (the bytes-code points 
stuff), character sets (code points-meaning of a sort), and 
language-specific overrides of character set behaviour.
3) All string data can be dealt with as either a series of bytes, 
code points, or characters. (Characters are potentially multiple code 
points--basically combining character stuff from those standards that 
do so)
4) We will *not* use ICU for core functions. (string to number or 
number to string conversions, for example)
5) Parrot will autoconvert strings as needed. If a string can't be 
converted, parrot will throw an exception. This goes for language, 
character set, or encoding.
6) There *may* be an overriding set of rules for throwing conversion 
exceptions. (They may be supressed on lossy conversions, or required 
for any conversions)
7) There *may* be an overriding language used for language-specific 
operations (case folding or sorting).

I know ICU's got all sorts of nifty features, but bluntly we're not 
going to use most of them.

The original split of encoding, character set, and language is one 
that I want to keep. I know we've lost a good chunk of that with the 
latest ICU patch, but that's only temporary and the breakage is worth 
it to get Unicode actually in use. I expect I need to step up to the 
plate and get an alternate encoding and charset in, so I'll probably 
take a shot at JIS X 0208:1997 or CNS11643-1992. (Or whatever the 
current version of those is)

As far as Parrot is concerned, a string is a series of bytes which 
may, via its encoding, be turned into a series of 32 bit integer code 
points. Those 32-bit integer code points can be turned, via its 
character set, into a series of characters where each character is 
one or more code points. Those characters may be classified and 
transformed based on the language of the string.

The responsibilities of the three layers are:

Encoding

*) Transform stream of bytes to and from a set of 32-bit integers
*) Manages byte buffer (so buffer positioning and manipulation by 
code point offset is handled here)

Character set
=
*) Provides default manipulation and comparison behaviour (sorting 
and case mangling)
*) Provides default character classifications (digit, word char, 
space, punctuation, whatever)
*) Provides code point and character manipulation. (substring 
functionality, basically)
*) Provides integrity features (exceptions if a string would be invalid)

Language

*) Provides language-sensitive manipulation of characters (case mangling)
*) Provides language-sensitive comparisons
*) Provides language-sensitive character overrides ('ll' treated as a 
single character, for example, in Spanish if that's still desired)
*) Provides language-sensitive grouping overrides.

Since examples are good, here are a few. They're in an If we/Then 
Parrot format.

IW: Mush together (either concatenate or substr replacement) two 
strings of different languages but same charset
TP: Checks to see if that's allowed. If not, an exception is thrown. 
If so, we do the operation. If one string is manipulated the language 
stays whatever that string was. If a new string is created either the 
left side wins or the default language is used, depending on the 
interpreter setting.

IW: Mush together two strings of different charsets
TP: If the two strings can be losslessly converted to one of the two 
charsets, do so, otherwise transform to Unicode and mush together. If 
transformation is lossy optionally throw an exception (or warning) 
Language rules above still apply.

IW: Force a conversion to a different character set
TP: Does it. An exception or warning may be thrown if the conversion 
is not lossless.

Please note that in most cases parrot deals with string data as 
*strings* in S registers (or hiding behind PMCs) not as integers in I 
registers (even though we treat strings as a series of abstract 
integer code points). This is because even something as simple as 
give me character 5 may return a series of code points if character 
5 is a combining character set. We may (possibly, but possibly not) 
get a bit dirtier for the regex code for speed reasons, but we'll see 
about that.

Also note that some languages, such as perl 6, have a more restricted 
view of things. That's fine, but we don't really care much as long as 
everything that they need is provided, so the fact that Larry's 
mandated the Ux levels is fine, but as they're a (possibly 
excessively) restricted subset of what we're going to do 

Re: ICU fixed

2004-04-12 Thread Dan Sugalski
At 5:23 PM +0200 4/12/04, Marcus Thiesen wrote:
On Monday 12 April 2004 17:01, Dan Sugalski wrote:
 I just checked in a patch for the problems building the data files.
 If the folks having problems could try it out that'd be great. (Works
 for me locally, but...)
Seems to work, I get now an all test successfull on Linux, for details see
www.luusa.org/~marcus/parrottest
Glad its working.

I see you've a selection of machines there. Any reason to not just 
drop them into the current tinderbox system?
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU fixed

2004-04-12 Thread Marcus Thiesen
On Monday 12 April 2004 17:46, Dan Sugalski wrote:
 I see you've a selection of machines there. Any reason to not just
 drop them into the current tinderbox system?

Most of the machines don't run 24/7 and I didn't really understand the 
existing system, so I did it my way(TM). I didn't really like what I saw 
about the old system and what I did is more or less historically grown, so 
I'll keep it that way, but I could forward this mails to any location.
Have fun,
Marcus

-- 
 :: Marcus Thiesen :: www.thiesen.org :: ICQ#108989768 :: 0x754675F2 :: 

There are worse things in life than death. Have you ever spent an evening with
an insurance salesman
  Woody Allen


pgp0.pgp
Description: signature


Re: [PATCH] Get string.c to compile in MS VC++

2004-04-12 Thread Jeff Clites
On Apr 12, 2004, at 5:33 AM, Jonathan Worthington wrote:

snip
See if you have a .dat file (or a bunch of individual files) in
blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then
that's what's going on. Right now, I have that path hard-coded--of
course I need to pull that out into a config--but it probably means
that either the data files aren't getting created, or just that they
are in a different location. Glancing at your icu.pl patch, it may 
just
be missing moving the .dat file (or, maybe creating it too).

They are missing, and in fact weren't even being created.  Turns out 
I'd
somehow managed to miss a line out in the makefile, namely the one 
that made
the data.  D'oh.

I added it in, but this gave rise to new problems.  The .mak file (in
icu/source/data) was missing some paths so some of the tools were not 
being
found.  That was easily fixed, and now it gets quite a way through 
making
the data, until it hits a point where it starts giving errors like 
this:-

--
Making Locale Resource Bundle files
..\..\locales\root.txt:39: warning: %Collation could not be 
constructed from
CollationElements - check context!
..\..\locales\root.txt:37: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
couldn't parse the file ..\..\locales\root.txt. 
Error:U_INVALID_FORMAT_ERROR
..\..\locales\ar.txt:16: warning: %Collation could not be constructed 
from
CollationElements - check context!
..\..\locales\ar.txt:14: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
couldn't parse the file ..\..\locales\ar.txt. 
Error:U_INVALID_FORMAT_ERROR
..\..\locales\ca.txt:12: warning: %Collation could not be constructed 
from
CollationElements - check context!
..\..\locales\ca.txt:10: parse error. Stopped parsing with
U_INVALID_FORMAT_ERROR
--

Any ideas?
This error was showing up on Linux, and I was able to get it to happen 
for me by running the genrb tool with a parameter (or env. variable) 
missing. (Probably, the cause in the Linux case was actually something 
else.)

Take a look at my first post in the Build problems in i386 linux 
thread. One cause of this error is that the 'genrb' tool (built before 
this point) can't find a data file it needs--the file is 
icudt26b_ucadata.icu (possibly with a different prefix for you), and is 
probably in the icu/source/data/out/build directory. On Unix systems, 
it's located via the ICU_DATA env. variable (which apparently has to 
end with a slash), which the Makefile in icu/source/data sets up, or it 
can be passed via a -i argument to 'genrb' (either way, pointing to 
the directory containing that file). So take a look and see how that 
tool is being invoked in the build process, and whether a parameter is 
missing (or pointed to the wrong place).

That's my bet for what's going on. (The Linux case was only failing on 
one if the files, but it sounds like you're failing on all of them, 
which is the behavior I'd expect if this is the problem.)

But, we're getting there!

JEff



Re: Build problems in i386 linux

2004-04-12 Thread Jeff Clites
On Apr 12, 2004, at 6:37 AM, Dan Sugalski wrote:

I'll try and get patches to use the system ICU install if there is one 
done today.
Also take a look at my [perl #28473], which I don't think has made it 
to the list yet. That's a patch to make the location of ICU's data 
directory configurable, so it should help with that.

JEff



Re: ICU bug some places

2004-04-12 Thread Alberto Manuel Brandao Simoes
Dan,
 as soon as you put the patch in, say so I can update cvs and re-test.
 Thanks
Alberto

Dan Sugalski wrote:
According to the IBM website, ICU triggers a GCC bug in the gcc 3.x 
series--you can't compile with the -O2 optimization setting. -O3 works. 
I'll put a patch in.


Re: ICU bug some places

2004-04-12 Thread Dan Sugalski
At 5:14 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote:
Dan,
 as soon as you put the patch in, say so I can update cvs and re-test.
It's in. :)

Dan Sugalski wrote:
According to the IBM website, ICU triggers a GCC bug in the gcc 3.x 
series--you can't compile with the -O2 optimization setting. -O3 
works. I'll put a patch in.


--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU fixed

2004-04-12 Thread Alberto Manuel Brandao Simoes
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
 ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build it_IT_PREEURO.txt
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
 ../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build ja.txt
../data/locales/ja.txt:15: parse error. Stopped parsing with 
U_INVALID_FORMAT_ERROR
couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR
make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3
make[1]: Leaving directory `/home/ambs/Junk/Parrot/parrot/icu/source/data'
make: *** [blib/lib/libicuuc.a] Error 2

:-|
Maybe the anoncvs is not updated, yet?
Alberto

Marcus Thiesen wrote:
On Monday 12 April 2004 17:01, Dan Sugalski wrote:

I just checked in a patch for the problems building the data files.
If the folks having problems could try it out that'd be great. (Works
for me locally, but...)
Seems to work, I get now an all test successfull on Linux, for details see 
www.luusa.org/~marcus/parrottest

Have fun,
Marcus



Re: ICU fixed

2004-04-12 Thread Dan Sugalski
At 5:21 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote:
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build it_IT_PREEURO.txt
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build ja.txt
../data/locales/ja.txt:15: parse error. Stopped parsing with 
U_INVALID_FORMAT_ERROR
couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR
make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3
make[1]: Leaving directory `/home/ambs/Junk/Parrot/parrot/icu/source/data'
make: *** [blib/lib/libicuuc.a] Error 2

:-|
Maybe the anoncvs is not updated, yet?
Nope -- there's only one CVS. The change is to the configure script 
for ICU, though, so you may want to do a make realclean first to make 
sure the changes are picked up.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: ICU fixed

2004-04-12 Thread Alberto Manuel Brandao Simoes
OK, make clean != make realclean :-)
Passed that problem.
Thanks
Alberto
Dan Sugalski wrote:
At 5:21 PM +0100 4/12/04, Alberto Manuel Brandao Simoes wrote:

ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build it_IT_PREEURO.txt
ICU_DATA=../data/out/build 
LD_LIBRARY_PATH=../common:../i18n:../tools/toolutil:../layout:../layoutex:../extra/ustdio:../tools/ctestfw:../data/out:../data:../stubdata/:$LD_LIBRARY_PATH 
../tools/genrb/genrb -k -q -p icudt26l -s ../data/locales -d 
../data/out/build ja.txt
../data/locales/ja.txt:15: parse error. Stopped parsing with 
U_INVALID_FORMAT_ERROR
couldn't parse the file ja.txt. Error:U_INVALID_FORMAT_ERROR
make[1]: *** [../data/out/build/icudt26l_ja.res] Error 3
make[1]: Leaving directory 
`/home/ambs/Junk/Parrot/parrot/icu/source/data'
make: *** [blib/lib/libicuuc.a] Error 2

:-|
Maybe the anoncvs is not updated, yet?


Nope -- there's only one CVS. The change is to the configure script for 
ICU, though, so you may want to do a make realclean first to make sure 
the changes are picked up.


Re: Plans for string processing

2004-04-12 Thread Michael Scott
Just thought I'd mention that I'm in the process of trying to get 
strings.pod updated to reflect the current state of affairs.

Mike



Re: cvs commit: parrot/ops math.ops

2004-04-12 Thread Leopold Toetsch
Dan Sugalski [EMAIL PROTECTED] wrote:
   --- math.ops23 Mar 2004 07:27:51 -  1.15
   +++ math.ops12 Apr 2004 14:59:12 -  1.16
   @@ -601,6 +601,8 @@

=item Bmul(out INT, in INT, in INT)

   +=item Bmul(out INT, in INT, in NUM)

Seems to be a bit asymmetric when comparing to other math ops like
Cadd, where this variant is missing.

leo


[perl #28494] [PATCH] unescape strings

2004-04-12 Thread via RT
# New Ticket Created by  Leopold Toetsch 
# Please include the string:  [perl #28494]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 


Attached patch:
* adds a new test file for Unicode-related string tests
* reimplements string_unescape_cstring which uses now ICU for the work
* fixes a bug in string_compare with equally length strings

It's also by far more efficient then the old code.

TODO: move it out of string.c, docs.

Jeff, please have a look at it.

leo
--- parrot/MANIFEST Mon Apr 12 15:43:05 2004
+++ parrot-leo/MANIFEST Mon Apr 12 18:41:07 2004
@@ -2596,6 +2596,7 @@
 t/op/rx.t []
 t/op/stacks.t []
 t/op/string.t []
+t/op/stringu.t[]
 t/op/time.t   []
 t/op/trans.t  []
 t/op/types.t  []
--- /dev/null   Fri Feb 28 14:27:28 2003
+++ parrot-leo/t/op/stringu.t   Mon Apr 12 18:40:40 2004
@@ -0,0 +1,57 @@
+#! perl -w
+# Copyright: 2001-2004 The Perl Foundation.  All Rights Reserved.
+# $Id$
+
+=head1 NAME
+
+t/op/stringu.t - Unicode String Test
+
+=head1 SYNOPSIS
+
+   % perl -Ilib t/op/stringu.t
+
+=head1 DESCRIPTION
+
+Tests Parrot's unicode string system.
+
+=cut
+#'
+
+use Parrot::Test tests = 4;
+use Test::More;
+
+output_is( 'CODE', OUTPUT, angstrom );
+chr S0, 0x212B
+print S0
+print \n
+end
+CODE
+\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom );
+set S0, \x{212b}
+print S0
+print \n
+end
+CODE
+\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom 2 );
+set S0, aa\x{212b}
+print S0
+print \n
+end
+CODE
+aa\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom 3 );
+set S0, aa\x{212b}-aa
+print S0
+print \n
+end
+CODE
+aa\xe2\x84\xab-aa
+OUTPUT
--- parrot/src/string.c Sun Apr 11 15:16:48 2004
+++ parrot-leo/src/string.c Mon Apr 12 18:40:29 2004
@@ -1626,24 +1626,28 @@
 type1 *curr1 = (type1 *)s1-strstart; \
 type2 *curr2 = (type2 *)s2-strstart; \
  \
-while( (_index++  minlen)  (*curr1 == *curr2) ) \
+while( (_index  minlen)  (*curr1 == *curr2) ) \
 { \
 ++curr1; \
 ++curr2; \
+++_index; \
 } \
+if (_index == minlen  s1-strlen == s2-strlen) { \
+result = 0; \
+break; \
+} \
+result = *curr1 - *curr2; \
  \
-*result = *curr1 - *curr2; \
- \
-if( !*result ) \
+if( !result ) \
 { \
 if( s1-strlen != s2-strlen ) \
 { \
-*result = s1-strlen  s2-strlen ? 1 : -1; \
+result = s1-strlen  s2-strlen ? 1 : -1; \
 } \
 } \
 else \
 { \
-*result = *result  0 ? 1 : -1; \
+result = result  0 ? 1 : -1; \
 } \
 } while(0)
 
@@ -1691,13 +1695,13 @@
 {
 case enum_stringrep_one:
 /* could use memcmp in this one case; faster?? */
-COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp);
+COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, cmp);
 break;
 case enum_stringrep_two:
-COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp);
+COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, cmp);
 break;
 case enum_stringrep_four:
-COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp);
+COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, cmp);
 break;
 default:
 /* trouble! */
@@ -1731,18 +1735,18 @@
 if( smaller-representation == enum_stringrep_two )
 {
 COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt2, 
-larger, smaller, cmp);
+larger, smaller, cmp);
 }
 else /* smaller-representation == enum_stringrep_one */
 {
 COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt1, 
-larger, smaller, cmp);
+larger, smaller, cmp);
 }
 }
 else /* larger-representation == enum_stringrep_two, 
 smaller-representation == enum_stringrep_one */
 {
-COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt1, larger, smaller, cmp);
+COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt1, larger, smaller, cmp);
 }
 
 return cmp * multiplier;
@@ -3052,7 +3056,69 @@
 =cut
 
 */
+#if 1
+/* TODO move this out of string.c */
+#include unicode/ustring.h
+static UChar
+char_at(Parrot_Int4 offs, void* context)
+{
+return *((char*)context + offs);
 
+}
+STRING *
+string_unescape_cstring(struct Parrot_Interp * interpreter,
+char *cstring, 

Re: [perl #28461] [PATCH] Spelling Nit for diagnostic message

2004-04-12 Thread chromatic
On Sun, 2004-04-11 at 11:10, Will Coleda wrote:

 Another instance of unknow
 
 bash-2.05a$ cvs diff src/dynext.c
 Index: src/dynext.c

Thanks, applied.

-- c



Re: [perl #28494] [PATCH] unescape strings

2004-04-12 Thread Jeff Clites
That's really funny--I wrote almost exactly the same code w.r.t. 
string_unescape_cstring last night, and I also always use U+212b for 
testing any time I need to come up with a readable character outside of 
the Latin range. Strange coincidences.

I'll take a look and see if there is anything significantly different 
in our implementations, and get back to you. (It's definitely 
convenient, especially for testing, to have a way to represent 
arbitrary characters in string literals.)

JEff

On Apr 12, 2004, at 9:54 AM, Leopold Toetsch (via RT) wrote:

# New Ticket Created by  Leopold Toetsch
# Please include the string:  [perl #28494]
# in the subject line of all future correspondence about this issue.
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28494 
Attached patch:
* adds a new test file for Unicode-related string tests
* reimplements string_unescape_cstring which uses now ICU for the work
* fixes a bug in string_compare with equally length strings
It's also by far more efficient then the old code.

TODO: move it out of string.c, docs.

Jeff, please have a look at it.

leo
--- parrot/MANIFEST Mon Apr 12 15:43:05 2004
+++ parrot-leo/MANIFEST Mon Apr 12 18:41:07 2004
@@ -2596,6 +2596,7 @@
 t/op/rx.t []
 t/op/stacks.t []
 t/op/string.t []
+t/op/stringu.t[]
 t/op/time.t   []
 t/op/trans.t  []
 t/op/types.t  []
--- /dev/null   Fri Feb 28 14:27:28 2003
+++ parrot-leo/t/op/stringu.t   Mon Apr 12 18:40:40 2004
@@ -0,0 +1,57 @@
+#! perl -w
+# Copyright: 2001-2004 The Perl Foundation.  All Rights Reserved.
+# $Id$
+
+=head1 NAME
+
+t/op/stringu.t - Unicode String Test
+
+=head1 SYNOPSIS
+
+   % perl -Ilib t/op/stringu.t
+
+=head1 DESCRIPTION
+
+Tests Parrot's unicode string system.
+
+=cut
+#'
+
+use Parrot::Test tests = 4;
+use Test::More;
+
+output_is( 'CODE', OUTPUT, angstrom );
+chr S0, 0x212B
+print S0
+print \n
+end
+CODE
+\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom );
+set S0, \x{212b}
+print S0
+print \n
+end
+CODE
+\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom 2 );
+set S0, aa\x{212b}
+print S0
+print \n
+end
+CODE
+aa\xe2\x84\xab
+OUTPUT
+
+output_is( 'CODE', OUTPUT,  escaped angstrom 3 );
+set S0, aa\x{212b}-aa
+print S0
+print \n
+end
+CODE
+aa\xe2\x84\xab-aa
+OUTPUT
--- parrot/src/string.c Sun Apr 11 15:16:48 2004
+++ parrot-leo/src/string.c Mon Apr 12 18:40:29 2004
@@ -1626,24 +1626,28 @@
 type1 *curr1 = (type1 *)s1-strstart; \
 type2 *curr2 = (type2 *)s2-strstart; \
  \
-while( (_index++  minlen)  (*curr1 == *curr2) ) \
+while( (_index  minlen)  (*curr1 == *curr2) ) \
 { \
 ++curr1; \
 ++curr2; \
+++_index; \
 } \
+if (_index == minlen  s1-strlen == s2-strlen) { \
+result = 0; \
+break; \
+} \
+result = *curr1 - *curr2; \
  \
-*result = *curr1 - *curr2; \
- \
-if( !*result ) \
+if( !result ) \
 { \
 if( s1-strlen != s2-strlen ) \
 { \
-*result = s1-strlen  s2-strlen ? 1 : -1; \
+result = s1-strlen  s2-strlen ? 1 : -1; \
 } \
 } \
 else \
 { \
-*result = *result  0 ? 1 : -1; \
+result = result  0 ? 1 : -1; \
 } \
 } while(0)
@@ -1691,13 +1695,13 @@
 {
 case enum_stringrep_one:
 /* could use memcmp in this one case; faster?? */
-COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, 
cmp);
+COMPARE_STRINGS(Parrot_UInt1, Parrot_UInt1, s1, s2, 
cmp);
 break;
 case enum_stringrep_two:
-COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, 
cmp);
+COMPARE_STRINGS(Parrot_UInt2, Parrot_UInt2, s1, s2, 
cmp);
 break;
 case enum_stringrep_four:
-COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, 
cmp);
+COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt4, s1, s2, 
cmp);
 break;
 default:
 /* trouble! */
@@ -1731,18 +1735,18 @@
 if( smaller-representation == enum_stringrep_two )
 {
 COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt2,
-larger, smaller, cmp);
+larger, smaller, cmp);
 }
 else /* smaller-representation == enum_stringrep_one */
 {
 COMPARE_STRINGS(Parrot_UInt4, Parrot_UInt1,
-larger, smaller, cmp);
+larger, smaller, cmp);
 }
 }
 else /* larger-representation == enum_stringrep_two,
 

Re: cvs commit: parrot/ops math.ops

2004-04-12 Thread Dan Sugalski
At 6:02 PM +0200 4/12/04, Leopold Toetsch wrote:
Dan Sugalski [EMAIL PROTECTED] wrote:
   --- math.ops 23 Mar 2004 07:27:51 -  1.15
   +++ math.ops 12 Apr 2004 14:59:12 -  1.16
   @@ -601,6 +601,8 @@
=item Bmul(out INT, in INT, in INT)

   +=item Bmul(out INT, in INT, in NUM)
Seems to be a bit asymmetric when comparing to other math ops like
Cadd, where this variant is missing.
Yeah, it is. I'm still not 100% sure they should be in there, nor 
that we shouldn't have more of them for the other basic math ops. 
(That's one of the reasons they aren't in the ops list yet)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Status of the ICU problem...

2004-04-12 Thread Alberto Manuel Brandao Simoes
All tests successful, 84 subtests skipped.
Files=100, Tests=1498, 517 wallclock secs (263.66 cusr + 68.59 csys = 
332.25 CPU)

Cheers :-D
Alberto


Strings rationale

2004-04-12 Thread Jeff Clites
I'm going to write up some information on my view of strings, and the 
rationale behind it, so that there's a clear explanation that we can 
use for discussion. That will give us something more organized to talk 
about. It will probably take a day or two for me to get that done.

I'll also respond to Dan's concerns, but that will be easier to do once 
I've spelled out what I'm thinking, so that we can minimize problems 
due to miscommunication.

JEff



Re: Strings rationale

2004-04-12 Thread Dan Sugalski
At 10:14 AM -0700 4/12/04, Jeff Clites wrote:
I'm going to write up some information on my view of strings, and 
the rationale behind it, so that there's a clear explanation that we 
can use for discussion. That will give us something more organized 
to talk about. It will probably take a day or two for me to get that 
done.
As long as it doesn't essentially read Because all the cool kids are 
doing it or because it makes my life easier, which are the two 
common rationales--neither of those are sufficient. :)

I'll hold off editing the interface back to the way it was (and the 
way I want it) until you've had a chance to make your pitch, though.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


ICU Building On Win32 (was Re: [PATCH] Get string.c to compile in MS VC++)

2004-04-12 Thread Jonathan Worthington
Jeff Clites [EMAIL PROTECTED] wrote:
 On Apr 12, 2004, at 5:33 AM, Jonathan Worthington wrote:

  snip
  See if you have a .dat file (or a bunch of individual files) in
  blib/lib/icu/2.6.1 (relative to your parrot source root). If not, then
  that's what's going on. Right now, I have that path hard-coded--of
  course I need to pull that out into a config--but it probably means
  that either the data files aren't getting created, or just that they
  are in a different location. Glancing at your icu.pl patch, it may
  just
  be missing moving the .dat file (or, maybe creating it too).
 
  They are missing, and in fact weren't even being created.  Turns out
  I'd
  somehow managed to miss a line out in the makefile, namely the one
  that made
  the data.  D'oh.
 
  I added it in, but this gave rise to new problems.  The .mak file (in
  icu/source/data) was missing some paths so some of the tools were not
  being
  found.  That was easily fixed, and now it gets quite a way through
  making
  the data, until it hits a point where it starts giving errors like
  this:-
 
  --
  Making Locale Resource Bundle files
  ..\..\locales\root.txt:39: warning: %Collation could not be
  constructed from
  CollationElements - check context!
  ..\..\locales\root.txt:37: parse error. Stopped parsing with
  U_INVALID_FORMAT_ERROR
  couldn't parse the file ..\..\locales\root.txt.
  Error:U_INVALID_FORMAT_ERROR
  ..\..\locales\ar.txt:16: warning: %Collation could not be constructed
  from
  CollationElements - check context!
  ..\..\locales\ar.txt:14: parse error. Stopped parsing with
  U_INVALID_FORMAT_ERROR
  couldn't parse the file ..\..\locales\ar.txt.
  Error:U_INVALID_FORMAT_ERROR
  ..\..\locales\ca.txt:12: warning: %Collation could not be constructed
  from
  CollationElements - check context!
  ..\..\locales\ca.txt:10: parse error. Stopped parsing with
  U_INVALID_FORMAT_ERROR
  --
 
  Any ideas?

 This error was showing up on Linux, and I was able to get it to happen
 for me by running the genrb tool with a parameter (or env. variable)
 missing. (Probably, the cause in the Linux case was actually something
 else.)

 Take a look at my first post in the Build problems in i386 linux
 thread. One cause of this error is that the 'genrb' tool (built before
 this point) can't find a data file it needs--the file is
 icudt26b_ucadata.icu (possibly with a different prefix for you), and is
 probably in the icu/source/data/out/build directory. On Unix systems,
 it's located via the ICU_DATA env. variable (which apparently has to
 end with a slash), which the Makefile in icu/source/data sets up, or it
 can be passed via a -i argument to 'genrb' (either way, pointing to
 the directory containing that file). So take a look and see how that
 tool is being invoked in the build process, and whether a parameter is
 missing (or pointed to the wrong place).

 That's my bet for what's going on. (The Linux case was only failing on
 one if the files, but it sounds like you're failing on all of them,
 which is the behavior I'd expect if this is the problem.)

Yup, that was it.  There were a few other little issues with the makefile
that I had to deal with, but it all appears to be working now.  There are
only 3 tests failing, and as I remember they were ones that failed before
the big ICU patch.

I've attached the patches, and (fingers crossed) this will get Parrot going
on Win32 again.  Summary of changes:-

* Add source/allinone/all/all.dsp (which was moved to the attic previously).
* Add a modified source/allinone/allinone.dsw (which was moved to the attic
previously). Changes are due to the fact that we do not have everything the
full ICU tree would have.
* Modify config/gen/icu.pl to write the makefile entries for building ICU on
Win32 and ensure .dsp files have proper Win32 line endings (MS VC++ is very
fussy about this).
* Modify icu/source/data/makedata.mak to correct a few path issues and
remove parts relating to things we don't have on the ICU source tree.

Jonathan


win32icu.patch
Description: Binary data


icuwin32missing.patch
Description: Binary data


Re: ICU Building On Win32 (was Re: [PATCH] Get string.c to compile in MS VC++)

2004-04-12 Thread Dan Sugalski
At 8:46 PM +0100 4/12/04, Jonathan Worthington wrote:
I've attached the patches, and (fingers crossed) this will get Parrot going
on Win32 again.
Applied, thanks.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Strings rationale

2004-04-12 Thread Jeff Clites
On Apr 12, 2004, at 10:23 AM, Dan Sugalski wrote:

At 10:14 AM -0700 4/12/04, Jeff Clites wrote:
I'm going to write up some information on my view of strings, and the 
rationale behind it, so that there's a clear explanation that we can 
use for discussion. That will give us something more organized to 
talk about. It will probably take a day or two for me to get that 
done.
As long as it doesn't essentially read Because all the cool kids are 
doing it or because it makes my life easier, which are the two 
common rationales--neither of those are sufficient. :)
Of course. The argument will be that this model delivers semantics that 
match the concept that a string is trying to capture, and that it gives 
developers the tools that they need and want in working with them.

The only point in mentioning precedent is to indicate that the pros and 
cons of such an approach are well-know--that there aren't hidden 
gotchas.

But before I justify the model, I need to fully explain it. One can't 
disagree (or agree) with a model, or its goals, until it's clear what 
those are.

JEff



Re: Plans for string processing

2004-04-12 Thread Matt Fowles
Dan~

I know that you are not technically required to defend your position, 
but I would like an explanation of one part of this plan.

Dan Sugalski wrote:
4) We will *not* use ICU for core functions. (string to number or number 
to string conversions, for example)
Why not?  It seems like we would just be reinventing a rather large 
wheel here.

Matt


Re: [perl #28393] [PATCH] Tcl pmcs

2004-04-12 Thread Will Coleda
Did the makefile change make it in?

On Monday, April 12, 2004, at 04:57  AM, Leopold Toetsch wrote:

Will Coleda [EMAIL PROTECTED] wrote:

dyld: ./parrot Undefined symbols:
_Parrot_tclobject_morph
_Parrot_tclobject_set_pmc
Ah yes. That's ugly. So here we go:

0) The PMCs were pre-ICU. I've adapted them. Should I check it in or
send it back to you?
1) dynamic PMCs need a dynpmc flag on the class definition line. This
causes the PMC compiler to add additional code for dynamic loading:
  pmclass TclString extends tclobject dynpmc {

2) Nasty dependencies. I got around that by changing the Makefile like
so:
tclobject$(SO) : tclobject.c
$(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \
-I../include -I../classes \
-L../blib/lib -lparrot $
$(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@|'
cd ../runtime/parrot/dynext; ln -sf tclobject.so libtclobject.so
%$(SO) : %.c
$(LD) $(LD_SHARED) $(LD_SHARED_FLAGS) $(LDFLAGS) -Wl,-E -o $@ \
-I../include -I../classes \
-L../blib/lib -lparrot -L../runtime/parrot/dynext \
-ltclobject $
$(PERL) -MFile::Copy=cp -e 'cp q|$@|, q|../runtime/parrot/dynext/$@
(and disabling non-tcl shared classes for now)

That is for all tcl* but tclobject libtclobject.so is added as a
library. This might also need the LD_LIBRARY_PATH to contain
Fruntime/parrot/dynext.
I don't know how we do that platform independend.

It might be simpler to have a utility that copies all tcl*.c together
into one and chains the Parrot_lib_type_load() functions. So only one
shared lib would be loaded that registers all classes. Should be a
rather simple script.
E.g. merge-classes -o tcl-all.c tclobject.c tclstring.c ...

Then compile and loadlib only the tcl-all. This would need a
Parrot_lib_tcl-all_load() function that calls _load() for all contained
PMCs.
3) For now
$ cat tcl.pasm
  loadlib P10, tclobject
  print ok 1\n
  loadlib P11, tclstring
  print ok 2\n
  new P1, .TclString
  set P1, ok 3\n
  set S1, P1
  print S1
  end
$ parrot tcl.pasm
ok 1
ok 2
ok 3
leo


--
Will Coke Coledawill at coleda 
dot com



[perl #28502] [PATCH] dynclasses/README

2004-04-12 Thread via RT
# New Ticket Created by  Will Coleda 
# Please include the string:  [perl #28502]
# in the subject line of all future correspondence about this issue. 
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28502 


Here's an updated version of dynclasses/README that sums up recent 
notes, and PODifies the doc.

(It's a big enough change, I just attached the whole file.)



README
Description: Binary data

--
Will Coke Coledawill at coleda 
dot com

Re: [perl #28502] [PATCH] dynclasses/README

2004-04-12 Thread Will Coleda
Immediately after I sent this, it occurred to me that I was missing the 
dynpmc flag that Leo had just mentioned. Re-attachment.



README
Description: Binary data


On Monday, April 12, 2004, at 07:35  PM, Will Coleda (via RT) wrote:

# New Ticket Created by  Will Coleda
# Please include the string:  [perl #28502]
# in the subject line of all future correspondence about this issue.
# URL: http://rt.perl.org:80/rt3/Ticket/Display.html?id=28502 
Here's an updated version of dynclasses/README that sums up recent
notes, and PODifies the doc.
(It's a big enough change, I just attached the whole file.)

README
--
Will Coke Coledawill at coleda
dot com
--
Will Coke Coledawill at coleda 
dot com