RE: Windows compile problems

2001-10-25 Thread Mattia Barbon

On Wed, 24 Oct 2001, Brent Dax wrote:

Unfortunately, I can't figure out how to utilize it.  Including
windows.h causes a conflict with Parrot's definition of BOOL, including
winbase.h gives me a ton of syntax errors, and putting the declaration
It is not supported to #include a win* file unless you have already
included windows.h . 

Regards
Mattia




RE: Revamping the build system

2001-10-25 Thread Espen Harlinn

Brent Dax:
 What about little inline things?

 AUTO_OP sleep(i|ic) {
   #ifdef WIN32
   Sleep($1*1000);
   #else
   sleep($1);
   #endif
 }

As long as the file compiles on all platforms, I think it's logical to
consider it platform independant :-)

Brent Dax:
 Would you demand that that be put in a separate file?  (As a matter of
 fact it can't be--ops2c.pl isn't equipped for that sort of thing.)
 Where would you draw the line?
Place things that are decidedly platform specific in a separate directory


  # 3. create an initial SIMPLE makefile and a config.h for
  each supported platform/compiler combination

Brent Dax:
 Problem with that is, some platforms don't have make or have
 bad makes.
 Neither nmake nor pmake works well enough on Win32 (and dmake uses a
 different syntax).
Well, the current crop of makefiles for the Win32 platform isn't exactly
simple - and if you try a build using the dmake/Borland C++ Builder 5
combination you will find that some files obviously are out of date.

What I am thinking of is the situation where you don't have a Perl binary
and want to bootstrap the build process. Skip include file dependencies and
just get the makefile to build and link an initial binary capable of
executing a parrot binary for a platform independant make.

 VMS almost always uses mms or mmk (and
 even if they
 had a normal make, I dare you to write a Makefile that will run there
 and on other platforms).  Most Macs don't even have a command line or
 compiler, let alone make.  You won't find such things available on
 handhelds either.
Thats why I would like to see a sparate initial makefile for each
platform/compiler combination

 Personally, I think we should write a shell script (or equivalent) for
 each platform that simply invokes the compiler to build
 miniperl, and we
 can do whatever we need from there.

That would also be something likely to work, as long as the shell script is
written for a shell shipped with that platform.
In this case would like to see a separate shell script (or equivalent) for
each compiler/platform combination.


 A safe config.h could look like:

 typedef long INTVAL;
 typedef double FLOATVAL;
 typedef long opcode_t;

 #undef HAS_HEADER_*

 etc.  Something like that ought to work on any platform; if
 necessary we
 can use #ifdefs with OS symbols (#ifdef WIN32, etc.) to figure it out.
 All miniperl does is figure out whatever Configure figures
 out currently
 and builds everything.  (Also, we may want to write it so it
 looks for a
 Perl 5 or Perl 6 that's already installed and hands things off to that
 if possible.)

 When you think about it, how much functionality do we need?
 Do we need
 much of anything OS-dependent besides simple IO, -X operators and
 system() to emulate make?  Do we even need to be as smart as make?  Is
 there really a problem with stupidly rebuilding everything, even if it
 isn't all necessary?
No, I agree


 # I know this isn't hightech, but it works like a charm.
 #
 # 4. write all other build tools in Perl

 Great.  How are we going to do this?  We can't depend on having a
 working Perl around at the beginning of the build process.
A parrot binary is going to be platform independant - right ??
So what we want is tools as parrot binaries, and a miniparrot, possible
created using a native shell script, capable of executing them

 # 5. use uuids to identify packages, not name, this way my
 # MY::TextModule and
 # your MY::TextModule can be identified as two different
 # packages, OR require
 # that I do something like harlinn::no::MY::TextModule when I name my
 # packages/modules.

 Huh?  Oh, you're talking about namespace conflicts.  I don't think
 there's much we can do about that, except the official list
 on the CPAN
 we already have.

If we would like to create something for other languages besides Perl6 I
think some thought should be given to this.

 # To test for the presence of a particular library and
 # associated include
 # files maintain a list of filenames
 # for each supported platform/compiler combination. Like:
 #
 # ACE: LIB=E:\src\Corba\ACE_wrappers\bin\ace.lib;
 # INCLUDE=E:\src\Corba\ACE_wrappers;E:\src\Corba\ACE_wrappers\TAO
 # TCL: LIB=C:\Tcl\lib\tcl83.lib INCLUDE=C:\Tcl\include
 # DEFINES=WIN32;WINNT=1 // a comment
 # DB2: LIB=C:\SQLLIB\lib\db2api.lib;C:\SQLLIB\lib\db2cli.lib
 # INCLUDE=C:\SQLLIB\include
 #
 # and so on ...
 #
 # or in other words:
 # platform independent package name: LIB=[optional fullpath to
 # library[;optional fullpath to next library]]
 # INCLUDE=[optional fullpath of directory[;optional
 # fullpath to next
 # directory]]
 # DEFINES=NAME1=VALUE1;NAME2=VALUE2  // Comments
 #
 # My point is that the format of this file should be kept
 # really simple and
 # used during the next stage of the build process
 # to generate the final build. If a package is missing from
 # this file, then
 # it's not included in the final build.

 Huh?  I don't get what this is 

Re: Revamping the build system

2001-10-25 Thread Andy Dougherty

In perl.perl6.internals, you wrote:
Brent Dax [EMAIL PROTECTED] writes:

 What about little inline things?

 AUTO_OP sleep(i|ic) {
  #ifdef WIN32
  Sleep($1*1000);
  #else
  sleep($1);
  #endif
 }

This reminds me.  gcc is slowly switching over to writing code like that
as:

if (WIN32) {
Sleep($1*1000);
} else {
sleep($1);
}

or the equivalent thereof instead of using #ifdef.  If you make sure that
the values are defined to be 0 or 1 rather than just defined or not
defined, it's possible to write code like that instead.

If I recall correctly, Plan9's C compiler doesn't do #ifdef at all!
The perl5 source (#ifdef forest) was munged into the second form.

It may not be possible to use this in cases where the not-taken branch may
refer to functions that won't be prototyped on all platforms, depending on
the compiler, but there are at least some places where this technique can
be used, and it's worth watching out for.

Yes, a number of the #ifdef branches in perl5's pp_sys.c would
have this problem (odd structs present on some systems but not others,
for example).  Also the VMS code with $ signs often gives other compilers
heartburn.

(In the case above, I'd probably instead define a sleep function on WIN32
that calls Sleep so that the platform differences are in a separate file,
but there are other examples of things like this that are better suited to
other techniques.)

Yes, that's what perl5 traditionally often tried to do.  (See, for example,
the various defines in unixish.h:  fwrite1, Stat, Fstat, Fflush, Mkdir).
Of course perl5 itself hasn't even always followed that plan . . . .

-- 
Andy Dougherty  [EMAIL PROTECTED]
Dept. of Physics
Lafayette College, Easton PA 18042



RE: Revamping the build system

2001-10-25 Thread Brent Dax

Espen Harlinn:
# Brent Dax:
#  What about little inline things?
# 
#  AUTO_OP sleep(i|ic) {
#  #ifdef WIN32
#  Sleep($1*1000);
#  #else
#  sleep($1);
#  #endif
#  }
#
# As long as the file compiles on all platforms, I think it's logical to
# consider it platform independant :-)

AUTO_OP sleep(i|ic) {
#ifdef WIN32
SleepEx($1*1000, NULL);
#endif
#ifdef VMS
#ifdef __VAX
proc_sleep($1);
#else
proc_sleep2($1, NULL);
#endif
#endif
#ifdef MACOS
process_pause($1*100);
#endif

sleep($1);
}

Is that platform-independent?  (No, I'm not saying that's what's needed
to do sleep, just giving an example.  But look through the Perl 5 source
and you'll find things that make this look pretty.)

# Brent Dax:
#  Would you demand that that be put in a separate file?  (As
# a matter of
#  fact it can't be--ops2c.pl isn't equipped for that sort of thing.)
#  Where would you draw the line?
# Place things that are decidedly platform specific in a
# separate directory

Fair enough.

#   # 3. create an initial SIMPLE makefile and a config.h for
#   each supported platform/compiler combination
#
# Brent Dax:
#  Problem with that is, some platforms don't have make or have
#  bad makes.
#  Neither nmake nor pmake works well enough on Win32 (and dmake uses a
#  different syntax).
# Well, the current crop of makefiles for the Win32 platform
# isn't exactly
# simple - and if you try a build using the dmake/Borland C++ Builder 5
# combination you will find that some files obviously are out of date.
#
# What I am thinking of is the situation where you don't have a
# Perl binary
# and want to bootstrap the build process. Skip include file
# dependencies and
# just get the makefile to build and link an initial binary capable of
# executing a parrot binary for a platform independant make.

But once again, we can't depend on make existing.  That's why I'm
suggesting shell scripts--virtually all platforms have something like
them.  Even Macs have AppleScript.

#  VMS almost always uses mms or mmk (and
#  even if they
#  had a normal make, I dare you to write a Makefile that will
# run there
#  and on other platforms).  Most Macs don't even have a
# command line or
#  compiler, let alone make.  You won't find such things available on
#  handhelds either.
# Thats why I would like to see a sparate initial makefile for each
# platform/compiler combination

That seems like a lot of extra work.  Do we really want to have separate
'install' (for lack of a better name) scripts where all we did was
Cs/$compiler_name_1/$compiler_name_2/g?

#  Personally, I think we should write a shell script (or
# equivalent) for
#  each platform that simply invokes the compiler to build
#  miniperl, and we
#  can do whatever we need from there.
#
# That would also be something likely to work, as long as the
# shell script is
# written for a shell shipped with that platform.
# In this case would like to see a separate shell script (or
# equivalent) for
# each compiler/platform combination.

That the script would be written for a shell on that platform is kinda
assumed.  Once again, I think that most compilers' calling semantics are
similar enough that we will often just have to change the name of the
command, so why add extra scripts to maintain?

#  A safe config.h could look like:
# 
#  typedef long INTVAL;
#  typedef double FLOATVAL;
#  typedef long opcode_t;
# 
#  #undef HAS_HEADER_*
# 
#  etc.  Something like that ought to work on any platform; if
#  necessary we
#  can use #ifdefs with OS symbols (#ifdef WIN32, etc.) to
# figure it out.
#  All miniperl does is figure out whatever Configure figures
#  out currently
#  and builds everything.  (Also, we may want to write it so it
#  looks for a
#  Perl 5 or Perl 6 that's already installed and hands things
# off to that
#  if possible.)
# 
#  When you think about it, how much functionality do we need?
#  Do we need
#  much of anything OS-dependent besides simple IO, -X operators and
#  system() to emulate make?  Do we even need to be as smart
# as make?  Is
#  there really a problem with stupidly rebuilding everything,
# even if it
#  isn't all necessary?
# No, I agree

Good, we agree on something.  :^)

#  # I know this isn't hightech, but it works like a charm.
#  #
#  # 4. write all other build tools in Perl
# 
#  Great.  How are we going to do this?  We can't depend on having a
#  working Perl around at the beginning of the build process.
# A parrot binary is going to be platform independant - right ??
# So what we want is tools as parrot binaries, and a
# miniparrot, possible
# created using a native shell script, capable of executing them

If you mean bytecode, that's true I suppose.  At the very beginning of
the build, all we can depend on is $cc and shell (or equivalent)
scripts.

#  # 5. use 

Re: Windows compile problems

2001-10-25 Thread Andy Dougherty

In perl.perl6.internals, you wrote:
On Wed, 24 Oct 2001, Brent Dax wrote:

Unfortunately, I can't figure out how to utilize it.  Including
windows.h causes a conflict with Parrot's definition of BOOL, including

Then we probably should change Parrot's name of BOOL.  I'd
suggest Bool_t, modeled after perl5's Size_t (and similar types).

Perl5 could actually use Bool_t, so if anyone implements such a test,
back-porting it to perl5 would be appreciated.

-- 
Andy Dougherty  [EMAIL PROTECTED]
Dept. of Physics
Lafayette College, Easton PA 18042



Chr Ord, v0.4

2001-10-25 Thread James Mastros

Hey all.
  This is version 0.4 of my chr and ord patch for parrot.  Included is a
patch, a test file, and an example.

I don't really see any major problems with this version, at least that
aren't implicit in the current Way Of Things with strings.  (That is, native
not being explicitly anything, and the encodings list being static.)

Chr and Ord aren't implemented for utf8 and utf16, only for native and
utf32.  I'd much appreciate it if sombody who knew what they were doing did
this.

The tests are woefuly incomplete.

The style of the example is poor.

-=- James Mastros


Index: core.ops
===
RCS file: /home/perlcvs/parrot/core.ops,v
retrieving revision 1.18
diff -u -r1.18 core.ops
--- core.ops2001/10/24 14:54:54 1.18
+++ core.ops2001/10/25 13:38:35
@@ -991,6 +991,43 @@
 $1 = string_substr(interpreter, $2, $3, $4, $1);
 }
 
+
+
+=item Bord(i, s)
+=item Bord(i, sc)
+
+Set $1 to the codepoint of the first character in $2.
+
+=cut
+
+AUTO_OP ord(i, s|sc) {
+  $1 = string_ord($2);
+}
+
+
+
+=item Bchr(s, i)
+=item Bchr(s, ic)
+
+Set $1 to a single-character string with the Unicode codepoint $2.
+
+=cut
+
+AUTO_OP chr(s, i|ic) {
+$1 = string_chr(interpreter, $2, enc_utf32, $1);
+}
+
+
+
+=item Bchr(s, i|ic, i|ic)
+
+Set $1 to a single-character string with the codepoint $2 in the encoding $3.
+
+=cut
+
+AUTO_OP chr(s, i|ic, i|ic) {
+$1 = string_chr(interpreter, $2, $3, $1);
+}
 
 =back
 
Index: string.c
===
RCS file: /home/perlcvs/parrot/string.c,v
retrieving revision 1.15
diff -u -r1.15 string.c
--- string.c2001/10/22 23:34:47 1.15
+++ string.c2001/10/25 13:38:35
@@ -168,6 +168,32 @@
 return (ENC_VTABLE(s1)-compare)(s1, s2);
 }
 
+/*=for api string string_ord
+ * get the codepoint of the first char of the string.
+ * (FIXME: Document in docs/strings.pod)
+ */
+INTVAL
+string_ord(STRING* s) {
+   return (ENC_VTABLE(s)-ord)(s);
+}
+
+/*=for api string string_chr
+ * Get a string with the first char having codepoint code, in the encoding 
+ * enc, and store it in d.  Also return d.
+ * Allocate memory for d if necessary.
+ */
+STRING*
+string_chr(struct Parrot_Interp *interpreter, INTVAL code, encoding_t enc, STRING** 
+d) {
+STRING *dest;
+if (!d || !*d) {
+dest = string_make(interpreter, NULL, 0, enc, 0, 0);
+}
+else {
+dest = *d;
+}
+return (ENC_VTABLE(dest)-chr)(code, dest);
+}
+
 /*
  * Local variables:
  * c-indentation-style: bsd
@@ -176,9 +202,4 @@
  * End:
  *
  * vim: expandtab shiftwidth=4:
-*/
-
-
-
-
-
+ */
Index: strnative.c
===
RCS file: /home/perlcvs/parrot/strnative.c,v
retrieving revision 1.19
diff -u -r1.19 strnative.c
--- strnative.c 2001/10/22 23:34:47 1.19
+++ strnative.c 2001/10/25 13:38:35
@@ -105,6 +105,32 @@
 return cmp;
 }
 
+/*=for api string_native string_native_ord
+   returns the value of the first byte of the string.
+ */
+INTVAL
+string_native_ord (STRING* s) {
+   return (INTVAL)*(char *)(s-bufstart);
+}
+
+/*=for api string_native string_native_chr
+   return a string whose first character is given by the INTVAL.
+*/
+STRING*
+string_native_chr (INTVAL code, STRING* dest) {
+   if (dest-encoding-which != enc_native) {
+   /* It is now, matey. */
+   dest-encoding = (Parrot_string_vtable[enc_native]);
+   }
+
+   string_grow(dest, 1);
+   *(char *)dest-bufstart = (char)code;
+   dest-strlen = 1;
+   dest-bufused = 1;
+
+   return dest;
+}
+
 /*=for api string_native string_native_vtable
return the vtable for the native string
 */
@@ -118,6 +144,8 @@
string_native_chopn,
string_native_substr,
string_native_compare,
+string_native_ord,
+   string_native_chr,
 };
 return sv;
 }
Index: strutf32.c
===
RCS file: /home/perlcvs/parrot/strutf32.c,v
retrieving revision 1.4
diff -u -r1.4 strutf32.c
--- strutf32.c  2001/10/22 23:34:47 1.4
+++ strutf32.c  2001/10/25 13:38:35
@@ -102,6 +102,32 @@
 return cmp;
 }
 
+/*=for api string_native string_utf32_ord
+   returns the value of the first byte of the string.
+ */
+INTVAL
+string_utf32_ord (STRING* s) {
+   return (INTVAL)*(utf32_t *)(s-bufstart);
+}
+
+/*=for api string_utf32 string_utf32_chr
+   return a string whose first character is given by the INTVAL.
+*/
+STRING*
+string_utf32_chr (INTVAL code, STRING* dest) {
+   if (dest-encoding-which != enc_utf32) {
+   /* It is now, matey. */
+   dest-encoding = (Parrot_string_vtable[enc_utf32]);
+   }
+
+   string_grow(dest, 1);
+   *(utf32_t *)dest-bufstart = (utf32_t)code;
+   dest-strlen = 1;
+   dest-bufused = 4;
+
+   return dest;
+}
+
 /*=for api 

Re: Are threads what we really want ???

2001-10-25 Thread Dan Sugalski

At 02:28 AM 10/25/2001 +0200, Espen Harlinn wrote:
Instead of thinking about multiple threads, one could think about multiple
execution contexts. Each instance of an object must belong to one and only
one execution context. Each execution context has an attached security
context and a security manager.

One actually needs to think about both. Threads and execution contexts 
aren't required to be related. You could have multiple threads in a single 
execution context (though it works badly with high-level languages as we 
found with perl 5's pthread model, but that's a separate issue) or multiple 
execution contexts with a single thread, which is what happens when you 
allow a process to create multiple interpreters.

Parrot will support the single-thread/multiple-interpreter and 
multiple-thread/multiple-interpreter models. (Where there's a 1:1 
relationship between those multiple threads and multiple interpreters)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Windows compile problems

2001-10-25 Thread Dan Sugalski

At 08:59 AM 10/25/2001 -0400, Andy Dougherty wrote:
In perl.perl6.internals, you wrote:
 On Wed, 24 Oct 2001, Brent Dax wrote:
 
 Unfortunately, I can't figure out how to utilize it.  Including
 windows.h causes a conflict with Parrot's definition of BOOL, including

Then we probably should change Parrot's name of BOOL.  I'd
suggest Bool_t, modeled after perl5's Size_t (and similar types).

Sounds like a good idea.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




HP-UX 11.00 still not happy

2001-10-25 Thread H . Merijn Brand

l1:/pro/3gl/CPAN/parrot-current 102  make distclean
perl -MExtUtils::Manifest=filecheck -le 'xtUtils::Manifest::Quiet=1;unlink  for
filecheck()'
Undefined subroutine xtUtils::Manifest::Quiet called at -e line 1.
make: *** [distclean] Error 255
l1:/pro/3gl/CPAN/parrot-current 103  rm -f *.o *.a
l1:/pro/3gl/CPAN/parrot-current 104  perl Co
Config_pm.in  Configure.pl
l1:/pro/3gl/CPAN/parrot-current 104  perl Configure.pl --default
:
l1:/pro/3gl/CPAN/parrot-current 105  make test_prog
perl vtable_h.pl
:
cc -DDEBUGGING -Ae -D_HPUX_SOURCE -I/pro/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64  -I./include   -o stacks.o -c stacks.c
cc: stacks.c, line 105: warning 604: Pointers are not assignment-compatible.
:
cc -DDEBUGGING -Ae -D_HPUX_SOURCE -I/pro/local/include -D_LARGEFILE_SOURCE 
-D_FILE_OFFSET_BITS=64  -I./include   -o vtable_ops.o -c vtable_ops.c
cc: vtable_ops.c, line 37: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 37: error 1533: Illegal function call.
cc: vtable_ops.c, line 43: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 43: error 1533: Illegal function call.
cc: vtable_ops.c, line 49: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 49: error 1533: Illegal function call.
cc: vtable_ops.c, line 55: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 55: error 1533: Illegal function call.
cc: vtable_ops.c, line 61: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 61: error 1533: Illegal function call.
cc: vtable_ops.c, line 67: error 1534: Illegal to use a function pointer as + 
operand where an arithmetic type is required.
cc: vtable_ops.c, line 67: error 1533: Illegal function call.
make: *** [vtable_ops.o] Error 1
l1:/pro/3gl/CPAN/parrot-current 106  cat .timestamp
1003950001
Wed Oct 24 19:00:01 2001 UTC

(time of this cvs update)
l1:/pro/3gl/CPAN/parrot-current 107 

-- 
H.Merijn BrandAmsterdam Perl Mongers (http://www.amsterdam.pm.org/)
using perl-5.6.1, 5.7.2  629 on HP-UX 10.20  11.00, AIX 4.2, AIX 4.3,
  WinNT 4, Win2K pro  WinCE 2.11.  Smoking perl CORE: [EMAIL PROTECTED]
http:[EMAIL PROTECTED]/   [EMAIL PROTECTED]
send smoke reports to: [EMAIL PROTECTED], QA: http://qa.perl.org




String rationale

2001-10-25 Thread Dan Sugalski

'Kay, here's the string background info I promised. If things are missing 
or unclear let me know and I'll fix it up until it is.


==Cut here with a very sharp knife===
=head1 TITLE

A parrot string backgrounder

=head1 Overview

Strings, in parrot, are compartmentalized, the same way so much else
in Parrot is compartmentalized. There's no single 'blessed' string
encoding--the closest we come is Unicode, and only as an encoding of
last resort. (Unicode's not a good interchange format, as it loses
information)

=head2 From the Outside

On the outside, the interpreter considers strings to be a sort of
black box. The only bits of the interpreter that much care about the
string data are the regex engine parts, and those only operate on
fixed-sized data.

The interpreter can only peek inside a string if that string is of
fixed length, and the interpreter doesn't actually care about the
character set the data is in. All character sets must provide a way to
transcode to Unicode, and all character encodings must provide a way
to turn their characters into fixed-sized entities. (The size may be
8, 16, or 32 bits as need be for the character set)

Character sets may provide a way to transcode to non-Unicode sets, for
example from EBCDIC to ASCII, but this is optional. If none is
provided a transcoding from one set to another will use Unicode as an
intermediate form, complete with potential data loss.

All character sets must provide the character lists the regular
expression engine needs for the base character classes. (space, word,
and digit characters) This permits the regular expression code to
operate on the contents of a string without needing to know its actual
character set.

=head2 From the Inside

=head2 Technical details

The base string structure looks like:

   struct parrot_string {
 void *bufstart;
 INTVAL buflen;
 INTVAL bufused;
 INTVAL flags;
 INTVAL strlen;
 STRING_VTABLE* encoding;
 INTVAL type;
 INTVAL lanugage;
   }


=head2 Fields

=over 4

=item bufstart

Where the string buffer starts

=item buflen

How big the buffer is

=item bufused

How much of the buffer's used

=item flags

A variety of flags. Low 16 bits reserved to Parrot, the rest are free
for the string encoding library to use

=item strlen

How long the string is in code points. (Note that, for encodings that
are more than 8 bits per code point, or of variable length, this will
Enot be the same as the buffer used.

=item encoding

Pointer to the library that handles the string encoding. Encoding is
basically how the stream of bytes pointed to by Cbufstart can be
turned into a stream of 32-bit codepoints. Examples include UTF-8, Big
5, or Shift JIS. Unicode, Ascii, or EBCDIC are Bnot encodings.first

=item type

What the character set or type of data is encoded in the buffer. This
includes things like ASCII, EBCDIC, Unicode, Chinese Traditional,
Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a
combination of type and encoding. I'll update the doc as soon as I can
reasonablty separate the two)

=item language

The language the string is in. This is essential for proper sorting,
if a sort function wants to be language-aware. Just an encoding/type
is insufficient for proper sorting--for example knowing a string is
UTF-32/Unicode doesn't tell you how the data should be ordered. This
is especially important for those languages that overlap in the
Unicode code space. Japanese and Chinese, for example, share many of
the Unicode code points but sort those code points differently.

=back

Libraries for processing character sets and encodings are shareable
libraries, and may be loaded on demand. They are looked up and
referenced by name. An identifying number is given to them at load
time and shouldn't be used outside the currently running
process. (EBCDIC might be character set 3 in one run and set 7 in
another)

The native encoding and character set is Inever considered a 'real'
encoding or character set. It just specifies what the default is if
nothing else is specified, but when bytecode is frozen to disk the
actual encoding or set name will be used instead.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: String rationale

2001-10-25 Thread Sam Tregar

On Thu, 25 Oct 2001, Dan Sugalski wrote:

 The only bits of the interpreter that much care about the
 string data are the regex engine parts, and those only operate on
 fixed-sized data.

Care to elaborate?  I thought the mandate from Larry was to have regexes
compile down to a stream of string ops.  Doesn't that mean it should work
regardless of the encoding of the string?

 The interpreter can only peek inside a string if that string is of
 fixed length, and the interpreter doesn't actually care about the
 character set the data is in.

Why is this necessary at all?  Wouldn't it be prefereable to have all
access go through the String vtable regardless of the encoding?

 =item encoding

 Pointer to the library that handles the string encoding. Encoding is
 basically how the stream of bytes pointed to by Cbufstart can be
 turned into a stream of 32-bit codepoints. Examples include UTF-8, Big
 5, or Shift JIS. Unicode, Ascii, or EBCDIC are Bnot encodings.first

.first?

Aside from the above, this was a nice refresher.

-sam




Re: String rationale

2001-10-25 Thread Dan Sugalski

At 12:19 PM 10/25/2001 -0400, Sam Tregar wrote:
On Thu, 25 Oct 2001, Dan Sugalski wrote:

  The only bits of the interpreter that much care about the
  string data are the regex engine parts, and those only operate on
  fixed-sized data.

Care to elaborate?  I thought the mandate from Larry was to have regexes
compile down to a stream of string ops.  Doesn't that mean it should work
regardless of the encoding of the string?

Since the encoding just determines how the abstract code point numbers are 
represented in bytes, I'm OK with requiring strings we process internally 
to be in a fixed-size version.

And regexes will be done with a stream of parrot opcodes, presuming that's 
not too slow. There'll be ops to reference the code point at position X in 
a string and check to see if its in a list of other code points and 
suchlike things. Basically we'll peek under the covers, but only for 
fixed-length strings.

  The interpreter can only peek inside a string if that string is of
  fixed length, and the interpreter doesn't actually care about the
  character set the data is in.

Why is this necessary at all?  Wouldn't it be prefereable to have all
access go through the String vtable regardless of the encoding?

Speed. We're going to take something of a hit decomposing to ops as it 
is--if we can safely cheat, I'm OK with mandating it to be required. :)

  =item encoding
 
  Pointer to the library that handles the string encoding. Encoding is
  basically how the stream of bytes pointed to by Cbufstart can be
  turned into a stream of 32-bit codepoints. Examples include UTF-8, Big
  5, or Shift JIS. Unicode, Ascii, or EBCDIC are Bnot encodings.first

.first?

Trailing buffer gook.

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: Windows compile problems

2001-10-25 Thread Russ Allbery

Dan Sugalski [EMAIL PROTECTED] writes:
 At 08:59 AM 10/25/2001 -0400, Andy Dougherty wrote:

 Then we probably should change Parrot's name of BOOL.  I'd
 suggest Bool_t, modeled after perl5's Size_t (and similar types).

 Sounds like a good idea.

IIRC, all types ending in _t are reserved by POSIX and may be used without
warning in later versions of the standard.  (This comes up not
infrequently in some of the groups I read, but I unfortunately don't have
a copy of POSIX to check for myself and be sure.)

-- 
Russ Allbery ([EMAIL PROTECTED]) http://www.eyrie.org/~eagle/



Re: Windows compile problems

2001-10-25 Thread Dan Sugalski

At 12:24 PM 10/25/2001 -0700, Russ Allbery wrote:
Dan Sugalski [EMAIL PROTECTED] writes:
  At 08:59 AM 10/25/2001 -0400, Andy Dougherty wrote:

  Then we probably should change Parrot's name of BOOL.  I'd
  suggest Bool_t, modeled after perl5's Size_t (and similar types).

  Sounds like a good idea.

IIRC, all types ending in _t are reserved by POSIX and may be used without
warning in later versions of the standard.  (This comes up not
infrequently in some of the groups I read, but I unfortunately don't have
a copy of POSIX to check for myself and be sure.)

Ah, good point.

Maybe we should go with _p as a suffix rather than _t. (the p for parrot, 
of course)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




[PATCH] Exceptions as promised...

2001-10-25 Thread jgoff

The included patch requires a new file t/op/exceptions.t, which tests basic exception 
handling, in this case divide-by-zero.
Patch was generated against latest CVS, but it shouldn't matter -that- much.

-Jeff
[EMAIL PROTECTED]




diff --recursive -C 2 parrot_cvs/MANIFEST parrot/MANIFEST
*** parrot_cvs/MANIFEST Wed Oct 24 07:36:57 2001
--- parrot/MANIFEST Wed Oct 24 07:37:22 2001
***
*** 108,111 
--- 108,112 
  t/op/basic.t
  t/op/bitwise.t
+ t/op/exception.t
  t/op/integer.t
  t/op/number.t
Only in parrot/: Makefile
diff --recursive -C 2 parrot_cvs/Parrot/Assembler.pm parrot/Parrot/Assembler.pm
*** parrot_cvs/Parrot/Assembler.pm  Wed Oct 24 07:36:57 2001
--- parrot/Parrot/Assembler.pm  Wed Oct 24 07:54:22 2001
***
*** 110,114 
  =cut
  
! my(%type_to_suffix)=('I'='i',  'N'='n',
   'S'='s',  'P'='p',
   'i'='ic', 'n'='nc',
--- 110,115 
  =cut
  
! my(%type_to_suffix)=('E'='e',
!'I'='i',  'N'='n',
   'S'='s',  'P'='p',
   'i'='ic', 'n'='nc',
***
*** 923,927 
  #
  
! if  (m/^([INPS])\d+$/) {   # a register.
push @arg_t,lc($1);
  } elsif (m/^\[([a-z]+):(\d+)\s*\]$/) { # string constant
--- 924,928 
  #
  
! if  (m/^([EINPS])\d+$/) {  # a register.
push @arg_t,lc($1);
  } elsif (m/^\[([a-z]+):(\d+)\s*\]$/) { # string constant
***
*** 945,949 
#
  
!   my @grep_ops = grep($_ =~ /^$opcode(?:_(?:(?:[ins]c?)|p))+$/, keys(%opcodes));
  
foreach my $op (@grep_ops) {
--- 946,950 
#
  
!   my @grep_ops = grep($_ =~ /^$opcode(?:_(?:(?:[eins]c?)|p))+$/, keys(%opcodes));
  
foreach my $op (@grep_ops) {
***
*** 1056,1059 
--- 1057,1061 
  
  my %rtype_map = (
+   e = E,
i = I,
n = N,
***
*** 1092,1100 
  #
  
! if($rtype eq I || $rtype eq N || $rtype eq P || $rtype eq S) {
# its a register argument
  
!   $args[$_] =~ s/^[INPS](\d+)$/$1/i
! or error(Expected m/[INPS]\\d+/, but got '$args[$_]'!, $file, $line);
  
error(Register $1 out of range (should be 0-31) in '$opcode',$file,$line) if 
$1  0 or $1  31;
--- 1094,1102 
  #
  
! if($rtype eq E || $rtype eq I || $rtype eq N || $rtype eq P || $rtype eq 
S) {
# its a register argument
  
!   $args[$_] =~ s/^[EINPS](\d+)$/$1/i
! or error(Expected m/[EINPS]\\d+/, but got '$args[$_]'!, $file, $line);
  
error(Register $1 out of range (should be 0-31) in '$opcode',$file,$line) if 
$1  0 or $1  31;
Only in parrot/Parrot: Config.pm
Only in parrot/Parrot: Types.pm
diff --recursive -C 2 parrot_cvs/Types_pm.in parrot/Types_pm.in
*** parrot_cvs/Types_pm.in  Wed Oct 24 07:36:57 2001
--- parrot/Types_pm.in  Wed Oct 24 07:39:58 2001
***
*** 35,38 
--- 35,39 
  
  my %how_to_pack = (
+ E  = $pack_type{op},
  I  = $pack_type{op},
  i  = $pack_type{op},
Only in parrot/classes: intclass.o
diff --recursive -C 2 parrot_cvs/config_h.in parrot/config_h.in
*** parrot_cvs/config_h.in  Wed Oct 24 07:36:57 2001
--- parrot/config_h.in  Wed Oct 24 07:53:11 2001
***
*** 24,31 
--- 24,33 
  #define FRAMES_PER_PMC_REG_CHUNK FRAMES_PER_CHUNK
  #define FRAMES_PER_NUM_REG_CHUNK FRAMES_PER_CHUNK
+ #define FRAMES_PER_EXC_REG_CHUNK FRAMES_PER_CHUNK
  #define FRAMES_PER_INT_REG_CHUNK FRAMES_PER_CHUNK
  #define FRAMES_PER_STR_REG_CHUNK FRAMES_PER_CHUNK
  
  #define MASK_STACK_CHUNK_LOW_BITS ${stacklow}
+ #define MASK_EXC_CHUNK_LOW_BITS ${intlow}
  #define MASK_INT_CHUNK_LOW_BITS ${intlow}
  #define MASK_NUM_CHUNK_LOW_BITS ${numlow}
diff --recursive -C 2 parrot_cvs/core.ops parrot/core.ops
*** parrot_cvs/core.ops Wed Oct 24 07:36:57 2001
--- parrot/core.ops Wed Oct 24 07:56:30 2001
***
*** 120,123 
--- 120,127 
  
  
+ =item Bset(e, i)
+ 
+ =item Bset(i, e)
+ 
  =item Bset(i, i)
  
***
*** 136,141 
  =cut
  
  
! AUTO_OP set(i, i|ic) {
$1 = $2;
  }
--- 140,148 
  =cut
  
+ AUTO_OP set(e, i) {
+   $1 = $2;
+ }
  
! AUTO_OP set(i, e|i|ic) {
$1 = $2;
  }
***
*** 684,688 
  
  AUTO_OP div(i, i|ic, i|ic) {
!   $1 = $2 / $3;
  }
  
--- 691,701 
  
  AUTO_OP div(i, i|ic, i|ic) {
!   INTVAL z = $3;
! 
!   if(z == 0) {
! interpreter-exc_reg-registers[0] = 1;
!   } else {
! $1 = $2 / $3;
!   }
  }
  
***
*** 1504,1507 
--- 1517,1522 
  
  
+ =item Bpope()
+ 
  =item Bpopi()
  
***
*** 1517,1520 
--- 1532,1539 
  =cut
  
+ AUTO_OP pope() {
+   Parrot_pop_e(interpreter);
+ }
+ 
  AUTO_OP popi() {
Parrot_pop_i(interpreter);
***
*** 1536,1539 
--- 1555,1560 
  
  
+ =item 

[PATCHES] Exception idea

2001-10-25 Thread Jeffrey Goff

[Apologies if this is a repeat, but the last message was early Wed. and
hasn't gone through yet]

The promised patches (against Wednesday morning's CVS-latest) are
attached to this message.
[You might need to reverse the first patch, against MANIFEST]

These patches add the following:

a) Exception register stack (E0-E31 for the moment, will trim down to
just E0)
b) div_i_i_ic altered to raise an exception when ic==0 (Which is to say,
sets E0 to 1)
c) New instructions set_e_i, set_i_e, push_e, pop_e
d) New test file t/op/exception.t, updated MANIFEST file

The tests exercise the new instructions and validate that div_i_i_ic
properly raises an exception.

The patches are a -very- crude form of exception handling. The constants
for errors like DIVIDE_BY_ZERO should probably be imported as manifest
constants, but that change would have been beyond the scope of the patch
:) Sample code that catches the divide-by-zero exception is in the
t/op/exception.t test #4, but here's a better explanation (Code uses
instructions that aren't implemented yet):

pushe# Save the current exceptions
set I2,5
div I1,I2,0  # This would ordinarily trigger a
coredump. Not now.
eq E0,DIVIDE_BY_ZERO,CATCH_EXCEPTION # Not in the current patch, but
easy to add
pope # Restore the exception stack

Rather than implementing a static set of flags in some sort of exception
register, each exception becomes an integer constant that can be tested
against. This leaves plenty of expansion room ((2**31)-1 possible
exceptions, assuming they're all negative) with the slight inconvenience
of not being able to test for a bitwise-or of exception flags. I don't
see this as being a major inconvenience, as most of the time you'll be
testing for a specific exception, at least at the assembler level.

The patch is incomplete, but then, so is the list of instructions that
can raise exceptions. This way we have a mechanism in place to handle
I/O exceptions when they're implemented (And I'm planning to work on
instructions such as open_i_s, read_i, close_i over the weekend). For
instance, an open I0,foo instruction (Just an idea, syntax will
likely be very different) would be able to set constants such as
FILE_NOT_FOUND and such.

Since we're in assembler here, I'm not sure if a single instruction
should throw multiple exceptions, and it probably shouldn't -anyway-. In
that case, we could use E1-E31 for the others, but I feel that a single
instruction should throw only one of a limited range of exceptions. For
instance open_i_sc should only throw one of (FILE_NOT_FOUND,
NO_PERMISSION, FILE_READ_ONLY, ...).

I did consider using a bitfield of exceptions, but found it too
limiting. Also, the only benefit I can see of doing this is being able
to test for multiple exceptions at the same time. It isn't worth
limiting the number of flags to 32 or whatever just to be able to handle
this rare case.

As usual, comments, criticisms, and questions more than welcome.

-Jeff
[EMAIL PROTECTED] [EMAIL PROTECTED]

 exception.diff
 exception.t


Re: [PATCHES] Exception idea

2001-10-25 Thread Dan Sugalski

At 10:34 AM 10/25/2001 -0400, Jeffrey Goff wrote:
pope # Restore the exception stack

I've been thinking about going with an exception stack rather than a set of 
exception registers, but there's something awfully compelling about an 
opcode named pope... :)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Quick todo list

2001-10-25 Thread Dan Sugalski

Here's a list of what I'm going to try and get done really soon (like in 
the next day or so)

*) Toss that stupid interpreter parameter. Going with thread-local storage 
instead. (And I know this is going to make Win32 unhappy)
*) Split the generic stack into a temp stack and control stack
*) Define parameter passing conventions
*) Define the exception handling mechanism
*) Simple open/read/write/close for files

(Why yes, I do have a lot of good coffee and a tin of caffeinated 
pepermints. Why do you ask? :)

Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Re: String rationale

2001-10-25 Thread Tom Hughes

In message [EMAIL PROTECTED]
  Dan Sugalski [EMAIL PROTECTED] wrote:

 =item type
 
 What the character set or type of data is encoded in the buffer. This
 includes things like ASCII, EBCDIC, Unicode, Chinese Traditional,
 Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a
 combination of type and encoding. I'll update the doc as soon as I can
 reasonablty separate the two)

Isn't this going to need to be a vtable pointer like encoding is? Only
some things (like character classification and at least some transcoding
tasks) will be character set based rather than encoding based.

Other than that it looked quite good and I'll probably start looking at
bending the existing code into the new model over the weekend.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/




[PATCH] Making Win32 work

2001-10-25 Thread Brent Dax

With the patch attached, all tests pass on Win32.

Well, except for the fact that classes\intclass.obj gets created as
.\intclass.obj, forcing you to manually copy it to the right place.
Ugh.  And examples\assembly\mops.obj has the same problem.  And there
are 11 warnings in intclass.c that I don't want to bother to fix.
(classes\intclass.c(16) : warning C4716: 'Parrot_int_type' : must
return a value and such.)  Other than that, though, it works fine.

--Brent Dax
[EMAIL PROTECTED]
Configure pumpking for Perl 6

When I take action, I’m not going to fire a $2 million missile at a $10
empty tent and hit a camel in the butt.
--Dubya


--- ..\..\parrot-cvs\parrot\make_vtable_ops.pl  Sun Oct 21 09:47:10 2001
+++ make_vtable_ops.pl  Thu Oct 25 17:00:10 2001
@@ -1,35 +1,52 @@
 use Parrot::Vtable;
 my %vtable = parse_vtable();
 
+print #define VTABLE_CALL_TYPE(func, type) ((op_func_t)((INTVAL)func + 
+(INTVAL)type))\n\n;
+
 while (DATA) {
 next if /^#/ or /^$/;
 my @params = split;
 my $op = $params[1];
 my $vtable_entry = $params[2] || $op;
+
 die Can't find $vtable_entry in vtable, line $.\n
 unless exists $vtable{$vtable_entry};
+
 print AUTO_OP $params[1] (.(join , , (p)x$params[0]).) {\n;
-print \t(\$2-vtable-$vtable_entry;
-print multimethod($vtable_entry);
+
+print \t.multimethod($vtable_entry);
+
 if ($params[0] == 3) {
 # Three-address function
-print ')($2,$3,$1);';
+print '($2,$3,$1);';
 } elsif ($params[0] == 2) {
 # Unary function
-print ')($2,$1);';
+print '($2,$1);';
 }
+
 print \n}\n;
 }
 
+
 sub multimethod {
-my $type = $vtable{$_[0]}{meth_type};
-returnif $type eq unique;
-return '_1 + $3-vtable-num_type' if $type eq num;
-return '_1 + $3-vtable-string_type' if $type eq str;
+my $vtable_entry=shift;
+my $type = $vtable{$vtable_entry}{meth_type};
+my $firstarg=\$2-vtable-$vtable_entry;
+
+return (${firstarg})
+   if $type eq unique;
+
+return VTABLE_CALL_TYPE(${firstarg}_1, \$3-vtable-num_type)
+   if $type eq num;
+
+return VTABLE_CALL_TYPE(${firstarg}_1, \$3-vtable-string_type)
+   if $type eq str;
+
 die Coding error - undefined type $type\n;
 }
 
 
+
 __DATA__
 # Three-address functions
 3 add
--- ..\..\parrot-cvs\parrot\core.opsWed Oct 24 07:54:54 2001
+++ core.opsThu Oct 25 14:27:46 2001
@@ -3,8 +3,16 @@
 */
 
 #include math.h
-#include sys/time.h
 
+#ifdef HAS_HEADER_SYSTIME
+  #include sys/time.h
+#else
+  #ifdef WIN32
+#include time.h
+__declspec(dllimport) void __stdcall Sleep(unsigned long);
+  #endif /* WIN32 */
+#endif /* HAS_HEADER_SYSTIME */
+
 =head1 NAME
 
 core.ops
@@ -95,9 +103,19 @@
 =cut
 
 AUTO_OP time(n) {
+#ifdef HAS_HEADER_SYSTIME
+
   struct timeval t;
   gettimeofday(t, NULL);
   $1 = (FLOATVAL)t.tv_sec + ((FLOATVAL)t.tv_usec / 100.0);
+
+#else
+
+  /* Win32 doesn't have gettimeofday or sys/time.h, so just use normal time w/o 
+microseconds
+ XXX Is there a Win32 equivalent to gettimeoday? */
+  $1 = (FLOATVAL)time(NULL);
+
+#endif
 }
 
 
@@ -1786,7 +1804,11 @@
 =cut
 
 AUTO_OP sleep(i|ic) {
-  sleep($1);
+  #ifdef WIN32
+Sleep($1*1000);
+  #else
+sleep($1);
+  #endif
 }
 
 ###



Re: [PATCHES] Exception idea

2001-10-25 Thread Jeff

Yeah, I probably should have named the register stack 'X' or something like
that. At least we're thinking along somewhat compatible lines. I'll be eager to
see your solution...

Dan Sugalski wrote:

 At 10:34 AM 10/25/2001 -0400, Jeffrey Goff wrote:
 pope # Restore the exception stack

 I've been thinking about going with an exception stack rather than a set of
 exception registers, but there's something awfully compelling about an
 opcode named pope... :)

 Dan

 --it's like this---
 Dan Sugalski  even samurai
 [EMAIL PROTECTED] have teddy bears and even
   teddy bears get drunk




Re: String rationale

2001-10-25 Thread Dan Sugalski

At 11:59 PM 10/25/2001 +0100, Tom Hughes wrote:
In message [EMAIL PROTECTED]
   Dan Sugalski [EMAIL PROTECTED] wrote:

  =item type
 
  What the character set or type of data is encoded in the buffer. This
  includes things like ASCII, EBCDIC, Unicode, Chinese Traditional,
  Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a
  combination of type and encoding. I'll update the doc as soon as I can
  reasonablty separate the two)

Isn't this going to need to be a vtable pointer like encoding is?

Yup. I'd intended it to be an index into a table of character set 
functions. Jarkko has convinced me that it's better to have it as a vtable 
pointer, but I haven't had a chance to update the docs yet.


Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk




Ooops, sorry for that blank log message.

2001-10-25 Thread Brian Wheeler

Darn it, I fat fingered the log message.

This is a fix which changes the way op variants are handled.  The old
method forgot the last variant, so thing(i,i|ic,i|ic) would
generate:
thing(i,i,i)
thing(i,i,ic)
thing(i,ic,i)

but not

thing(i,ic,ic)

The new one does.

Brian