Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes

2004-06-14 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 10:12 PM -0700 6/11/04, Matt Fowles (via RT) wrote:
>>This patch adds the above Fixed*Array classes.  They are basic tests for
>>all of them included too, although more tests never hurts...

> With MANIFEST patch even! Woohoo!

> Applied, thanks.

1) Is there any good reason to start now malloc(3) based array classes?
This leads to code duplication for all the utility vtable entries (like
C). F can deal with all types already.

2) What's the difference between *PMCArray and *BooleanArray?

leo


Re: Making PMCs

2004-06-14 Thread Dan Sugalski
At 12:53 PM -0500 6/13/04, Matt Fowles wrote:
Nicholas~
I will try to answer what I can, based on my current experience 
making those array PMCs.

Nicholas Clark wrote:
a data pointer
which I can use. I am always responsible for freeing anything there(?)
and to do this I need to set the active destroy flag(?)
This flag is not the same as the high priority DOD system(?)
Does the garbage collector ever consider this pointer?
Does it ever chase what it points to?
You are responsible for freeing it by setting the active destroy flag.
Well... no. You're not. If the memory hanging off the data pointer 
was allocated from one of parrot's managed pools (either free memory 
or pmc/buffer header) then you don't have to free it.

You only need to have a destroy function if you've malloc'd memory or 
need to actively tear down something, usually a filehandle or 
connection to a third-party extension or something.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes

2004-06-14 Thread Dan Sugalski
At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 10:12 PM -0700 6/11/04, Matt Fowles (via RT) wrote:
This patch adds the above Fixed*Array classes.  They are basic tests for
all of them included too, although more tests never hurts...

 With MANIFEST patch even! Woohoo!

 Applied, thanks.
1) Is there any good reason to start now malloc(3) based array classes?
This leads to code duplication for all the utility vtable entries (like
C). F can deal with all types already.
list.c's pretty inefficient for most array usage. It's good for 
mixed-type, sparse, or really big arrays, but for normal arrays it's 
overkill. A big wad of memory's just fine there.

2) What's the difference between *PMCArray and *BooleanArray?
The PMC arrays hold PMCs. The Boolean arrays hold true/false values only.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Making PMCs

2004-06-14 Thread Dan Sugalski
At 11:01 AM +0100 6/13/04, Nicholas Clark wrote:
I'm trying to work out how to make PMCs. I'm not finding much documentation,
and I'm not sure what I'm missing. Particularly I'm trying to work out where
I'm allowed to store data, and what flags I might have to set
I'll write up something more detailed later on today, but for now:
A basic PMC appears to contain
flags
of which 8 are private so I could use
Yes.
a data pointer
which I can use. I am always responsible for freeing anything there(?)
No. Only if you need to take some sort of extraordinary measures.
	and to do this I need to set the active destroy flag(?)
Again, only with extraordinary measures.
	This flag is not the same as the high priority DOD system(?)
Nope.
	Does the garbage collector ever consider this pointer?
Yes, if the right flags are set.
is_PMC_ptr is set if this pointer points to a PMC.
is_buffer_ptr is set if this pointer points to a buffer-like 
structure. (Such as a string)

Set them both if the pointer points to a buffer of PMCs.
	Does it ever chase what it points to?
If the right flags are set (namely the two above) yes.
a pobj_t union
which I can use.
Given that the nature of a C union means that the floating point value
occupies the same space as the pointer, do I need to set flags
depending on whether the pointers point to anything?
If they point to something parrot needs to track (a buffer, string or 
PMC) and you want parrot to do it automatically, yes. I'm going to 
have to go dig for that, though, as things have changed a bit since I 
last looked.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Making PMCs

2004-06-14 Thread Nicholas Clark
On Mon, Jun 14, 2004 at 08:53:10AM -0400, Dan Sugalski wrote:
> At 12:53 PM -0500 6/13/04, Matt Fowles wrote:
> >Nicholas~
> >
> >I will try to answer what I can, based on my current experience 
> >making those array PMCs.
> >
> >
> >Nicholas Clark wrote:
> >
> >>a data pointer
> >>which I can use. I am always responsible for freeing anything 
> >>there(?)
> >>and to do this I need to set the active destroy flag(?)
> >>This flag is not the same as the high priority DOD system(?)
> >>Does the garbage collector ever consider this pointer?
> >>Does it ever chase what it points to?
> >>
> >You are responsible for freeing it by setting the active destroy flag.
> 
> Well... no. You're not. If the memory hanging off the data pointer 
> was allocated from one of parrot's managed pools (either free memory 
> or pmc/buffer header) then you don't have to free it.

There's a memory internals document, but I can't spot any document given
an API overview on how to allocate memory this way.

The implication of what you're saying is that the data pointer is checked by
the DOD, and any PMC it points directly to isn't dead.

Nicholas Clark


Re: Making PMCs

2004-06-14 Thread Dan Sugalski
At 2:33 PM +0100 6/14/04, Nicholas Clark wrote:
On Mon, Jun 14, 2004 at 08:53:10AM -0400, Dan Sugalski wrote:
 At 12:53 PM -0500 6/13/04, Matt Fowles wrote:
 >Nicholas~
 >
 >I will try to answer what I can, based on my current experience
 >making those array PMCs.
 >
 >
 >Nicholas Clark wrote:
 >
 >>a data pointer
 >>   which I can use. I am always responsible for freeing anything
 >>   there(?)
 >>   and to do this I need to set the active destroy flag(?)
 >>   This flag is not the same as the high priority DOD system(?)
 >>   Does the garbage collector ever consider this pointer?
 >>   Does it ever chase what it points to?
 >>
 >You are responsible for freeing it by setting the active destroy flag.
 Well... no. You're not. If the memory hanging off the data pointer
 was allocated from one of parrot's managed pools (either free memory
 or pmc/buffer header) then you don't have to free it.
There's a memory internals document, but I can't spot any document given
an API overview on how to allocate memory this way.
Yeah, it's all kinda ad-hoc. Needs fixing.
The implication of what you're saying is that the data pointer is checked by
the DOD, and any PMC it points directly to isn't dead.
Well... sort of. Checking and cleaning up are two very separate 
things here. Parrot may not automatically check (leaving that to your 
PMC's custom mark routine) but will automatically clean up. (If 
you've not marked in your custom mark routine)

Basically, if the right flags are set, the DOD trace will treat the 
pointer as pointing to something it should consider, and 
automatically trace into it. If the right flags aren't set it won't, 
and your needs to mark it explicitly.

Regardless of anything else, the DOD sweep will reclaim the 
PMC/String/Buffer/PObj structures if they aren't marked in the mark 
phase, either automatically or by a PMC mark routine, and memory not 
pointed to by a live buffer-ish thing will get reclaimed, so if your 
PMC with custom stuff hanging off the data pointer dies parrot will 
still reclaim its memory and whatnot for you.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Making PMCs

2004-06-14 Thread Leopold Toetsch
Nicholas Clark <[EMAIL PROTECTED]> wrote:
> I'm trying to work out how to make PMCs. I'm not finding much documentation,

I'll create a POD, which hopefully will answer all these questons.

leo


Re: [perl #30245] [PATCH] Resizable*Array pmcs

2004-06-14 Thread Dan Sugalski
At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote:
This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array
pmcs.  It does so by inheriting from them, so the Fixed ones are changed
too.
Applied, thanks.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes

2004-06-14 Thread Leopold Toetsch
Dan Sugalski wrote:
At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote:
1) Is there any good reason to start now malloc(3) based array classes?
This leads to code duplication for all the utility vtable entries (like
C). F can deal with all types already.

list.c's pretty inefficient for most array usage. It's good for 
mixed-type, sparse, or really big arrays, but for normal arrays it's 
overkill. A big wad of memory's just fine there.
Well, yes. It depends on the usage of the PMC, which isn't known. What 
about shift/unshift? Are these allowed for fixed sized arrays?

I'd vote for optimizing list.c for the "small usage pattern" and switch 
to a different strategy for big arrays.

Anyway, the patch #30245 Resizable*Array implements these arrays on top 
of fixed size. We had that some times ago with Array/PerlArray. It was 
around 100 times slower for growing usage like:

  @ar[$_] = $x for (0..$N)
for some big $N.
And it of course duplicates existing classes like IntList, which just 
needs to get renamed.

2) What's the difference between *PMCArray and *BooleanArray?

The PMC arrays hold PMCs. The Boolean arrays hold true/false values only.
Then it should really store just one bit instead of a word.
leo


Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes

2004-06-14 Thread Dan Sugalski
At 4:56 PM +0200 6/14/04, Leopold Toetsch wrote:
Dan Sugalski wrote:
At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote:
1) Is there any good reason to start now malloc(3) based array classes?
This leads to code duplication for all the utility vtable entries (like
C). F can deal with all types already.

list.c's pretty inefficient for most array usage. It's good for 
mixed-type, sparse, or really big arrays, but for normal arrays 
it's overkill. A big wad of memory's just fine there.
Well, yes. It depends on the usage of the PMC, which isn't known. 
What about shift/unshift? Are these allowed for fixed sized arrays?
Given that they change the size of an array... no.
I'd vote for optimizing list.c for the "small usage pattern" and 
switch to a different strategy for big arrays.
I wouldn't. list.c is designed for a different set of usage than the 
common array. Making it handle both common "wad of memory filled with 
a single type" arrays and the much-less-common "sparse array of 
multiple types" arrays doesn't make much sense.

Better to have the array have the needed smarts to upgrade itself to 
the more heavyweight array type if it really needs to.

Anyway, the patch #30245 Resizable*Array implements these arrays on 
top of fixed size.
So? I'm well aware that the implementation is suboptimal. Hell, the 
commit message and the messages on the list make that clear. That 
really, *really* doesn't matter for this. The point here is to get 
the types in, get their behaviour correct, and nail them down as 
guaranteed. How they do their thing is entirely irrelevant to that.

2) What's the difference between *PMCArray and *BooleanArray?

The PMC arrays hold PMCs. The Boolean arrays hold true/false values only.
Then it should really store just one bit instead of a word.
So fix it if you want. This is first cut code. There's plenty of time 
to optimzie it later.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [perl #30252] [PATCH] work on languages/Makefile

2004-06-14 Thread Dan Sugalski
At 5:04 AM -0700 6/13/04, Bernhard Schmalhofer (via RT) wrote:
I have been looking into languages/Makefile and tried to update and beautify
it.
Cool. Applied, thanks.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: [perl #30245] [PATCH] Resizable*Array pmcs

2004-06-14 Thread Leopold Toetsch
Dan Sugalski <[EMAIL PROTECTED]> wrote:
> At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote:
>>This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array
>>pmcs.  It does so by inheriting from them, so the Fixed ones are changed
>>too.

> Applied, thanks.

- duplicates existing PMCs functionality:
   IntList <-> ResizableIntegerArray
   FloatvalArray <-> ResizableFloatArray
   PerlArray <-> ResizablePMCArray
   Array <-> FixedMCArray

- incomplete: push/pop/shift/unshift/splice/freeze/thaw

- It's broken (realloced mem isn't cleared always)

= It's 2 times slower for filling arrays with this loop:

i = 0
lp:
ar[i] = 1
inc i
if i < n goto lp

- clone is different (shallow vs deep) - whatever is right

leo


More perl5.005 problems

2004-06-14 Thread Andy Dougherty
For some reason I haven't been able to figure out, perl5.00503 can't seem
to handle the TODO test in t/pmc/object-meths.t.  Here's the result of

perl5.005 t/harness t/pmc/object-meths.t

t/pmc/object-meths..FAILED test 19
Failed 1/21 tests, 95.24% okay
Failed Test  Status Wstat Total Fail  Failed  List of failed
---
t/pmc/object-me  211   4.76%  19
Failed 1/1 test scripts, 0.00% okay. 1/21 subtests failed, 95.24% okay.

The same command with perl5.6 or 5.8 reports all tests succeed.

Does anybody know how to fix this?

Annoyingly, if I try to see what's going on with t/harness's documented -v
switch, chaos ensues:

perl5.005 t/harness -v t/pmc/object-meths.t

t/pmc/object-meths..# Failed test (t/pmc/object-meths.t at line 62)
#  got: 'debug = 0x0
# Reading /home/doughera/src/parrot/parrot-andy/t/pmc/object-meths_5.pasm
# using optimization '0' (0)
# Starting parse...
# 13 lines compiled.
# Running...
# main
# in meth
# back
# '
# expected: 'main
# in meth
# back
# '

[ etc.]

So my other question is:  Is  t/harness -v actually supposed to work
and do something useful?

(Incidentally, all of this presumes that I've installed a recent
File::Spec into parrot's lib/ directory -- 5.005's File::Spec isn't up to
the task.)

-- 
Andy Dougherty  [EMAIL PROTECTED]


Re: [perl #30245] [PATCH] Resizable*Array pmcs

2004-06-14 Thread Dan Sugalski
At 5:50 PM +0200 6/14/04, Leopold Toetsch wrote:
Dan Sugalski <[EMAIL PROTECTED]> wrote:
 At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote:
This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array
pmcs.  It does so by inheriting from them, so the Fixed ones are changed
too.

 Applied, thanks.
- duplicates existing PMCs functionality:
   IntList <-> ResizableIntegerArray
   FloatvalArray <-> ResizableFloatArray
   PerlArray <-> ResizablePMCArray
   Array <-> FixedMCArray
Yup. (Well, except for the PerlArray part) Part of this exercise is 
to standardize things and toss the things we no longer need.

- incomplete: push/pop/shift/unshift/splice/freeze/thaw
Then we need to start a todo list.
- It's broken (realloced mem isn't cleared always)
So fix it, or file a bug report for someone else to fix.
= It's 2 times slower for filling arrays with this loop:
i = 0
lp:
ar[i] = 1
inc i
if i < n goto lp
Well, yeah, it's unoptimized. That can be fixed.
- clone is different (shallow vs deep) - whatever is right
Which needs standardization.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Slices and iterators

2004-06-14 Thread Dan Sugalski
Since we're going to need these, and I'm in a documenting and 
defining mood (yes, I am making a final decision on strings today. 
Whee!) I figure we need to tackle them.

First, slices. Perl's got 'em, Python has them, Ruby, interestingly, 
doesn't. (Sort of) A slice is a subset of elements in an aggregate. 
They don't have to be contiguous, unique, or in any order. As an 
example:

@foo = ('A', 'B', 'C', 'D');
@bar = @foo[0,2]; # A slice--elements 0 and 2
@bar is now ('A', 'C'). Or:
@bar = @foo[0,0,0,0,0,0];
@bar is now ('A',  'A',  'A',  'A',  'A',  'A').
I think for this to work we need to add a slice vtable entry. Not 
because I'm particularly fond of vtable entries as such, but it's a 
pretty fundamental operation. (Python devotes opcodes to it even)

The slice vtable entry should take as its parameter a slice pmc. This 
should be an array of typed from/to values, so we can do something 
like:

@foo[0..2,4..8,12..];
with three entries in the slice array--one with a from/to of 0/2, one 
with 4/8, and one with 12/inf. Typed since these will be used with 
hashes, and we'll need to differentiate between something that should 
be taken as a string and something taken as an integer. (If the range 
ends are PMCs, since they may behave differently depending on which 
way they're read)

This vtable entry should return an iterator, which is why they're 
here--not because I've any particular love of the things, but because 
if someone does:

@foo = @bar[0..];
on an array that generates data randomly we'll get caught in an 
infinite loop, which is generally a bad thing.

Since we're working on iterators, all aggregates should be able to 
generate them, which leads to the iterator vtable entry (since 
everyone wants to iterate over everything).

So, the proposal:
*) We add a slice vtable entry which takes a slice pmc and returns an interator
*) We add an iterator vtable entry which returns an interator for the PMC
*) We consider ways to make slices. I can see ops, or I can see basic 
functions. Either is fine, depends on how often the things are used. 
(Ops have less overhead, functions mean fewer ops)

Please, let discussion ensue. We'll decide on the slice creation 
method in a day or two and then just make it all happen.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


The behaviour of iterators

2004-06-14 Thread Dan Sugalski
Once we decide how to *get* these things (see the previous e-mail) we 
need to decide how they should work. We can fiddle around, but 
honestly the scheme:

1) They act as arrays--if you want the 18th element in the iterator, 
access it directly
2) They have 'next', 'previous', 'first', 'last', and 'reset' methods 
to get the next, previous, first, or last element in the iterator, or 
to reset the iterator to the beginning. Next, last, and reset change 
the internal current element pointer, first and last don't.

Sane? The only downside I can see is one of speed, since method calls 
are a bit costly.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: The behaviour of iterators

2004-06-14 Thread Matt Fowles
Dan~
Just a few questions.
Dan Sugalski wrote:
2) They have 'next', 'previous', 'first', 'last', and 'reset' methods 
to get the next, previous, first, or last element in the iterator, or 
to reset the iterator to the beginning. Next, last, and reset change 
the internal current element pointer, first and last don't. 
Do you mean next, previous, reset?
What about those data structures that can only be iterated in one 
direction easily (such as a singly linked list)?  Should they implement 
previous in the slow and painful way and hope no one calls it?  Should 
they throw an exception?  Might it be worthwhile to have two different 
types of iterators (those that only go one direction and those that go 
both)?

Matt



Re: The behaviour of iterators

2004-06-14 Thread Dan Sugalski
At 1:15 PM -0500 6/14/04, Matt Fowles wrote:
Dan~
Just a few questions.
Dan Sugalski wrote:
2) They have 'next', 'previous', 'first', 'last', and 'reset' 
methods to get the next, previous, first, or last element in the 
iterator, or to reset the iterator to the beginning. Next, last, 
and reset change the internal current element pointer, first and 
last don't.
Do you mean next, previous, reset?
D'oh! Yes.
What about those data structures that can only be iterated in one 
direction easily (such as a singly linked list)?  Should they 
implement previous in the slow and painful way and hope no one calls 
it?  Should they throw an exception?  Might it be worthwhile to have 
two different types of iterators (those that only go one direction 
and those that go both)?
Exceptions for unimplemented behaviour is just fine. I should've 
specified. (I can see defining a basic and extended iterator protocol 
for this)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: The behaviour of iterators

2004-06-14 Thread Luke Palmer
Dan Sugalski writes:
> Once we decide how to *get* these things (see the previous e-mail) we 
> need to decide how they should work. We can fiddle around, but 
> honestly the scheme:
> 
> 1) They act as arrays--if you want the 18th element in the iterator, 
> access it directly
> 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods 
> to get the next, previous, first, or last element in the iterator, or 
> to reset the iterator to the beginning. Next, last, and reset change 
> the internal current element pointer, first and last don't.

Why not take a page from C++ and call "previous" and "next" C and
C, and then C to get what it points to.  The ops are already
there.  Not sure about "reset" though.

Luke


Re: Slices and iterators

2004-06-14 Thread Luke Palmer
Dan Sugalski writes:
> The slice vtable entry should take as its parameter a slice pmc. This 
> should be an array of typed from/to values, so we can do something 
> like:
> 
> @foo[0..2,4..8,12..];
> 
> with three entries in the slice array--one with a from/to of 0/2, one 
> with 4/8, and one with 12/inf. 

Perl also has:

@foo[0..12 :by(3)]# 0,3,6,9,12

PDL has affine slices.

To me, it seems like the best thing to do is to give slice an iterator,
and slice would return an iterator that maps keys to values.  So, doing
C<@bar = @foo[0..2,4..8,12...]> would look something like:

Construct iterator for 0..2, 4..8, 12...
Call @foo->VTABLE_slice(iterator)
Initialize @bar from returned iterator

Iterators have the advantage over arrays since they can be infinite.
With arrays, how do you represent:

@foo[12... :by(3)]

Do we still have multidimensional keys? 

Luke



Re: The behaviour of iterators

2004-06-14 Thread Dan Sugalski
At 1:08 PM -0600 6/14/04, Luke Palmer wrote:
Dan Sugalski writes:
 Once we decide how to *get* these things (see the previous e-mail) we
 need to decide how they should work. We can fiddle around, but
 honestly the scheme:
 1) They act as arrays--if you want the 18th element in the iterator,
 access it directly
 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods
 to get the next, previous, first, or last element in the iterator, or
 to reset the iterator to the beginning. Next, last, and reset change
 the internal current element pointer, first and last don't.
Why not take a page from C++ and call "previous" and "next" C and
C, and then C to get what it points to.
Because ++ and -- affect the value not the container. (There are days 
when I think "C++ does it like..." is the near-perfect argument 
against doing it one particular way... :) Next and previous are 
actions on the container.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Slices and iterators

2004-06-14 Thread Dan Sugalski
At 1:21 PM -0600 6/14/04, Luke Palmer wrote:
Dan Sugalski writes:
 The slice vtable entry should take as its parameter a slice pmc. This
 should be an array of typed from/to values, so we can do something
 like:
 @foo[0..2,4..8,12..];
 with three entries in the slice array--one with a from/to of 0/2, one
 with 4/8, and one with 12/inf.
Perl also has:
@foo[0..12 :by(3)]# 0,3,6,9,12
PDL has affine slices.
Yeah, but at some point you have to draw the line and say "This is as 
far as we're going at the low level."

Iterators have the advantage over arrays since they can be infinite.
With arrays, how do you represent:
@foo[12... :by(3)]
And that is probably well past it. :)
Do we still have multidimensional keys?
Yes, we do.
--
Dan
--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Big nums

2004-06-14 Thread Dan Sugalski
At 8:15 PM +0100 6/14/04, Alex Gough wrote:
 [Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED]
 | Time for these as well. There's a partial implementation of them in
 | types/bignum.c. I think it's time to move that to src/ (and the
 | header file to .h) and get it integrated into parrot.
 I'm not really sure if types/bignum.c is what we want. There are AFAIK
 some other math packages around, which are maintained and more complete.
 GMP comes to my mind.
That's not such a bad plan.  There's still a lot to do before the
bignum stuff is entirely ready (in terms of functions for the
standard) (and I'm still too busy right now to get deeply into
anything).
The only thing that worries me about GMP is the license. It's LGPL, 
so we might be able to, but it's tough to tell for sure, and the 
explanatory text doesn't help at all.

The only bignum stuff I want in the core is the 
basics--extended-precision numbers and basic math. (If we get 
transcendentals as a bonus, well... swell) I think I'd as soon just 
flesh out what we have now and be done with it.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Big nums

2004-06-14 Thread Dan Sugalski
At 3:40 PM -0400 6/14/04, Dan Sugalski wrote:
At 8:15 PM +0100 6/14/04, Alex Gough wrote:
 [Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED]
 | Time for these as well. There's a partial implementation of them in
 | types/bignum.c. I think it's time to move that to src/ (and the
 | header file to .h) and get it integrated into parrot.
 I'm not really sure if types/bignum.c is what we want. There are AFAIK
 some other math packages around, which are maintained and more complete.
 GMP comes to my mind.
That's not such a bad plan.  There's still a lot to do before the
bignum stuff is entirely ready (in terms of functions for the
standard) (and I'm still too busy right now to get deeply into
anything).
The only thing that worries me about GMP is the license. It's LGPL, 
so we might be able to, but it's tough to tell for sure, and the 
explanatory text doesn't help at all.
But on second reading it the license makes this untenable. If we did 
use the GMP library and shipped it with parrot then we'd be obligated 
to package the full GMP source code with every binary distribution, 
so... no joy there.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Characters/graphemes/freds

2004-06-14 Thread Dan Sugalski
Did we ever come to some consensus of what a "character" (that is a 
sequence of code points which makes up a single atomic thing in a 
language) should be called? I seem to remember grapheme being 
not-quite-correct, but I can't dig up the better answer. (And yes, 
the string doc is being finished. This is all that's left to it)
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Strings. Finally.

2004-06-14 Thread Dan Sugalski
The official, 1.0, final version, modulo a more correct name for 
'grapheme', or spelling/grammar errors.

Do please note that whatever objection you may have to this has at 
least three people who disagree differently, and one or more (who 
aren't me) who agree with what you disagree with. Also note that I'm 
not entirely happy with this either.

Consider it an exercise in group coping--we will all deal with it and 
make do. All complaints, *including* mine, shall be summarily binned, 
with extreme prejudice. And yes, this means I won't be complaining 
about Unicode any more.

++Cut Here
Strings, the final design document
Requirements

* Efficiency - The system must do the absolute minimum amount of work
  to get the job done
* Correctness - The job that's done must actually be right
* Upgradeability - This stuff's all going to change again in five years
  so we really don't want to have to do it over again.
* Flexibility - Since, unfortunately, no one way of looking at
  strings is going to be right for everyone
Realities
=
* There are a lot of different ways of representing text. Many of
  them annoying, some of them wildly incompatible, none of them
  wrong.
* We don't get to make the call what is right or wrong
* Some of the languages we support don't do Unicode, or do Unicode
  and other things (including perl 5 and Ruby)
Desires
===
* We want to make it easily possible to do the right thing with string
  data
* We want all the troublesome stuff to be as invisible as possible
* We want to make it look like everyone's got what they want without
  actually doing it when we don't have to
With that list in mind, here's parrot's solution. Please note that
the *only* thing up for discussion is a more correct label for
'grapheme'. It is, otherwise, the final external design.
Definitions
===
BYTE - 8 bits 'o data
CODE POINT - A 32-bit integer that represents a single thing in a
 character set
ENCODING - How code points are mapped to bytes, and vice versa
CHARACTER SET - Contains meta-information about code points. This
includes both the meaning of individual code points
(65 is capital A, 776 is a combining diaresis) as
well as a set of categorizations of code
points (alpha, numeric, whitespace, punctuation, and
so on), and a sorting order.
GRAPHEME - One or more code points which makes up a single real
   entity. The "oe" (I'm stuck with ASCII here, that should
   really be an o with two dots over it) in Leo's last name
   is, in the unicode character set, a single character with
   two code points, 111 (lowercase o) and 776 (combining
   diaresis). Graphemes can *not* be legitimately
   decomposed into individual code points in most cases.
Important note
==
This document is completely language-insensitive--that is, there's no
language attached to any particular piece of data. Collation and
casing rules are done based on a single global setting that is
unconditionally applied in all cases. Setting and querying those
rules is beyond the scope of this document.
Conceptually

The smallest unit of text that Parrot will process is the string,
something that can be put in an S register. These strings have the
following properties:
*) They have an encoding
*) They have a character set
*) They have a taint status
The above things are independent of the view of the string presented
to bytecode programs--these are metadata elements that describe the
contents of the string as they actually exist, rather than as they
are presented.
Internally parrot is capable of maintaining strings in several
different basic encodings (8-bit, 16-bit, and 32-bit integer, as well
as UTF-8) and may load other encodings on the fly as needed. Parrot so
also capable of maintaining strings in many different character sets
(ASCII, EBCDIC, Unicode, Latin-n, etc) which are also dynamically
loadable. Finally Parrot is capable of maintaining strings in many
different languages, which also may be loaded on the fly.
This is done for maximum efficiency, regardless of the view of the
data presented to the bytecode programs. Conversion to a different
format may be done if needed to properly express the semantics of the
program, but will not be done if not needed.
For example, consider the following:
  use Unicode;
  open FOO, "foo.txt", :charset(latin-3);
  open BAR, "bar.txt", :charset(big5);
  $filehandle = 0;
  while (<>) {
if ($filehandle++) {
  print FOO $_;
} else {
  print BAR $_;
}
$filehadle %= 2;
  }
Relatively simple, the program reads from the input filehandle and
splits the data, line by line, between two output files. The two
output files have different requirements -- FOO gets data in Latin-1,
while BAR gets it in Big5. The "use Unicode;" thing at the top's a
hand-wavey way of a

Re: Big nums

2004-06-14 Thread Alex Gough
 [Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED]
> 
> | Time for these as well. There's a partial implementation of them in 
> | types/bignum.c. I think it's time to move that to src/ (and the 
> | header file to .h) and get it integrated into parrot.
> 
> I'm not really sure if types/bignum.c is what we want. There are AFAIK
> some other math packages around, which are maintained and more complete.
> GMP comes to my mind.

That's not such a bad plan.  There's still a lot to do before the
bignum stuff is entirely ready (in terms of functions for the
standard) (and I'm still too busy right now to get deeply into
anything).

At the same time, I'd caution against putting too much functionality
into the core, the current Perl bignum stuff is probably too broad,
which makes it tricky to look after.  I'd also argue strongly in
favour of a decimal bignum implementation, because that gets you two
birds with one stone (well, a few: bignums, limited/defined precision
bignums, trustyworthy rounding, decent numerical exception support,
world peace), and a massive test suite from ibm to let you know
everything's working.

Alex
-- 
It's supposed to be automatic but actually you have to press this button




Re: More perl5.005 problems

2004-06-14 Thread Michael G Schwern
On Mon, Jun 14, 2004 at 12:00:42PM -0400, Andy Dougherty wrote:
> For some reason I haven't been able to figure out, perl5.00503 can't seem
> to handle the TODO test in t/pmc/object-meths.t.  Here's the result of

5.5.3's Test::Harness doesn't know how to handle that style of TODO.
You'll have to make a dependency on T::H 2.x if you want to use TODO.


-- 
Michael G Schwern[EMAIL PROTECTED]  http://www.pobox.com/~schwern/
Funny thing about weekends when you're unemployed--they don't mean quite
so much.  'Cept you get to hang out with your workin' friends.
- Primus "Spaghetti Western"


Re: More perl5.005 problems

2004-06-14 Thread Dan Sugalski
At 4:39 PM -0400 6/14/04, Michael G Schwern wrote:
On Mon, Jun 14, 2004 at 12:00:42PM -0400, Andy Dougherty wrote:
 For some reason I haven't been able to figure out, perl5.00503 can't seem
 to handle the TODO test in t/pmc/object-meths.t.  Here's the result of
5.5.3's Test::Harness doesn't know how to handle that style of TODO.
You'll have to make a dependency on T::H 2.x if you want to use TODO.
Is there another style of TODO that could be used here that would be 
compatible with 5.005_03?
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: More perl5.005 problems

2004-06-14 Thread chromatic
On Mon, 2004-06-14 at 14:26, Dan Sugalski wrote:

> >5.5.3's Test::Harness doesn't know how to handle that style of TODO.
> >You'll have to make a dependency on T::H 2.x if you want to use TODO.

> Is there another style of TODO that could be used here that would be 
> compatible with 5.005_03?

None such beast currently exists.

In fact, I'm surprised he managed to install an acceptably recent
version of Test::Simple on 5.5.3 without upgrading Test::Harness; the
bundle's required Test::Harness 2.03 for a couple of years now.

-- c



Parrot core dumps on FC1?

2004-06-14 Thread Michel Pelletier

Has anyone run into immediate core dumps on Fedora Core 1?  When I run
'make' the interpreter successfully compiles, but dumps core when it
tries to compile parrotlib.imc.

: blib/lib/libparrot.a
c++ -o parrot -Wl,-E  -g  imcc/main.o blib/lib/libparrot.a
blib/lib/libicuuc.a blib/lib/libicudata.a -lnsl -ldl -lm -lcrypt -lutil
-lpthread -lrt
./parrot -o runtime/parrot/include/parrotlib.pbc
runtime/parrot/library/parrotlib.imc
make: *** [runtime/parrot/include/parrotlib.pbc] Segmentation fault

Under gdb:

[EMAIL PROTECTED] parrot]$ gdb parrot
GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you
are
welcome to change it and/or distribute copies of it under certain
conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for
details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host
libthread_db library "/lib/tls/libthread_db.so.1".

(gdb) r
Starting program: /home/michel/parrot/parrot 
[Thread debugging using libthread_db enabled]
[New Thread -1084659872 (LWP 3839)]

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread -1084659872 (LWP 3839)]
doOpenChoice (path=0x827684c "icudt26l", type=0x8274d47 "icu",
name=0x8274d40 "uprops", 
isAcceptable=0x8203104 , context=0x0,
pErrorCode=0xbff5186c) at udata.c:825
825 if(pHeader->dataHeader.magic1==0xda &&
(gdb) 

Is this a known FC1 issue?

-Michel



Some rationale for the mixed encoding scheme

2004-06-14 Thread Dan Sugalski
Since I know this is going to come up, I figure I should pre-empt it 
and be done with it. (Though I should've put this in the string 
document. Ah, well. Hopefully timing isn't everything, or I am *so* 
in trouble...)

Why aren't we converting to Unicode on the edge? Since, after all, 
any Sane Language will do all its string handling in Unicode, right? 
Why leave things the way they are until late?

Simple. Efficiency.
It's no less efficient to defer conversion of string data to Unicode 
(or, heck, from a harder-to-use (UTF-8) encoding to an easier to use 
one (UTF-32)) on demand then it is to do it at the edge. But... we 
get the bonus of *not* spending the time to do the encoding shifts 
and charset shifts if we don't need to. Which will happen for folks 
if they, for example, never *do* anything that'd mandate the shift. 
And if they do, well, we do the shift once then switch over the 
string vtable pointers to the new encoding and never have to do so 
again.

And while that may not be an overwhelming win, nor convincing to 
everyone, it also means that folks who want to stick with a single, 
non-Unicode setup (US-ASCII or Latin-1 folks who don't want to shift) 
can do so without incurring a penalty in time, space, or e-mail 
complaining. (And, bluntly, at this point I consider features that 
let people not grumble a big win)

Is it a bit more work for us? Well, a little, but no more so than 
using vtables for PMCs to do stuff, and that's all worked out quite 
nicely, honestly.

I do realize that the Big ICU Patch tossed a lot of the 
infrastructure for this, which broke parrot for folks who can't/won't 
do ICU. (And there are a number of folks shut out of development 
because they can't get ICU going) That'll be put back over then next 
week or so and ICU factored out to an optional build feature.
--
Dan

--it's like this---
Dan Sugalski  even samurai
[EMAIL PROTECTED] have teddy bears and even
  teddy bears get drunk


Re: Slices and iterators

2004-06-14 Thread Luke Palmer
Dan Sugalski writes:
> At 1:21 PM -0600 6/14/04, Luke Palmer wrote:
> >Dan Sugalski writes:
> >> The slice vtable entry should take as its parameter a slice pmc. This
> >> should be an array of typed from/to values, so we can do something
> >> like:
> >>
> >> @foo[0..2,4..8,12..];
> >>
> >> with three entries in the slice array--one with a from/to of 0/2, one
> >> with 4/8, and one with 12/inf.
> >
> >Perl also has:
> >
> >@foo[0..12 :by(3)]# 0,3,6,9,12
> >
> >PDL has affine slices.
> 
> Yeah, but at some point you have to draw the line and say "This is as 
> far as we're going at the low level."
> 
> >Iterators have the advantage over arrays since they can be infinite.
> >With arrays, how do you represent:
> >
> >@foo[12... :by(3)]
> 
> And that is probably well past it. :)
> 
> >Do we still have multidimensional keys?
> 
> Yes, we do.

Then these are both clear arguments for giving an iterator to slice
rather than an array.  We have no way to represent @foo[12... :by(3)],
so how do we represent it?  We have multidimensional keys but no way to
slice by them.

The former is solved by constructing the lazy iterator for C<12... :by(3)> 
and giving it to @foo's slice.  The latter is solved by creating an
iterator that yields multidimensional keys.

What advantage does the simple array case give, other than the one fewer
opcode from extracting the iterator?

Luke


Re: The behaviour of iterators

2004-06-14 Thread Luke Palmer
Dan Sugalski writes:
> At 1:08 PM -0600 6/14/04, Luke Palmer wrote:
> >Dan Sugalski writes:
> >> Once we decide how to *get* these things (see the previous e-mail) we
> >> need to decide how they should work. We can fiddle around, but
> >> honestly the scheme:
> >>
> >> 1) They act as arrays--if you want the 18th element in the iterator,
> >> access it directly
> >> 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods
> >> to get the next, previous, first, or last element in the iterator, or
> >> to reset the iterator to the beginning. Next, last, and reset change
> >> the internal current element pointer, first and last don't.
> >
> >Why not take a page from C++ and call "previous" and "next" C and
> >C, and then C to get what it points to.
> 
> Because ++ and -- affect the value not the container. (There are days 
> when I think "C++ does it like..." is the near-perfect argument 
> against doing it one particular way... :) 

Heh, yeah.

> Next and previous are actions on the container.

Then how, if we have an array of iterators, do we increment an internal
iterator from an external one.  That is:

@foo = (1..5);
@bar = map { @foo.iter($_) } reverse 0..4;  # make an array of iterators

for @bar  0... -> $x, $c {
if something($x) {
@bar[$c].next;
}
}

Without that awful reference of $c (awful not in a stylistic way, but in
a I-have-to-keep-the-index-around-too-wtf kind of way).

I'm arguing for an iterator to be an I pointer, one that you
have to dereference, for this reason.

Luke


Re: Strings. Finally.

2004-06-14 Thread Brent 'Dax' Royal-Gordon
Sorry to reply to this, but I feel that this is a request for 
clarifications, not for a change.  :^)

Dan Sugalski wrote:
Synthesized code points
===
...
becomes two integers, 0x0041 and 0x82A9. (Though it could
represent them as 16-bit integers, since no character takes three or
more bytes)
It strikes me that this scheme is not always null-safe (e.g. the 
character 00 11 would be indistinguishable from a bare 11).  Are there 
any encodings this could cause a problem with?

getbyte Ix, Sy, Iz
(u)getcodepoint  Ix, Sy, Iz
(u)getgrapheme   Sx, Sy, Iz
>
Get the byte, codepoint, or grapheme requested. Destination is either
an integer (representing the byte or codepoint) or a string. Sy is the
source string, Iz is the offset in bytes, code points, or graphemes
from the beginning of the string.
Since we're going to be shifting around the encoding essentially at 
will, does 'getbyte' make sense on non-binary strings?  (And when we 
have a binary string, is there any difference between 'getbyte', 
'getcodepoint', and 'getgrapheme' at all?)

If so, will 16- and 32-bit encodings have to implement this with a 
forward scan from the start of the string (the way getcodepoint would 
have to be implemented with a variable-width encoding), or do you have 
another trick up your sleeve?

setbyte Sx, Iy, Iz
(u)setcodepoint  Sx, Iy, Iz
(u)setgrapheme   Sx, Sy, Iz
Likewise.
--
Brent "Dax" Royal-Gordon <[EMAIL PROTECTED]>
Perl and Parrot hacker
Oceania has always been at war with Eastasia.


Re: Event design sketch

2004-06-14 Thread martin
On Tue, 11 May 2004, Uri Guttman wrote:
>   >> Why would alarm need any special opcode when it is just a timer
>   >> with a delay of [abs_time minus NOW]?
>   >> Let the coder handle that and lose the extra opcodes.
>
>   mab> you want to make the latency between getting the abs_time, doing
>   mab> the substract[ion] and actually setting up the time as small as
>   mab> possible
>
> Accuracy of delivery (latency) is silly to worry about in Perl for
> granularities of more than about .05 seconds or so. Building a very fine
> grained accurate real-time system in Perl makes little sense to me.

> so i usually don't worry about who does the delta calculation and the
> slight amount of delay it takes.

Never mind the granularity or latency, there are systems where "time of day"
can be adjusted to take into account clock drift, while "system elapsed
time" is left unaffected. Which you want depends on whether you want to
sleep for a specific time, or wake up at a specific time, and it would be
nice if Parrot didn't rule out making use of that.

-Martin