Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 10:12 PM -0700 6/11/04, Matt Fowles (via RT) wrote: >>This patch adds the above Fixed*Array classes. They are basic tests for >>all of them included too, although more tests never hurts... > With MANIFEST patch even! Woohoo! > Applied, thanks. 1) Is there any good reason to start now malloc(3) based array classes? This leads to code duplication for all the utility vtable entries (like C). F can deal with all types already. 2) What's the difference between *PMCArray and *BooleanArray? leo
Re: Making PMCs
At 12:53 PM -0500 6/13/04, Matt Fowles wrote: Nicholas~ I will try to answer what I can, based on my current experience making those array PMCs. Nicholas Clark wrote: a data pointer which I can use. I am always responsible for freeing anything there(?) and to do this I need to set the active destroy flag(?) This flag is not the same as the high priority DOD system(?) Does the garbage collector ever consider this pointer? Does it ever chase what it points to? You are responsible for freeing it by setting the active destroy flag. Well... no. You're not. If the memory hanging off the data pointer was allocated from one of parrot's managed pools (either free memory or pmc/buffer header) then you don't have to free it. You only need to have a destroy function if you've malloc'd memory or need to actively tear down something, usually a filehandle or connection to a third-party extension or something. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes
At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 10:12 PM -0700 6/11/04, Matt Fowles (via RT) wrote: This patch adds the above Fixed*Array classes. They are basic tests for all of them included too, although more tests never hurts... With MANIFEST patch even! Woohoo! Applied, thanks. 1) Is there any good reason to start now malloc(3) based array classes? This leads to code duplication for all the utility vtable entries (like C). F can deal with all types already. list.c's pretty inefficient for most array usage. It's good for mixed-type, sparse, or really big arrays, but for normal arrays it's overkill. A big wad of memory's just fine there. 2) What's the difference between *PMCArray and *BooleanArray? The PMC arrays hold PMCs. The Boolean arrays hold true/false values only. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Making PMCs
At 11:01 AM +0100 6/13/04, Nicholas Clark wrote: I'm trying to work out how to make PMCs. I'm not finding much documentation, and I'm not sure what I'm missing. Particularly I'm trying to work out where I'm allowed to store data, and what flags I might have to set I'll write up something more detailed later on today, but for now: A basic PMC appears to contain flags of which 8 are private so I could use Yes. a data pointer which I can use. I am always responsible for freeing anything there(?) No. Only if you need to take some sort of extraordinary measures. and to do this I need to set the active destroy flag(?) Again, only with extraordinary measures. This flag is not the same as the high priority DOD system(?) Nope. Does the garbage collector ever consider this pointer? Yes, if the right flags are set. is_PMC_ptr is set if this pointer points to a PMC. is_buffer_ptr is set if this pointer points to a buffer-like structure. (Such as a string) Set them both if the pointer points to a buffer of PMCs. Does it ever chase what it points to? If the right flags are set (namely the two above) yes. a pobj_t union which I can use. Given that the nature of a C union means that the floating point value occupies the same space as the pointer, do I need to set flags depending on whether the pointers point to anything? If they point to something parrot needs to track (a buffer, string or PMC) and you want parrot to do it automatically, yes. I'm going to have to go dig for that, though, as things have changed a bit since I last looked. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Making PMCs
On Mon, Jun 14, 2004 at 08:53:10AM -0400, Dan Sugalski wrote: > At 12:53 PM -0500 6/13/04, Matt Fowles wrote: > >Nicholas~ > > > >I will try to answer what I can, based on my current experience > >making those array PMCs. > > > > > >Nicholas Clark wrote: > > > >>a data pointer > >>which I can use. I am always responsible for freeing anything > >>there(?) > >>and to do this I need to set the active destroy flag(?) > >>This flag is not the same as the high priority DOD system(?) > >>Does the garbage collector ever consider this pointer? > >>Does it ever chase what it points to? > >> > >You are responsible for freeing it by setting the active destroy flag. > > Well... no. You're not. If the memory hanging off the data pointer > was allocated from one of parrot's managed pools (either free memory > or pmc/buffer header) then you don't have to free it. There's a memory internals document, but I can't spot any document given an API overview on how to allocate memory this way. The implication of what you're saying is that the data pointer is checked by the DOD, and any PMC it points directly to isn't dead. Nicholas Clark
Re: Making PMCs
At 2:33 PM +0100 6/14/04, Nicholas Clark wrote: On Mon, Jun 14, 2004 at 08:53:10AM -0400, Dan Sugalski wrote: At 12:53 PM -0500 6/13/04, Matt Fowles wrote: >Nicholas~ > >I will try to answer what I can, based on my current experience >making those array PMCs. > > >Nicholas Clark wrote: > >>a data pointer >> which I can use. I am always responsible for freeing anything >> there(?) >> and to do this I need to set the active destroy flag(?) >> This flag is not the same as the high priority DOD system(?) >> Does the garbage collector ever consider this pointer? >> Does it ever chase what it points to? >> >You are responsible for freeing it by setting the active destroy flag. Well... no. You're not. If the memory hanging off the data pointer was allocated from one of parrot's managed pools (either free memory or pmc/buffer header) then you don't have to free it. There's a memory internals document, but I can't spot any document given an API overview on how to allocate memory this way. Yeah, it's all kinda ad-hoc. Needs fixing. The implication of what you're saying is that the data pointer is checked by the DOD, and any PMC it points directly to isn't dead. Well... sort of. Checking and cleaning up are two very separate things here. Parrot may not automatically check (leaving that to your PMC's custom mark routine) but will automatically clean up. (If you've not marked in your custom mark routine) Basically, if the right flags are set, the DOD trace will treat the pointer as pointing to something it should consider, and automatically trace into it. If the right flags aren't set it won't, and your needs to mark it explicitly. Regardless of anything else, the DOD sweep will reclaim the PMC/String/Buffer/PObj structures if they aren't marked in the mark phase, either automatically or by a PMC mark routine, and memory not pointed to by a live buffer-ish thing will get reclaimed, so if your PMC with custom stuff hanging off the data pointer dies parrot will still reclaim its memory and whatnot for you. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Making PMCs
Nicholas Clark <[EMAIL PROTECTED]> wrote: > I'm trying to work out how to make PMCs. I'm not finding much documentation, I'll create a POD, which hopefully will answer all these questons. leo
Re: [perl #30245] [PATCH] Resizable*Array pmcs
At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote: This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array pmcs. It does so by inheriting from them, so the Fixed ones are changed too. Applied, thanks. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes
Dan Sugalski wrote: At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote: 1) Is there any good reason to start now malloc(3) based array classes? This leads to code duplication for all the utility vtable entries (like C). F can deal with all types already. list.c's pretty inefficient for most array usage. It's good for mixed-type, sparse, or really big arrays, but for normal arrays it's overkill. A big wad of memory's just fine there. Well, yes. It depends on the usage of the PMC, which isn't known. What about shift/unshift? Are these allowed for fixed sized arrays? I'd vote for optimizing list.c for the "small usage pattern" and switch to a different strategy for big arrays. Anyway, the patch #30245 Resizable*Array implements these arrays on top of fixed size. We had that some times ago with Array/PerlArray. It was around 100 times slower for growing usage like: @ar[$_] = $x for (0..$N) for some big $N. And it of course duplicates existing classes like IntList, which just needs to get renamed. 2) What's the difference between *PMCArray and *BooleanArray? The PMC arrays hold PMCs. The Boolean arrays hold true/false values only. Then it should really store just one bit instead of a word. leo
Re: [perl #30230] [PATCH] add Fixed(Integer|Float|Boolean|String|PMC)Array classes
At 4:56 PM +0200 6/14/04, Leopold Toetsch wrote: Dan Sugalski wrote: At 11:57 AM +0200 6/14/04, Leopold Toetsch wrote: 1) Is there any good reason to start now malloc(3) based array classes? This leads to code duplication for all the utility vtable entries (like C). F can deal with all types already. list.c's pretty inefficient for most array usage. It's good for mixed-type, sparse, or really big arrays, but for normal arrays it's overkill. A big wad of memory's just fine there. Well, yes. It depends on the usage of the PMC, which isn't known. What about shift/unshift? Are these allowed for fixed sized arrays? Given that they change the size of an array... no. I'd vote for optimizing list.c for the "small usage pattern" and switch to a different strategy for big arrays. I wouldn't. list.c is designed for a different set of usage than the common array. Making it handle both common "wad of memory filled with a single type" arrays and the much-less-common "sparse array of multiple types" arrays doesn't make much sense. Better to have the array have the needed smarts to upgrade itself to the more heavyweight array type if it really needs to. Anyway, the patch #30245 Resizable*Array implements these arrays on top of fixed size. So? I'm well aware that the implementation is suboptimal. Hell, the commit message and the messages on the list make that clear. That really, *really* doesn't matter for this. The point here is to get the types in, get their behaviour correct, and nail them down as guaranteed. How they do their thing is entirely irrelevant to that. 2) What's the difference between *PMCArray and *BooleanArray? The PMC arrays hold PMCs. The Boolean arrays hold true/false values only. Then it should really store just one bit instead of a word. So fix it if you want. This is first cut code. There's plenty of time to optimzie it later. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #30252] [PATCH] work on languages/Makefile
At 5:04 AM -0700 6/13/04, Bernhard Schmalhofer (via RT) wrote: I have been looking into languages/Makefile and tried to update and beautify it. Cool. Applied, thanks. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: [perl #30245] [PATCH] Resizable*Array pmcs
Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote: >>This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array >>pmcs. It does so by inheriting from them, so the Fixed ones are changed >>too. > Applied, thanks. - duplicates existing PMCs functionality: IntList <-> ResizableIntegerArray FloatvalArray <-> ResizableFloatArray PerlArray <-> ResizablePMCArray Array <-> FixedMCArray - incomplete: push/pop/shift/unshift/splice/freeze/thaw - It's broken (realloced mem isn't cleared always) = It's 2 times slower for filling arrays with this loop: i = 0 lp: ar[i] = 1 inc i if i < n goto lp - clone is different (shallow vs deep) - whatever is right leo
More perl5.005 problems
For some reason I haven't been able to figure out, perl5.00503 can't seem to handle the TODO test in t/pmc/object-meths.t. Here's the result of perl5.005 t/harness t/pmc/object-meths.t t/pmc/object-meths..FAILED test 19 Failed 1/21 tests, 95.24% okay Failed Test Status Wstat Total Fail Failed List of failed --- t/pmc/object-me 211 4.76% 19 Failed 1/1 test scripts, 0.00% okay. 1/21 subtests failed, 95.24% okay. The same command with perl5.6 or 5.8 reports all tests succeed. Does anybody know how to fix this? Annoyingly, if I try to see what's going on with t/harness's documented -v switch, chaos ensues: perl5.005 t/harness -v t/pmc/object-meths.t t/pmc/object-meths..# Failed test (t/pmc/object-meths.t at line 62) # got: 'debug = 0x0 # Reading /home/doughera/src/parrot/parrot-andy/t/pmc/object-meths_5.pasm # using optimization '0' (0) # Starting parse... # 13 lines compiled. # Running... # main # in meth # back # ' # expected: 'main # in meth # back # ' [ etc.] So my other question is: Is t/harness -v actually supposed to work and do something useful? (Incidentally, all of this presumes that I've installed a recent File::Spec into parrot's lib/ directory -- 5.005's File::Spec isn't up to the task.) -- Andy Dougherty [EMAIL PROTECTED]
Re: [perl #30245] [PATCH] Resizable*Array pmcs
At 5:50 PM +0200 6/14/04, Leopold Toetsch wrote: Dan Sugalski <[EMAIL PROTECTED]> wrote: At 12:42 AM -0700 6/13/04, Matt Fowles (via RT) wrote: This patch adds Resizable*Array pmcs as the counterparts to Fixed*Array pmcs. It does so by inheriting from them, so the Fixed ones are changed too. Applied, thanks. - duplicates existing PMCs functionality: IntList <-> ResizableIntegerArray FloatvalArray <-> ResizableFloatArray PerlArray <-> ResizablePMCArray Array <-> FixedMCArray Yup. (Well, except for the PerlArray part) Part of this exercise is to standardize things and toss the things we no longer need. - incomplete: push/pop/shift/unshift/splice/freeze/thaw Then we need to start a todo list. - It's broken (realloced mem isn't cleared always) So fix it, or file a bug report for someone else to fix. = It's 2 times slower for filling arrays with this loop: i = 0 lp: ar[i] = 1 inc i if i < n goto lp Well, yeah, it's unoptimized. That can be fixed. - clone is different (shallow vs deep) - whatever is right Which needs standardization. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Slices and iterators
Since we're going to need these, and I'm in a documenting and defining mood (yes, I am making a final decision on strings today. Whee!) I figure we need to tackle them. First, slices. Perl's got 'em, Python has them, Ruby, interestingly, doesn't. (Sort of) A slice is a subset of elements in an aggregate. They don't have to be contiguous, unique, or in any order. As an example: @foo = ('A', 'B', 'C', 'D'); @bar = @foo[0,2]; # A slice--elements 0 and 2 @bar is now ('A', 'C'). Or: @bar = @foo[0,0,0,0,0,0]; @bar is now ('A', 'A', 'A', 'A', 'A', 'A'). I think for this to work we need to add a slice vtable entry. Not because I'm particularly fond of vtable entries as such, but it's a pretty fundamental operation. (Python devotes opcodes to it even) The slice vtable entry should take as its parameter a slice pmc. This should be an array of typed from/to values, so we can do something like: @foo[0..2,4..8,12..]; with three entries in the slice array--one with a from/to of 0/2, one with 4/8, and one with 12/inf. Typed since these will be used with hashes, and we'll need to differentiate between something that should be taken as a string and something taken as an integer. (If the range ends are PMCs, since they may behave differently depending on which way they're read) This vtable entry should return an iterator, which is why they're here--not because I've any particular love of the things, but because if someone does: @foo = @bar[0..]; on an array that generates data randomly we'll get caught in an infinite loop, which is generally a bad thing. Since we're working on iterators, all aggregates should be able to generate them, which leads to the iterator vtable entry (since everyone wants to iterate over everything). So, the proposal: *) We add a slice vtable entry which takes a slice pmc and returns an interator *) We add an iterator vtable entry which returns an interator for the PMC *) We consider ways to make slices. I can see ops, or I can see basic functions. Either is fine, depends on how often the things are used. (Ops have less overhead, functions mean fewer ops) Please, let discussion ensue. We'll decide on the slice creation method in a day or two and then just make it all happen. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
The behaviour of iterators
Once we decide how to *get* these things (see the previous e-mail) we need to decide how they should work. We can fiddle around, but honestly the scheme: 1) They act as arrays--if you want the 18th element in the iterator, access it directly 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods to get the next, previous, first, or last element in the iterator, or to reset the iterator to the beginning. Next, last, and reset change the internal current element pointer, first and last don't. Sane? The only downside I can see is one of speed, since method calls are a bit costly. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: The behaviour of iterators
Dan~ Just a few questions. Dan Sugalski wrote: 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods to get the next, previous, first, or last element in the iterator, or to reset the iterator to the beginning. Next, last, and reset change the internal current element pointer, first and last don't. Do you mean next, previous, reset? What about those data structures that can only be iterated in one direction easily (such as a singly linked list)? Should they implement previous in the slow and painful way and hope no one calls it? Should they throw an exception? Might it be worthwhile to have two different types of iterators (those that only go one direction and those that go both)? Matt
Re: The behaviour of iterators
At 1:15 PM -0500 6/14/04, Matt Fowles wrote: Dan~ Just a few questions. Dan Sugalski wrote: 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods to get the next, previous, first, or last element in the iterator, or to reset the iterator to the beginning. Next, last, and reset change the internal current element pointer, first and last don't. Do you mean next, previous, reset? D'oh! Yes. What about those data structures that can only be iterated in one direction easily (such as a singly linked list)? Should they implement previous in the slow and painful way and hope no one calls it? Should they throw an exception? Might it be worthwhile to have two different types of iterators (those that only go one direction and those that go both)? Exceptions for unimplemented behaviour is just fine. I should've specified. (I can see defining a basic and extended iterator protocol for this) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: The behaviour of iterators
Dan Sugalski writes: > Once we decide how to *get* these things (see the previous e-mail) we > need to decide how they should work. We can fiddle around, but > honestly the scheme: > > 1) They act as arrays--if you want the 18th element in the iterator, > access it directly > 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods > to get the next, previous, first, or last element in the iterator, or > to reset the iterator to the beginning. Next, last, and reset change > the internal current element pointer, first and last don't. Why not take a page from C++ and call "previous" and "next" C and C, and then C to get what it points to. The ops are already there. Not sure about "reset" though. Luke
Re: Slices and iterators
Dan Sugalski writes: > The slice vtable entry should take as its parameter a slice pmc. This > should be an array of typed from/to values, so we can do something > like: > > @foo[0..2,4..8,12..]; > > with three entries in the slice array--one with a from/to of 0/2, one > with 4/8, and one with 12/inf. Perl also has: @foo[0..12 :by(3)]# 0,3,6,9,12 PDL has affine slices. To me, it seems like the best thing to do is to give slice an iterator, and slice would return an iterator that maps keys to values. So, doing C<@bar = @foo[0..2,4..8,12...]> would look something like: Construct iterator for 0..2, 4..8, 12... Call @foo->VTABLE_slice(iterator) Initialize @bar from returned iterator Iterators have the advantage over arrays since they can be infinite. With arrays, how do you represent: @foo[12... :by(3)] Do we still have multidimensional keys? Luke
Re: The behaviour of iterators
At 1:08 PM -0600 6/14/04, Luke Palmer wrote: Dan Sugalski writes: Once we decide how to *get* these things (see the previous e-mail) we need to decide how they should work. We can fiddle around, but honestly the scheme: 1) They act as arrays--if you want the 18th element in the iterator, access it directly 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods to get the next, previous, first, or last element in the iterator, or to reset the iterator to the beginning. Next, last, and reset change the internal current element pointer, first and last don't. Why not take a page from C++ and call "previous" and "next" C and C, and then C to get what it points to. Because ++ and -- affect the value not the container. (There are days when I think "C++ does it like..." is the near-perfect argument against doing it one particular way... :) Next and previous are actions on the container. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Slices and iterators
At 1:21 PM -0600 6/14/04, Luke Palmer wrote: Dan Sugalski writes: The slice vtable entry should take as its parameter a slice pmc. This should be an array of typed from/to values, so we can do something like: @foo[0..2,4..8,12..]; with three entries in the slice array--one with a from/to of 0/2, one with 4/8, and one with 12/inf. Perl also has: @foo[0..12 :by(3)]# 0,3,6,9,12 PDL has affine slices. Yeah, but at some point you have to draw the line and say "This is as far as we're going at the low level." Iterators have the advantage over arrays since they can be infinite. With arrays, how do you represent: @foo[12... :by(3)] And that is probably well past it. :) Do we still have multidimensional keys? Yes, we do. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Big nums
At 8:15 PM +0100 6/14/04, Alex Gough wrote: [Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED] | Time for these as well. There's a partial implementation of them in | types/bignum.c. I think it's time to move that to src/ (and the | header file to .h) and get it integrated into parrot. I'm not really sure if types/bignum.c is what we want. There are AFAIK some other math packages around, which are maintained and more complete. GMP comes to my mind. That's not such a bad plan. There's still a lot to do before the bignum stuff is entirely ready (in terms of functions for the standard) (and I'm still too busy right now to get deeply into anything). The only thing that worries me about GMP is the license. It's LGPL, so we might be able to, but it's tough to tell for sure, and the explanatory text doesn't help at all. The only bignum stuff I want in the core is the basics--extended-precision numbers and basic math. (If we get transcendentals as a bonus, well... swell) I think I'd as soon just flesh out what we have now and be done with it. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Big nums
At 3:40 PM -0400 6/14/04, Dan Sugalski wrote: At 8:15 PM +0100 6/14/04, Alex Gough wrote: [Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED] | Time for these as well. There's a partial implementation of them in | types/bignum.c. I think it's time to move that to src/ (and the | header file to .h) and get it integrated into parrot. I'm not really sure if types/bignum.c is what we want. There are AFAIK some other math packages around, which are maintained and more complete. GMP comes to my mind. That's not such a bad plan. There's still a lot to do before the bignum stuff is entirely ready (in terms of functions for the standard) (and I'm still too busy right now to get deeply into anything). The only thing that worries me about GMP is the license. It's LGPL, so we might be able to, but it's tough to tell for sure, and the explanatory text doesn't help at all. But on second reading it the license makes this untenable. If we did use the GMP library and shipped it with parrot then we'd be obligated to package the full GMP source code with every binary distribution, so... no joy there. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Characters/graphemes/freds
Did we ever come to some consensus of what a "character" (that is a sequence of code points which makes up a single atomic thing in a language) should be called? I seem to remember grapheme being not-quite-correct, but I can't dig up the better answer. (And yes, the string doc is being finished. This is all that's left to it) -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Strings. Finally.
The official, 1.0, final version, modulo a more correct name for 'grapheme', or spelling/grammar errors. Do please note that whatever objection you may have to this has at least three people who disagree differently, and one or more (who aren't me) who agree with what you disagree with. Also note that I'm not entirely happy with this either. Consider it an exercise in group coping--we will all deal with it and make do. All complaints, *including* mine, shall be summarily binned, with extreme prejudice. And yes, this means I won't be complaining about Unicode any more. ++Cut Here Strings, the final design document Requirements * Efficiency - The system must do the absolute minimum amount of work to get the job done * Correctness - The job that's done must actually be right * Upgradeability - This stuff's all going to change again in five years so we really don't want to have to do it over again. * Flexibility - Since, unfortunately, no one way of looking at strings is going to be right for everyone Realities = * There are a lot of different ways of representing text. Many of them annoying, some of them wildly incompatible, none of them wrong. * We don't get to make the call what is right or wrong * Some of the languages we support don't do Unicode, or do Unicode and other things (including perl 5 and Ruby) Desires === * We want to make it easily possible to do the right thing with string data * We want all the troublesome stuff to be as invisible as possible * We want to make it look like everyone's got what they want without actually doing it when we don't have to With that list in mind, here's parrot's solution. Please note that the *only* thing up for discussion is a more correct label for 'grapheme'. It is, otherwise, the final external design. Definitions === BYTE - 8 bits 'o data CODE POINT - A 32-bit integer that represents a single thing in a character set ENCODING - How code points are mapped to bytes, and vice versa CHARACTER SET - Contains meta-information about code points. This includes both the meaning of individual code points (65 is capital A, 776 is a combining diaresis) as well as a set of categorizations of code points (alpha, numeric, whitespace, punctuation, and so on), and a sorting order. GRAPHEME - One or more code points which makes up a single real entity. The "oe" (I'm stuck with ASCII here, that should really be an o with two dots over it) in Leo's last name is, in the unicode character set, a single character with two code points, 111 (lowercase o) and 776 (combining diaresis). Graphemes can *not* be legitimately decomposed into individual code points in most cases. Important note == This document is completely language-insensitive--that is, there's no language attached to any particular piece of data. Collation and casing rules are done based on a single global setting that is unconditionally applied in all cases. Setting and querying those rules is beyond the scope of this document. Conceptually The smallest unit of text that Parrot will process is the string, something that can be put in an S register. These strings have the following properties: *) They have an encoding *) They have a character set *) They have a taint status The above things are independent of the view of the string presented to bytecode programs--these are metadata elements that describe the contents of the string as they actually exist, rather than as they are presented. Internally parrot is capable of maintaining strings in several different basic encodings (8-bit, 16-bit, and 32-bit integer, as well as UTF-8) and may load other encodings on the fly as needed. Parrot so also capable of maintaining strings in many different character sets (ASCII, EBCDIC, Unicode, Latin-n, etc) which are also dynamically loadable. Finally Parrot is capable of maintaining strings in many different languages, which also may be loaded on the fly. This is done for maximum efficiency, regardless of the view of the data presented to the bytecode programs. Conversion to a different format may be done if needed to properly express the semantics of the program, but will not be done if not needed. For example, consider the following: use Unicode; open FOO, "foo.txt", :charset(latin-3); open BAR, "bar.txt", :charset(big5); $filehandle = 0; while (<>) { if ($filehandle++) { print FOO $_; } else { print BAR $_; } $filehadle %= 2; } Relatively simple, the program reads from the input filehandle and splits the data, line by line, between two output files. The two output files have different requirements -- FOO gets data in Latin-1, while BAR gets it in Big5. The "use Unicode;" thing at the top's a hand-wavey way of a
Re: Big nums
[Sat, Jun 12, 2004 at 11:39:27AM +0200: [EMAIL PROTECTED] > > | Time for these as well. There's a partial implementation of them in > | types/bignum.c. I think it's time to move that to src/ (and the > | header file to .h) and get it integrated into parrot. > > I'm not really sure if types/bignum.c is what we want. There are AFAIK > some other math packages around, which are maintained and more complete. > GMP comes to my mind. That's not such a bad plan. There's still a lot to do before the bignum stuff is entirely ready (in terms of functions for the standard) (and I'm still too busy right now to get deeply into anything). At the same time, I'd caution against putting too much functionality into the core, the current Perl bignum stuff is probably too broad, which makes it tricky to look after. I'd also argue strongly in favour of a decimal bignum implementation, because that gets you two birds with one stone (well, a few: bignums, limited/defined precision bignums, trustyworthy rounding, decent numerical exception support, world peace), and a massive test suite from ibm to let you know everything's working. Alex -- It's supposed to be automatic but actually you have to press this button
Re: More perl5.005 problems
On Mon, Jun 14, 2004 at 12:00:42PM -0400, Andy Dougherty wrote: > For some reason I haven't been able to figure out, perl5.00503 can't seem > to handle the TODO test in t/pmc/object-meths.t. Here's the result of 5.5.3's Test::Harness doesn't know how to handle that style of TODO. You'll have to make a dependency on T::H 2.x if you want to use TODO. -- Michael G Schwern[EMAIL PROTECTED] http://www.pobox.com/~schwern/ Funny thing about weekends when you're unemployed--they don't mean quite so much. 'Cept you get to hang out with your workin' friends. - Primus "Spaghetti Western"
Re: More perl5.005 problems
At 4:39 PM -0400 6/14/04, Michael G Schwern wrote: On Mon, Jun 14, 2004 at 12:00:42PM -0400, Andy Dougherty wrote: For some reason I haven't been able to figure out, perl5.00503 can't seem to handle the TODO test in t/pmc/object-meths.t. Here's the result of 5.5.3's Test::Harness doesn't know how to handle that style of TODO. You'll have to make a dependency on T::H 2.x if you want to use TODO. Is there another style of TODO that could be used here that would be compatible with 5.005_03? -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: More perl5.005 problems
On Mon, 2004-06-14 at 14:26, Dan Sugalski wrote: > >5.5.3's Test::Harness doesn't know how to handle that style of TODO. > >You'll have to make a dependency on T::H 2.x if you want to use TODO. > Is there another style of TODO that could be used here that would be > compatible with 5.005_03? None such beast currently exists. In fact, I'm surprised he managed to install an acceptably recent version of Test::Simple on 5.5.3 without upgrading Test::Harness; the bundle's required Test::Harness 2.03 for a couple of years now. -- c
Parrot core dumps on FC1?
Has anyone run into immediate core dumps on Fedora Core 1? When I run 'make' the interpreter successfully compiles, but dumps core when it tries to compile parrotlib.imc. : blib/lib/libparrot.a c++ -o parrot -Wl,-E -g imcc/main.o blib/lib/libparrot.a blib/lib/libicuuc.a blib/lib/libicudata.a -lnsl -ldl -lm -lcrypt -lutil -lpthread -lrt ./parrot -o runtime/parrot/include/parrotlib.pbc runtime/parrot/library/parrotlib.imc make: *** [runtime/parrot/include/parrotlib.pbc] Segmentation fault Under gdb: [EMAIL PROTECTED] parrot]$ gdb parrot GNU gdb Red Hat Linux (5.3.90-0.20030710.41rh) Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/tls/libthread_db.so.1". (gdb) r Starting program: /home/michel/parrot/parrot [Thread debugging using libthread_db enabled] [New Thread -1084659872 (LWP 3839)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread -1084659872 (LWP 3839)] doOpenChoice (path=0x827684c "icudt26l", type=0x8274d47 "icu", name=0x8274d40 "uprops", isAcceptable=0x8203104 , context=0x0, pErrorCode=0xbff5186c) at udata.c:825 825 if(pHeader->dataHeader.magic1==0xda && (gdb) Is this a known FC1 issue? -Michel
Some rationale for the mixed encoding scheme
Since I know this is going to come up, I figure I should pre-empt it and be done with it. (Though I should've put this in the string document. Ah, well. Hopefully timing isn't everything, or I am *so* in trouble...) Why aren't we converting to Unicode on the edge? Since, after all, any Sane Language will do all its string handling in Unicode, right? Why leave things the way they are until late? Simple. Efficiency. It's no less efficient to defer conversion of string data to Unicode (or, heck, from a harder-to-use (UTF-8) encoding to an easier to use one (UTF-32)) on demand then it is to do it at the edge. But... we get the bonus of *not* spending the time to do the encoding shifts and charset shifts if we don't need to. Which will happen for folks if they, for example, never *do* anything that'd mandate the shift. And if they do, well, we do the shift once then switch over the string vtable pointers to the new encoding and never have to do so again. And while that may not be an overwhelming win, nor convincing to everyone, it also means that folks who want to stick with a single, non-Unicode setup (US-ASCII or Latin-1 folks who don't want to shift) can do so without incurring a penalty in time, space, or e-mail complaining. (And, bluntly, at this point I consider features that let people not grumble a big win) Is it a bit more work for us? Well, a little, but no more so than using vtables for PMCs to do stuff, and that's all worked out quite nicely, honestly. I do realize that the Big ICU Patch tossed a lot of the infrastructure for this, which broke parrot for folks who can't/won't do ICU. (And there are a number of folks shut out of development because they can't get ICU going) That'll be put back over then next week or so and ICU factored out to an optional build feature. -- Dan --it's like this--- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk
Re: Slices and iterators
Dan Sugalski writes: > At 1:21 PM -0600 6/14/04, Luke Palmer wrote: > >Dan Sugalski writes: > >> The slice vtable entry should take as its parameter a slice pmc. This > >> should be an array of typed from/to values, so we can do something > >> like: > >> > >> @foo[0..2,4..8,12..]; > >> > >> with three entries in the slice array--one with a from/to of 0/2, one > >> with 4/8, and one with 12/inf. > > > >Perl also has: > > > >@foo[0..12 :by(3)]# 0,3,6,9,12 > > > >PDL has affine slices. > > Yeah, but at some point you have to draw the line and say "This is as > far as we're going at the low level." > > >Iterators have the advantage over arrays since they can be infinite. > >With arrays, how do you represent: > > > >@foo[12... :by(3)] > > And that is probably well past it. :) > > >Do we still have multidimensional keys? > > Yes, we do. Then these are both clear arguments for giving an iterator to slice rather than an array. We have no way to represent @foo[12... :by(3)], so how do we represent it? We have multidimensional keys but no way to slice by them. The former is solved by constructing the lazy iterator for C<12... :by(3)> and giving it to @foo's slice. The latter is solved by creating an iterator that yields multidimensional keys. What advantage does the simple array case give, other than the one fewer opcode from extracting the iterator? Luke
Re: The behaviour of iterators
Dan Sugalski writes: > At 1:08 PM -0600 6/14/04, Luke Palmer wrote: > >Dan Sugalski writes: > >> Once we decide how to *get* these things (see the previous e-mail) we > >> need to decide how they should work. We can fiddle around, but > >> honestly the scheme: > >> > >> 1) They act as arrays--if you want the 18th element in the iterator, > >> access it directly > >> 2) They have 'next', 'previous', 'first', 'last', and 'reset' methods > >> to get the next, previous, first, or last element in the iterator, or > >> to reset the iterator to the beginning. Next, last, and reset change > >> the internal current element pointer, first and last don't. > > > >Why not take a page from C++ and call "previous" and "next" C and > >C, and then C to get what it points to. > > Because ++ and -- affect the value not the container. (There are days > when I think "C++ does it like..." is the near-perfect argument > against doing it one particular way... :) Heh, yeah. > Next and previous are actions on the container. Then how, if we have an array of iterators, do we increment an internal iterator from an external one. That is: @foo = (1..5); @bar = map { @foo.iter($_) } reverse 0..4; # make an array of iterators for @bar  0... -> $x, $c { if something($x) { @bar[$c].next; } } Without that awful reference of $c (awful not in a stylistic way, but in a I-have-to-keep-the-index-around-too-wtf kind of way). I'm arguing for an iterator to be an I pointer, one that you have to dereference, for this reason. Luke
Re: Strings. Finally.
Sorry to reply to this, but I feel that this is a request for clarifications, not for a change. :^) Dan Sugalski wrote: Synthesized code points === ... becomes two integers, 0x0041 and 0x82A9. (Though it could represent them as 16-bit integers, since no character takes three or more bytes) It strikes me that this scheme is not always null-safe (e.g. the character 00 11 would be indistinguishable from a bare 11). Are there any encodings this could cause a problem with? getbyte Ix, Sy, Iz (u)getcodepoint Ix, Sy, Iz (u)getgrapheme Sx, Sy, Iz > Get the byte, codepoint, or grapheme requested. Destination is either an integer (representing the byte or codepoint) or a string. Sy is the source string, Iz is the offset in bytes, code points, or graphemes from the beginning of the string. Since we're going to be shifting around the encoding essentially at will, does 'getbyte' make sense on non-binary strings? (And when we have a binary string, is there any difference between 'getbyte', 'getcodepoint', and 'getgrapheme' at all?) If so, will 16- and 32-bit encodings have to implement this with a forward scan from the start of the string (the way getcodepoint would have to be implemented with a variable-width encoding), or do you have another trick up your sleeve? setbyte Sx, Iy, Iz (u)setcodepoint Sx, Iy, Iz (u)setgrapheme Sx, Sy, Iz Likewise. -- Brent "Dax" Royal-Gordon <[EMAIL PROTECTED]> Perl and Parrot hacker Oceania has always been at war with Eastasia.
Re: Event design sketch
On Tue, 11 May 2004, Uri Guttman wrote: > >> Why would alarm need any special opcode when it is just a timer > >> with a delay of [abs_time minus NOW]? > >> Let the coder handle that and lose the extra opcodes. > > mab> you want to make the latency between getting the abs_time, doing > mab> the substract[ion] and actually setting up the time as small as > mab> possible > > Accuracy of delivery (latency) is silly to worry about in Perl for > granularities of more than about .05 seconds or so. Building a very fine > grained accurate real-time system in Perl makes little sense to me. > so i usually don't worry about who does the delta calculation and the > slight amount of delay it takes. Never mind the granularity or latency, there are systems where "time of day" can be adjusted to take into account clock drift, while "system elapsed time" is left unaffected. Which you want depends on whether you want to sleep for a specific time, or wake up at a specific time, and it would be nice if Parrot didn't rule out making use of that. -Martin