Re: String API
In message <[EMAIL PROTECTED]> Peter Gibbs <[EMAIL PROTECTED]> wrote: > I do not believe that the two existing parameters are orthogonal, > so the number of charset (or whatever) entities would be less than > the cross product. e.g. the existing 2 chartypes x 4 encodings > would really only require 4 charsets. The problem is that there are hundreds of characters sets that use a single byte encoding so you're going to wind up duplicating the encoding related actions for all those character sets. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: This week's summary
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 8:11 AM +0100 7/3/03, Alan Burlison wrote: > >Dan Sugalski wrote: > > > >>I'm pretty sure the POSIX docs say that you can't call mutex > >>routines from within interrupt code, which makes sense--the last > >>thing you want is for an interrupt handler to block on a mutex > >>aquisition. > > > >I haven't got a copy, but I'd be surprised if they explicitly > >forbade it - I think however you *do* need to be very careful if you > >need to mix mutexes and signals, so that you don't self-deadlock. > > I don't have a copy handy anymore either, unfortunately, but > Butenhof's pretty clear--none of the Posix thread functions are async > safe. (Section 6.6.6, p234-235 in my copy) I seem to remember someone > scolding me about this, or something like it, ages ago. The POSIX docs (or rather their successor, the SuS docs) can be found online - the current version is at: http://www.opengroup.org/onlinepubs/007904975/toc.htm Specific documentation the the pthread routines is at: http://www.opengroup.org/onlinepubs/007904975/basedefs/pthread.h.html Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: core.ops / vtable: keyed_int
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Is there any compelling reason, why the {get,set}__keyed_int > vtable methods are defined to take a KEY* value instead of a plain > INTVAL value? They aren't are they? As far as I can see they are INTVAL* values... > This causes an (IMHO) unneeded test for a NULL key, a stack variable > for the key and prohibits JIT code to pass a processor register > directly, when the integer register is MAPed. If you're asking why they are INTVAL* rather than INTVAL then I had thought the answer was to allow support of multi-level keys, but that doesn't actually seem to be the case. I can only assume that the original reason was to support multi-level keys though. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Parrot 0.0.9
In message <[EMAIL PROTECTED]> Steve Fink <[EMAIL PROTECTED]> wrote: > * Keyed access > - Another discussion that's gone over my head. Leo has a scheme to > dramatically reduce the number of instructions, at the cost of > requiring a couple of opcodes for keyed accesses; Dan says that > lots of instructions are no big deal and pushing forward with the > status quo is better. > - Either way, the current keyed support isn't complete. I've got a more or less complete patch for dynamic key contruction lying around here somewhere... I'll try and dig it out and send it in a while... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #17876] [PTACH] Parrot_snprintf writes 1 char too much
In message <[EMAIL PROTECTED]> Leopold Toetsch (via RT) <[EMAIL PROTECTED]> wrote: > I didn't look, if this is really intended, but I wouldn't like to behave > Parrot_snprintf different then snprintf(3). > > It would also be nice, if we could have a return value, consistent with > glibc 2.1. One slight problem with makeing Parrot_snprintf consistent with snprintf is that there are at least three different ways that snprintf is implemented on different platforms. It's probably best to do whatever C99 does, which I think is the same as what glibc does, namely to return the amount of space that would be needed to avoid truncation if the result is truncated. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Eliminate padding warnings
In message <[EMAIL PROTECTED]> Simon Glover <[EMAIL PROTECTED]> wrote: > I've just had a quick look at hash.h (which I should have done in the > first place) and you're quite right. Second attempt at a correct patch > below. > > All tests still pass, but this isn't much comfort, as the fact that the > preceding patch 'worked' suggests that nothing's actually testing this > line of code. I'll see if I can do something to remedy this tomorrow, > unless somebody beats me to it. That looks better, although you can actually get rid of the cast once you do that as pmc_val has the right type. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Eliminate padding warnings
In message <[EMAIL PROTECTED]> Simon Glover <[EMAIL PROTECTED]> wrote: > This one happens because entry is a HASH_ENTRY*, but get_pmc_keyed is > expecting a PMC*. However, by this point in the function, we've already > verified that entry is actually a PMC*, so it should be safe to add a > cast, as in the patch below. This shuts the warning up, and all tests > still pass. I don't thnik that works. A HASH_ENTRY is not a PMC. I think you want pass entry->val.pmc if you want the PMC stored in the hash. > --- classes/perlhash.pmc.old Wed Oct 9 15:59:29 2002 > +++ classes/perlhash.pmc Wed Oct 9 15:59:41 2002 > @@ -189,7 +189,7 @@ pmclass PerlHash { > if (!nextkey) > return entry->val.pmc_val; > return entry->val.pmc_val->vtable->get_pmc_keyed(INTERP, > - entry, nextkey); > + (PMC*)entry, nextkey); > > } > internal_exception(OUT_OF_BOUNDS, -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #17621] [PATCH] intlist cleanup (intlist-3)
In message [EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Tom Hughes wrote: > > > Unfortunately it doesn't test clean afterwards: > > dunsmere [~/src/parrot] % perl t/harness t/pmc/intlist.t > > > t/pmc/intlistNOK 3# Failed test (t/pmc/intlist.t at line 149) # > > got: '' > > Ah, dumps core, aha GC, ahaaha junk_list moved. I ran the tests with > my Lea allocator in place, which doesn't suffer from this problem ;-) > > Anyway here is an update, tested with CVS GC. Much better. Applied. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [perl #17621] [PATCH] intlist cleanup (intlist-3)
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > This patch contains: > - removal of the initial parameter from intlist_new - no need for it in > the public interface (thanks Tom) > - junk_list is now a Buffer, which gets collected > - some macros to hide this > - more docs on indexed access/junk_list > - overall cleanup for junk_list > - rebuild_junk_list is forced after collection I applied this to my checkout (with _intlist_new renamed to allocate_chunk because names starting with _ are reserved). Unfortunately it doesn't test clean afterwards: dunsmere [~/src/parrot] % perl t/harness t/pmc/intlist.t t/pmc/intlistNOK 3# Failed test (t/pmc/intlist.t at line 149) # got: '' # expected: 'ok # ' t/pmc/intlistNOK 4# Failed test (t/pmc/intlist.t at line 189) # got: '' # expected: 'ok 1 # ok 2 # ' # Looks like you failed 2 tests of 4. t/pmc/intlistdubious Test returned status 2 (wstat 512, 0x200) DIED. FAILED tests 3-4 Failed 2/4 tests, 50.00% okay Failed Test Stat Wstat Total Fail Failed List of Failed --- t/pmc/intlist.t2 512 42 50.00% 3-4 Failed 1/1 test scripts, 0.00% okay. 2/4 subtests failed, 50.00% okay. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [perl #17615] [PATCH] perl6: make --test
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Attached patch fixed the "make --test" problem, reported by Tanton et al. > Actually it was my fault, I forgot about the changed semantics of > running imcc and the usage in TestCompiler.pm Looks good to me, and seems to solve the problem. Applied. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Status of my patches ...
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > No it's not a reset thing. I should have documented it better, though i > thought the wod "initial" would tell it ;-) Well I was thinking of it as initial allocation versus reallocation. > The intlist structure is a little bit special, the first chunk - and > after shift/unshift maybe another entry - is the head of the list, which > carries additional information: Before my patch only the length of the > list, and now additionally e.g. the junk_list member. > > This head chunk or "the list" is the parameter for the intlist > functions, in case of shift/unshift you pass an address, because the > head might move. > > The "intial" parameter now constructs such an head entry, is it 0, then > a "normal" entry is allocated. So it's an implementation detail that doesn't need to be exposed outside of the intlist code and therefore probably shouldn't be ;-) Seriously, I'd suggest putting the common code into intlist_new_chunk or something and then have intlist_new call that before doing the other setup. That way you don't need the extra argument. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Status of my patches ...
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > On 26 Sep 2002, Tom Hughes wrote: > > > The problem here is that the rule in the Makefile that causes it to > > rerun Configure.pl if any of the Configure.pl generated files is out > > of date clashes with the recently introduced edit to stop Configure.pl > > updating a file that hasn't actually changed. > > I think that the 'recently-introduced-edit' is wrong. Make's dependency > system requires that the stated commands actually bring something > up-to-date. On the other hand, without that edit it winds up rebuilding everything every time you run Configure, even if it doesn't have to... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Status of my patches ...
In message [EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > #17549, 17569 intlist bugfix, speedup, test Applied. One slight query I had was the meaning of the extra parameter added to intlist_new() by this patch. I assume the idea is that you can call it with a value of 0 to reset the intlist? I suspect that it might be better to have a separate intlist_reset or intlist_empty or something to do that rather than a wierd boolean parameter like that. Not a major issue though. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Status of my patches ...
In message [EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > #17353/17323 test for Parrot_sprintf Applied. I've also updated MANIFEST and the .cvsignore files to try and match something approaching reality. The outstanding question here is anyop.h and anyop.c in languages/imcc as they are not built, and seem to have been removed from the MANIFEST but are still in the repository. Are these now dead? If they are then I'll remove them, otherwise they need to go back into the manifest. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Status of my patches ...
In message <20020925234547$[EMAIL PROTECTED]> Tanton Gibbs <[EMAIL PROTECTED]> wrote: > > #17517 build system, permanent Configure runs - annoying at least > > I wish someone would commit this one as this does fix a very annoying > problem, especially on cygwin. Applied. The problem here is that the rule in the Makefile that causes it to rerun Configure.pl if any of the Configure.pl generated files is out of date clashes with the recently introduced edit to stop Configure.pl updating a file that hasn't actually changed. As a result the Makefile continues to think it needs to rerun Configure.pl ad infinitum. I'm not quite sure how to resolve this in the long term as there are conflicting goals here, but I've committed the patch for now. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [perl #17578] [PATCH] imcc 0.0.9.3
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > What Solaris's qsort ended up doing was walking off the front of the > reglist[] array, effectively trying to sort reglist[-1], which, of > course, didn't exist. (*Why* it did that is a bit of a mystery; I'd have > to look at the detailed implementation of Sun's qsort to say for sure.) > > Here's the patch that fixes it. Applied. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Status of my patches ...
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > #17578 imcc including all fixes sent to the list except todays fix >by Andy. > - actually the 3rd fix summary IIRC I sent in (s. there for a list of >patches, which are obsolete) > - CRUCIAL for non i386 platforms to run perl6 Applied. > #17193 necessary for imcc to write out PBC Applied. Like you I don't like it much but there aren't any other obviously better ways. I missed that when it went through originally, but I don't like your first two suggestions as they have nasty speed and/or memory overhead issues. I don't really understand what you mean by the third one so I can't comment much on that... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: pdd06_pasm, pdd08_keys: _keyed ops
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > For _keyed operands I would propose: > > The used keys are not coded into the opcode name (so no 64 variations), > but the opcode-number holds this information. > > add_p_p_p (op #297) > app_p_k_p_p => #297 + (KEY1_FLAGS << 16) > add_p_p_k_p => #297 + (KEY2_FLAGS << 16) > ... > where KEY1_FLAGS are e.g. > _k 0b0001 > _ki 0b0011 > _kic 0b0111 > _kc 0b0101 > KEY2_FLAGS same << 3 ... > > Now the current run loop look's like this: > while (pc) { > DO_OP(pc, interpreter); > } > > #define DO_OP(PC,INTERP) (PC = (INTERP->op_func_table)[*PC])(PC,INTERP)) > > I would change the run loop like so: > >while (pc) { > argp1 = ...pmc_reg.registers[cur_opcode[1]]; > if (*pc & KEY1_MASK) { > key1 = ...pmc_reg.registers[cur_opcode[2]]; /* for p_k */ > argp1 = get_keyed_entry(argp1, key1, flags); > } > ... > PC = (INTERP->op_func_table)[*PC & KEY_MASK]( ... ); > PC += n_keys; /* account for additonal keys in bytecode */ > } > > which would call add_p_p_p(argp1, argp2, argp3). > > "argp" points either directly to the PMC register, or for keys, into > the bucket, where the PMC is stored in the aggregate. Of course, argp > shouldn't move due to GC while processing one op. This may be all fine and dandy for the prederef case but it's going to force the opcode functions to do a lot more work working out where to get the key from in all the other cases. It also prevents you using the optimised _int vtable methods for keyed access using integer keys... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: pdd06_pasm, pdd08_keys: _keyed ops
In message <[EMAIL PROTECTED]> Sean O'Rourke <[EMAIL PROTECTED]> wrote: > Actually, if scratchpads become proper PMC's these ops would be incredibly > useful and common. For example, "@a[0] = %b{1} + $c" might become > > add P0["@a";0], P0["%b";"1"], P0["$c"] > > This is rather speculative, but if many operations will be on lexicals as > opposed to registers/temporaries, such hoariness might be worth it. Except those indexes are key constants which are type kc but Leopold only wants to allow dynamically created keys of type k on the other ops. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: pdd06_pasm, pdd08_keys: _keyed ops
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Tom Hughes wrote: > > > You will still get horrible op explosion for a three argument op as > > even if you assume that all PMCs are keyed, there are four key types > > which, with three operands, gives you 64 ops in total. > > No. We would have > set_p_kc > set_p_ki > set_p_kic ... special shortcut set/get, only one key per op allowed > and > op_p_k_p_k ... unary keyed, all k are KEYs > op_p_k_p_k_p_k ... 3 keys bin op, all k are KEYs So now the assembler has to know that set is special and can have all four sorts of keys while the other ops only support dynamic keys? > All 64 combinations would be a horror. Indeed. > But I really vote for a predereferencing like solution. I didn't really understand that part of your previous message, but I don't see what relevance that has to how the assembler decides on the op name to use (and hence how to encode the arguments). Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: pdd06_pasm, pdd08_keys: _keyed ops
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > >>2) What PASM ops should above statement generate: > >>a) add_p_k_p_p_k (i.e. all variations of /p(_k)?/ ) > >>b) add_p_k_p_k_p_k > >> if b) how to create a NULL key and how does it look like in PBC? > >> > > As things stand it would have to be option a, or at least that is > > what > > > the current assembler would generate. There isn't much else it can do > > really - how would it know when to generate a null key for an operand > > and when not to? > > If there is a plain P0 without [], the assembler hat to insert a NULL > key instead. In other words we assume all PMC arguments have a key, so you can never have a p in a opcode name with one of k/kc/ki/kic following it? You will still get horrible op explosion for a three argument op as even if you assume that all PMCs are keyed, there are four key types which, with three operands, gives you 64 ops in total. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: pdd06_pasm, pdd08_keys: _keyed ops
In message <[EMAIL PROTECTED]> Leopold Toetsch <[EMAIL PROTECTED]> wrote: > Above docs state, that a gernal parrot op looks like this > > op dest[dkey], src1[skey1], src2[skey2] > > e.g. > > add P0[P1], P2, P3[P4] > > where P1 and P4 are keys and P0 and P3 are aggregates and P2 is a scalar. > > Several questions arise from these pdd's: > > 1) Are above pdd's valid, WRT this 3 key opcodes? As far as I know PDD 08 is up to date, at least if you ignore the lack of detail on how to generate keys dynamically, which I have a patch for that I need to finish off. There was however some discussion as to whether we wanted to limit keyed access to just the set/assign opcodes in order to avoid the explosion of ops that would occur if we supported keyed access directly on every op. > 2) What PASM ops should above statement generate: > a) add_p_k_p_p_k (i.e. all variations of /p(_k)?/ ) > b) add_p_k_p_k_p_k >if b) how to create a NULL key and how does it look like in PBC? As things stand it would have to be option a, or at least that is what the current assembler would generate. There isn't much else it can do really - how would it know when to generate a null key for an operand and when not to? I believe you could encode a key constant with zero components in the byte code if you wanted - the first word of the constant is the component count after all. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [perl #17026] [PATCH] core.ops including #16838
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > [IIRC the question at one time was whether a const method in C++ could > change the underlying object, providing the values returned by all public > methods would not change as a result] The answer to which is of course that a const method can change any mutable members of the object ;-) Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [BUG] strange key behaviour
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > That explains why you are not seeing the last component. You are also > missing the first one for some reason. The most likely cause would be > that you have already used key_next to discard the first component > before you reached this loop but I can't tell for sure from that code > fragment. I've just seen the patch you sent, and I can now explain the other problem - you are passing initializer->data to the function which effectively discards the first element as the data member of a Key PMC is the pointer to the next component. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [BUG] strange key behaviour
In message <[EMAIL PROTECTED]> Josef Hook <[EMAIL PROTECTED]> wrote: > As you can see above the while loop in multiarray below only iterates > 4 times when it should iterate 6 times ( .MultiArray[2;3;2;1;1;1] ). > The same happens when i define a 3 dim array .MultiArray[2;2;2] > it only iterate 1 time. This means that key_next(interpreter,key) > always loose 2 keyes. > > multiarray.pmc: > > while (key_next(interpreter, key) != NULL) { > printf("init_marray key size is %d\n", key_integer(interpreter, key)); > printf("key_next is %x\n", key_next(interpreter, key)); > size *= key_integer(interpreter, key); > key = key_next(interpreter, key); > > } This loop stops as soon as key_next() becomes NULL which means that you never process the last key component. I would guess that you want to make the first line into this: while (key != NULL) { That explains why you are not seeing the last component. You are also missing the first one for some reason. The most likely cause would be that you have already used key_next to discard the first component before you reached this loop but I can't tell for sure from that code fragment. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Dynamic keys
In message <a05111b04b98fae4c4fe8@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > Have you taken a look at the proposed ops in PDD6? They may not be > what we ultimately want to use, but it might be a place to start. > (And I'd rather avoid generic vtable access to keys if at all > possible, for speed reasons. They're our internal structures--we can > screw with them as we need :) Fine. Those ops are a bit of a mish mash though - even the naming isn't very consistent. Likewise, why is there a chop_key to remove from one end (and which end is "topmost" anyway) but nothing to remove from the other, or to add at either end (beyond resizing the key completely). I would suggest that useful operations might include some or all of the following: - creating a new key (new_key) - cloning a key (clone_key) - discard a key (destroy_key) - get number of key elements (key_size) - get type of key element (key_type) - get value of key element (key_value) - set value of key element (key_set) [deduces type from argument] - increment value of key element (key_increment) - decrement value of key element (key_decrement) - add new element to end of key (key_push) - remove element from end of key (key_pop) - add new element to start of key (key_unshift) - remove element from start of key (key_shift) As I say, it isn't clear that we need all of these - adding and removing at the end is probably more useful than adding and removing at the start for example. The other question is what key_type should return - the idea of magic numbers isn't particularly nice but I'm not sure what else we can do there. Assuming that it is even useful for code to be able to determine the type of an element of course - surely code will normally know the type of elements in a key it is manipulating? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Dynamic keys
The one part of the keyed access puzzle that my last patch did not attempt to address is that of constructing keys dynamically. As things stand you can create a key PMC and you can set the value of that PMC to a given integer, number, string or PMC value. What you can't do is join several key PMCs together to create a multi-level key. I now plan to address that issue. What I propose is that although the key is implemented as a linked list it should appear to act as an array so that push/pop/shift/unshift can be used to add and remove elements at the ends, and indexed access using integers can be used to fetch and set the value of elements in the list. Does anybody have any objections to this, or any better ideas on how to handle this? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #16755] imcc requires Parrot_dlopen but HAS_DLOPEN is never defined
In message <20020825181959$[EMAIL PROTECTED]> "Markus Laire" <[EMAIL PROTECTED]> wrote: > I applied this patch locally, but making imcc still ends with error > "cannot find -ldl" > (I quess that means Parrot_dlopen library as Cygwin has no such file) That sounds like a separate bug in the imcc makefile - the main parrot makefile only links against libdl if Configure.pl discovers that perl5 does. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #16755] imcc requires Parrot_dlopen but HAS_DLOPEN is never defined
In message <20020825155505$[EMAIL PROTECTED]> Tom Hughes (via RT) <[EMAIL PROTECTED]> wrote: > Recent changes to imcc make it require a working Parrot_dlopen but > unfortunately as things stand it never does work because Configure.pl > never sets HAS_DLOPEN so Parrot_dlopen is also stubbed out. > > There is a second problem in that platform.c only include dlfcn.h if > a particular symbol is defined but that symbol is not defined until > parrot.h is included which is after the include of dlfcn.h. Here's a patch that addresses both those issues and makes imcc work again. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: config/auto/functions.pl === RCS file: config/auto/functions.pl diff -N config/auto/functions.pl --- /dev/null 1 Jan 1970 00:00:00 - +++ config/auto/functions.pl25 Aug 2002 17:42:45 - @@ -0,0 +1,24 @@ +package Configure::Step; + +use strict; +use vars qw($description @args); +use Parrot::Configure::Step ':auto'; +use Config; + +$description="Probing for C functions..."; + +@args=qw(miniparrot); + +sub runstep { +my ($miniparrot) = @_; + +if ($miniparrot) { + return; +} + +for (qw(dlopen)) { + Configure::Data->set("f_$_", $Config{"d_$_"}); +} +} + +1; Index: config/gen/config_h.pl === RCS file: /cvs/public/parrot/config/gen/config_h.pl,v retrieving revision 1.2 diff -u -r1.2 config_h.pl --- config/gen/config_h.pl 7 Jun 2002 01:12:39 - 1.2 +++ config/gen/config_h.pl 25 Aug 2002 17:42:45 - @@ -32,6 +32,28 @@ } close HH; + + open(HF, ">include/parrot/has_feature.h") or die "Can't open has_feature.h: $!"; + + print HF qq( +/* + ** !!! DO NOT EDIT THIS FILE !!! + ** + ** This file is generated automatically by Configure.pl + */ +); + + for(Configure::Data->keys()) { +next unless /f_(\w+)/; +if(Configure::Data->get($_)) { + print HF "#define HAS_\U$1 1\n" +} +else { + print HF "#undef HAS_\U$1\n"; +} + } + + close HF; } 1; Index: config/gen/config_h/config_h.in === RCS file: /cvs/public/parrot/config/gen/config_h/config_h.in,v retrieving revision 1.7 diff -u -r1.7 config_h.in --- config/gen/config_h/config_h.in 18 Aug 2002 03:36:57 - 1.7 +++ config/gen/config_h/config_h.in 25 Aug 2002 17:42:45 - @@ -100,6 +100,7 @@ #define FLOATVAL_FMT "${floatvalfmt}" #include "parrot/has_header.h" +#include "parrot/has_feature.h" #endif Index: config/gen/platform/generic.c === RCS file: /cvs/public/parrot/config/gen/platform/generic.c,v retrieving revision 1.5 diff -u -r1.5 generic.c --- config/gen/platform/generic.c 3 Aug 2002 07:58:58 - 1.5 +++ config/gen/platform/generic.c 25 Aug 2002 17:42:45 - @@ -2,13 +2,13 @@ ** platform.c [generic version] */ +#include "parrot/parrot.h" + #include #include #ifdef HAS_HEADER_DLFCN # include #endif - -#include "parrot/parrot.h" #define PARROT_DLOPEN_FLAGS RTLD_LAZY Index: lib/Parrot/Configure/RunSteps.pm === RCS file: /cvs/public/parrot/lib/Parrot/Configure/RunSteps.pm,v retrieving revision 1.8 diff -u -r1.8 RunSteps.pm --- lib/Parrot/Configure/RunSteps.pm23 Aug 2002 06:16:06 - 1.8 +++ lib/Parrot/Configure/RunSteps.pm25 Aug 2002 17:42:46 - @@ -17,6 +17,7 @@ inter/pmc.pl auto/alignptrs.pl auto/headers.pl +auto/functions.pl auto/sizes.pl auto/stackdir.pl auto/byteorder.pl
Re: [perl #16741] languages/parrot_compiler fixups
In message <20020825070751$[EMAIL PROTECTED]> Mike Lambert (via RT) <[EMAIL PROTECTED]> wrote: > The below patch fixes the languages/parrot_compiler/ code to work again > with the new keyed syntax. It correctly compiles > languages/parrot_compiler/sample.pasm and parrot executes it fine. > > The only change I'm unsure about it is the use of -e"" instead of -e'' to > make activestate perl happy. ie, I'm not sure if it breaks other > platforms. Although this patch makes the the compiler work with the new keyed syntax it does make it compile the new keyed syntax correctly. Not that that is necessarily a reason not to commit it, but I just thought I'd point it out. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] and yet another typo-cleaning pass on the pdd
In message <20020825071653$[EMAIL PROTECTED]> Jerome Quelin <[EMAIL PROTECTED]> wrote: > Well, I corrected it because there is both infinitive and third person, > depending on the method described: > Example: >[...] >BIGNUM* shift_bignum(INTERP, PMC* self) > Returns ... >[...] >void subtract_same(INTERP, PMC* self, PMC* value, PMC* dest) > Subtract ... >[...] > > It seems that every mathematical operation (divide, add, substract, multiply) > use an infinitive, but the other words (returns, does, compares, ...) use the > third person. So, one may think there is a rule there, but then what about > the following: I hadn't realised that the document was inconsistent as I was only looking at the patch. It should certainly be consistent one way or the other, so I have now applied the rest of your patch. > In my patch, I decided to put all verbs to the infinitive. Being a > non-english speaker, I may be wrong, but then we are to put all verbs to the > third person for consistency... Your grasp of english grammar is certainly better than my grasp of french grammar ;-) Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #16274] [PATCH] Keyed access
? basic.pbc ? merged_basic.pasm Index: basicvar.pasm === RCS file: /cvs/public/parrot/languages/BASIC/basicvar.pasm,v retrieving revision 1.10 diff -u -r1.10 basicvar.pasm --- basicvar.pasm 20 Jun 2002 00:05:09 - 1.10 +++ basicvar.pasm 21 Aug 2002 07:51:21 - @@ -166,22 +166,26 @@ restore I0# Line number to fetch. set I2, I0 eq I0, -1, CFETCHSTART -set S0, P22[I0] + set S0, I0 +set S0, P22[S0] ne S0, "", CFETCHEND # Not found. Let's see if this is a +1 dec I0 -set S0, P22[I0] + set S0, I0 +set S0, P22[S0] ne S0, "", CFETCHNEXT branch CNOTFOUND CFETCHNEXT: -set I1, P23[I0] # Okay, got the line before + set S0, I0 +set I1, P23[S0] # Okay, got the line before inc I1 gt I1, I28, COVERFLOW set I0, P24[I1] # Next line number is... eq I0, 0, COVERFLOW -set S0, P22[I0] # Fetch it. + set S0, I0 +set S0, P22[S0] # Fetch it. ne S0, "", CFETCHEND branch CNOTFOUND # This is a should-not-happen, I think. @@ -190,7 +194,8 @@ gt I6, I28, COVERFLOW set I0, P24[I6] eq I0, 0, COVERFLOW -set S0, P22[I0] # Fetch line + set S0, I0 +set S0, P22[S0] # Fetch line ne S0, "", CFETCHEND branch CNOTFOUND # This is a should-not-happen, I think. @@ -241,7 +246,8 @@ CLOAD: set I0, 0 CNEXT: gt I0, I28, CEND set I3, P24[I0] # Get the next line -set S1, P22[I3] # Get the line code itself + set S1, I3 +set S1, P22[S1] # Get the line code itself inc I0 eq I3, I1, CNEXT # Skip this, it's being replaced. save S1 @@ -271,9 +277,10 @@ restore S0 # Code line set I1, S0 # Line Number -set P22[I1], S0 # The line itself + set S1, I1 +set P22[S1], S0 # The line itself inc I28 -set P23[I1], I28 # Index back to array +set P23[S1], I28 # Index back to array set P24[I28], I1 dec I5 branch STOREC Index: instructions.pasm === RCS file: /cvs/public/parrot/languages/BASIC/instructions.pasm,v retrieving revision 1.10 diff -u -r1.10 instructions.pasm --- instructions.pasm 20 Jun 2002 00:05:09 - 1.10 +++ instructions.pasm 21 Aug 2002 07:51:21 - @@ -898,7 +898,8 @@ LIST_ONE_LINE: - set S0, P22[I2] + set S0, I2 + set S0, P22[S0] print S0 print "\n" branch END_LIST @@ -917,7 +918,8 @@ DO_I_LIST: set I0, 0 DOLISTL: gt I0, I28, END_LIST set I1, P24[I0] # Get the next line -set S0, P22[I1] # Get the line code itself + set S0, I1 +set S0, P22[S0] # Get the line code itself eq I3, -1, LIST_SHOW lt I2, I1, LIST_NEXT gt I3, I1, LIST_NEXT
Re: [perl #16274] [PATCH] Keyed access
In message <[EMAIL PROTECTED]> Mike Lambert <[EMAIL PROTECTED]> wrote: > Anyways, cd to languages/BASIC, run basic.pl, type "LOAD wumpus", and > watch it die on "Not a string!". It could be that basic is using keys in > weird ways, or it could be that the key patch is borked...I haven't looked > into it enough to determine the true cause here. The problem is that it is trying to index a hash using a number and that is no longer allowed. At some point it is going to be allowed again, but it won't do what the code there wants it to do - it won't stringify the number and then use that as a key. What it will do is some sort of back door reference directly to an entry in the hash table without going through a hash lookup. Dan has said that this is needed for efficient access to scratchpads or something although I'm not quite sure how it is supposed to work given that the hash may decide to rearrange itself if it gets too full. What the BASIC interpreter probably needs to do is stringify those keys itself before trying to look them up. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [perl #16274] [PATCH] Keyed access
In message <[EMAIL PROTECTED]> Jeff <[EMAIL PROTECTED]> wrote: > Jeff wrote: > > > It's not quite applying against the current build, however. > > classes/default.pmc was easy to fix, assemble.pl not so simple, core.ops > > and hash.c had other problems. Could I trouble you to fix these so I can > > commit it tonight? I can send you the rejected hunks if you like... > > Sorry about the QA comment, I just had someone point out to me that was > mail server mangling. Many apologies. It -really- looks like a nice > patch, if we can get these few problems ironed out. I have a clean version that's up to date, and as everybody seems to be happy with it I'm going to go ahead and commit it now. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #16274] [PATCH] Keyed access
In message <[EMAIL PROTECTED]> Mike Lambert <[EMAIL PROTECTED]> wrote: > - assemble.pl: > shouldn't the code : > elsif ($_->[0] =~ /^([snpk])c$/) { # String/Num/PMC/Key constant > include support for "kic" somewhere? It doesn't need to as to_bytecode() turns [1] into an ic argument but adds kic to the op name. Much the same thing is done for integer register keys. So when _generate_bytecode() runs the argument type will just appear to be a i or ic. > the magic numbers in _key_constant, I'm assuming they are supposed to map > to the constants in key.h ? Perhaps a note mentioning that correspondance > would be useful. Also, it seems the number usage is broken. You use > 1,1,1,2,4,7. Shouldn't it be 1,1,1,2,3,5? And shouldn't s/inps/ be > s/insp/? Or maybe the constants in key.h need rearranging? Actually they correspond to the PARROT_ARG_XX constants uses for encoding op arguments types. I should really add a perl version of those constants. > - dod.c: > Near the comment, "Mark the key constants as live". Constants shouldn't > need to be marked live, as constants are prevented from being GC'ed, if > PMC_constant_FLAG is set. At least, in theory. Did it not work for you? The reason I did it that way was that I wasn't sure whether a PMC that was marked as constant could ever die before the end of the program, and whether we might need to add and remove constant tables on the fly when we load and unload bits of code. Given that strings in the constant table are marked as constant I guess that it should be safe to do the same for keys, so I have changed that. > - core.ops > Looking at the set functions, shouldn't the "Px[ KEY ] = Bx" > set of functions have $1 defined as inout instead of out in most > circumstances? Possibly. That was copied from the original. I'm not quite sure what difference inout makes to the code that is generated? > In your definition of the groups of set functions, can you change "Ax = > Px[ KEY ]" to "Ax = Px[ INTKEY ]" where appropriate? Done. > - key.pmc > the mark() function needs to return a value. Namely, the return value of > key_mark. Oops. That only went in yesterday... Now fixed. > Overall, tho, the patch looks extemely complete. Tracing support, > disassemble.pl support, debug.c support, etc. You even reduced macro > usage. Rather impressive. :) The tracing, disassembly etc was mostly done when I found I needed it to try and find a problem in the other things I'd done ;-) One other outstanding problem that I remembered last night is that it is allocating memory for the key atom which is attached to the cache.struct_val member on the PMC but which is never freed. Allocating that small piece of memory as a buffer which can be GCed seems like complete overkill, but short of marking the PMC as having a private GC so it can cleanup I hadn't managed to come up with a solution. What I realised last night however is that there is enough space in the private flags on the PMC for the type information and I can then attach the data directly to the cache and do away with key atoms completely... I shall get on with that now and post a new patch later. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Keyed access to PerlArray/PerlHash
In message <a05111b06b97eea0ca9c0@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > Nobody's doing a get_integer on key PMCs--we're peeking directly. > (Integer lookup can also be done via the keyed_int method of the > vtable) At the moment it is using get_integer as I decided to get it all working first (which it laregly is now) before optimising it. However it's done it will still be shared code though as I don't intend to duplicate key decipherment in all the classes that need to do it. > Basically we need to make sure that the hash we're using as a > scratchpad can be looked up by integer index for speed reasons. I > thought we'd split it out into a Hash class the way we'd done with > Array, but I guess not. Are you saying that integer lookup in hashes bypasses the hashing and just knows where to look? If so then that isn't what integer keys were doing - it was converting them to strings and then looking them up in the normal way. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Keyed access to PerlArray/PerlHash
In message <a05111b10b97e5445dbfe@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > Arrays don't have to support lookup by string keys. They also can > throw an exception. How about numeric keys? Presumably they can also throw an exception as it doesn't make much sense to access the 5.2'th element of an array... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Keyed access to PerlArray/PerlHash
In message <a05111b10b97e5445dbfe@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > Hash should support integer lookup. PerlHash doesn't have to, and may > throw an exception. I don't follow the logic behind this... Making it work for one but not another is tricky - either get_integer on a Key PMC will work when the key is a string or it won't. In fact there isn't a Hash class at the moment, only a PerlHash. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Keyed access to PerlArray/PerlHash
Is indexing a PerlHash by an integer something that is supposed to be valid? Likewise for indexing a PerlArray by a string? Currently both of these are allowed, but as it stands my keyed access patch breaks this. Obviously indexing either by a PerlScalar will still work as the PerlScalar PMC will handle the type conversion issues when necessary. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: _keyed_int PMC methods
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Does anybody know the _keyed_int PMC methods take the key as a pointer > to an INTVAL instead of a straight INTVAL? > > It doesn't seem to make any sense so unless somebody knows of a reason > for it I plan to change it as part of my keyed access patch... I think I may have worked out the answer to this myself - some of the PMC methods which take more than one key rely on a null pointer being used to indicate that there is no key. On a separate point, why do we have set_number but push_float? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
_keyed_int PMC methods
Does anybody know the _keyed_int PMC methods take the key as a pointer to an INTVAL instead of a straight INTVAL? It doesn't seem to make any sense so unless somebody knows of a reason for it I plan to change it as part of my keyed access patch... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [perl #16114] [PATCH] faster assembler
In message <[EMAIL PROTECTED]> "Sean O'Rourke" <[EMAIL PROTECTED]> wrote: > - inline and remove _to_keyed and _to_keyed_integer. Those routines are dying in my keyed access cleanup anyway ;-) > - reorder the big elsif to test for /^\[/ once at the top, then only match > against keyed/non-keyed. That whole if/else is being reordered as well ;-) Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: never ending story Keyes
In message Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 6:58 PM +0100 8/8/02, Tom Hughes wrote: > > >Presumably with all keys being PMCs we will just encode the key > >arguments in the opcode name as a k, and kc for constant keys. > > Yep. > > >Likewise, the constant keys will presumably be encoded in the byte > >code much as specified in the PDD and then turned into PMC structures > >in the constant table when the byte code is loaded. > > Yep. One thing I just realised is that we still have a problem of how to tell what a P register used as an key means - it can either mean that the register contains a key, or that it contains an integer or string that is to be used as a key. If we're going to say that a P register is always taken to be a key then does that mean that you can't do this: set P0, "foo" set P2, P1[P0] Obviously that is manufactured, as you could do it with a constant index or an S register but in general terms if you have perl indexing an array or hash by a scalar then then it is likely to be indexing one PMC by another. If the above code was banned then you would have to build the key dynamically instead: new_key P0 size_key P0, 1 ke_set_value P0, 1, "foo" set P2, P1[P0] Or some such, depending on how the key ops wind op working, which is something else we need to think about as the old spec I have here has no way to set any values in the key... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: PARROT QUESTIONS: Keyed access: PROPOSAL
In message <a05111b09b960bd6a030f@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > Keys are either constant key structs, constant integers, string > registers, or integer registers. Encoding shouldn't be any different > than any other constant or register. Jeff's got an opcode function > naming scheme--I've not browsed back far enough in the discussion to > see if it's considered insufficient or not. The problem isn't really the encoding in the sense of the bytecode encoding - as you say that is no different to any other constant or register. The problem is the opcode names. I don't know where Jeff's scheme is documented, but none of the information I can find in the docs seems to provide a workable scheme for keyed ops. Given your description of the valid key types I would suggest that the key arguments be encoded in opcode names as follows: constant key struct k constant integer kic string register ks integer register ki Also, am I right in believing that when you talk about a string register being used as a key you mean that the register will be assumed to be a pointer to a KEY instead of a pointer to a string, and that this will be used to handle the case where a key has been build dynamically with the key ops? If I am right about that then it means that it is impossible to write an instruction that does a hash lookup from a string register, as this: set S1, P0[S0] Will be assumed to mean that S0 is a key. Instead you would either have to put the string into a PMC first, like this: set P1, S0 set S1, P0[P1] Which could be handled by building a constant key struct containing a single element. The other option would be to build a key dynamically and then use that to do the access. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <[EMAIL PROTECTED]> "David M. Lloyd" <[EMAIL PROTECTED]> wrote: > On Sat, 13 Jul 2002, Tom Hughes wrote: > > > Of course... The attached patch should handle that I think... > > This patch is breaking several Solaris 32-bit tests. The following > assembly (from t/pmc/perlarray1.pbc): I've just tried that test on a Solaris 7 machine and it ran fine and produced the correct bytecode. I can't honestly see how that patch could cause it to generate completely the wrong op... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: [PATCH] MANIFEST update
In message <[EMAIL PROTECTED]> Andy Dougherty <[EMAIL PROTECTED]> wrote: > On Wed, 17 Jul 2002, Brent Dax wrote: > > > There should be no Makefile.in's left in the source--they've been tossed > > in favor of config/gen/makefiles. > > Fair enough. I just took what cvs handed me. It was a fresh checkout as > of yesterday, updated this morning. Whoever removes those files from the > repository ought to adjust MANIFEST accordingly. I have removed the files and updated the MANIFEST to reflect that. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: PARROT QUESTIONS: Keyed access
In message <[EMAIL PROTECTED]> Melvin Smith <[EMAIL PROTECTED]> wrote: > At 03:54 PM 7/14/2002 +0100, Tom Hughes wrote: > >I've been trying to make sense of the current status of keyed access > >at all levels, from the assembler through the ops to the vtables and > >it has to be said that the harder I look the more confused I seem to > >become... > > FWIW, I have a large patch from Sean O'Rourke in response to my > request for someone to cleanup the set/set_keyed stuff. I'll commit > it later today, it does clean it up a bit, and removes some of the > older versions of set (3 arg). It at least reduces the noise. I was going to some work on that request, but I reached the point where I decided there was no point trying to do anything until it was clear what the target was that I was trying to reach... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
PARROT QUESTIONS: Keyed access
I've been trying to make sense of the current status of keyed access at all levels, from the assembler through the ops to the vtables and it has to be said that the harder I look the more confused I seem to become... It all seems to be a bit of a mess at the moment, and I'd like to have a go at cleaning it up but first of all I need to work out how it is all supposed to work. It is clear that the encoding currently used by the assembler does not match that specified by PDD 8 as the following examples show: Instruction PDD 8 Encoding Actual Current Encoding set P1["hi"], 1234 set_p_kc_ic set_keyed_p_sc_ic set P1[S1], 1234set_p_r_ic set_keyed_p_s_ic set P1[1], 1234 set_p_kc_ic set_keyed_integer_p_ic_ic set P1[I1], 1234set_p_r_ic set_keyed_integer_p_k_ic set P1[S1], P2[S2] set_p_r_p_r set_keyed_p_s_p_s set P1[I1], P2[S2] set_p_kc_p_rset_keyed_keyed_integer_p_i_p_s Obviously this is a complete nonsense. To be honest I suspect that both encodings have problems, The PDD 8 encoding uses kc and r (why not kc and k?) to encode the keys regardless of their type so the op has no way of knowing what sort of argument it is dealing with. The currently implemented system distinguishes the operand types OK but trys to differentiate between ops with an integer key and those with other types of keys which all falls apart when you have a combination of integer and non-integer keys in the same instruction. Once we get to multi-component keys things just get even worse. If we believe PDD 8 then the syntax should be: set P1[I1;I2], I3 But what is currently implemented is this: set P1[k;I1;I2], I3 In addition it appears that the current implementation would turn that instrucion into this encoding: set_keyed_integer_p_k_k_i Where each component of the key becomes a separate argument, thereby requiring an infinite number of ops to cope with an infinite number of possible key components. There is a suggestion in PDD 8 that this should be encoded as this: set_p_kc_i With the key constant actually referring to an entry in the constant table that encodes the key. Moving on the from the assembler I'm not sure how the recent addition of the _keyed_int vtable methods interacts with all this - they appear to be at odds with PDD 8 anyway which appears to want to avoid the kind of vtable explosion that they promote. Anyhow, that's probably enough for now... If anybody can elighten me about how all this is supposed to work then I'll try and knock it all into shape, starting with making sure that PDD 8 is accurate. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Parrot_open_i_sc_sc
In message <[EMAIL PROTECTED]> Bryan Logan <[EMAIL PROTECTED]> wrote: > Here's the code I have: > > open I0, "test.txt", "<" > open I1, "testdtxt", "<" > end > > I assemble and load it into pdb and get this: > > Parrot Debugger 0.0.1 > > (pdb) list > 1 open_i_sc_sc I0,"test.txt<","<" > 2 open_i_sc_sc I1,"testdtxt","<" > 3 end This is a bug in the debugger (and also in the opcode tracing) where it is assuming that constant strings in the byte code are zero terminated when they aren't, and it is therefore overrunning and printing bits of the next string or whatever. I have just committed a fix. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <20020713174114$[EMAIL PROTECTED]> brian wheeler <[EMAIL PROTECTED]> wrote: > On Sat, 2002-07-13 at 12:32, Tom Hughes wrote: > > In message <20020703012231$[EMAIL PROTECTED]> > > Here's a patch that will fix this. I havn't committed it because I'm > > not sure why the assember wasn't dropping comments that included quotes > > so I'm giving people who know more about the assembler than me a chance > > to comment first... > > I believe it wasn't dropping the comments with quotes as a side effect > of not wanting to break things like: > print "#" > > which breaks with the included patch. I basically had the same patch > you do, but wasn't able to figure out how to handle the above case *and* > do the right thing with # prints "a" Of course... The attached patch should handle that I think... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: assemble.pl === RCS file: /cvs/public/parrot/assemble.pl,v retrieving revision 1.77 diff -u -r1.77 assemble.pl --- assemble.pl 4 Jul 2002 18:36:17 - 1.77 +++ assemble.pl 13 Jul 2002 17:49:58 - @@ -430,10 +430,13 @@ sub _annotate_contents { my ($self,$line) = @_; + my $str_re = qr(\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\" | + \'(?:[^\\\']*(?:\\.[^\\\']*)*)\' + )x; $self->{pc}++; return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank lines - $line=~s/#[^'"]+$//; # Remove trailing comments + $line=~s/^((?:[^'"]+|$str_re)*)#.*$/$1/; # Remove trailing comments $line=~s/(^\s+|\s+$)//g; # Remove leading and trailing whitespace # # Accumulate lines that only have labels until an instruction is found..
Re: [netlabs #758] [PATCH] Fixes for example programs
In message <20020703015823$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #758] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=758 > > > > > Fixes to various of the PASM examples in light of recent changes in the > assembler. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #757] Problem mixing labels, comments and quote-marks
In message <20020703012231$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > This code: > > A:# prints "a" > print "a" > end > > doesn't assemble; the assembler dies with the error message: > > Use of uninitialized value in hash element at assemble.pl line 844. > Couldn't find operator '' on line 1. > > If you remove the ""s from the comment, it works fine. Likewise, if > you put the label, op and comment on the same line, ie: > >A: print "a" # prints "a" > end > > then it assembles and runs OK. Here's a patch that will fix this. I havn't committed it because I'm not sure why the assember wasn't dropping comments that included quotes so I'm giving people who know more about the assembler than me a chance to comment first... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: assemble.pl === RCS file: /cvs/public/parrot/assemble.pl,v retrieving revision 1.77 diff -u -r1.77 assemble.pl --- assemble.pl 4 Jul 2002 18:36:17 - 1.77 +++ assemble.pl 13 Jul 2002 17:30:48 - @@ -433,7 +433,7 @@ $self->{pc}++; return if $line=~/^\s*$/ or $line=~/^\s*#/; # Filter out the comments and blank lines - $line=~s/#[^'"]+$//; # Remove trailing comments + $line=~s/#.*$//; # Remove trailing comments $line=~s/(^\s+|\s+$)//g; # Remove leading and trailing whitespace # # Accumulate lines that only have labels until an instruction is found..
Re: [netlabs #788] [PATCH] Array fixes (and tests)
In message <20020711221132$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #788] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=788 > > > > > This patch fixes a number of off-by-one errors in array.pmc, and adds a > few more tests. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #790] [PATCH] MANIFEST update
In message <20020712005836$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #790] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=790 > > > > > Self-explanatory. Applied. Somebody please update the ticket... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [netlabs #789] [PATCH] Squish some warnings
In message <20020712010920$[EMAIL PROTECTED]> Simon Glover (via RT) <[EMAIL PROTECTED]> wrote: > # New Ticket Created by Simon Glover > # Please include the string: [netlabs #789] > # in the subject line of all future correspondence about this issue. > # http://bugs6.perl.org/rt2/Ticket/Display.html?id=789 > > > > > stack_chunk is now Stack_Chunk... Applied. Somebody update the ticket please... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Adding the system stack to the root set
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Wed, Jul 10, 2002 at 06:49:06PM -0400, Dan Sugalski wrote: > > Yes, this is an issue for systems with a chunked stack. As far as I > > know that only applies to the various ARM OSes, and for those we'll > > have to have some different system specific code to deal with the > > stack. (Which is fine) > > Sorry, I wasn't clear in my previous reply to your private message. > ARM Linux doesn't use a chunked stack. It's contiguous, and (for example) > the Bohem garbage collector does work on it. I would expect NetBSD ARM > doesn't either. (There is a FreeBSD port to StrongARM, but its mailing > list is very very quiet). So I don't think those two will pose undue > problems. As far as I know all the ARM unixes use a contiguous stack - it's just RISC OS that uses the chunked stack I believe. I believe you can always tell by looking at where sl points and seeing if there is a valid chunk descriptor there and then following it's prev pointer to get the previous chunk if there is one. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Stack performance issue
In message <[EMAIL PROTECTED]> Melvin Smith <[EMAIL PROTECTED]> wrote: > You might want to modify register stacks too. I currently have a > band-aid on it that just doesn't free stack chunks which works in > all but the weirdest cases. I've done that now. I also just realised that the stacks are allocating their chunks directly from the system, which presumably means the GC won't pick them up so they need to be freed directly. I've done that for the register stacks, and I'll do the same for the other stacks unless somebody spots a flaw in my logic and points out that the GC will catch it... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Stack performance issue
There is a performance issue in the stack code, which the attached patch attempts to address. The problem revolves around what happens when you are close to the boundary between two chunks. When this happens you can find that you are in a loop where something is pushed on the stack, causing a new chunk to be allocated. That item is then popped causing the new chunk to be discarded only for it to have to be allocated again on the next iteration of the loop. This is a well known problem with chunked stacks - it is certainly a known issue on ARM based machines which use the chunked stack variant of the ARM procedure call standard. The solution there is to always keep one chunk in reserve - when you move back out of a chunk you don't free it. Instead you wait until you move back another chunk and then free the chunk after the one that has just emptied. Even this can go wrong if your loop pushes more that one chunks worth of data on the stack and then pops it again, but that is far rarer than the general case of pushing one or two items which happens to take it over a chunk boundary. The attached patch implements this one behind logic, both for the generic stack and the integer stack. If nobody has any objections then I'll commit it tomorrow sometime. Some figures from my test programs, running on a K6-200 linux box. The test programs push and pop 65536 times with the first column being when that loop doesn't cross a chunk boundary and the second being when it does cross a chunk boundary: No overflow Overflow Integer stack, before patch 0.065505s 16.589480s Integer stack, after patch 0.062732s 0.068460s Generic stack, before patch 0.161202s 5.475367s Generic stack, after patch 0.166938s 0.168390s Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: rxstacks.c === RCS file: /cvs/public/parrot/rxstacks.c,v retrieving revision 1.5 diff -u -r1.5 rxstacks.c --- rxstacks.c 17 May 2002 21:38:20 - 1.5 +++ rxstacks.c 30 Jun 2002 17:42:02 - @@ -46,13 +46,20 @@ /* Register the new entry */ if (++chunk->used == STACK_CHUNK_DEPTH) { -/* Need to add a new chunk */ -IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk)); -new_chunk->used = 0; -new_chunk->next = stack; -new_chunk->prev = chunk; -chunk->next = new_chunk; -stack->prev = new_chunk; +if (chunk->next == stack) { +/* Need to add a new chunk */ +IntStack_Chunk new_chunk = mem_allocate_aligned(sizeof(*new_chunk)); +new_chunk->used = 0; +new_chunk->next = stack; +new_chunk->prev = chunk; +chunk->next = new_chunk; +stack->prev = new_chunk; +} +else { +/* Reuse the spare chunk we kept */ +chunk = chunk->next; +stack->prev = chunk; +} } } @@ -67,11 +74,17 @@ /* That chunk != stack check is just to allow the empty stack case * to fall through to the following exception throwing code. */ -/* Need to pop off the last entry */ -stack->prev = chunk->prev; -stack->prev->next = stack; -/* Relying on GC feels dirty... */ -chunk = stack->prev; +/* If the chunk that has just become empty is not the last chunk + * on the stack then we make it the last chunk - the GC will clean + * up any chunks that are discarded by this operation. */ +if (chunk->next != stack) { +chunk->next = stack; +} + +/* Now back to the previous chunk - we'll keep the one we have + * just emptied around for now in case we need it again. */ +chunk = chunk->prev; +stack->prev = chunk; } /* Quick sanity check */ Index: stacks.c === RCS file: /cvs/public/parrot/stacks.c,v retrieving revision 1.34 diff -u -r1.34 stacks.c --- stacks.c25 Jun 2002 23:50:51 - 1.34 +++ stacks.c30 Jun 2002 17:42:02 - @@ -208,22 +208,29 @@ /* Do we need a new chunk? */ if (chunk->used == STACK_CHUNK_DEPTH) { -/* Need to add a new chunk */ -Stack_Chunk_t *new_chunk = mem_allocate_aligned(sizeof(Stack_Chunk_t)); - -new_chunk->used = 0; -new_chunk->next = stack_base; -new_chunk->prev = chunk; -chunk->next = new_chunk; -stack_base->prev = new_chunk; -chunk = new_chunk; - -/* Need to initialize this pointer before the collector sees it */ -chunk->buffer = NULL; -chunk->buffer = new_buffer_header(interpreter); - -Parrot_allocate(interpreter, chunk->
Re: Dynaloading
In message <a05111b2fb92c9ba1ac83@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > The exported name should be the MD5 checksum of a string that > represents the actual routine name we're looking for. This, I think, > should be specified somewhere external to the library, in some sort > of metadata file, I think. (Not sure, I'm waffling here. But we need > this to be unique) Why does it need to be unique if it's not going to be linked against anything? If you're just finding the name with dlsym() or equivalent then you can just use the same name in all the libraries and it won't clash. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: x86 linux memory leak checker (and JIT ideas)
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > Jarkko mailed this URL to p5p: > > http://developer.kde.org/~sewardj/ > > It describes a free (GPL) memory leak checker for x86 Linux > > 1: This may be of use for parrot hackers Which is why I mentioned it a week or two ago ;-) I also ran it over the test suite and fixed the only bug that it found at that time... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: transcode addition
In message <[EMAIL PROTECTED]> Roman Hunt <[EMAIL PROTECTED]> wrote: > I'm not too sure if this is necessary but it seems logical to get things > into charsets our compilers can handle. Hopefully this is the correct > approach . . . . also this should NULL terminate in the event that the > entire buffer had not yet been filled. This is wrong - you need to worry about the character set as well as the encoding, and at the very least you should compare the encoding to the default encoding for the native charset and not assume that it will always be singlebyte. You buffer termination code is also wrong - bufused is the end of the string. You are null terminating the buffer not the string, and the buffer may have extra space. Plus you have created a buffer overrun. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > I have developed patch for this in the form of a new routine > which returns a nul terminated C style string given a parrot > string as argument. It does this by making sure buflen is at > least one greater than bufused and then stuffing a nul in that > byte. > > This isn't a particularly brilliant fix so I'm attaching it here > for comments before I commit it. I haven't seen any major objections to this so I have committed it. It will at least ensure that file opening is stable for the upcoming release. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Roman Hunt <[EMAIL PROTECTED]> wrote: > why dont we default to null terminating strings of type native? > if "native" is what we get when LANG=C it only seems natural to do so. > else we are forced to use wrapper functions a that grow and manipulate > string data any time we need to pass it to standard C functions that > wont accept a string_length parameter, this list unfortunately contains > several syscalls. Well that is what perl 5 does certainly. I thought it had been decided not to do that in perl 6 though due to issues about what it meant to nul terminate in various different character sets. We can't assume that US-ASCII will be native everywhere though as some platforms may use some form of unicode as the native character set (and accept unicode arguments to systems calls). It does need some thought though, to determine how best to handle this issue. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODO additions
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Syscall param open(pathname) contains uninitialised or unaddressable byte(s) > at 0x403F1892: __libc_open (__libc_open:31) > by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67) > by 0x809B287: cg_core (core.ops:138) > by 0x80955E0: runops_fast_core (runops_cores.c:34) > Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd > at 0x4003DCC2: malloc (vg_clientmalloc.c:618) > by 0x8092E11: mem_sys_allocate (memory.c:74) > by 0x8098DAD: Parrot_alloc_new_block (resources.c:830) > by 0x8092EC0: mem_setup_allocator (memory.c:108) > > ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) > malloc/free: in use at exit: 249652 bytes in 54 blocks. > malloc/free: 58 allocs, 4 frees, 381692 bytes allocated. > For a detailed leak analysis, rerun with: --leak-check=yes > For counts of detected errors, rerun with: -v > > I haven't attempted to look at this and see what is causing it. I've had a look at it now. The problem is that we are passing s->bufstart to fopen but there is no guarantee that there is a nul byte at the end of the buffer as parrot strings are not nul terminated. I have developed patch for this in the form of a new routine which returns a nul terminated C style string given a parrot string as argument. It does this by making sure buflen is at least one greater than bufused and then stuffing a nul in that byte. This isn't a particularly brilliant fix so I'm attaching it here for comments before I commit it. Of course we also need to think about encoding/charset issues when passing strings to system calls... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ Index: core.ops === RCS file: /home/perlcvs/parrot/core.ops,v retrieving revision 1.119 diff -u -w -r1.119 core.ops --- core.ops3 Apr 2002 23:03:37 - 1.119 +++ core.ops13 Apr 2002 14:11:11 - @@ -135,7 +135,7 @@ =cut inline op open(out INT, in STR) { - $1 = (INTVAL)fopen(($2)->bufstart, "r+"); + $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), "r+"); if (!$1) { perror("Can't open"); exit(1); @@ -145,7 +145,7 @@ } inline op open(out INT, in STR, in STR) { - $1 = (INTVAL)fopen(($2)->bufstart, ($3)->bufstart); + $1 = (INTVAL)fopen(string_to_cstring(interpreter, ($2)), +string_to_cstring(interpreter, ($3))); goto NEXT(); } @@ -246,7 +246,7 @@ op print(in STR) { STRING *s = $1; if (s && string_length(s)) { -printf("%.*s", (int)string_length(s), (char *) s->bufstart); +printf("%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -255,7 +255,7 @@ PMC *p = $1; STRING *s = (p->vtable->get_string(interpreter, p)); if (s) { -printf("%.*s",(int)string_length(s),(char *) s->bufstart); +printf("%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -304,7 +304,7 @@ default: file = (FILE *)$1; } if (s && string_length(s)) { -fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart); +fprintf(file, "%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } @@ -323,7 +323,7 @@ default: file = (FILE *)$1; } if (s) { -fprintf(file, "%.*s",(int)string_length(s),(char *) s->bufstart); +fprintf(file, "%s", string_to_cstring(interpreter, (s))); } goto NEXT(); } Index: string.c === RCS file: /home/perlcvs/parrot/string.c,v retrieving revision 1.68 diff -u -w -r1.68 string.c --- string.c12 Apr 2002 01:40:28 - 1.68 +++ string.c13 Apr 2002 14:11:12 - @@ -802,6 +802,21 @@ NULL, 0, NULL); } +const char * +string_to_cstring(struct Parrot_Interp * interpreter, STRING * s) +{ +char *cstring; + +if (s->buflen == s->bufused) +string_grow(interpreter, s, 1); + +cstring = s->bufstart; + +cstring[s->bufused] = 0; + +return cstring; +} + /* * Local variables: Index: include/parrot/string_funcs.h === RCS file: /home/perlcvs/parrot/include/parrot/string_funcs.h,v retrieving revision 1.6 diff -u -w -r1.6 string_funcs.h --- include/parrot/string_funcs.h 22 Mar 2002 04:11:57 - 1.6 +++ include/parrot/string_funcs.h 13 Apr 2002 14:11:12 - @@ -27,6 +27,7 @@ const STRING *, STRING **); INTVAL Parrot_string_compare(Parrot, const STRING *, const STRING *); Parrot_Bool Parrot_string_bool(const STRING *); +const char *Parrot_string_cstring(const S
Re: TODO additions
In message <[EMAIL PROTECTED]> Steve Fink <[EMAIL PROTECTED]> wrote: > +Stability > +- > +Purify and other memory badness detectors One thing that may be useful here is valgrind, which can be found at http://developer.kde.org/~sewardj/ and does Purify types things on linux. I just hacked the parrot test suite to run parrot under valgrind and it has only come up with one problem in t/op/hacks1, the details of which are as follows: valgrind-20020329, a memory error detector for x86 GNU/Linux. Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward. For more details, rerun with: -v Syscall param open(pathname) contains uninitialised or unaddressable byte(s) at 0x403F1892: __libc_open (__libc_open:31) by 0x403829C3: _IO_fopen@@GLIBC_2.1 (iofopen.c:67) by 0x809B287: cg_core (core.ops:138) by 0x80955E0: runops_fast_core (runops_cores.c:34) Address 0x4104051D is 3201 bytes inside a block of size 32824 alloc'd at 0x4003DCC2: malloc (vg_clientmalloc.c:618) by 0x8092E11: mem_sys_allocate (memory.c:74) by 0x8098DAD: Parrot_alloc_new_block (resources.c:830) by 0x8092EC0: mem_setup_allocator (memory.c:108) ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 0 from 0) malloc/free: in use at exit: 249652 bytes in 54 blocks. malloc/free: 58 allocs, 4 frees, 381692 bytes allocated. For a detailed leak analysis, rerun with: --leak-check=yes For counts of detected errors, rerun with: -v I haven't attempted to look at this and see what is causing it. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: I submit for your aproval . . .
In message <a05101503b8da6ead2821@[63.120.19.221]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 6:29 PM -0400 4/10/02, Roman Hunt wrote: > > >also I think > >encoding_lookup() should accept an argument of "native". > > Good point, they should. OTOH, that makes some of this interesting, > since which characters you use for various things depend on the > encoding and charset. We already have string_native_type which points to the CHARTYPE structure for the native character type and that structure includes default_encoding which is the name of the default encoding for the native character type. I guess string_init could also set up string_native_encoding by looking up the name of the default encoding for the native character type. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [COMMIT] Embedding enhancements
In message <[EMAIL PROTECTED]> Nicholas Clark <[EMAIL PROTECTED]> wrote: > On Sat, Feb 16, 2002 at 01:46:56AM -0800, Brent Dax wrote: > > NEW CONVENTIONS FOR DATA EXPOSED TO EMBEDDERS: > > > > -All structs should have a name of the form parrot_system_t. This name > > should never be directly used outside the subsystem in question. > > > > struct parrot_foo_t { > > ... > > }; > > Am I right in thinking that I could paraphrase that statement as > "All structs should trample in ANSI's reserved namespace"? I don't think so... As far as I can find in the standard, only certain type names ending in _t are reserved, namely: [#1] Type names beginning with int or uint and ending with _t may be added to the types defined in the header. Macro names beginning with INT or UINT and ending with _MAX or _MIN, or macro names beginning with PRI or SCN followed by any lower case letter or X may be added to the macros defined in the header. So struct x_t should be fine because that's a structure tag and not a type name. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Proposal: Naming conventions
In message <20020110201559$[EMAIL PROTECTED]> "Melvin Smith" <[EMAIL PROTECTED]> wrote: > > Foo foo = (Foo) malloc(sizeof(*foo)); > >? Does ANSI allow using sizeof on a variable declared on the > > same line? > > Wouldn't sizeof(Foo) be safer here? At the logical time of the > call *foo points to undefined. Technically its not a deref but > still looks scary. In C++ it might be confusing if you were to > cast it as: Well sizeof(Foo) and sizeof(*foo) are not actually the same thing at all there because Foo is presumably a typedef for a pointer type so sizeof(Foo) will be the size of a pointer and sizeof(*foo) will be the size of the thing it points to. You're quite right that it isn't technically a deref, as sizeof() is only interested in the static type of the object and is evaluated at compile time (if we ignore VLA's in C99 that is). In general it is safer to sizeof() on the variable you are working with than on it's type, as that way the sizeof() will still work if somebody changes the type of the variable. > // If it were really C++ we would probably be using new() > Foo foo = (FooBar) malloc(sizeof(*foo)); > > What type is *foo then? Should be Foo, but what if FooBar > was of different size, it might not be an obvious bug to someone > that just came along and tweaked your code. The type of *foo is whatever Foo as been typedefed as a pointer to, and FooBar is a red herring. > >If people have visceral objections to typedef'ing pointers, I'm > >fine with dropping that part of the proposal. I'd just like to see > > I've always been uncomfortable with that practice, its one part of > the whole Win32 world I hate. If you stick with the practice then > you either end up making a new typedef for every level of indirection > or you drop to using * (some typedef), etc. Now if it were C++ and we > were using a smart pointer class I don't mind the practice. I will agreee that hiding pointers inside typedefs is not a very good idea, if only because it makes it impossible to const qualify the pointer without creating a second parallel typedef. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: TODOs for STRINGs
In message <20020102054642$[EMAIL PROTECTED]> "David & Lisa Jacobs" <[EMAIL PROTECTED]> wrote: > Here is a short list of TODOs that I came up with for STRINGs. First, do > these look good to people? And second, what is the preferred method for > keeping track of these (patch to the TODO file, entries in bug tracking > system, mailing list, etc. > > * Add set ops that are encoding aware (e.g., set S0, "something", "unicode", > "utf-8")? You can already have Unicode constants by prefixing the string with a U character. I seem to recall Dan saying that he didn't want to allow constants in arbitrary encodings but instead would prefer just to have native and unicode. > * Add transcoding ops (this might be a specific case of the previous e.g., > set S0, S1, "unicode", "utf-16") I'm not sure whether this is needed. I think the idea is that in general transcoding will happen at I/O time, presumably by pushing a transcoding module on the I/O stack. > * Move like encoded string comparison into encodings (i.e., the STRING > comparison function gets the strings into the same encoding and then calls > out to the encodings comparison function - This will allow each encoding to > optimize its comparison. The problem with this is that string comparison depends on both the encoding and the character set so in general you can't do this. If the character set was the same for both strings then you could do so though. What I did think about was having a flag on each encoding that specified whether or not comparisons for that encoding could be done using memcmp() when the character sets were the same. That is true for things like the single byte encoding, but probably not for the unicode encodings due to canonicalisation issues. > * Add size of string termination to encodings (i.e., how many 0 bytes) Certainly. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] string_transcode
In message <007f01c1930c$9d326220$[EMAIL PROTECTED]> "Peter Gibbs" <[EMAIL PROTECTED]> wrote: > Another correction to string_transcode; this function now seems to work okay > (tested using a dummy 'encode' op added to my local copy of core.ops) Applied, thanks. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: JIT me some speed!
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > On Fri, 21 Dec 2001, Tom Hughes wrote: > > > I suspect it is also rather questionable to call system calls > > directly rather than going via their C library veneers - that is > > even more true when you come to things (like socket calls) which > > are system calls on some machines and functions on others. > > We are not always calling system calls directly, we can use the C library > when ever we need it, check out the .jit syntax. I did have a brief look last night but I must have missed that. No problem that front then. Incidentally the JIT times are definitely impressive... Times for a 1.33 GHz Athlon are like this: dutton [~/src/parrot] % ./test_parrot ./examples/assembly/mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 4.806858 M op/s:41.607220 dutton [~/src/parrot] % ./test_parrot -j ./examples/assembly/mops.pbc Iterations:1 Estimated ops: 2 Elapsed time: 0.300258 M op/s:666.093736 dutton [~/src/parrot] % ./examples/assembly/mops Iterations:1 Estimated ops: 2 Elapsed time: 0.324787 M op/s:615.788117 Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: JIT me some speed!
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > To run a program with the JIT, pass test_parrot the -j flag and watch it > scream. Well, scream if you're on x86 Linux or BSD (I get a speedup on > mops.pbc of 35x) but it's a darned good place to start. It does seem to be quite impressively fast. Faster even than the compiled version of mops on my machine... It looks like it is going to need some work before it can work for other instruction sets though, at least for RISC systems where the operands are typically encoded with the opcode as part of a single word and the range of immediate constants is often restricted. I'm thinking it will need some way of indicating field widths and shifts for the operands and opcode so they can be merged into an instruction word and also some way of handling a constant pool so that arbitrary addresses can be loaded using PC relative loads. I suspect it is also rather questionable to call system calls directly rather than going via their C library veneers - that is even more true when you come to things (like socket calls) which are system calls on some machines and functions on others. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Bytecode portablilty
In message <20011210133529.EYKY11472.femail13.sdc1.sfba.home.com@there> Bryan C. Warnock <[EMAIL PROTECTED]> wrote: > On Monday 10 December 2001 03:06 am, Tom Hughes wrote: > > In message <20011210011601$[EMAIL PROTECTED]> > > > > Actually VAXes have perfectly ordinary endianness - it was PDPs that > > had the middle endian layout. > > Who's got the 16 bittish little endian layout ("21436587")? (Perhaps it's > wrong to categorize that as endianness.) I always believed it to be one or more of the PDP machines - most unix systems call it PDP endian in their header files. That said the jargon file lists the PDP 10 as big endian and the PDP 11 as little endian, and has this to say about the third form: middle-endian adj. Not big-endian or little-endian. Used of perverse byte orders such as 3-4-1-2 or 2-1-4-3, occasionally found in the packed-decimal formats of minicomputer manufacturers who shall remain nameless. Certainly the VAX is a perfectly ordinary little endian system. > > Presumably that's G_Floating that you're converting to/from for > > the VAX rather than D_Floating? > > Yes. Is that going to be a problem? (The sum of programs I've written on > a VAX can be represented with 1 digit. In base 2.) Well VAXC defaults to using D_Floating for doubles but can be made to use G_Floating instead with a switch to the compiler. I'm not sure whether that makes it a problem or not. > I've paper code for converting to and from D_Floating (for general data > migration), but it's range is too restrictive for my liking for floating > point constants inside of bytecode. If this is bumpkis, someone clue me > in, por favor. As you say the exponent is more restricted (it has the same size as in F_Floating which is the single precision format) but the trade off is that the mantissa is larger so you get greater precision at the expense of less range. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Bytecode portablilty
In message <20011210011601$[EMAIL PROTECTED]> "Bryan C. Warnock" <[EMAIL PROTECTED]> wrote: > - Endianness. The three major types are Big, Little, and Vaxian. > Supporting these three should handle the majority of cases. Actually VAXes have perfectly ordinary endianness - it was PDPs that had the middle endian layout. > - Floating point representations. The four major types are IEEE(ish), > Vaxian, Cray's CRI, and the IBM/370 hexadecimal format. There are some > minor variations among these, particularly with how much of the > IEEE-754 standard floating point operations adhere to. However, > adherence falls more into Portability Layer Three, and we will solely > address representation. Of course there are also about five variants of floating point format on the VAX although only two are 64 bits in size. Some of those exist (or are emulated) on Alpha as well although that also has IEEE types. > - I've code that currently converts 32, 64, 96, and 128 bit floating > point representations among all but the IBM format (for which I have > the algorithms on paper, but nowhere to test), optimized for both 32 > bit and 64 bit support. Although 96 and 128 bit handling is currently > hardcoded specifically for conversions between long doubles on x86 > machines and 64 bit processors, I've got alpha code for casting among > arbitrary types. (For casting to and from 32 bit floats on machines > that have no such type, for instance.) IEEE semantics are *not* > supported, and are still a matter for discussion. The implementation > of over- and underflow conversion to BigFloat is missing, for obvious > reasons. I'm still trying to come up with a better interface and > implementation, however. Presumably that's G_Floating that you're converting to/from for the VAX rather than D_Floating? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> Bart Lateur <[EMAIL PROTECTED]> wrote: > On Thu, 06 Dec 2001 00:16:34 GMT, Tom Hughes wrote: > > >So far I have added as is_digit() call to the character type layer > >to replace the existing isdigit() calls. > > There seems to be an overlap with the /\d/ character class in regexes. > Can't you use the same test? Can't you use the definition of that > character class, whatever form it may be in? Well presumably the regex code should use the character type of the string it is matching against when processing \d. There isn't any regex code in yet though is there? Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > Right. Unfornatly, after starting on this, I relized that that's the easy > part. Unicode has a fairly-well defined way of figuring out if a character > is a digit (see if it's category is Nd (Number/digit), and if so what it's > value is (the value of the "decimal" property.) Can it also tell you the base used for digit strings in that character set... Actually I don't know if there are any modern writing systems that don't use base ten but certainly if you were dealing with some ancient scripts that used sexagesimal numbers that might be a problem ;-) > However, there appears to be no good way of determining if somthing is a > decimal point, a sign indicator, or an E/e (exponent signifier). I suspected there wouldn't be. > The attached patch will let the chartype layer decide if a character is a > digit, and what it's value is. The patch seems to be missing though... > Note also that is_digit should now return the value of the digit if it is a > digit, or 42 if it isn't. (I had to use somthing, and ~0 sometimes wanted > to be (char)~0, and sometimes (INTVAL)~0, so I decided not to use ~0. 0, of > course, can't be used for not-a-digit, since is_digit('0')==0. I was assuming there would a separate digit_value() routine to avoid that problem. Apart from anything else there will doubtless me many other is_xxx() routines in due course which will be simple boolean tests. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > On Mon, 3 Dec 2001, Tom Hughes wrote: > > It's completely wrong I would have thought - the encoding layer > > cannot know that a given code point is a digit so it can't possibly > > do string to number conversion. > > > > You need to use the encoding layer to fetch each character and > > then the character set layer to determine what digit it represents. > Right. And then you need to apply some unified logic to get from this > vector of digits (and other such symbols) to a value. Indeed, and that logic needs to be in the string layer where it can use both the encoding routines and the character type routines. I have just rearranged things to reflect that. > I'm just having nightmares of subtily different definitions of what a > numeric constant looks like depending on the string encoding, because of > different bits o' code not being quite in sync. Code duplication bad, > code sharing good. Absolutely. That code is now in one place. > (The charset layer should still be involved somewhere, because Unicode > (for ex) has a "digit value" property. This makes, say, aribic numerials > (which don't look at all what what a normal person calls aribic numerals, > BTW) work properly. (OTOH, it might also do strange things with ex > Hebrew, where the letters are also numbers (Aleph is also 1, Bet is also > 2, etc.)) So far I have added as is_digit() call to the character type layer to replace the existing isdigit() calls. To do things completely right we need to extend that with calls to get the digit value, check for sign characters etc, rather than assuming ASCIIish like it does now. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Moving string -> number conversions to string libs
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Mon, Dec 03, 2001 at 05:42:15PM +, Alex Gough wrote: > > The string to number conversion stuff should really be done by the > > string encodings... I think this is the right way to get this > > happening, comments? > > Looks like the right way to me. Could you commit it? It's completely wrong I would have thought - the encoding layer cannot know that a given code point is a digit so it can't possibly do string to number conversion. You need to use the encoding layer to fetch each character and then the character set layer to determine what digit it represents. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Butt-ugliness reduction
In message <[EMAIL PROTECTED]> Michael L Maraist <[EMAIL PROTECTED]> wrote: > inlined c-functions.. Hmm, gcc has some support for this, but what about > other archectures.. For function-inlining to work with GCC, you have to > define the function in the header.. That's definately not portable. I guess > you're saying that the inlined functions would be the same .c file as it's > being used.. Well, I thought these classes might span multiple files, making > that rather difficult. You only need to define it in the header if it needs to be visible across more than one file - if it is only needed in the file that is implrmenting the scalar class then it can be put there. In fact many compilers will inline small static functions anyway even without an explicit hint in the source. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCHES] ord(i,s|sc(,i|ic)?) operator committed, fixed bug in concat()
In message <[EMAIL PROTECTED]> Jeff <[EMAIL PROTECTED]> wrote: > string.c - Added string_ord() and a _string_index() helper function to > help making accommodating different encodings easier. Patched concat() > to deal with null strings. I have just committed an amendment to this to make string_index use the encoding routines instead of assuming a single byte encoding. I have also renamed _string_index to string_index as function names that start with an underscore are reserved to implementors by the C standard. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCHES] concat, read, substr, added 'ord' operator, and a SURPRISE
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 03:35 AM 11/11/2001 -0500, James Mastros wrote: > > >No, it isn't. I'm not sure s->strlen is always gaurnteed to be correct; > >string_length(s) is. (I found a case where it was wrong when coding my > >version of ord() once, though that ended up being a problem with my > >version of chr(). The point is that string_length is an API, but the > >contents of the struct are not.) > > We shouldn't cheat--the string length field should be considered a black > box until we need the speed, at which point we play Macro Games and change > string_length into a direct fetch. As far as I know the strlen member should always be correct. I was certainly trying to make sure it was because strings.pod explictly says that it will be and that it can be used directly instead of calling string_length(). Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > Do you want me to give you an account in my linux machine where I have > install gcc 3.0.2 so that you see it? I'm not sure that will achieve anything - it's not that I don't believe you, it's just that I'm not seeing the same thing. I have now tried on a number of other machines, and the results are summarised in the following table: Standard Computed Gotos Interpreted CompiledInterpreted Compiled A 3.3533.56 4.63 (+38%) 29.83 (-11%) B 5.6985.2414.08 (+147%) 78.60 (-8%) C 15.09 314.9131.83 (+111%)259.34 (-18%) D 45.87 774.7362.37 (+36%) 795.30 (+3%) Machine A is a 90Mhz Pentium running RedHat 7.1 with gcc 2.96 Machine B is a Dual 200Mhz Pentium-Pro running RedHat 6.1 with egcs 1.1.2 Machine C is a 733Mhz Pentium III running FreeBSD 4.3-STABLE with gcc 2.95.3 Machine D is an 1333Mhz Athlon running RedHat 7.1 with gcc 2.96 Clearly the speedup varies significantly between systems with some giving much greater improvements than others. One other thing that I did notice is that there is quite a bit of fluctuation between runs on some of the machines, possibly because we are measuring real time and not CPU time. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > Yeap, I was right, using gcc 3.0.2 you can see the difference: I've just tried it with 3.0.1 and see much the same results as I did with 2.96 I'm afraid. I don't have 3.0.2 to hand without building it from source so I haven't tried that as yet. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Computed goto, super-fast dispatching.
In message <[EMAIL PROTECTED]> Daniel Grunblatt <[EMAIL PROTECTED]> wrote: > All: > Here's a list of the things I've been doing: > > * Added ops2cgc.pl which generates core_cg_ops.c and core_cg_ops.h from > core.ops, and modified Makefile.in to use it. In core_cg_ops.c resides > cg_core which has an array with the addresses of the label of each opcode > and starts the execution "jumping" to the address in array[*cur_opcode]. > > * Modified interpreter.c to include core_cg_ops.h > > * Modified runcore_ops.c to discard the actual dispatching method and call > cg_core, but left everything else untouched so that -b,-p and -t keep > working. > > * Modified pbc2c.pl to use computed goto when handling jump or ret, may be > I can modified this once again not to define the array with the addresses > if it's not going to be used but I don't think that in real life a program > won't use jump or ret, am I right? > > Hope some one find this usefull. I just tried it but I don't seem to be seeing anything like the speedups you are. All the times which follow are for a K6-200 running RedHat 7.2 and compiled -O6 with gcc 2.96. Without patch: gosford [~/src/parrot] % ./test_prog examples/assembly/mops.pbc Iterations:1 Estimated ops: 3 Elapsed time: 37.387179 M op/s:8.024141 gosford [~/src/parrot] % ./examples/assembly/mops Iterations:1 Estimated ops: 3 Elapsed time: 3.503482 M op/s:85.629098 With patch: gosford [~/src/parrot-cg] % ./test_prog examples/assembly/mops.pbc Iterations:1 Estimated ops: 3 Elapsed time: 29.850361 M op/s:10.050130 gosford [~/src/parrot-cg] % ./examples/assembly/mops Iterations:1 Estimated ops: 3 Elapsed time: 4.515596 M op/s:66.436413 So there is a small speed up for the interpreted version, but nothing like the three times speedup you had. The compiled version has actually managed to get slower... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > As things stand, that won't work, because you're doing a string lookup in one > of the core functions, and you still need some way of registering incoming > stuff. With an enum, you can keep hold of a fake encoding_max, and hand > encoding_max++ to the initialisation function for each encoding. Well there won't be any point in it being an enum rather that an integer unless some of them are going to be preallocated. I'm not sure if the encoding and character types will need to know their own index numbers but if we do then they can be told at initialisation time, yes. I absolutely intend that the current hard coded strings in the core will go away in due course though. When you look up an encoding or character type by name it will first check a hash table or something to see if it is already loaded and if not it will look for it on disk and load it in, allocate it a number, and add it to the hash table for future reference. Hence the current strcmp junk in the lookup functions will go away. In much the same way the byte code will have some sort of table of names which it will look up as it is loaded rather than the current hard coding of name to number mappings in the byte code. So all I need now to make all this work is hash tables and dynamic code loading ;-) Any volunteers... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String rationale
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > On Sat, Oct 27, 2001 at 04:23:48PM +0100, Tom Hughes wrote: > > The encoding_lookup() and chartype_lookup() routines will obviously > > need to load the relevant libraries on the fly when we have support > > for that. > > Could you try rewriting them using an enum, like the vtable stuff and > the original string encoding stuff does? The intention is that when an encoding or character type is loaded it will be allocated a unique ID number that can be used internally to refer to it, but that the number will only valid for the duration of that instance of parrot rather than being persistent. That's certainly the way Dan described it happening in his rationale which is what my code is based on. Allocating them globally is not possible if we're going allow people to add arbitrary encodings and character sets - as things stand adding the foo encoding will be as simple as adding foo.so to the encodings directory. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > In message <[EMAIL PROTECTED]> > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > > At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: > > > > >Attached is my first pass at this - it's not fully ready yet but > > >is something for people to cast an eye over before I spend lots of > > >time going down the wrong path ;-) > > > > It looks pretty good on first glance. > > I've done a bit more work now, and the latest version is attached. Unless anybody has objections I plan to commit this work shortly... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > On Mon, Oct 29, 2001 at 11:20:47PM +0000, Tom Hughes wrote: > > > I suspect that the encode and decode methods in the encoding vtable > > are enough for doing chr/ord aren't they? > > Hmm... come to think of it, yes. chr will always create a utf32-encoded > string with the given charset number (or unicode for the two-arg version), > ord will return the codepoint within the current charset. I hope it will create a string with the given charset number and using the default encoding for that charset. Asking for an ASCII character and getting it UTF-32 encoded would be more that a little bizarre. If I say chr(65,ASCII) then I would expect to get a single byte encoded string... > (This, BTW, means that only encodings that feel like it have to provide > either, but all encodings must be able to convert to utf32.) The way I've written it, any encoding can convert to any encoding at all, because there is no conversion at that level. I just decode a character from the source, transcode it at the character level, and then encode it to the destination. If an encoding cannot handle the full range of character values for a character set then you will get an exception when it tries to encode an out of range character. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu
Re: String rationale
In message <[EMAIL PROTECTED]> James Mastros <[EMAIL PROTECTED]> wrote: > > That leaves the third, which is what I have implemented. When looking to > > transcode from A to B it will first ask A if can it transcode to B and > > if that fails then it will ask B if it can transcode from A. > I propose another variant on this: > If that fails, it asks A to transcode to Unicode, and B to transcode from > Unicode. (Not Unicode to transcode to B; Unicode implements no transcodings.) My code does that, though at a slightly higher level. If you look at string_transcode() you will see that if it can't find a direct mapping it will go via unicode. If C had closures then I'd have buried that down in the chartype_lookup_transcoder() layer, but it doesn't so I couldn't ;-) > > The problem it raises is, whois reponsible for transcoding from ASCII to > > Latin-1? and back again? If we're not careful both ends will implement > > both translations and we will have effective duplication. > 1) Neither. Each must support transcoding to and from Unicode. Absolutely. > 2) But either can support converting directly if it wants. The danger is that everybody tries to be clever and support direct conversion to and from as many other character sets as possible, which leads to lots of duplication. > I also think that, for efficency, we might want a "7-bit chars match ASCII" > flag, since most charactersets do, and that means that we don't have to deal > with the overhead for strings that fit in 7 bits. This smells of premature > optimization, though, so sombody just file this away in their heads for > future reference. I have already been thinking about this although it does get more complicated as you have to consider the encoding as well - if you have a single byte encoded ASCII string then transcoding to a single byte encoded Latin-1 string is a no-op, but that may not be true for other encodings if such a thing makes sense for those character types. > (BTW, for those paying attention, I'm waiting on this discussion for my > chr/ord patch, since I want them in terms of charsets, not encodings.) I suspect that the encode and decode methods in the encoding vtable are enough for doing chr/ord aren't they? Surely chr() is just encoding the argument in the chosen encoding (which can be the default encoding for the char type if you want) and then setting the type and encoding of the resulting string appropriately. Equally ord() is decoding the first character of the string to get a number. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
RE: String rationale
In message <[EMAIL PROTECTED]> "Stephen Howard" <[EMAIL PROTECTED]> wrote: > right. I had just keyed in on this from Tom's message: > > "My code currently allows either set to provide the transform on the > grounds that otherwise the unicode module would have to either know > how to convert to everything else or from everything else." > > ...which seemed to posit that Unicode module could be responsible for > all the transcodings to and from it's own character set, which seemed > backwards to me. I was only positing it long enough to acknowledge that such a rule was untenable. What it comes down to is that there are three possibles rules, namely: 1. Each character set defines transforms from itself to other character sets. 2. Each character set defines transforms to itself from other character sets. 3. Each character set defines transforms both from itself to other character sets and from other character sets to itself. We have established that the first two will not work because of the unicode problem. That leaves the third, which is what I have implemented. When looking to transcode from A to B it will first ask A if can it transcode to B and if that fails then it will ask B if it can transcode from A. That way each character set can manage it's own translations both to and from unicode as we require. The problem it raises is, whois reponsible for transcoding from ASCII to Latin-1? and back again? If we're not careful both ends will implement both translations and we will have effective duplication. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > At 04:23 PM 10/27/2001 +0100, Tom Hughes wrote: > > >Attached is my first pass at this - it's not fully ready yet but > >is something for people to cast an eye over before I spend lots of > >time going down the wrong path ;-) > > It looks pretty good on first glance. I've done a bit more work now, and the latest version is attached. This version can do transcoding. The intention is that there will be some sort of cache in chartype_lookup_transcoder to avoid repeating the expensive lookups by name too much. One interesting question is who is responsible for transcoding from character set A to character set B - is it A or B? and how about the other way? My code currently allows either set to provide the transform on the grounds that otherwise the unicode module would have to either know how to convert to everything else or from everything else. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ # This is a patch for parrot to update it to parrot-ns # # To apply this patch: # STEP 1: Chdir to the source directory. # STEP 2: Run the 'applypatch' program with this patch file as input. # # If you do not have 'applypatch', it is part of the 'makepatch' package # that you can fetch from the Comprehensive Perl Archive Network: # http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz # In the above URL, 'x' should be 2 or higher. # # To apply this patch without the use of 'applypatch': # STEP 1: Chdir to the source directory. # If you have a decent Bourne-type shell: # STEP 2: Run the shell with this file as input. # If you don't have such a shell, you may need to manually create/delete # the files/directories as shown below. # STEP 3: Run the 'patch' program with this file as input. # # These are the commands needed to create/delete files/directories: # mkdir 'chartypes' chmod 0755 'chartypes' mkdir 'encodings' chmod 0755 'encodings' rm -f 'transcode.c' rm -f 'strutf8.c' rm -f 'strutf32.c' rm -f 'strutf16.c' rm -f 'strnative.c' rm -f 'include/parrot/transcode.h' rm -f 'include/parrot/strutf8.h' rm -f 'include/parrot/strutf32.h' rm -f 'include/parrot/strutf16.h' rm -f 'include/parrot/strnative.h' touch 'chartype.c' chmod 0644 'chartype.c' touch 'chartypes/unicode.c' chmod 0644 'chartypes/unicode.c' touch 'chartypes/usascii.c' chmod 0644 'chartypes/usascii.c' touch 'encoding.c' chmod 0644 'encoding.c' touch 'encodings/singlebyte.c' chmod 0644 'encodings/singlebyte.c' touch 'encodings/utf16.c' chmod 0644 'encodings/utf16.c' touch 'encodings/utf32.c' chmod 0644 'encodings/utf32.c' touch 'encodings/utf8.c' chmod 0644 'encodings/utf8.c' touch 'include/parrot/chartype.h' chmod 0644 'include/parrot/chartype.h' touch 'include/parrot/encoding.h' chmod 0644 'include/parrot/encoding.h' # # This command terminates the shell and need not be executed manually. exit # End of Preamble Patch data follows diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST' Index: ./MANIFEST *** ./MANIFEST Sun Oct 28 17:11:21 2001 --- ./MANIFEST Sun Oct 28 17:11:07 2001 *** *** 1,5 --- 1,8 assemble.pl ChangeLog + chartype.c + chartypes/unicode.c + chartypes/usascii.c classes/genclass.pl classes/intclass.c classes/scalarclass.c *** *** 15,20 --- 18,28 docs/parrotbyte.pod docs/strings.pod docs/vtables.pod + encoding.c + encodings/singlebyte.c + encodings/utf8.c + encodings/utf16.c + encodings/utf32.c examples/assembly/bsr.pasm examples/assembly/call.pasm examples/assembly/euclid.pasm *** *** 30,35 --- 38,45 global_setup.c hints/mswin32.pl hints/vms.pl + include/parrot/chartype.h + include/parrot/encoding.h include/parrot/events.h include/parrot/exceptions.h include/parrot/global_setup.h *** *** 46,56 include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h - include/parrot/strnative.h - include/parrot/strutf16.h - include/parrot/strutf32.h - include/parrot/strutf8.h - include/parrot/transcode.h include/parrot/trace.h include/parrot/unicode.h interpreter.c --- 56,61 *** *** 108,117 runops_cores.c stacks.c string.c - strnative.c - strutf16.c - strutf32.c - strutf8.c test_c.in test_main.c Test/More.pm --- 113,118 *** *** 129,135 t/op/time.t t/op/trans.t trace.c - transcode.c Types_pm.in vtable_h.pl vtable.tbl --- 130,135 diff -c &
Re: Opcode complaints
In message <[EMAIL PROTECTED]> "Brent Dax" <[EMAIL PROTECTED]> wrote: > 4. eq and friends: string variants > One thing that seems to be missing is string and numeric variants on the > comparison ops. While this isn't a problem now, it may be once we get > PMCs. Both string and numeric versions of the comparison ops exist... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Attached is my first pass at this - it's not fully ready yet but > is something for people to cast an eye over before I spend lots of > time going down the wrong path ;-) Before anybody else spots, let me just add what I forget to mention in my original post, which is that transcoding isn't implemented yet as I'm still thinking about the best way to do it. There is a hook in place ready for it though. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Tom Hughes <[EMAIL PROTECTED]> wrote: > Other than that it looked quite good and I'll probably start looking at > bending the existing code into the new model over the weekend. Attached is my first pass at this - it's not fully ready yet but is something for people to cast an eye over before I spend lots of time going down the wrong path ;-) The encoding_lookup() and chartype_lookup() routines will obviously need to load the relevant libraries on the fly when we have support for that. The packfile stuff is just a hack to make it work for now. Presumably we will have to modify the byte code format to record the string types as names or something so we can look them up properly? String comparison is not language sensitive here - as before it just compares based on character values. Other than that I think it's aiming in the right direction and it does pass all the tests... Please correct me if I'm wrong. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/ # This is a patch for parrot to update it to parrot-ns # # To apply this patch: # STEP 1: Chdir to the source directory. # STEP 2: Run the 'applypatch' program with this patch file as input. # # If you do not have 'applypatch', it is part of the 'makepatch' package # that you can fetch from the Comprehensive Perl Archive Network: # http://www.perl.com/CPAN/authors/Johan_Vromans/makepatch-x.y.tar.gz # In the above URL, 'x' should be 2 or higher. # # To apply this patch without the use of 'applypatch': # STEP 1: Chdir to the source directory. # If you have a decent Bourne-type shell: # STEP 2: Run the shell with this file as input. # If you don't have such a shell, you may need to manually create/delete # the files/directories as shown below. # STEP 3: Run the 'patch' program with this file as input. # # These are the commands needed to create/delete files/directories: # mkdir 'chartypes' chmod 0755 'chartypes' mkdir 'encodings' chmod 0755 'encodings' rm -f 'transcode.c' rm -f 'strutf8.c' rm -f 'strutf32.c' rm -f 'strutf16.c' rm -f 'strnative.c' rm -f 'include/parrot/transcode.h' rm -f 'include/parrot/strutf8.h' rm -f 'include/parrot/strutf32.h' rm -f 'include/parrot/strutf16.h' rm -f 'include/parrot/strnative.h' touch 'chartype.c' chmod 0644 'chartype.c' touch 'chartypes/unicode.c' chmod 0644 'chartypes/unicode.c' touch 'chartypes/usascii.c' chmod 0644 'chartypes/usascii.c' touch 'encoding.c' chmod 0644 'encoding.c' touch 'encodings/singlebyte.c' chmod 0644 'encodings/singlebyte.c' touch 'encodings/utf16.c' chmod 0644 'encodings/utf16.c' touch 'encodings/utf32.c' chmod 0644 'encodings/utf32.c' touch 'encodings/utf8.c' chmod 0644 'encodings/utf8.c' touch 'include/parrot/chartype.h' chmod 0644 'include/parrot/chartype.h' touch 'include/parrot/encoding.h' chmod 0644 'include/parrot/encoding.h' # # This command terminates the shell and need not be executed manually. exit # End of Preamble Patch data follows diff -c 'parrot/MANIFEST' 'parrot-ns/MANIFEST' Index: ./MANIFEST *** ./MANIFEST Wed Oct 24 22:16:51 2001 --- ./MANIFEST Sat Oct 27 14:59:43 2001 *** *** 1,5 --- 1,8 assemble.pl ChangeLog + chartype.c + chartypes/unicode.c + chartypes/usascii.c classes/genclass.pl classes/intclass.c config_h.in *** *** 14,19 --- 17,27 docs/parrotbyte.pod docs/strings.pod docs/vtables.pod + encoding.c + encodings/singlebyte.c + encodings/utf8.c + encodings/utf16.c + encodings/utf32.c examples/assembly/bsr.pasm examples/assembly/call.pasm examples/assembly/euclid.pasm *** *** 29,34 --- 37,44 global_setup.c hints/mswin32.pl hints/vms.pl + include/parrot/chartype.h + include/parrot/encoding.h include/parrot/events.h include/parrot/exceptions.h include/parrot/global_setup.h *** *** 45,55 include/parrot/runops_cores.h include/parrot/stacks.h include/parrot/string.h - include/parrot/strnative.h - include/parrot/strutf16.h - include/parrot/strutf32.h - include/parrot/strutf8.h - include/parrot/transcode.h include/parrot/trace.h include/parrot/unicode.h interpreter.c --- 55,60 *** *** 107,116 runops_cores.c stacks.c string.c - strnative.c - strutf16.c - strutf32.c - strutf8.c test_c.in test_main.c Test/More.pm --- 112,117 *** *** 128,134 t/op/time.t t/op/trans.t trace.c - transcode.c Types_pm.in vtable_h.pl vtable.tbl --- 129,134 diff -c
Re: Ooops, sorry for that blank log message.
In message <[EMAIL PROTECTED]> Brian Wheeler <[EMAIL PROTECTED]> wrote: > Darn it, I fat fingered the log message. > > This is a fix which changes the way op variants are handled. The old > method "forgot" the last variant, so thing(i,i|ic,i|ic) would > generate: > thing(i,i,i) > thing(i,i,ic) > thing(i,ic,i) > > but not > > thing(i,ic,ic) It didn't forget it, it went to some considerable trouble to ignore it on the grounds that such an opcode is pointless as alll the operands are constant. I did describe the algorithm used and the logic behind it on the list when I implemented it. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: String rationale
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > =item type > > What the character set or type of data is encoded in the buffer. This > includes things like ASCII, EBCDIC, Unicode, Chinese Traditional, > Chinese Simplified, or Shift-JIS. (And yes, I know the latter's a > combination of type and encoding. I'll update the doc as soon as I can > reasonablty separate the two) Isn't this going to need to be a vtable pointer like encoding is? Only some things (like character classification and at least some transcoding tasks) will be character set based rather than encoding based. Other than that it looked quite good and I'll probably start looking at bending the existing code into the new model over the weekend. Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Resync your CVS...
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > On Mon, 22 Oct 2001, Sam Tregar wrote: > > > Fresh checkout won't compile on Redhat Linux 7.1: > > Damn. It compiled cleanly before I checked it in. I'll patch up again and > see what I missed. Probably some odd dependency or timing issue > somewhere. (It's emacs fault! Yeah, that's the ticket! :) I'd already patched it up, so I've just committed my fix... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: [PATCH] Bugfix for push_generic_entry
In message <[EMAIL PROTECTED]> Jason Gloudon <[EMAIL PROTECTED]> wrote: > The "stacktest" patch will fail on the current CVS source, due to a bug in > push_generic_entry. This looks good to me so I have committed it. Thanks for spotting it! Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: PMCs and how the opcode functions will work
In message <[EMAIL PROTECTED]> Simon Cozens <[EMAIL PROTECTED]> wrote: > I've now changed the vtable structure to reflect this, but I'd like someone > to confirm that the "variant" forms of the ops can be addressed the way I > think they can. (ie. structure->base_element + 1 to get "thing after > base_element") Legally speaking they can't as ISO C says that you can't do pointer calculations and comparisons across object boundaries and separate members of a structure are different objects. If you replace this: set_integer_method_t set_integer_1; set_integer_method_t set_integer_2; set_integer_method_t set_integer_3; set_integer_method_t set_integer_4; set_integer_method_t set_integer_5; with this: set_integer_method_t set_integer[5]; then you would be able to, as an array is all one object. Practically speaking I think it will work on every system that I can think of at the moment but who knows what wierd things are out there... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/
Re: Have I given the big "The Way Strings Should Work" talk?
In message <[EMAIL PROTECTED]> Dan Sugalski <[EMAIL PROTECTED]> wrote: > I've given it a few places, but I don't know that I've sent it to > perl6-internals. If not, or if I should do it again, let me know. I want to > make sure we're all on the same page here. Not that I recall. I thought that was what strings.pod was... Tom -- Tom Hughes ([EMAIL PROTECTED]) http://www.compton.nu/