Re: IMCC temp optimizations...
A good register allocation algorithm will always have problems with big subs, there is nothing that we can do about it. I think that what "real compilers" do to solve this problem is implement two different algorithms: one for normal subs which tries to generate optimal code, and a naive one for very large subs with many virtual registers. That makes compilation much faster, and the execution penalty doesn't hurt too much. Actually, it's (for me) an open question whether the "good" register allocation should be the default one. Perl (and python and..) users expect blazing compilation times, so maybe we should reserve it for higher -O levels. But then, we won't know how bad are our compilation times until there are "real" programs written in perl6/parrot. -angel Leopold Toetsch wrote: > Dan Sugalski <[EMAIL PROTECTED]> wrote: > > Dunno what parrot thinks--it's not done yet. grep says 569 > > .locals and 473 temp PMC registers. > > I've now enabled some more code that speeds up temp allocation more > (~2.5 for 2000 temps in a unit). This needs that the usage range is > limited to 10 lines. If the code from a compiler looks like the > output from the program below, this works fine.
Re: OT: Will the State of the Onion be posted online?
Monday 14 July 2003 05:56, Robert Spier wrote: > > Sorry for a slightly off-topic post, but will Larry's State of > > the Onion be posted online soon? > > Yes. > > http://dev.perl.org/perl6/talks/ > > -R Really? I can't find anything about TPC7 in this page.. -angel
Re: Using imcc as JIT optimizer
> [ you seem to be living some hors ahead in time ] Yep, sorry about that. > The problem stays the same: spilling processors to parrot's or > parrots to array. > Thinking a bit more about it, now I believe that the best way to do it would be: (1) First, do a register allocation for machine registers, assuming that there are N machine registers and infinite parrot registers. (2) Second, do a register allocation for parrot registers, using an array as spill area. The first step assures us that we generate code that always puts data in the availabe machine registers, and tries to minimize moves between registers and the physical memory. The second step tries to put all the data in parrot registers, and if it is not able to do that in the parrot spilling area (currently an PerlArray) For example, code generated by (1) would look like: set m3, 1 # m3 is the machine 3d register add m3, m3, 1 print m3 set $I1, m3 # $I1 is a parrot virtual register etc... Then we would do register allocation for the virtual $I1 registers, hoping to be able to put them all in the 32 parrot registers. I believe this would be the optimal way to do it, because it actually models our priorities: first to put all data in physical registers, otherwise try do it in parrot registers. This is better than reserving the machine registers for the most used parrot registers (your original proposal) or doing a pyhsical register allocation and assuming that we have an infinite number of parrot registers (my original proposal). Hope that it know make more sense, -angel
Re: Using imcc as JIT optimizer
I explained very badly. The issue is not spilling (at the parrot level) The problem is: if you only pick the highest priority parrot registers and put them in real registers you are losing oportunities where copying the date once will save you from copying it many times. You are, in some sense, underspilling. Let's see an example. Imagine you are compilling this imc, to be run in a machine which has 3 registers free (after temporaries): set $I1, 1 add $I1, $I1, 1 print $I1 set $I2, 1 add $I2, $I2, 1 print $I2 set $I3, 1 add $I3, $I3, 1 print $I3 set $I4, 1 add $I4, $I4, 1 print $I4 set $I5, 1 add $I5, $I5, 1 print $I5 print $I1 print $I2 print $I3 print $I4 print $I5 Very silly code indeed, but you get the idea. Since we have only 5 vars, imcc would turn this into: set I1, 1 add I1, I1, 1 print I1 set I2, 1 add I2, I2, 1 print I2 set I3, 1 add I3, I3, 1 print I3 set I4, 1 add I4, I4, 1 print I4 set I5, 1 add I5, I5, 1 print I5 print I1 print I2 print I3 print I4 print I5 Now, assuming you put registers I1-I3 in real registers, what would it take to execute this code in JIT? It would have to move the values of I4 and I5 from memory to registers a total of 10 times (4 saves and 6 restores if you assume the JIT is smart) [This particular example could be improved by making the jit look if the same parrot register is going to be used in the next op, but that's not the point] But, if IMCC knew that there were really only 3 registers in the machine, it would generate: set I1, 1 add I1, I1, 1 print I1 set I2, 1 add I2, I2, 1 print I2 set I3, 1 add I3, I3, 1 print I3 fast_save I3, 1 set I3, 1 add I3, I3, 1 print I3 fast_save I3, 2 set I3, 1 add I3, I3, 1 print I3 fast_save I3, 3 print I1 print I2 fast_restore I3, 3 print I3 fast_restore I3, 2 print I3 fast_restore I3, 1 print I3 When running this code in the JIT, it would only require 6 moves (3 saves, 3 restores): exactly the ones generated by imcc. In reality this would be even better, because as you have the garantee of having the data already in real registers you need less temporaries and so have more machine registers free. > So the final goal could be, to emit these load/stores too, which > then could be optimized to avoid duplicate loading/storing. Or imcc > could emit a register move, if in the next instruction the parrot > register is used again. Yes, that's the idea: making imcc generate the loads/stores, using the info about how many registers are actually available in the real machine _and_ its own knowledge about the program flow. An even better goal would be to have imcc know how many temporaries every JITed op requires, and use this information during register allocation. All this is obviously machine dependent: the code generated should only run in the machine it was compiled for. So we should always keep the original imc code in case we copy the pbc file to another machine. -angel
Re: Using imcc as JIT optimizer
Saturday 22 February 2003 16:28, Leopold Toetsch wrote: > Gopal V wrote: > > If memory serves me right, Leopold Toetsch wrote: > > > > > > Ok .. well I sort of understood that the first N registers will > > be the ones MAPped ?. So I thought re-ordering/sorting was the > > operation performed. > > Yep. Register renumbering, so that the top N used (in terms of > score) registers are I0, I1, ..In-1 With your approach there are three levels of parrot "registers": - The first N registers, which in JIT will be mapped to physical registers. - The others 32 - N parrot registers, which will be in memory. - The "spilled" registers, which are also on memory, but will have to be copied to a parrot register (which may be a memory location or a physical registers) before being used. I believe it would be smarter if we instructed IMCC to generate code that only uses N parrot registers (where N is the number of machine register available). This way we avoid the risk of having to copy twice the data. This is also insteresting because it gives the register allocation algorithm all the information about the actual structure of the machine we are going to run in. I am quite confident that code generated this way would run faster. We also need tho have a better procedure for saving and restoring spilled registers. Specially in the case of JIT compilation, where it could be translated to a machine save/restore. What do you think about it? -angel
Re: Infant mortality
> > The best partial solution to early finalization is compile-time > tracing of possible references by the compiler which can explicitly > generate the appropriate DESTROY calls. > What about refcounting + real GC? refcounting will free most of the objects as soon as possible, and the real GC system makes sure that even cyclical references are eventually freed.. -angel (who is not really paying atention at all)
Re: branch dump
> Hmm wouldn't the JIT benifit from a pre knowledge of basic > blocks and types or some information ? ... (I seem to think so > ...). I would think so, because if, for example, the JIT wants to do a full register allocation to map parrot registers to machine registers, it would certainly need information about basic blocks. (I am talking of a complete register allocation, that would re-do the original register allocation of imc with the actual number of registers available in the machine) On the other hand, the JIT could certainly regenerate this information from the imc code, which is probably going to be stored somewhere anyway. -angel
Re: Parrot: maximizing the audience
Dan Sugalski wrote: > At 7:40 AM -0400 9/4/02, John Porter wrote: > > > >Some folks seem to think that sufficient reason exists now. > > That's fine. You don't have to convince some folks. You have to > convince me. Being ultra-pragmatic, the name change: - costs nothing - might make some folks happier So it's a win-win, isn't it? Even if, of course, it doesn't really matter at all. Although symbols do matter sometimes, specially in politics, which is just what we are talking about. -angel
Re: core.ops ARGDIR
Hi Leo, > > This should be - from my (imcc) POV - reflected by these IN/OUT > settings: > > op set(in PMC, in INT) > op set(in PMC, in STR) > op set(in PMC, in NUM) > op set(out PMC, in PMC) # ok, $1 points to $2 now > > # P[i] = x > op set(in PMC, in intkey, in x) > # P[KEY] = x > op set(in PMC, in KEY, in x) > # p[KEY] = P[KEY] > op set(in PMC, in KEY, in PMC, in KEY) > Shouldn't all this PMC be "inout"? They depend on the previous value of the PMC, but they also modify it. This probably doesn't affect imcc now, but it might be useful in the future. -angel
Encodings - Questions
Hi, Now that we've got ICU in, I thought it may be time to revisit the encodings implementation. I am a clamorous ignorant is unicode/encodings issues, so please be patient with me. :) >From what I have asked people at IRC, and what's on the list archives, my understanding of how parrot will work with various encodings is: i) After an IO operation, strings are preserved on their original encoding, whatever it is. ii) Parrot will try to keep the string in such encoding, for as long as possible. iii) When some operation that requires ICU is requested, parrot will feed the string to ICU, who will convert it to UTF-16 (its internal encoding) and then perform the desired operation. Please correct me if this is wrong. Now, my questions are: I. About iii): I can imagine at least three different options about what to do with the converted UTF-16 string: a) We can discard the UTF-16 version, and recompute the conversion each time. (this is costly, isn't it?) b) We can replace the original string with the "upgraded" version, so strings will lazily become converted to UTF-16. (this makes sure that the conversion is only done once, but is conversion to UTF-16 always lossless?) c) We can store the UTF-16 version along the original one. (this is doubles the memory usage, plus it may be hard to implement) Each approach has its pros and cons. Which one is the right one? II. About ii): Which is exactly the point at which we decide to feed the string to ICU, and what operations should we (as parrot developers) implement in our own layer?. For example, let's take a relatively simple operation, such as uppercasing an string, and let's assume that the string is on a friendly encoding, such as ISO-8859-1. Even with this assumptions, conversion to uppercase is not straightforward, since it's locale-dependent (or to be more precise, it might be locale-dependent if the user chooses to). We could, of course, implement all locale-aware operations for each encoding and each locale, but how much work do we want to put on this? So, exactly what string functionalities do we want to implement ourselves in a per-encoding basis, and which ones are we going to forward to ICU? -angel
Re: imcc hack for perl6 regexes
> c) imcc becomes a "middle" level language. > I never meant it to be an assembler. In practice > intermediate languages provide other constructs > such as aggregate type definition that are not > available in the assembler. > > type i : packed int[32] > type r : record { foo : int, bar : string } > > This is not assembler. This is medium level > computer language in my book. You could even > see things like > > ..pragma classes_are_hashes > > or something that would tell the compiler that > aggregates should be implemented as standard > "Perl" hashes in this piece of code, whereas > others might want "packed" aggregates with > no runtime introspection capability. > Now, this is interesting. I confess I was only thinking on dynamic perl-like languages, but I think you have some secret plans.. But, this features don't change the point about the fact that infix notation doesn't make the job of the language compiler easier (neither it makes it harder) and is a duplicate feature once we have direct access to parrot ops. > > But I'm willing to invent and develop another language. And it should be a > lot richer than > just an assembler with a register allocator > and spiller. :) > Don't forget optimitzations. That's where the real fun is.. :) -angel
Re: Off-list discussions, was Re: imcc hack for perl6 regexes
> > Sure, I have no problem with it. At one > time someone suggested making a separate > list for Parrot compilers, so I took it as > a hint that maybe we were spamming. > I am all for the creation of a new list for stuff such as imcc, and perl6 compilers. ([EMAIL PROTECTED]?) So people interested in parrot internals (the hard stuff: GC, JIT,..) don't need to listen our discussions about bugs on imcc, feature requests, etc.. On the other hand people like me that cannot keep track of all that happens on perl6-internals, would have an easier way of focusing on areas were we can actually contribute something. -angel
Re: imcc hack for perl6 regexes
> I still prefer infix notation to prefix notation for an intermediate > language. I don't understand why it is so hard to adopt. imcc is supposed > to be a step closer to higher level languages, which is why I went that > way. Hi, I think we all agree that since parrot can have dynamic oplibs (and core parrot has hundreds of ops), imcc needs some way to directly express them. The idea of having parrot ops be included as such, and imcc directives be prepended with "." looks fair to me. So we end having: a) imcc becomes like parrot assembly, but with virtual registers and a few administrative ops (for calls and such), that start with "." or b) imcc becomes like parrot assembly, but with virtual registers, a few administrative ops, and some convenient infix notation like for stuff like "a =1" or "b = a + 1", There isn't much difference between both proposals, when you think about it. The only benefit of proposal b) is that hand-writing of imc becomes nicer, but i would not expect much of it. Additional to the learning benefit for developers of parrot-based languages, the usage of parrot ops "as is" is good for us, because we don't need to invent, develop, and document, another language. It lets us focus on making imcc do well a few tasks (ahem) such as register allocation and language-neutral optimitzations. >From a more conceptual point of view, I would favour imcc being as similiar to the assembler as possible because: - parrot assembler is already very high-level - imcc won't probably be retargeted to a different plataform. If we really wanted to make imcc retargetable, we would need much more than a friendlier syntax; and honestly I am not sure that is worth it. To sum up, my vote would be for a assembler-like imcc. The infix notation doesn't add much value in a intermediate language such as imcc. I am not a priori against imcc diverging from assembly. But I think the only good reason for divergence should be making the task of the language compiler simpler. About the implementation, I think we will need the following metadata about each op: i) the opcode, and the op name. ii) the type of arguments, including in/out/etc.. iii) whether the opcode has side-effects or not. This is important for a number of optimizations: constant-folding, optimitzations that work by moving a instruction to a more convenient point, etc. Best, -angel
Re: [perl #15797] [PATCH] Regex speedup (with docs)
Sunday 18 August 2002 00:38, Simon Cozens wrote: > [EMAIL PROTECTED] (Dan Sugalski) writes: > > Has someone looked at and maybe committed this? > > The reason I asked which pieces of Parrot were prototypes was > because optimizing the hell out of something that's only a > prototype is nothing short of intellectual masturbation, and it > seems nobody actually learnt anything from my YAPC presentation > after all. One possible reason would be to learn if a speed problem is due to the design or the implementation. Sometimes you only know if something can be fast enough until you try it to optimize the hell out of it. Could you share with the ones of us who where not in YAPC what was this presentation about? -angel
Re: PARROT QUESTIONS: Keyed access: PROPOSAL
Sunday 21 July 2002 21:34, Dan Sugalski wrote: > > No. They are not. You're missing the important case where the data > structure is inherently and intrinsically multidimensional. > >my int Array @foo : dimensions(3); >@foo[1;2;3] = 12; > > Or whatever the syntax is in perl to declare a 3D array and access > the element at (1,2,3). > In my opinion, there are actually two different things to dicuss: - keyed opcodes - keyed methods on the PMC vtable You can keep keyed opcodes, without actually implementing the keyed methods. Actually, ALL our keyed methods look like: ... stuff to get the value of SELF[key] into VALUE.. VALUE->vtable->method(..) Puting this code into the opcode, and removing the vtable methods (except get_keyed_*), shouldn't make it slower at all IMHO (same job being done, in the same way. The same number (two) of vtable calls). Keyed opcodes can stay in the interest of code density. > > No. Keyed access for all methods stays. You're forgetting one very > important case, the one where the contents of an aggregate isn't a > PMC. This scheme would then require the aggregate PMC to construct > fake PMCs to do operations on. Bad, definitely bad. Is really so bad? As long that what is ultimately stored in the aggregate is one of our base types (pmc, int, string, float), the opcode shouldn't need to build the fake PMC. Or, ¿are you talking about multi-dimensional PMCs? I don't see why these should need to build fake PMCs: you just call the get_keyed method, and it will do the Right Thing (probably getting the value directly, maybe propagating a portion of the key to a inner PMC). As I said, there should no be any speed cost at all, and a significant improvement on code size. But this is only an opinion. I am willing to submit a benchmarking patch if it has some interest. Best, -àngel
RE: Three address code, named variables
Hello, >If people have different opinions on intermediate code generation, I'd >like to hear them, since I've only done toy compilers in school which >always used 3address or quadruples. I have been working in a compiler for an intermediate language that I call P-- (obviously inspired in C--) for now. The language offers some low-level abstractions (register-allocation for locals, function calls) and the idea is that it could be used for lazy language implementors that don't want to mess with low-level aspects of Parrot's VM. Register allocation is implemented using Briggs' alghorism (see http://citeseer.nj.nec.com/briggs92register.html) I was planning to finish testing it this weekend and send a first version to the list. It's mostly a learning project for myself, but it could even be useful for someone.. who knows. Sincerly, -angel who never programmed a compiler before, even in school
RE: Globals
Dan wrote: >Yep, I've seen their plans. It's less an issue for us, at least as >far as globals are concerned, since we'll be doing that with >lexicals. (Python not having lexicals, after all) Globals are a bit >more interesting, since bytecode-loaded modules can't guarantee >global positions, since there's no way of knowing at runtime what's >been stuffed into the global tables already. > >Anyway, doing this with lexicals, which we can know at compile time, >will get us these speed boosts, and has been on the list for ages. > Oh, I see. Just two quick questions: * will sub definition create globals? I mean, does "sub a {}" create a function PMC and store it into the "a" lexical? or will there be a separate mechanism for functions? * Is there any chance that lexicals get mapped to registers, at least in the cases where the block doesn't use any dynamic features (MY%, eval, etc..)? Thanks. -angel
RE: [pythonesque] vtable ideas
Simon said: > > I see. ?But what's the rational to have Bigints / Bigfloat be native types > > instead of PMCs? > > They're useful and predictable. And also, you may want to automatically > promote integers to bigints; you probably don't want to automatically > promote integers to complex types. :) > Who knows... what about sqrt(-1)? Anyway, i get your point. > > In other words, in my humble opinion (which is a not very qualified one :) > > it is a bad idea to put the language-dependent behaviour in the PMCs because > > PMCs hold user's data, and data should freely travel the language > > boundaries. > > > The right place to put language-dependent behaviour is in the opcode, which > > is generated by a language-aware compilation of this particular fragment of > > code. > > I'm afraid I have to disagree very violently with you on this one. > Well, until now your disagreement has been expressed in very gentle terms; so i wouldn't call it violent. :) Anyway, the whole point I try to express, is that making "add" have a different behaviour depending of the language that _created_ the object is wrong. Because, , and , and , are really properties of the perl, python, javascript, ruby, scheme, or java4parrot code that is running this very line of user-code. If you put language-dependent behaviour on the PMC, then you are definitely forced to change the type of a PMC each time it crosses a language boundary, or risk at nasty things. Maybe that's not a problem, maybe it is (speed, information loss, rich types, code complexity, etc..). There is another way of leaving the door open to other languages, and it is to keep the set of responsibilities of PMCs a bit smaller, the definition of vtable methods clear and unique and add a per-language layer (with its per-language opcodes) who takes care of implementing javascript-add in terms of standard-parrot-PMCs operations. Whatever, I understand this is a significant difference from the current state of things, so maybe a wise choice is to mantain the current behaviour of PMCs until the problems I expect become reality, and only if they really do. -angel
RE: [pythonesque] vtable ideas
> > These methods are provided so that we can play with "native" types in our > bytecode then send them to our pmc, as otherwise every constant in our > code would have to be a PMC, which is a little excessive. > > Things like a complex number will be represented by either a PMC class > (very likely for complex numbers) or a language level overloaded class > (like a Quantum::Entanglement). > I see. ¿But what's the rational to have Bigints / Bigfloat be native types instead of PMCs? > > Additionally, in the specific Parrot case (where we want to execute code of > > different programming languages) I would say that this is the only way to do > > the right thing, because the same mathematical operation can require > > different coercion polices for different languages. (Think on perl, who > > Thus we have PerlInts, PythonInts etc... In general these will inherit > from default.pmc except where they do something different. > > > So it looks to me that coercion doesn't belong to the object domain, but to > > the language domain, (since a number born in perl could easily end up in > > python) and thus its code shouldn't live in the vtable but in the opcode (or > > Passing things between languages will need a well defined type coercion > mechanism. In general I expect that a PerlString will be turned into > a string constant then loaded into a PythonString by code that knows what > it's doing. After all, we need to do something simillar when passing > SVs into C code at the moment, and I'm sure Inline::* will provide not > only endless amusement but also some useful advice on this. > ¿What about arrays, hashes, etc.? How do we expect arr1 + arr2 to concatenate the two lists when called in python but not in perl. Eithere we define an intermediate type for everything (which can be tricky for user-defined classes that inherit core types, and slow if inter-language calls are frequent) or have some other mechanism to implement the various meaning of operators. It looks to me that a + cannot be directly translated to pmc->vtable->add, since in that particular language + can mean anything different (or additionally to) mathematical additon. We should separate the definition of vtable methods (that in my humble opinion should have a clear, language-independent, meaning, from the actual language-dependent interpretation of operators, casting between types, dwiminery, etc.. For instance, we should not have arrays PMC evualate to its length automagically when called in scalar context, but have a separate length method. Then, perl opcodes that require scalar context can call this method if it turns out that its PMC argument is an array. This would allow to have the core PMC types be language-independent, and thus they will be able to pass freely between languages. On the other hand, we'll need language-dependent opcodes, but i guess this will happen anyway. The only thing required is to have the compiler and the opcodes be a bit smarter. In other words, in my humble opinion (which is a not very qualified one :) it is a bad idea to put the language-dependent behaviour in the PMCs because PMCs hold user's data, and data should freely travel the language boundaries. The right place to put language-dependent behaviour is in the opcode, which is generated by a language-aware compilation of this particular fragment of code. -angel
[pythonesque] vtable ideas
Hi all- I have been checking the Python source code (something I am sure all of you already have done :-) and some differences on the design surprised me so much that I decided to come and make some questions. I guess that most of you know well the python source code, so please apologize the lengthy mail. I focused on reviewing the implementation of PythonObjects (Python's PMC) and found some interesting points that I would like to discuss with this list. These are: I. Missing methods in parrot: The are many vtable methods in python not present in parrot. Some of the ones I would expect to really be needed are: * divmod# integer division * power * absolute * invert * compare And probably many others.¿Is there any reason not to have this ones at least? They've got also a bunch of methods for things like calling builtin methods, getting and setting object attributes, and so on. But I guess these ones should better stay out until that part of the design is more advanced. BTW, ¿do we have any idea of how will user-defined objects look like? ¿Will methods be object's attributes, or just a funny syntax to call functions? ¿Will Parrot allow user-defined class to inherit from builtin-classes (another Python's requirement)? II. The "protocols" concept In Parrot, there is a single vtable structure for objects. Python, in the other hand, defines a set of protocols that are optionally implemented by objects (Number, Sequence, Mapping, Buffer and Iterator I think). Objects that implement the Number protocol have a non null entry on the main vtable called tp_as_number that holds a pointer to yet another vtable with the number-relatad methods. This allow for easy checking if an object can be operated numerically (just test pmc->vtable->as_number != NULL) but adds yet an indirection level (pmc->vtable->add gets pmc->vtable->as_number->add) I cannot tell what other design or performance benefits produces the choice of putting so frequently used methods in a deeper indirection level but would love to listen from those who know more. III. A different solution for coercion In Parrot, we currently define a new family of methods for each type of number (or string btw). So we have: void add (PMC * value, PMC* dest) void add_int (INTVAL value, PMC* dest) void add_bigint (BIGINT value, PMC* dest) void add_float (FLOATVAL value, PMC* dest) void add_bigfloat (BIGFLOAT value, PMC* dest) void add_same (PMC * value, PMC* dest) If we wanted to a add a new number or string type (for instance a Complex type), then we should create a new family of add_complex, subtract_complex, etc.. methods, and implement them on each PMC class. Intuition tells that this cannot be the best way to handle this problem. Python's solution is to define an intermediate function (let's call it Perl_NumberAdd) that implements the coercions between numbers, so the PMC is only responsible of implementing the "add" method with an object of the same type (what is currently add_same) This simplifies very much the creation of hierarchy of types for numbers, and localizes the changes to make when a new number type is introduced. Additionally, in the specific Parrot case (where we want to execute code of different programming languages) I would say that this is the only way to do the right thing, because the same mathematical operation can require different coercion polices for different languages. (Think on perl, who automatically converts strings to numbers when performing an "add", against python, who would raise a type error). So it looks to me that coercion doesn't belong to the object domain, but to the language domain, (since a number born in perl could easily end up in python) and thus its code shouldn't live in the vtable but in the opcode (or a function called by it), and that we should have two different opcodes for python's add (py_add), and perl's add (perl_add). To sum up: kill all the exotic versions of add, subtract and friends, and implement coercion in a per-language function, that gets called by a per-language opcode version. -- I am honestly not very sure to make much sense in any of these, and I am more than willing to learn why our way is better than their's, if that's the case. On the other hand... who knows. If any of this pythonesque influences haves some interest for Parrot, (I am particulary interested in the third one) I would volunteer to contribute a patch attempt for it. Angel Faus [EMAIL PROTECTED]
RE: parrot rx engine
Ashley Winters wrote: >First, we set the rx engine to case-insensitive. Why is that bad? It's >setting a runtime property for what should be compile-time >unicode-character-kung-fu. Assuming your "CPU" knows what the gritty >details of unicode in the first place just feels wrong, but I digress. I tend to agree to that. Many run-time options can be turned to compile-time versions of the opcodes, which hopefully will produce a speed increase. >Once you squash rx_literal and friends, any attempt to benchmark the >"rx" engine really becomes a benchmark of parrot itself. When you speed >up parrot, you speed up regular expressions. Voila, no more black box. >If Parrot is just too damn slow for you, whip out libmylang and do the >nitty gritty yourself. Since this is mostly a "just don't do it" post, >no code is actually *required* from me, right? :) We are already doing so. What you are suggesting in fact, is to compile down regular expressions to Perl code (and this one to Parrot then). This will be always slower than directly generating Parrot, because some Perl features prevent the heavy use of some optimitzations (think JIT) that are necessary if we want good regex perfomance. In other words. With your proposal, if you have a better general-purpose optimizer you will get better regex perfomance, but it will always remain worse than the current state. If what you are suggesting is that everything is compiled to general-purpose opcodes (branch, unicode's, etc..) [which is what is derived from your words, but not from your examples], I still believe this to be a perfomance mistake. It would dramatically reduce the code density, and no matter how fast parrot dispatch is, this will kill your perfomance. And using too much stacks (as the usage of exceptions would probably require), will also be too slow (as Brent Dax showed me when we where discussing our two regex opcodes designs). Just my 2 cents (of euros) :) --- Angel faus [EMAIL PROTECTED]
[PATCH] multimethod ops
Simon Cozens: >* Modify make_vtable_ops.pl to produce variant ops which pass in > INTVAL, FLOATVAL and STRING arguments to the multimethods, (Ask > me for more details if you want to do this.) so we can have > > add Px, Px, Ix > and add Px, Px, Ic > and the like working. This patch should make it. I have not included STRING arguments, because I don't know which multimethod should be called (native, unicode or other). Tests are updated too. Sincerly, ---- Angel Faus [EMAIL PROTECTED] Vtable.pm Description: Binary data make_vtable_ops.pl Description: Binary data pmc.t Description: Binary data
RE: PMC Classes Preprocessor
Hi, Currently pmc2c.pl requires that the user writes methods in .pmc files in exaclty the same order than in vtable.tbl. That's not a nice thing to do. The version I am attaching hopefully fixes this, and creates a dummy version of the method if the user has not provided one. This allows us to add new methods in vtable.tbl, without worrying about old ..pmc files. btw, sorry for that silly Text::Balanced dependency. I stupidly assumed it was a standard module without checking. Sincerly, --- Angel Faus [EMAIL PROTECTED] pmc2c.pl Description: Binary data
Re: [Proposal] Regex documentation
> Okay, I've got another argument for you: speed. > That was a killer one. > > C:\Brent\VISUAL~1\PERL6~1\parrot\parrot>rebench.pl > Explicit backtracking: 75 iterations, about 30.5380001068115 > seconds. > Implicit backtracking: 75 iterations, about 655.510995864868 > seconds. > > The numbers there are total time. > Really amazing. More than I could ever imagined. > > I can provide source for everything on request, including a new core.ops > and some supporting files, the test assembly file, and the benchmark > runner script. > Cool. ¿Could you send it to me? I would like to make a try to speed my proposal... Probably there is no way to catch up, but I would like to see it anyway. Meanwhile, I'll shut up my mouth and update regex.pod to reflect your proposal. Sincerly, --- Angel Faus [EMAIL PROTECTED]
Psyco - Python specializing compiler
Hi, I have found a reference for a very interesting project related to accelerating Python with some nice ideas, that maybe could be applied to Parrot too. It's called Psyco (standing for Python Specializing Compiler) and works (if I understood it right) by creating specialized versions of python functions which optimize a particular type of parameters. The specialization can happen both at compile-time, run-time, or (most of times) will start at compile-time, stop until some run-time value is known, compile a bit more, and so on. It is a quite general trick, and could be used to implement some of the optimizing strategies that have been discussed here (like PMC opcodes inlining, constant folding, etc..) without requiring Parrot to know on advance things that it simply cannot know. They have some problems that are very similar to the ones discussed in this mailing list in a previous thread (users can redefine bultins, globals can get rebound somewhere in the wild, etc..); they have not addressed this yet, but suggest having a signal system on the globals lexical hash that fires a backtrack when anyone is playing dangerous games with it). The URL is http://homepages.ulb.ac.be/~arigo/psyco/ ; they have some very nice documents that explain all with a great detail. I wonder whether Parrot could stole some ideas of this.. Just mi half a cent. -Angel Faus [EMAIL PROTECTED]
Re: A serious stab at regexes
Hi Brent, > > It just means you have to be more explicit. I consider that a Good > Thing--Perl 5's regular expressions are compact enough to be represented > like: but the internals to support them are an absolute jungle. I'd rather > have a few exposed ops than have a chunk of code like Perl 5's regular > expressions. Besides, being able to /see/ where we jump to on each op > when we fail will help when debugging the RE compiler and such. > I totally agree. It is not about having fewer opcodes or a more compact syntax, but a more maintainable system, I just pretend that having a stack were you predeclare the possible continuation paths is simpler than your marks idea. I could be biased here of course. I would love some comments (specially of someone with experience in this area) about whether were are going the right way here. > How do you plan to support lookahead? Unless I'm mistaken, with my > proposal it goes something like: > > rePushindex > rePushmark > #lookahead code in here > rePopmark > rePopindex > This could be solved by having a new pair of ops (reGetIndex /reSetIndex) that save and restore the current index into an INTVAL register: reGetIndex I1 #lookahead code in here reSetIndex I1 So lookahead would be an special case, but that's fine, because it _is_ an special case. > There are advantages and disadvantages to both proposals. The question > is, which one (or both) is flexible enough to do what we want it to do? > Probably both proposals are good enough, so I would say that we should choose one (not necessarily mine's) and go ahead. I would love to see some regex support commited soon (maybe as an oplib) so we can hack the languages (babyperl and others) and gain some experiencie about performance and optimitzations... ¿Is there any plan about when to commit regular expression support on Parrot? Dam Sugalsky has said that he is not interessed at all in the design of regex opcodes. Who is going to have a final say on this? Just an idea, but ¿could we have someone with experience in perl5 regex internals playing the role of "regex pumpking"? > > I look forward to seeing it. > I am attaching a patch that implements the modifications I suggest to your code. The tests and the compiler are updated too and work fine. (btw, I am not sure about if I choosed the right options on the diff command ¿could someone please tell me on private what's the right way of submiting patches when there are various files?) There is another (totally independent) question that bothers me: ¿should we use the regex flags on compile time to perform agressive optimitzations, or should we rely on runtime checks to get the job done? I'd better explain myself with an example: We have (in these patches) an opcode called reAnyhing. When called, it testes the RE_single_line flag, and acts in consequence. A small optimitzation could be gained if we had two versions of the op (like reAnythingML [multi-line version] and reAnythingsSL [single-line version]) and choosed between them on compile-time. This can be applied to most flags, and would hopefully result in a speed increase (less runtime checks), at the cost of using more opcodes numbers. Another example: case-insensitive matches can be implemented as run-time checks on the corresponding flag (that's the way most regex engines do it), or by normalizing the regular expression to lower case on compile time, and then converting the string to lower case just at the begining of the match. (The former is actually propsed by Jeffrey Friedl in the Mastering Regexs book claiming that it would be much faster) How agressive are we going to be with this kind of optimitzations? Or are we going to rely in a very smart JIT compiler? --- Angel Faus [EMAIL PROTECTED] diff -crN parrot_current/parrot/Parrot/OpsFile.pm parrot_patched/parrot/Parrot/OpsFile.pm *** parrot_current/parrot/Parrot/OpsFile.pm Fri Oct 26 03:00:01 2001 --- parrot_patched/parrot/Parrot/OpsFile.pm Sat Nov 10 16:08:44 2001 *** *** 22,27 --- 22,42 #my %opcode; #my $opcode; + my $backtrack_macro = <stack_base)){ + pop_generic_entry(interpreter, &cur_re->stack_top, &cur_re->index, +STACK_ENTRY_INT); + pop_generic_entry(interpreter, &cur_re->dest_stack_top, &dest, +STACK_ENTRY_DESTINATION); + return dest; + } + else { + return cur_re->onfaildest; + } + } + EOM + # # trim() *** *** 217,235 # $body =~ s/HALT/{{=0}}/mg; ! $body =~ s/RESTART\(\*\)/{{=0,+=$op_size}}/mg; $body =~ s/RESTART\((.*)\)/{{=0,+=$1}}/mg; ! $body =~ s/RETREL\(\*\)/{{+=$op_size}}/mg; $body =~ s/RETREL\((
Re: A serious stab at regexes
Brent Dax : > Okay, this bunch of ops is a serious attempt at regular expressions. I > had a discussion with japhy on this in the Monastery > (http://www.perlmonks.org/index.pl?node_id=122784), and I've come up > with something flexible enough to actually (maybe) work. Attached is a > patch to modify core.ops and add re.h (defines data structures and such) > and t/op/re.t (six tests). All tests, including the new ones, pass. Hi Brent, Since your ops are much complete and better documented that the ones I sent, I was trying to adapt my previous regex compiler to your ops, but I found what i think might be a limitation of your model. It looks to me that for compiling down regexp to usual opcodes there is the need of having a generic backtrack, insted of a $backtrack label for each case. I have been uncapable of expressing nested groups or alternation with your model, and I would say that this is because the engine needs some way to save not only the index into the string, but also the point of the regex where it can branch on a backtack. You solve this in your examples, by having a "$bactrack" address for each case, but this looks to me as a bad solution. In particular, i would say that cannot be aplied for complex regular expressions. In my previous experimental patch, there was a way to save the string index _plus_ the "regex index". Writing this with your syntax, it would mean to be able to add a parametrer in rePushindex that saves the "regex index". Your example: RE: reFlags "" reMinlength 4 $advance: rePopindex reAdvance $fail $start: rePushindex reLiteral "f", $advance $findo: literal "o", $findbar rePushindex branch $findo $findbar: reLiteral "bar", $backtrack set I0, 1 #true reFinished $backtrack: rePopindex $advance branch $findbar <<< backtrack needs to know where to branch $fail: set I0, 0 #false reFinished Your example tweaked by me: RE: reFlags "" reOnFail $fail reMinlength 4 $start: rePushindex $advance reLiteral "f" $findo: rePushindex $findbar literal "o" branch $findo $findbar: reLiteral "bar" set I0, 1 #true reFinished $fail: set I0, 0 #false reFinished $advance: reAdvance branch $start So it is not the reLiteral, reAdvance, etc.. ops that need to know were they have to branch on failing, but when failing they always: -pop the last index on the stack and then branch to the last saved destination. -or branch to the address previously set in reOnFail op if there are no pending indexes. There is no $bactrack label, but the backtracking action is called each time a submatch fails. I am not sure that this is the only solution, but is the one that come to my mind mind seeing your proposal and I find it quite elegant. It is quite possible that nested groups and alternation can be implemented with your model. If that is the case, ¿could you please post an example so I can understand?. What do you think about it? -angel
Re: Experimental regex support
quot;; cry "branch $id"; } elsif ($self->quant eq "?" ) { my $nextid = $self->next()->id(); cry $id, "savebr I1, $nextid"; cry $opcode; } else { cry $id, $opcode; } } ## ## each element pasm ## sub BabyRegex::anchor::pasm { my $self = shift; my $type = $self->{TEXT}; print $type; } sub BabyRegex::macro::pasm { die "unimplemented\n"; } sub BabyRegex::oct::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::hex::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::utf8hex::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::ctrl::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::named::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::Cchar::explanation { die "unimplemented - too lazy\n"; } sub BabyRegex::any::pasm { my $self = shift; my $l; my $id = $self->id; if ($modes{on} =~ /s/) { $self->cry_atomic ("matchanychar S1, I1, BT"); } else { #$self->cry_atomic ("matchanycharbutnl S1, I1, BT"); #we don't have the opcode anyway $self->cry_atomic ("matchanychar S1, I1, BT"); } } sub BabyRegex::text::pasm { my $self = shift; my $text = $self->text; $text =~ s/\n/\\n/g; $text =~ s/\r/\\r/g; $text =~ s/\t/\\t/g; $text =~ s/\f/\\f/g; $text =~ s/'/\\'/g; my $id = $self->id(); $self->cry_atomic ("matchexactly \"$text\", S1, I1, BT"); } sub BabyRegex::alt::pasm { my $self = shift; my $id = $self->id(); my $endofgroup_id = $self->{GROUP}->{CLOSED}->id; cry("branch $endofgroup_id"); } sub BabyRegex::slash::pasm { die "unimplemented\n"; } sub BabyRegex::class::pasm { die "unimplemented\n"; } sub BabyRegex::group::pasm{ my $self = shift; my $id = $self->id; my $cnt = $self->{COUNT}; my $fs = substr($self->fullstring,1,30); print "\n"; cry $id, "#start of n.c. group $cnt($fs...)"; if ($self->quant eq "*" or $self->quant eq "?") { cry "savebr I1, ". $self->{CLOSED}->next->id(); } foreach my $alt (@{$self->{ALTS}}) { cry "savebr I1, " . $alt->next->id(); } } sub BabyRegex::capture::pasm { # We are not capturing anything yet! my $self = shift; my $id = $self->id; my $cnt = $self->{COUNT}; my $fs = substr($self->fullstring,1,30); print "\n"; if ($self->quant eq "*" or $self->quant eq "?") { cry "savebr I1, ". $self->{CLOSED}->next->id(); } cry $id, "#start of group $cnt ($fs...)"; foreach my $alt (@{$self->{ALTS}}) { cry "savebr I1, ". $alt->next->id(); } } sub BabyRegex::close::pasm { my $self = shift; my $id = $self->id; my $cnt = $self->{GROUP}->{COUNT}; cry $id, "#end of group $cnt"; if ($self->{GROUP}->quant eq "*" or $self->{GROUP}->quant eq "+") { cry "savebr I1, " . $self->next->id(); cry "branch " . $self->{GROUP}->id; } print "\n"; } sub BabyRegex::comment::pasm { } sub BabyRegex::whitespace::pasm{ } sub BabyRegex::lookahead::explanation { die "unimplemented\n"; } sub BabyRegex::lookbehind::explanation { die "unimplemented\n"; } sub BabyRegex::code::pasm { die "unimplemented\n"; } sub BabyRegex::later::pasm { die "unimplemented\n"; } sub BabyRegex::conditional::pasm { die "unimplemented\n"; } sub BabyRegex::cut::pasm { die "unimplemented\n"; } sub BabyRegex::flags::pasm{ die "unimplemented\n"; } sub BabyRegex::backref::pasm { die "unimplemented \n"; } 1; __END__ =head1 NAME BabyRegex - compiles a regular expression down to Parrot bytecode =head1 SYNOPSIS use BabyRegex; BabyRegex->new($REx)->pasm; =head1 SEE ALSO The C documentation. =head1 AUTHOR Angel Faus [EMAIL PROTECTED] Based in YAPE::Regex::Explain by Jeff Pinyan ([EMAIL PROTECTED]) =cut use BabyRegex; unless (@ARGV[0] & @ARGV[1]) { print 'usage: perl babyre.pl "pattern" "string"' . "\n"; print 'ex:perl babyre.pl "reg(exp?|ular +expression)?" "regex" > regex.pasm' . "\n"; exit; } $pattern = @ARGV[0]; $string = @ARGV[1]; $c = BabyRegex->new($pattern); $c->pasm($string); INIT:initbrstack
Experimental regex support
Hi all, I have developed some adittions that give Parrot a limited amount of support to regular expressions. It all started as a little experiment to find out what the "compile down to low-level ops" thing could mean someday. The patch consists of: * 5 new opcodes: - matchexactly - matchanychar - initbrstack - clearbrstack - backtrack - savebr The first two are the ones that actually implement the matches. initbrstack, clearbrstack, backtrack, savebr are for managing the stack of pending possible matches. They use internally the integer and destination stack. * A perl package and script that implement a simple regex compiler (using YAPE::Regex by the way). The compiler currently outputs a parrot program that matches the regexp against a predefined string. It could be easily modified to proceduce something more useful. Currently, the following features are supported. * exact matches * any char (.) * nested groups (do not capture) * alternation * simple quantifires (*, + ?) There is a lot of room for improvment, either by implementing features that do not require changes in Parrot (non-greedy-quantifiers, anchors, capturing and most of regex options can be added right now) or by making the necessary changes in Parrot (support for locales are required for macros, etc..). This is not a serious patch, in the sense that there are many things missing, the ones that are supposed to work are not tested enough and even the ones that work are implemented in a way that is just wrong. I am a rather mediocre programmer, and this are the first lines of code i ever sent to a mailing list, so please be benevolent with me. :) Anyway I thought it would be interesting to share my little experiment. Sincerly, ------- Angel Faus [EMAIL PROTECTED] 1814a1815,1882 > > > AUTO_OP matchexactly(sc, s, i, ic){ > > STRING* temp; > > > if (string_length($2) <= $3) { > RETREL($4); > } > > temp = string_substr(interpreter, $2, $3 , string_length($1), NULL); > > if (string_compare(interpreter, $1, temp) != 0 ) { > RETREL($4); > } > else { > $3 = $3 + string_length($1); > } > } > > AUTO_OP matchanychar(s, i, ic) { >if (string_length($1) > $2){ > $2++; > } >else { > RETREL($3); >} > } > > MANUAL_OP backtrack(i){ > opcode_t *dest; > > pop_generic_entry(interpreter, &interpreter->user_stack_top, &($1), STACK_ENTRY_INT); > pop_generic_entry(interpreter, &interpreter->control_stack_top, &dest, STACK_ENTRY_DESTINATION); > > RETABS(dest); > } > > > AUTO_OP savebr(i, ic){ > > push_generic_entry(interpreter, &interpreter->control_stack_top, cur_opcode + cur_opcode[2], STACK_ENTRY_DESTINATION, NULL); > > push_generic_entry(interpreter, &interpreter->user_stack_top, &($1), STACK_ENTRY_INT, NULL); > > } > > AUTO_OP initbrstack(ic) { > INTVAL i; > i = -1; > > push_generic_entry(interpreter, &interpreter->control_stack_top, cur_opcode + cur_opcode[1], STACK_ENTRY_DESTINATION, NULL); > push_generic_entry(interpreter, &interpreter->user_stack_top, &i, STACK_ENTRY_INT, NULL); > > } > > AUTO_OP clearbrstack(i){ > opcode_t *dest; > > while ($1 && $1 >= 0) { > pop_generic_entry(interpreter, &interpreter->control_stack_top, &dest, STACK_ENTRY_DESTINATION); > pop_generic_entry(interpreter, &interpreter->user_stack_top, &($1), STACK_ENTRY_INT); > } > > } > > 1826a1895 > package BabyRegex; use YAPE::Regex 'BabyRegex'; use strict; use vars '$VERSION'; $VERSION = '0.01'; my %modes = ( on => '', off => '' ); sub buildtree { my $self = shift; my $cnt = 0; my ($groupscnt, @groups); my @tree; while (my $node = $self->next) { $node->id($cnt++); $tree[-1]->next($node) if @tree; if ($node->type =~ /capture|group/) { push @groups, $node; $node->{ALTS} = []; $node->{COUNT} = $groupscnt++; } if ($node->type eq "alt") { push (@{$groups[-1]->{ALTS}}, $node); my $groupnode = $groups[-1]; $node->{GROUP} = $groupnode; push @{$groupnode->{ALTS}}, $node, } if ($node->type eq "close"){ my $groupnode = pop @groups; $groupnode->{CLOSED} = $node; $node->{GROUP} = $groupnode; for my $alt (@{$groupnode->{ALTS}}) { #Alt nodes get its ID replaced by the Closing node ID, so #that the when its antecessors calls ->next->id it gets the good one. #This is
RE: Revamping the build system
Hi, > > From: Dan Sugalski [mailto:[EMAIL PROTECTED]] > There's nothing really past what make does. The reason for having our own is: > *) Make isn't everywhere (like windows) > *) Make on various platforms has different syntax (VMS, Windows, and Unix > are all different) > *) Not speaking for anyone else, but I find make's gotten rather creaky > a round the edges--after 20+ years presumably we can make things a bit better > *) Having the full power of perl in the build tool should give us a big > boost in utility. (Consider the difference between C macros and Perl source > filters) > *) It'll be really unfamiliar to everyone, which'll stop folks from falling > into old, non-portable habits. If there is going to be a new build tool for perl6, i would suggest using something similar to Ant (http://jakarta.apache.org/ant/) Ant is not suitable for parrot of course (it requires Java) but its design is quite good imho. >From its webpage: > Ant is different. Instead of a model where it is extended with shell based commands, it is > extended using Java classes. Instead of writing shell commands, the configuration files > are XML based calling out a target tree where various tasks get executed. Each task is run > by an object which implements a particular Task interface. It tries to avoid executing shell commands (which is good if you want to be portable to places like Windows) and instead it comes with a predefined set of tasks (remove files, compile, etc..). that can be extended programming your own Task classes. This article: http://www.onjava.com/pub/a/onjava/2001/02/22/open_source.html does a very good job at giving you a feeling of how it works. In my limited expierence, this is something very similar to what we would need for parrot/perl6. Just my half a cent, Angel Faus [EMAIL PROTECTED] vLex.com
RE: Strings db (comments requested)
Hi Grant, Just as a suggestion, i would use the PO format (already used by other tools than gettext, like KDE) so we get for free all the catalog manager tools (like Kbabel, which is very nice, by the way). And maybe error codes output could be just another target language. So: fprintf(stderr, i18n ("Error #43: Can't stat file\n") ) Prints just "ERR43\n" when language var is set to, say, "CODES". -- Angel Faus Director Técnico vLex.com - [EMAIL PROTECTED]