Re: IMCC temp optimizations...

2004-04-22 Thread Angel Faus

A good register allocation algorithm will always have problems with 
big subs, there is nothing that we can do about it.

I think that what real compilers do to solve this problem is 
implement two different algorithms: one for normal subs which tries 
to generate optimal code, and a naive one for very large subs with 
many virtual registers. 

That makes compilation much faster, and the execution penalty doesn't 
hurt too much.

Actually, it's (for me) an open question whether the good register 
allocation should be the default one. Perl (and python and..) users 
expect blazing compilation times, so maybe we should reserve it for 
higher -O levels.

But then, we won't know how bad are our compilation times until there 
are real programs written in perl6/parrot.

-angel


Leopold Toetsch wrote:
 Dan Sugalski [EMAIL PROTECTED] wrote:
  Dunno what parrot thinks--it's not done yet. grep says 569
  .locals and 473 temp PMC registers.

 I've now enabled some more code that speeds up temp allocation more
 (~2.5 for 2000 temps in a unit). This needs that the usage range is
 limited to 10 lines. If the code from a compiler looks like the
 output from the program below, this works fine.



Re: OT: Will the State of the Onion be posted online?

2003-07-15 Thread Angel Faus
Monday 14 July 2003 05:56, Robert Spier wrote:
  Sorry for a slightly off-topic post, but will Larry's State of
  the Onion be posted online soon?

 Yes.

 http://dev.perl.org/perl6/talks/

 -R

Really? I can't find anything about TPC7 in this page..

-angel



Re: Using imcc as JIT optimizer

2003-02-26 Thread Angel Faus

 [ you seem to be living some hors ahead in time ]

Yep, sorry about that.

 The problem stays the same: spilling processors to parrot's or
 parrots to array.


Thinking a bit more about it, now I believe that the best way to do it 
would be:

(1) First, do a register allocation for machine registers, assuming 
that there are N machine registers and infinite parrot registers.

(2) Second, do a register allocation for parrot registers, using an 
array as spill area.

The first step assures us that we generate code that always puts data 
in the availabe machine registers, and tries to minimize moves 
between registers and the physical memory.

The second step tries to put all the data in parrot registers, and if 
it is not able to do that in the parrot spilling area (currently an 
PerlArray)
 
For example, code generated by (1) would look like:

set m3, 1   # m3 is the machine 3d register 
add m3, m3, 1
print m3

set $I1, m3   # $I1 is a parrot virtual register

etc...

Then we would do register allocation for the virtual $I1 registers, 
hoping to be able to put them all in the 32 parrot registers.

I believe this would be the optimal way to do it, because it actually 
models our priorities: first to put all data in physical registers, 
otherwise try do it in parrot registers.

This is better than reserving the machine registers for the most used 
parrot registers (your original proposal) or doing a pyhsical 
register allocation and assuming that we have an infinite number of 
parrot registers (my original proposal).

Hope that it know make more sense,

-angel



Re: Using imcc as JIT optimizer

2003-02-25 Thread Angel Faus
Saturday 22 February 2003 16:28, Leopold Toetsch wrote:
 Gopal V wrote:
  If memory serves me right, Leopold Toetsch wrote:
 
 
  Ok .. well I sort of understood that the first N registers will
  be the ones MAPped ?. So I thought re-ordering/sorting was the
  operation performed.

 Yep. Register renumbering, so that the top N used (in terms of
 score) registers are I0, I1, ..In-1

With your approach there are three levels of parrot registers: 

- The first N registers, which in JIT will be mapped to physical 
registers.

- The others 32 - N parrot registers, which will be in memory.

- The spilled registers, which are also on memory, but will have to 
be copied to a parrot register (which may be a memory location or a 
physical registers) before being used.

I believe it would be smarter if we instructed IMCC to generate code 
that only uses N parrot registers (where N is the number of machine 
register available). This way we avoid the risk of having to copy 
twice the data.

This is also insteresting because it gives the register allocation 
algorithm all the information about the actual structure of the 
machine we are going to run in. I am quite confident that code 
generated this way would run faster.

We also need tho have a better procedure for saving and restoring 
spilled registers. Specially in the case of JIT compilation, where it 
could be translated to a machine save/restore.

What do you think about it?

-angel



Re: Using imcc as JIT optimizer

2003-02-25 Thread Angel Faus
I explained very badly. The issue is not spilling (at the parrot 
level)

The problem is: if you only pick the highest priority parrot registers 
and put them in real registers you are losing oportunities where 
copying the date once will save you from copying it many times. You 
are, in some sense, underspilling.

Let's see an example. Imagine you are compilling this imc, to be run 
in a machine which has 3 registers free (after temporaries):

set $I1, 1
add $I1, $I1, 1
print $I1

set $I2, 1
add $I2, $I2, 1
print $I2

set $I3, 1
add $I3, $I3, 1
print $I3

set $I4, 1
add $I4, $I4, 1
print $I4

set $I5, 1
add $I5, $I5, 1
print $I5

print $I1
print $I2
print $I3
print $I4
print $I5

Very silly code indeed, but you get the idea.

Since we have only 5 vars, imcc would turn this into:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

set I4, 1
add I4, I4, 1
print I4

set I5, 1
add I5, I5, 1
print I5

print I1
print I2
print I3
print I4
print I5

Now, assuming you put registers I1-I3 in real registers, what would it 
take to execute this code in JIT?

It would have to move the values of I4 and I5 from memory to registers 
a total of 10 times (4 saves and 6 restores if you assume the JIT is 
smart)

[This particular example could be improved by making the jit look if 
the same parrot register is going to be used in the next op, but 
that's not the point]

But, if IMCC knew that there were really only 3 registers in the 
machine, it would generate:

set I1, 1
add I1, I1, 1
print I1

set I2, 1
add I2, I2, 1
print I2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 1

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 2

set I3, 1
add I3, I3, 1
print I3

fast_save I3, 3

print I1
print I2
fast_restore I3, 3
print I3
fast_restore I3, 2
print I3
fast_restore I3, 1
print I3

When running this code in the JIT, it would only require 6 moves (3 
saves, 3 restores): exactly the ones generated by imcc. 

In reality this would be even better, because as you have the garantee 
of having the data already in real registers you need less 
temporaries and so have more machine registers free.

 So the final goal could be, to emit these load/stores too, which
 then could be optimized to avoid duplicate loading/storing. Or imcc
 could emit a register move, if in the next instruction the parrot
 register is used again.

Yes, that's the idea: making imcc generate the loads/stores, using the 
info about how many registers are actually available in the real 
machine _and_ its own knowledge about the program flow.

An even better goal would be to have imcc know how many temporaries 
every JITed op requires, and use this information during register 
allocation.

All this is obviously machine dependent: the code generated should 
only run in the machine it was compiled for. So we should always keep 
the original imc code in case we copy the pbc file to another 
machine.

-angel




Re: Infant mortality

2003-01-02 Thread Angel Faus

 The best partial solution to early finalization is compile-time
 tracing of possible references by the compiler which can explicitly
 generate the appropriate DESTROY calls.


What about refcounting + real GC?

refcounting will free most of the objects as soon as possible, and the 
real GC system makes sure that even cyclical references are 
eventually freed..

-angel 
(who is not really paying atention at all)



Re: branch dump

2002-11-11 Thread Angel Faus
 Hmm wouldn't the JIT benifit from a pre knowledge of basic
 blocks and types or some information ? ... (I seem to think so
 ...).

I would think so, because if, for example, the JIT wants to do a full 
register allocation to map parrot registers to machine registers, it 
would certainly need information about basic blocks.

(I am talking of a complete register allocation, that would re-do the 
original register allocation of imc with the actual number of 
registers available in the machine)

On the other hand, the JIT could certainly regenerate this information  
from the imc code, which is probably going to be stored somewhere 
anyway.

-angel



Re: core.ops ARGDIR

2002-09-03 Thread Angel Faus

Hi Leo,


 This should be - from my (imcc) POV - reflected by these IN/OUT
 settings:

 op set(in PMC, in INT)
 op set(in PMC, in STR)
 op set(in PMC, in NUM)
 op set(out PMC, in PMC)   # ok, $1 points to $2 now

 # P[i] = x
 op set(in PMC, in intkey, in x)
 # P[KEY] = x
 op set(in PMC, in KEY, in x)
 # p[KEY] = P[KEY]
 op set(in PMC, in KEY, in PMC, in KEY)


Shouldn't all this PMC be inout? They depend on the previous value 
of the PMC, but they also modify it. 

This probably doesn't affect imcc now, but it might be useful in the 
future.

-angel



Encodings - Questions

2002-08-28 Thread Angel Faus

Hi,

Now that we've got ICU in, I thought it may be time to revisit the 
encodings implementation. I am a clamorous ignorant is 
unicode/encodings issues, so please be patient with me. :)

From what I have asked people at IRC, and what's on the list archives, 
my understanding of how parrot will work with various encodings is:

i) After an IO operation, strings are preserved on their original 
encoding, whatever it is.

ii) Parrot will try to keep the string in such encoding, for as long 
as possible.

iii) When some operation that requires ICU is requested, parrot will 
feed the string to ICU, who will convert it to UTF-16 (its internal 
encoding) and then perform the desired operation.

Please correct me if this is wrong. Now, my questions are:

I. About iii): I can imagine at least three different options about 
what to do with the converted UTF-16 string:

a) We can discard the UTF-16 version, and recompute the conversion 
each time. (this is costly, isn't it?)

b) We can replace the original string with the upgraded version, so 
strings will lazily become converted to UTF-16. (this makes sure that 
the conversion is only done once, but is conversion to UTF-16 always 
lossless?) 

c) We can store the UTF-16 version along the original one. (this is 
doubles the memory usage, plus it may be hard to implement)

Each approach has its pros and cons. Which one is the right one?


II. About ii): Which is exactly the point at which we decide to feed 
the string to ICU, and what operations should we (as parrot 
developers) implement in our own layer?.

For example, let's take a relatively simple operation, such as 
uppercasing an string, and let's assume that the string is on a 
friendly encoding, such as ISO-8859-1. 

Even with this assumptions, conversion to uppercase is not 
straightforward, since it's locale-dependent (or to be more precise, 
it might be locale-dependent if the user chooses to).

We could, of course, implement all locale-aware operations for each 
encoding and each locale, but how much work do we want to put on 
this? 

So, exactly what string functionalities do we want to implement 
ourselves in a per-encoding basis, and which ones are we going to 
forward to ICU?

-angel




Re: imcc hack for perl6 regexes

2002-08-21 Thread Angel Faus


 I still prefer infix notation to prefix notation for an intermediate
 language. I don't understand why it is so hard to adopt. imcc is supposed
 to be a step closer to higher level languages, which is why I went that
 way.

Hi,

I think we all agree that since parrot can have dynamic oplibs (and core 
parrot has hundreds of ops), imcc needs some way to directly express them.  
The idea of having parrot ops be included as such, and imcc directives be 
prepended with . looks fair to me. 

So we end having:

a) imcc becomes like parrot assembly, but with virtual registers and a few 
administrative ops (for calls and such), that start with .

or 

b) imcc becomes like parrot assembly, but with virtual registers, a few 
administrative ops, and some convenient infix notation like for stuff like a 
=1 or b = a + 1, 

There isn't much difference between both proposals, when you think about it. 
The only benefit of proposal b) is that hand-writing of imc becomes nicer, 
but i would not expect much of it. 

Additional to the learning benefit for developers of parrot-based languages, 
the usage of parrot ops as is is good for us, because we don't need to 
invent, develop, and document, another language. It lets us focus on making 
imcc do well a few tasks (ahem) such as register allocation and 
language-neutral optimitzations. 

From a more conceptual point of view, I would favour imcc being as similiar  
to the assembler as possible because:

- parrot assembler is already very high-level
- imcc won't probably be retargeted to a different plataform. If we really 
wanted to make imcc retargetable, we would need much more than a friendlier 
syntax; and honestly I am not sure that is worth it. 

To sum up, my vote would be for a assembler-like imcc. The infix notation 
doesn't add much value in a intermediate language such as imcc. 

I am not a priori against imcc diverging from assembly. But I think the only 
good reason for divergence should be making the task of the language compiler 
simpler.

About the implementation, I think we will need the following metadata about 
each op:

i) the opcode, and the op name.
ii) the type of arguments, including in/out/etc..
iii) whether the opcode has side-effects or not. This is important for a 
number of optimizations: constant-folding, optimitzations that work by moving 
a instruction to a more convenient point, etc.

Best,

-angel




Re: Off-list discussions, was Re: imcc hack for perl6 regexes

2002-08-21 Thread Angel Faus


 Sure, I have no problem with it. At one
 time someone suggested making a separate
 list for Parrot compilers, so I took it as
 a hint that maybe we were spamming.


I am all for the creation of a new list for stuff such as imcc, and perl6 
compilers. ([EMAIL PROTECTED]?)

So people interested in parrot internals (the hard stuff: GC, JIT,..) don't 
need to listen our discussions about bugs on imcc, feature requests, etc.. 

On the other hand people like me that cannot keep track of all that happens on 
perl6-internals, would have an easier way of focusing on areas were we can 
actually contribute something.

-angel




Re: imcc hack for perl6 regexes

2002-08-21 Thread Angel Faus

 c) imcc becomes a middle level language.
 I never meant it to be an assembler. In practice
 intermediate languages provide other constructs
 such as aggregate type definition that are not
 available in the assembler.

 type i : packed int[32]
 type r : record { foo : int, bar : string }

 This is not assembler. This is medium level
 computer language in my book. You could even
 see things like

 ..pragma classes_are_hashes

 or something that would tell the compiler that
 aggregates should be implemented as standard
 Perl hashes in this piece of code, whereas
 others might want packed aggregates with
 no runtime introspection capability.


Now, this is interesting. I confess I was only thinking on dynamic perl-like 
languages, but I think you have some secret plans..

But, this features don't change the point about the fact that infix notation 
doesn't make the job of the language compiler easier (neither it makes it 
harder) and is a duplicate feature once we have direct access to parrot ops.


 But I'm willing to invent and develop another language. And it should be a
 lot richer than
 just an assembler with a register allocator
 and spiller. :)


Don't forget optimitzations. That's where the real fun is.. :)

-angel




Re: [perl #15797] [PATCH] Regex speedup (with docs)

2002-08-19 Thread Angel Faus

Sunday 18 August 2002 00:38, Simon Cozens wrote:
 [EMAIL PROTECTED] (Dan Sugalski) writes:
  Has someone looked at and maybe committed this?

 The reason I asked which pieces of Parrot were prototypes was
 because optimizing the hell out of something that's only a
 prototype is nothing short of intellectual masturbation, and it
 seems nobody actually learnt anything from my YAPC presentation
 after all.

One possible reason would be to learn if a speed problem is due to the 
design or the implementation. Sometimes you only know if something 
can be fast enough until you try it to optimize the hell out of it. 

Could you share with the ones of us who where not in YAPC what was 
this presentation about?

-angel




Re: PARROT QUESTIONS: Keyed access: PROPOSAL

2002-07-22 Thread Angel Faus


Sunday 21 July 2002 21:34, Dan Sugalski wrote:

 No. They are not. You're missing the important case where the data
 structure is inherently and intrinsically multidimensional.

my int Array foo : dimensions(3);
foo[1;2;3] = 12;

 Or whatever the syntax is in perl to declare a 3D array and access
 the element at (1,2,3).


In my opinion, there are actually two different things to dicuss:

- keyed opcodes 
- keyed methods on the PMC vtable

You can keep keyed opcodes, without actually implementing the keyed 
methods. Actually, ALL our keyed methods look like:

... stuff to get the value of SELF[key] into VALUE..
VALUE-vtable-method(..)

Puting this code into the opcode, and removing the vtable methods 
(except get_keyed_*), shouldn't make it slower at all IMHO (same job 
being done, in the same way. The same number (two) of vtable calls). 

Keyed opcodes can stay in the interest of code density. 


 No. Keyed access for all methods stays. You're forgetting one very
 important case, the one where the contents of an aggregate isn't a
 PMC. This scheme would then require the aggregate PMC to construct
 fake PMCs to do operations on. Bad, definitely bad.

Is really so bad? As long that what is ultimately stored in the 
aggregate is one of our base types (pmc, int, string, float), the 
opcode shouldn't need to build the fake PMC.

Or, ¿are you talking about multi-dimensional PMCs? I don't see why 
these should need to build fake PMCs: you just call the get_keyed 
method, and it will do the Right Thing (probably getting the value 
directly, maybe propagating a portion of the key to a inner PMC).

As I said, there should no be any speed cost at all, and a significant 
improvement on code size. But this is only an opinion. I am willing 
to submit a benchmarking patch if it has some interest.

Best,

-àngel





RE: Three address code, named variables

2002-02-15 Thread Angel Faus

Hello,

If people have different opinions on intermediate code generation, I'd
like to hear them, since I've only done toy compilers in school which
always used 3address or quadruples.

I have been working in a compiler for an intermediate language that I call
P-- (obviously inspired in C--) for now. The language offers some low-level
abstractions (register-allocation for locals, function calls) and the idea
is that it could be used for lazy language implementors that don't want to
mess with low-level aspects of Parrot's VM.

Register allocation is implemented using Briggs' alghorism (see
http://citeseer.nj.nec.com/briggs92register.html)

I was planning to finish testing it this weekend and send a first version to
the list. It's mostly a learning project for myself, but it could even be
useful for someone.. who knows.

Sincerly,

-angel
who never programmed a compiler before, even in school




RE: Globals

2002-02-13 Thread Angel Faus


Dan wrote:
Yep, I've seen their plans. It's less an issue for us, at least as
far as globals are concerned, since we'll be doing that with
lexicals. (Python not having lexicals, after all) Globals are a bit
more interesting, since bytecode-loaded modules can't guarantee
global positions, since there's no way of knowing at runtime what's
been stuffed into the global tables already.

Anyway, doing this with lexicals, which we can know at compile time,
will get us these speed boosts, and has been on the list for ages.


Oh, I see. Just two quick questions:

* will sub definition create globals? I mean, does sub a {} create
a function PMC and store it into the a lexical? or will there be
a separate mechanism for functions?


* Is there any chance that lexicals get mapped to registers, at least
in the cases where the block doesn't use any dynamic features (MY%,
eval, etc..)?


Thanks.

-angel




RE: parrot rx engine

2002-01-30 Thread Angel Faus


Ashley Winters wrote:
First, we set the rx engine to case-insensitive. Why is that bad? It's
setting a runtime property for what should be compile-time
unicode-character-kung-fu. Assuming your CPU knows what the gritty
details of unicode in the first place just feels wrong, but I digress.

I tend to agree to that. Many run-time options can be turned to compile-time
versions of the opcodes, which hopefully will produce a speed increase.

Once you squash rx_literal and friends, any attempt to benchmark the
rx engine really becomes a benchmark of parrot itself. When you speed
up parrot, you speed up regular expressions. Voila, no more black box.
If Parrot is just too damn slow for you, whip out libmylang and do the
nitty gritty yourself. Since this is mostly a just don't do it post,
no code is actually *required* from me, right? :)

We are already doing so. What you are suggesting in fact, is to compile down
regular expressions to Perl code (and this one to Parrot then). This will be
always slower than directly generating Parrot, because some Perl features
prevent the heavy use of some optimitzations (think JIT) that are necessary
if we want good regex perfomance.

In other words. With your proposal, if you have a better general-purpose
optimizer you will get better regex perfomance, but it will always remain
worse than the current state.

If what you are suggesting is that everything is compiled to general-purpose
opcodes (branch, unicode's, etc..) [which is what is derived from your
words, but not from your examples], I still believe this to be a perfomance
mistake. It would dramatically reduce the code density,
and no matter how fast parrot dispatch is, this will kill your perfomance.

And using too much stacks (as the usage of exceptions would probably
require), will also be too slow (as Brent Dax showed me when we
where discussing our two regex opcodes designs).

Just my 2 cents (of euros) :)

---
Angel faus
[EMAIL PROTECTED]




[PATCH] multimethod ops

2001-11-27 Thread Angel Faus


Simon Cozens:
* Modify make_vtable_ops.pl to produce variant ops which pass in
  INTVAL, FLOATVAL and STRING arguments to the multimethods, (Ask
  me for more details if you want to do this.) so we can have

   add Px, Px, Ix
  and  add Px, Px, Ic
  and the like working.

This patch should make it. I have not included STRING arguments, because I
don't know which multimethod should be called (native, unicode or other).

Tests are updated too.

Sincerly,


Angel Faus
[EMAIL PROTECTED]



Vtable.pm
Description: Binary data


make_vtable_ops.pl
Description: Binary data


pmc.t
Description: Binary data


RE: PMC Classes Preprocessor

2001-11-26 Thread Angel Faus


Hi,

Currently pmc2c.pl requires that the user writes methods in .pmc files in
exaclty the same order than in vtable.tbl. That's not a nice thing to do.

The version I am attaching hopefully fixes this, and creates a dummy version
of the method if the user has not provided one.

This allows us to add new methods in vtable.tbl, without worrying about old
..pmc files.

btw, sorry for that silly Text::Balanced dependency. I stupidly assumed it
was a standard module without checking.

Sincerly,

---
Angel Faus
[EMAIL PROTECTED]



pmc2c.pl
Description: Binary data


Re: [Proposal] Regex documentation

2001-11-25 Thread Angel Faus

 Okay, I've got another argument for you: speed.


That was a killer one.


 C:\Brent\VISUAL~1\PERL6~1\parrot\parrotrebench.pl
 Explicit backtracking: 75 iterations, about 30.5380001068115
 seconds.
 Implicit backtracking: 75 iterations, about 655.510995864868
 seconds.

 The numbers there are total time.


Really amazing. More than I could ever imagined.


 I can provide source for everything on request, including a new core.ops
 and some supporting files, the test assembly file, and the benchmark
 runner script.


Cool. ¿Could you send it to me? I would like to make a try to speed my 
proposal... Probably there is no way to catch up, but I would like to see it 
anyway.

Meanwhile, I'll shut up my mouth and update regex.pod to reflect your 
proposal.

Sincerly,

---
Angel Faus
[EMAIL PROTECTED]



Psyco - Python specializing compiler

2001-11-20 Thread Angel Faus

Hi,

I have found a reference for a very interesting project related to
accelerating Python with some nice ideas, that maybe could be applied to
Parrot too.

It's called Psyco (standing for Python Specializing Compiler) and works (if
I understood it right) by creating specialized versions of python functions
which optimize a particular type of parameters.

The specialization can happen both at compile-time, run-time, or (most of
times) will start at compile-time, stop until some run-time value is known,
compile a bit more, and so on.

It is a quite general trick, and could be used to implement some of the
optimizing strategies that have been discussed here (like PMC opcodes
inlining, constant folding, etc..) without requiring Parrot to know on
advance things that it simply cannot know.

They have some problems that are very similar to the ones discussed in this
mailing list in a previous thread (users can redefine bultins, globals can
get rebound somewhere in the wild, etc..); they have not addressed this yet,
but suggest having a signal system on the globals lexical hash that fires a
backtrack when anyone is playing dangerous games with it).

The URL is http://homepages.ulb.ac.be/~arigo/psyco/ ; they have some very
nice documents that explain all with a great detail.

I wonder whether Parrot could stole some ideas of this..

Just mi half a cent.

-Angel Faus
[EMAIL PROTECTED]




Re: A serious stab at regexes

2001-11-10 Thread Angel Faus

Hi Brent,


 It just means you have to be more explicit.  I consider that a Good
 Thing--Perl 5's regular expressions are compact enough to be represented
 like: but the internals to support them are an absolute jungle.  I'd rather
 have a few exposed ops than have a chunk of code like Perl 5's regular
 expressions.  Besides, being able to /see/ where we jump to on each op
 when we fail will help when debugging the RE compiler and such.


I totally agree. It is not about having fewer opcodes or a more compact 
syntax, but a more maintainable system, I just pretend that having a stack 
were you predeclare the possible continuation paths is simpler than your 
marks idea. 

I could be biased here of course. I would love some comments (specially of 
someone with experience in this area) about whether were are going the right 
way here. 

 How do you plan to support lookahead?  Unless I'm mistaken, with my
 proposal it goes something like:

   rePushindex
   rePushmark
   #lookahead code in here
   rePopmark
   rePopindex


This could be solved by having a new pair of ops (reGetIndex /reSetIndex) 
that save and restore the current index into an INTVAL register:

reGetIndex I1 
#lookahead code in here
reSetIndex I1

So lookahead would be an special case, but that's fine, because it _is_ an 
special case.

 There are advantages and disadvantages to both proposals.  The question
 is, which one (or both) is flexible enough to do what we want it to do?


Probably both proposals are good enough, so I would say that we should 
choose one (not necessarily mine's) and go ahead. I would love to see some 
regex support commited soon (maybe as an oplib) so we can hack the languages 
(babyperl and others) and gain some experiencie about performance and 
optimitzations...

¿Is there any plan about when to commit regular expression support on Parrot? 

Dam Sugalsky has said that he is not interessed at all in the design of regex 
opcodes.  Who is going to have a final say on this? 

Just an idea, but ¿could we have someone with experience in perl5 regex 
internals playing the role of regex pumpking?


 I look forward to seeing it.


I am attaching a patch that implements the modifications I suggest to your 
code. The tests and the compiler are updated too and work fine.

(btw, I am not sure about if I choosed the right options on the diff command 
¿could someone please tell me on private what's the right way of submiting 
patches when there are various files?)

There is another (totally independent) question that bothers me: 

¿should we use the regex flags on compile time to perform agressive 
optimitzations, or should we rely on runtime checks to get the job done?

I'd better explain myself with an example:

We have (in these patches) an opcode called reAnyhing. When called, it testes 
the RE_single_line flag, and acts in consequence.

A small optimitzation could be gained if we had two versions of the op (like 
reAnythingML [multi-line version] and reAnythingsSL [single-line version]) 
and choosed between them on compile-time.

This can be applied to most flags, and would hopefully result in a speed 
increase (less runtime checks), at the cost of using more opcodes numbers.

Another example: case-insensitive matches can be implemented as run-time 
checks on the corresponding flag (that's the way most regex engines do it), 
or by normalizing the regular expression to lower case on compile time, and 
then converting the string to lower case just at the begining of the match.

(The former is actually propsed by Jeffrey Friedl in the Mastering Regexs 
book claiming that it would be much faster)

How agressive are we going to be with this kind of optimitzations? Or are we 
going to rely in a very smart JIT compiler?

---
Angel Faus
[EMAIL PROTECTED]




diff -crN parrot_current/parrot/Parrot/OpsFile.pm 
parrot_patched/parrot/Parrot/OpsFile.pm
*** parrot_current/parrot/Parrot/OpsFile.pm Fri Oct 26 03:00:01 2001
--- parrot_patched/parrot/Parrot/OpsFile.pm Sat Nov 10 16:08:44 2001
***
*** 22,27 
--- 22,42 
  #my %opcode;
  #my $opcode;
  
+ my $backtrack_macro = EOM;
+ {
+ 
+   opcode_t *dest;
+   if(stack_depth(interpreter, cur_re-stack_base)){
+ pop_generic_entry(interpreter, cur_re-stack_top, cur_re-index, 
+STACK_ENTRY_INT);
+ pop_generic_entry(interpreter, cur_re-dest_stack_top, dest, 
+STACK_ENTRY_DESTINATION);
+ return dest;
+   }
+   else {
+ return cur_re-onfaildest;
+   }
+ }
+ EOM
+ 
  
  #
  # trim()
***
*** 217,235 
#
  
$body =~ s/HALT/{{=0}}/mg;
!   
$body =~ s/RESTART\(\*\)/{{=0,+=$op_size}}/mg;
$body =~ s/RESTART\((.*)\)/{{=0,+=$1}}/mg;
!   
$body =~ s/RETREL\(\*\)/{{+=$op_size}}/mg;
$body =~ s/RETREL\((.*)\)/{{+=$1}}/mg;
!   
$body =~ s/RETABS\((.*)\)/{{=$1}}/mg;
!   
$body =~ s/\$(\d+)/{{\@$1}}/mg

Re: A serious stab at regexes

2001-11-04 Thread Angel Faus

Brent Dax :

 Okay, this bunch of ops is a serious attempt at regular expressions.  I
 had a discussion with japhy on this in the Monastery
 (http://www.perlmonks.org/index.pl?node_id=122784), and I've come up
 with something flexible enough to actually (maybe) work.  Attached is a
 patch to modify core.ops and add re.h (defines data structures and such)
 and t/op/re.t (six tests).  All tests, including the new ones, pass.

Hi Brent,

Since your ops are much complete and better documented that the ones I sent, 
I was trying to adapt my previous regex compiler to your ops, but I found 
what i think might be a limitation of your model.

It looks to me that for compiling down regexp to usual opcodes there is the 
need of having a generic backtrack, insted of a $backtrack label for each 
case.

I have been uncapable of expressing nested groups or alternation with your 
model, and I would say that this is because the engine needs some way to save 
not only the index into the string, but also the point of the regex where it 
can branch on a backtack.

You solve this in your examples, by having a $bactrack address for each 
case, but this looks to me as a bad solution. In particular, i would say that 
cannot be aplied for complex regular expressions.

In my previous experimental patch, there was a way to save the string index 
_plus_ the regex index. Writing this with your syntax, it would mean to be 
able to add a parametrer in rePushindex that saves the regex index. 

Your example:

RE:
reFlags 
reMinlength 4
$advance:
rePopindex
reAdvance $fail
$start:
rePushindex
reLiteral f, $advance
$findo:
literal o, $findbar
rePushindex
branch $findo
$findbar:
reLiteral bar, $backtrack
set I0, 1   #true
reFinished
$backtrack:
rePopindex $advance
branch $findbar  backtrack needs to know where to branch
$fail:
set I0, 0   #false
reFinished

Your example tweaked by me:

RE:
reFlags 
reOnFail $fail
reMinlength 4
$start:
rePushindex $advance
reLiteral f
$findo:
rePushindex $findbar
literal o
branch $findo
$findbar:
reLiteral bar
set I0, 1   #true
reFinished
$fail:
set I0, 0   #false
reFinished
$advance:
reAdvance
branch $start

So it is not the reLiteral, reAdvance, etc.. ops that need to know were they 
have to branch on failing, but when failing they always:

  -pop the last index on the stack and then branch to the last saved 
destination.
  -or branch to the address previously set in reOnFail op if there are no 
pending indexes.

There is no $bactrack label, but the backtracking action is called each time 
a submatch fails.

I am not sure that this is the only solution, but is the one that come to my 
mind mind seeing your proposal and I find it quite elegant. 

It is quite possible that nested groups and alternation can be implemented 
with your model. If that is the case, ¿could you please post an example so I 
can understand?.

What do you think about it?

-angel




Re: Experimental regex support

2001-11-02 Thread Angel Faus
;
}

sub BabyRegex::ctrl::explanation {
  die unimplemented - too lazy\n;
}

sub BabyRegex::named::explanation {
  die unimplemented - too lazy\n;
}

sub BabyRegex::Cchar::explanation {
  die unimplemented - too lazy\n;
}


sub BabyRegex::any::pasm {
  my $self = shift;  
  my $l;
  my $id = $self-id;

  if ($modes{on} =~ /s/) {
$self-cry_atomic (matchanychar S1, I1, BT);
  } else {
#$self-cry_atomic (matchanycharbutnl S1, I1, BT);
#we don't have the opcode anyway
$self-cry_atomic (matchanychar S1, I1, BT);
  }

}


sub BabyRegex::text::pasm {
  my $self = shift;
  my $text = $self-text;
  
  $text =~ s/\n/\\n/g;
  $text =~ s/\r/\\r/g;
  $text =~ s/\t/\\t/g;
  $text =~ s/\f/\\f/g;
  $text =~ s/'/\\'/g;
  
  my $id = $self-id();

  $self-cry_atomic (matchexactly \$text\, S1, I1, BT);
  
}


sub BabyRegex::alt::pasm {
  my $self = shift;
  my $id = $self-id();
  my $endofgroup_id = $self-{GROUP}-{CLOSED}-id;

  cry(branch $endofgroup_id);

}


sub BabyRegex::slash::pasm {  die unimplemented\n; }

sub BabyRegex::class::pasm {  die unimplemented\n; }


sub BabyRegex::group::pasm{
  my $self = shift;
  
  my $id = $self-id;
  my $cnt = $self-{COUNT};  
  my $fs = substr($self-fullstring,1,30);
  
  print \n;
  cry $id, #start of n.c. group $cnt($fs...);

  if ($self-quant eq * or $self-quant eq ?) {
cry savebr I1, . $self-{CLOSED}-next-id(); 
  }
 
  foreach my $alt (@{$self-{ALTS}}) {
cry savebr I1,  . $alt-next-id();
  }
  
}


sub BabyRegex::capture::pasm {

  # We are not capturing anything yet! 
  
  my $self = shift;
  my $id = $self-id;
  my $cnt = $self-{COUNT};
  my $fs = substr($self-fullstring,1,30);
  
  print \n;
  
  if ($self-quant eq * or $self-quant eq ?) {
cry savebr I1, . $self-{CLOSED}-next-id(); 
  }

  cry $id, #start of group $cnt ($fs...);
  
  foreach my $alt (@{$self-{ALTS}}) {
cry savebr I1, . $alt-next-id();
  }
}


sub BabyRegex::close::pasm {
  my $self = shift;  
  my $id = $self-id;
  my $cnt = $self-{GROUP}-{COUNT};

  cry $id, #end of group $cnt;
  
  if ($self-{GROUP}-quant eq * or $self-{GROUP}-quant eq +) {
cry savebr I1,  .  $self-next-id();
cry branch  . $self-{GROUP}-id;
}

  print \n;
  
}


  
sub BabyRegex::comment::pasm { }

sub BabyRegex::whitespace::pasm{ }


sub BabyRegex::lookahead::explanation { die unimplemented\n; }

sub BabyRegex::lookbehind::explanation { die unimplemented\n; }

sub BabyRegex::code::pasm {  die unimplemented\n; }

sub BabyRegex::later::pasm { die unimplemented\n; }

sub BabyRegex::conditional::pasm { die unimplemented\n; }

sub BabyRegex::cut::pasm { die unimplemented\n; }

sub BabyRegex::flags::pasm{ die unimplemented\n; }

sub BabyRegex::backref::pasm { die unimplemented \n; }





1;

__END__

=head1 NAME

BabyRegex - compiles a regular expression down to Parrot bytecode

=head1 SYNOPSIS

  use BabyRegex;
  BabyRegex-new($REx)-pasm;

=head1 SEE ALSO

The CYAPE::Regex documentation.

=head1 AUTHOR

  Angel Faus
  [EMAIL PROTECTED]
  
  Based in YAPE::Regex::Explain by Jeff Pinyan ([EMAIL PROTECTED])

=cut





use BabyRegex;

unless (@ARGV[0]  @ARGV[1]) {
print 'usage: perl babyre.pl pattern string' . \n;
print 'ex:perl babyre.pl reg(exp?|ular +expression)? regex  
regex.pasm' . \n;  
exit;
}   

$pattern = @ARGV[0];
$string = @ARGV[1];

$c = BabyRegex-new($pattern);
$c-pasm($string);





INIT:initbrstack FAIL
 set I1, 0
 set S1, regex

L0:  #start of n.c. group 0(?-imsx:reg(exp?|ular +expressi...)
L1:  matchexactly reg, S1, I1, BT

 savebr I1, L10
L2:  #start of group 1 (exp?|ular +expression)?...)
 savebr I1, L6
 savebr I1, L6
L3:  matchexactly ex, S1, I1, BT
L4:  savebr I1, L9
 matchexactly p, S1, I1, BT
 branch L9
L6:  matchexactly ular, S1, I1, BT
L7:  matchexactly  , S1, I1, BT
 savebr I1, L8
 branch L7
L8:  matchexactly expression, S1, I1, BT
L9:  #end of group 1

L10: #end of group 0

OK:  print match
 clearbrstack I1
 end

FAIL:print fail
 clearbrstack I1
 end

BT:  backtrack I1



1814a1815,1882
 
 
 AUTO_OP matchexactly(sc, s, i, ic){
   
   STRING* temp;
  
   
   if (string_length($2) = $3) {
 RETREL($4);
 }   
 
   temp = string_substr(interpreter, $2, $3 , string_length($1), NULL);
 
   if (string_compare(interpreter, $1, temp) != 0 ) {
 RETREL($4);
   }
   else {
 $3 = $3 + string_length($1);
   }
 }  
 
 AUTO_OP matchanychar(s, i, ic) {
if (string_length($1)  $2){   
   $2++;
   }
else {
   RETREL($3);
}
 }

 MANUAL_OP backtrack(i){
   opcode_t *dest;
 
   pop_generic_entry(interpreter, interpreter-user_stack_top, ($1), 
STACK_ENTRY_INT);
   pop_generic_entry(interpreter, interpreter-control_stack_top, dest, 
STACK_ENTRY_DESTINATION);
 
   RETABS(dest);
 }
 
 
 AUTO_OP

Experimental regex support

2001-11-01 Thread Angel Faus

Hi all,

I have developed some adittions that give Parrot a limited
amount of support to regular expressions.

It all started as a little experiment to find out what the 
compile down to low-level ops thing could mean 
someday.

The patch consists of:

* 5 new opcodes:

   - matchexactly
   - matchanychar
   - initbrstack
   - clearbrstack
   - backtrack
   - savebr

  The first two are the ones that actually implement the 
  matches.

  initbrstack, clearbrstack, backtrack, savebr are for
  managing the stack of pending possible matches. They
  use internally the integer and destination stack.

* A perl package and script that implement a simple regex
  compiler (using YAPE::Regex by the way).

  The compiler currently outputs a parrot program that
  matches the regexp against a predefined string. It could
  be easily modified to proceduce something more useful.

Currently, the following features are supported.

* exact matches
* any char (.)
* nested groups (do not capture)
* alternation
* simple quantifires (*, + ?)

There is a lot of room for improvment, either by 
implementing features that do not require changes in 
Parrot (non-greedy-quantifiers, anchors, capturing
and most of regex options can be added right now) 
or by making the necessary changes in Parrot 
(support for locales are required for macros, etc..).

This is not a serious patch, in the sense that there 
are many things missing, the ones that are supposed 
to work are not tested enough and even the ones 
that work are implemented in a way that is just wrong.

I am a rather mediocre programmer, and this are the first 
lines of code i ever sent to a mailing list, so please be 
benevolent with me. :)

Anyway I thought it would be interesting to share my 
little experiment.

Sincerly,

---
Angel Faus
[EMAIL PROTECTED]

1814a1815,1882
 
 
 AUTO_OP matchexactly(sc, s, i, ic){
   
   STRING* temp;
  
   
   if (string_length($2) = $3) {
 RETREL($4);
 }   
 
   temp = string_substr(interpreter, $2, $3 , string_length($1), NULL);
 
   if (string_compare(interpreter, $1, temp) != 0 ) {
 RETREL($4);
   }
   else {
 $3 = $3 + string_length($1);
   }
 }  
 
 AUTO_OP matchanychar(s, i, ic) {
if (string_length($1)  $2){   
   $2++;
   }
else {
   RETREL($3);
}
 }

 MANUAL_OP backtrack(i){
   opcode_t *dest;
 
   pop_generic_entry(interpreter, interpreter-user_stack_top, ($1), STACK_ENTRY_INT);
   pop_generic_entry(interpreter, interpreter-control_stack_top, dest, STACK_ENTRY_DESTINATION);
 
   RETABS(dest);
 }
 
 
 AUTO_OP savebr(i, ic){
  
   push_generic_entry(interpreter, interpreter-control_stack_top, cur_opcode + cur_opcode[2],  STACK_ENTRY_DESTINATION, NULL);
 
   push_generic_entry(interpreter, interpreter-user_stack_top, ($1),  STACK_ENTRY_INT, NULL);
 
 }
 
 AUTO_OP initbrstack(ic) {
   INTVAL i;
   i = -1;
   
   push_generic_entry(interpreter, interpreter-control_stack_top, cur_opcode + cur_opcode[1], STACK_ENTRY_DESTINATION, NULL);
   push_generic_entry(interpreter, interpreter-user_stack_top, i, STACK_ENTRY_INT, NULL); 
 
 }
 
 AUTO_OP clearbrstack(i){
   opcode_t *dest;
   
   while ($1  $1 = 0) {
 	pop_generic_entry(interpreter, interpreter-control_stack_top, dest, STACK_ENTRY_DESTINATION);
 	pop_generic_entry(interpreter, interpreter-user_stack_top, ($1), STACK_ENTRY_INT); 
 	}
 	
 }
 
 
1826a1895
 




package BabyRegex;

use YAPE::Regex 'BabyRegex';
use strict;
use vars '$VERSION';

$VERSION = '0.01';

my %modes = ( on = '', off = '' );

sub buildtree {
  my $self = shift;
  
  my $cnt = 0;
  my ($groupscnt, @groups);
  my @tree;
  
  while (my $node = $self-next) {

$node-id($cnt++);
$tree[-1]-next($node) if @tree;  

if ($node-type =~ /capture|group/) {
	push @groups, $node;
	$node-{ALTS} = [];
	$node-{COUNT} = $groupscnt++;
	}	
	
if ($node-type eq alt) 	 {
	push (@{$groups[-1]-{ALTS}}, $node);
	my $groupnode = $groups[-1];
	$node-{GROUP} = $groupnode;
	
	push @{$groupnode-{ALTS}}, $node,  
	}

if ($node-type eq close){
	my $groupnode = pop @groups;
	$groupnode-{CLOSED} = $node;
	$node-{GROUP} = $groupnode;
	for my $alt (@{$groupnode-{ALTS}}) {
	#Alt nodes get its ID replaced by the Closing node ID, so 
	#that the when its antecessors calls -next-id it gets the good one.
	#This is probably on of the worse to do that.
	$alt-{ID} = $node-{ID};
	}
}
push (@tree, $node);  
}

  return @tree;  
   
}

sub cry {
  if (@_[1]) {
  	my $label = shift;
  	my $opcode = shift;
  
  	my $spc =   x (4 - length($label) ) ;
  	print $label. : . $spc . $opcode . \n;
  }
  else {
  	my $opcode = shift;
  	print  $opcode\n;
  }

}


sub pasm {
  my ($self, $string) = @_;  
  my @tree = $self-buildtree;   
  
  cry INIT, initbrstack FAIL;
  cry set I1, 0;
  cry set S1, \$string\;

  for my $node (@tree

RE: Revamping the build system

2001-10-22 Thread Angel Faus

Hi,


 From: Dan Sugalski [mailto:[EMAIL PROTECTED]]
 There's nothing really past what make does. The reason for having our own
is:
 *) Make isn't everywhere (like windows)
 *) Make on various platforms has different syntax (VMS, Windows, and Unix
 are all different)
 *) Not speaking for anyone else, but I find make's gotten rather creaky
 a round the edges--after 20+ years presumably we can make things a bit
better
 *) Having the full power of perl in the build tool should give us a big
 boost in utility. (Consider the difference between C macros and Perl
source
 filters)
 *) It'll be really unfamiliar to everyone, which'll stop folks from
falling
 into old, non-portable habits.


If there is going to be a new build tool for perl6, i would suggest using
something similar
to Ant (http://jakarta.apache.org/ant/)

Ant is not suitable for parrot of course (it requires Java) but its design
is quite good imho.

From its webpage:

 Ant is different. Instead of a model where it is extended with shell based
commands, it is
 extended using Java classes. Instead of writing shell commands, the
configuration files
 are XML based calling out a target tree where various tasks get executed.
Each task is run
 by an object which implements a particular Task interface.

It tries to avoid executing shell commands (which is good if you want to be
portable to places
like Windows) and instead it comes with a predefined set of tasks (remove
files, compile, etc..).
that can be extended programming your own Task classes.

This article: http://www.onjava.com/pub/a/onjava/2001/02/22/open_source.html
does
a very good job at giving you a feeling of how it works.

In my limited expierence, this is something very similar to what we would
need for parrot/perl6.

Just my half a cent,

Angel Faus
[EMAIL PROTECTED]
vLex.com





RE: Strings db (comments requested)

2001-10-03 Thread Angel Faus

Hi Grant,

Just as a suggestion, i would use the PO format (already used by other tools
than gettext, like KDE) so we get for free all the catalog manager tools
(like Kbabel, which is very nice, by the way).

And maybe error codes output could be just another target language. So:

fprintf(stderr,  i18n (Error #43: Can't stat file\n) )

Prints just ERR43\n when language var is set to, say, CODES.

--

Angel Faus
Director Técnico
vLex.com
-
[EMAIL PROTECTED]