Re: IMCC Reentrancy

2006-07-18 Thread Audrey Tang


在 2006/7/18 上午 1:54 時,Audrey Tang 寫到:
If you have a way to make IMCC reentrant that involves upgrading  
to a more recent version of flex and passing one additional  
parameter, go for it! Send us a patch and if it passes all the  
tests, we'll apply it.


As flex 2.5.30+ is not API compatible with the current flex IMCC is  
using, I wonder how it is different from re2c or regel, in  
particular that shoehorning an additional YYLEX parameter to make  
it work with bison will also involve overhauls beyond the original  
bison interface.


I guess my question is: If I send two patches, of equal size, one  
uses re2c and is much cleaner and faster; another uses a kluged-up  
flex with its new, backward-incompatible reentrant API, would you  
reject one and apply the other?  If you are willing to let  
alternative scanners go in, I'd much rather working on that instead  
of trying to work around the bison/flex interface.


Code is easier for me to write than English.  Hence:

09:22 @audreyt imcc scanner is now reentrant.
09:22 @audreyt I think it wouldn't take more than another hour to  
get it based on re2c

09:22 @audreyt but I'm willing to take what is felt more comfortable.

:-)

Audrey

PGP.sig
Description: This is a digitally signed message part


Re: IMCC Reentrancy

2006-07-18 Thread Allison Randal

Audrey Tang wrote:


Indeed, and I'd like to apologize publicly for the snipping.


Accepted and forgiven.

However, the re2c or regel-based scanner refactoring isn't different 
from a flex upgrade patch, as it (by definition) can't affect IMCC's 
public API at all.  An additional advantage is that they will let us rid 
of the flaky API situation with flex.  In any case it wouldn't take 6 
months.


In vsoni's original words:


a. Remove flex and implement re2c
b. Remove static and global variables


The full quote in context is:


Since flex is not generating reeentrant code, this option will get rid of
flex altogether and replace it with re2c. This would require significant
reworking on the code. So the plan of action would be as follows:
a. Remove flex and implement re2c
b. Remove static and global variables

Apart from this we also need to refactor the code to get rid of arrays to a
hash table implementation for macros.

All in all this would be over hauling lot of code.



And you answered:

The cost/benefit balance on this solution is not good. A lot of people 
are depending on IMCC now, and a refactor of that magnitude will throw 
several important projects on Parrot into a dead stall.


Yup. Always take the estimate of the developer and multiply it by at 
least 3. If the developer thinks it will require significant 
reworking, it's likely to be a massive overhaul.


It will involve overhauls, but again, the public interface -- at bison 
level and above -- cannot break.  So the dead stall ruling -- 
effectively dismissing re2c and other scanner alternatives instantly -- 
strikes me as extremely surprising.


It's not the definition of the interface I'm concerned about, it's the 
behavior behind the interface. Can you guarantee that you can substitute 
re2c for flex without changing any behavior of IMCC? If you say Yes, 
I'll still be suspicious the answer will turn out to be No.


I'm also not convinced that re2c is a significant improvement over flex. 
I'd rather spend that developer time on things that are significant 
improvements.


I am convinced that we need to avoid yanking working systems out from 
under developers whenever possible.


Allison


Re: IMCC Reentrancy

2006-07-17 Thread Allison Randal

Vishal Soni wrote:


The current implementation is implemented using Flex and YACC.  Flex
implementation has limitations in C mode.  The C lexer generated by flex
cannot be reentrant/threadsafe. Flex generates thread-safe parsers only in
C++ mode. This limition of flex will defeat the whole effort of removing
global variables from IMCC. In my opinion if we cannot get global variable
free code from flex there is no sense in proceeding with cleaning up the
other global variables.


This is unfortunate, but not entirely surprising.


1st Option: Hack it and patch it to death !!!
---
Since flex is not generating reentrant code, this option will get rid of
flex altogether and replace it with re2c. This would require significant
reworking on the code. So the plan of action would be as follows:
a. Remove flex and implement re2c
b. Remove static and global variables

Apart from this we also need to refactor the code to get rid of arrays to a
hash table implementation for macros.

All in all this would be over hauling lot of code.


The cost/benefit balance on this solution is not good. A lot of people 
are depending on IMCC now, and a refactor of that magnitude will throw 
several important projects on Parrot into a dead stall.


So, my answer is: No.


2nd: Inaction is the best action !!!
---
Lets not do anything a leave the code as it is. Just say IMCC is not
re-entrant/thread-safe and leave it there We will address this issue in
future. I highly doubt it this is the route we want to take


For the short-term, this is the route we want to take. A new PIR/PASM 
compiler isn't absolutely necessary for a 1.0 release. IMCC doesn't 
really need to be reentrant, it just needs to produce bytecode.


So, my answer is: Yes, but...


3rd Option: Back to drawing board !!!


This option would require a complete re-write of IMCC ( possibly could call
it PIRC).  The cons of this approach is we will have to re-implement the
whole IMCC again. The programming languages will have to live with IMCC
limitations as long as the new version is ready.

The pros of this approach are
  a. A clean implementation rather than a prototypish implementation
  b. Make PIR compiler production release ready. The way the compiler sits
right now it is not a good release candidate.
  c. Structure the code in a way that is easy to maintain and extend.

The 3rd option is lot of work but might be a good option in the long run.


IMCC was originally implemented as a separate compiler. After a while, 
we found it to be so much better than the existing assembler that we 
made it the primary way of producing bytecode. It's okay to repeat the 
cycle by experimenting with a new compiler that produces bytecode, and 
later decide if we want to replace IMCC with it. This doesn't interfere 
with IMCC's development.


So, my answer is: Yes, but...

re2c and lemon aren't enough of an improvement over flex and bison to be 
worth the pain of rewriting IMCC from scratch. If we do create a new way 
of producing bytecode (and it's a safe bet that we will at some point), 
I would lean toward using our own tools.


- Patrick is already looking into implementing a version of PGE in C. 
This will be an infinitely better parser than any existing alternatives, 
so it's worth waiting for.


- We already want an OST(opcode syntax tree)-to-bytecode compiler that 
bypasses PIR for the compiler tools. That same compiler could be used to 
implement PIR (combined with a lightweight version of TGE in C).


- IMCC is not a straight translator, it also performs optimizations. 
These should be implemented in a modular way, with a standard interface, 
so that developers can swap in new and improved optimizers as we go 
along. The best place to hook them is probably off the OST-to-bytecode 
compiler.



This approach does mean that the tools to start an IMCC rewrite aren't 
available yet. It's a long-term solution (possibly post-1.0), so we can 
afford to take a long-term view.


Allison


Re: IMCC Reentrancy

2006-07-17 Thread Vishal Soni
On Mon, 2006-07-17 at 14:49 -0700, Allison Randal wrote:

 re2c and lemon aren't enough of an improvement over flex and bison to be 
 worth the pain of rewriting IMCC from scratch. If we do create a new way 
 of producing bytecode (and it's a safe bet that we will at some point), 
 I would lean toward using our own tools.

 - Patrick is already looking into implementing a version of PGE in C. 
 This will be an infinitely better parser than any existing alternatives, 
 so it's worth waiting for.
 
 - We already want an OST(opcode syntax tree)-to-bytecode compiler that 
 bypasses PIR for the compiler tools. That same compiler could be used to 
 implement PIR (combined with a lightweight version of TGE in C).
 
 - IMCC is not a straight translator, it also performs optimizations. 
 These should be implemented in a modular way, with a standard interface, 
 so that developers can swap in new and improved optimizers as we go 
 along. The best place to hook them is probably off the OST-to-bytecode 
 compiler.

Allison having said that we need an API for byte code generation that
supports plug n play optimizers would it make sense to start
implementing this layer. This API could be used for OST to byte code
generation. Later when Patrick's PGE to C parser generator is ready we
could use his code to implement the PIR compiler and just use the API's
that we write for byte code generation.  Initially for prototyping
purposes we might just use the existing flex/yacc or re2c/lemon.

Allison should this development wait or can we start working on it? Will
we need a PDD before we can commence working on this API. Let me know
your thoughts.

It might not hurt to start working on a Prototype API and see how it
fits withe OST-to-bytecode compiler.

 This approach does mean that the tools to start an IMCC rewrite aren't 
 available yet. It's a long-term solution (possibly post-1.0), so we can 
 afford to take a long-term view.
 
 Allison



Re: IMCC Reentrancy

2006-07-17 Thread Allison Randal

Vishal Soni wrote:


Allison having said that we need an API for byte code generation that
supports plug n play optimizers would it make sense to start
implementing this layer. This API could be used for OST to byte code
generation. Later when Patrick's PGE to C parser generator is ready we
could use his code to implement the PIR compiler and just use the API's
that we write for byte code generation.


Yes, this will be valuable.


Initially for prototyping
purposes we might just use the existing flex/yacc or re2c/lemon.


The current PGE implementation is the best prototyping substitute: a) 
the output from it will be nearly identical to the output from the C 
version, and b) we also want to be able to use the OST-to-bytecode 
compiler from language-compilers that use the PIR versions of PGE/TGE, 
so it makes sense to build it that way from the start.


Ultimately we'll want to remove the PIR-PGE-PIR dependency loop, but 
this is a good start.



Allison should this development wait or can we start working on it? Will
we need a PDD before we can commence working on this API. Let me know
your thoughts.

It might not hurt to start working on a Prototype API and see how it
fits withe OST-to-bytecode compiler.


Let's go for an agile, iterative approach to the spec. Write up some 
initial thoughts on the shape of the API and post them to 
parrot-porters. The group can do sanity-checking/brainstorming, and then 
you can start a prototype based on the result. After we've played with 
the prototype a bit (and probably after you've modified it a few times 
based on feedback from the group), I'll write a PDD to flesh out the 
spec, fill in any holes, and address any problems encountered along the way.


Thanks,
Allison


Re: IMCC Reentrancy

2006-07-17 Thread Vishal Soni

 Let's go for an agile, iterative approach to the spec. Write up some 
 initial thoughts on the shape of the API and post them to 
 parrot-porters. The group can do sanity-checking/brainstorming, and then 
 you can start a prototype based on the result. After we've played with 
 the prototype a bit (and probably after you've modified it a few times 
 based on feedback from the group), I'll write a PDD to flesh out the 
 spec, fill in any holes, and address any problems encountered along the way.

Allison this sounds great. To get started I will need some reference to
the OST format. Can you please point me in the right direction (some
documentation or sample code shall do.)?

I will assume the implementation of the Byte Code Generation/
Optimization API will be implemented in C (TGE could use loadlib or some
PMC mechanism to call it). Let me know if my assumption is correct or
does this API need to be in PIR.


 Thanks,
 Allison



Re: IMCC Reentrancy

2006-07-17 Thread Allison Randal

Vishal Soni wrote:


Allison this sounds great. To get started I will need some reference to
the OST format. Can you please point me in the right direction (some
documentation or sample code shall do.)?


Start with languages/punie/lib/POST/ and 
languages/punie/lib/PIRGrammar.tg. This is the most developed existing 
prototype implementation of OST nodes and an OST-to-PIR translator, 
which should give you a general idea of what we'll be looking for.



I will assume the implementation of the Byte Code Generation/
Optimization API will be implemented in C (TGE could use loadlib or some
PMC mechanism to call it). Let me know if my assumption is correct or
does this API need to be in PIR.


Yes, C is the right way to go.

Allison


Re: IMCC Reentrancy

2006-07-17 Thread Allison Randal

Audrey Tang wrote:


As I'm writing this, I noticed that Allison has ruled that we go with 
PIR/PGE and eventually C-based libpge instead
-- since a lexer refactoring that doesn't affect the IMCC API will 
somehow throw important projects on Parrot into a

dead stall, and thread safety for PIR compilation is not a 1.0 goal
anyway -- I'll abandon working on this, and
focus on helping getting a C-based libpge started instead. :-)


LOL :) Audrey, I love you dear, but you always have an interesting way 
of interpreting what I say. :)


Yes, I'm not willing to start a 6+ month project to gut IMCC. The cost 
is too great and the benefit isn't great enough.


If you have a way to make IMCC reentrant that involves upgrading to a 
more recent version of flex and passing one additional parameter, go for 
it! Send us a patch and if it passes all the tests, we'll apply it.



It's still true that:

- We need an OST-to-bytecode compiler for the compiler tools. (I suspect 
it will solve some of your problems too, as you'll no longer need to 
embed Parrot to generate Parrot bytecode. You'll be able to generate it 
from a C library instead and just run the bytecode on Parrot.)


- A PIR parser written in PGE is a good idea (and will be dead simple 
anyway, as PIR is a simple language).


- A version of PGE written in C is a good idea, because it will spread 
Perl 6 regexes/grammars far and wide. (It will be difficult, because of 
all the Parrot features that will have to be reimplemented in a 
standalone PGE. But, it is possible.)


- If those things combine to produce a cleaner, more maintainable 
alternative to IMCC, it's good for Parrot. If not, then the separate 
components are still good for Parrot.



There's more than one way to do it,
Sometimes you should do both,
Allison


Re: IMCC Reentrancy

2006-07-17 Thread Audrey Tang


在 2006/7/18 上午 1:21 時,Allison Randal 寫到:
LOL :) Audrey, I love you dear, but you always have an interesting  
way of interpreting what I say. :)


Yes, I'm not willing to start a 6+ month project to gut IMCC. The  
cost is too great and the benefit isn't great enough.


Indeed, and I'd like to apologize publicly for the snipping.

However, the re2c or regel-based scanner refactoring isn't different  
from a flex upgrade patch, as it (by definition) can't affect  
IMCC's public API at all.  An additional advantage is that they will  
let us rid of the flaky API situation with flex.  In any case it  
wouldn't take 6 months.


In vsoni's original words:


a. Remove flex and implement re2c
b. Remove static and global variables


And you answered:

The cost/benefit balance on this solution is not good. A lot of  
people are depending on IMCC now, and a refactor of that magnitude  
will throw several important projects on Parrot into a dead stall.


So, my answer is: No.


It will involve overhauls, but again, the public interface -- at  
bison level and above -- cannot break.  So the dead stall ruling --  
effectively dismissing re2c and other scanner alternatives instantly  
-- strikes me as extremely surprising.


If you have a way to make IMCC reentrant that involves upgrading to  
a more recent version of flex and passing one additional parameter,  
go for it! Send us a patch and if it passes all the tests, we'll  
apply it.


As flex 2.5.30+ is not API compatible with the current flex IMCC is  
using, I wonder how it is different from re2c or regel, in particular  
that shoehorning an additional YYLEX parameter to make it work with  
bison will also involve overhauls beyond the original bison interface.


I guess my question is: If I send two patches, of equal size, one  
uses re2c and is much cleaner and faster; another uses a kluged-up  
flex with its new, backward-incompatible reentrant API, would you  
reject one and apply the other?  If you are willing to let  
alternative scanners go in, I'd much rather working on that instead  
of trying to work around the bison/flex interface.


- A version of PGE written in C is a good idea, because it will  
spread Perl 6 regexes/grammars far and wide. (It will be difficult,  
because of all the Parrot features that will have to be  
reimplemented in a standalone PGE. But, it is possible.)


Well, as discussed in #parrot, an offline-parser (i.e. one that does  
not permit changes to the gramamr during parsing) with rule syntax  
can be much more easily generated as a C-emitter backend from either  
PIR/PGE or Perl5/PCR.  I'm looking into it with vsoni right now,.


Audrey




PGP.sig
Description: This is a digitally signed message part