Re: IMCC Reentrancy
在 2006/7/18 上午 1:54 時,Audrey Tang 寫到: If you have a way to make IMCC reentrant that involves upgrading to a more recent version of flex and passing one additional parameter, go for it! Send us a patch and if it passes all the tests, we'll apply it. As flex 2.5.30+ is not API compatible with the current flex IMCC is using, I wonder how it is different from re2c or regel, in particular that shoehorning an additional YYLEX parameter to make it work with bison will also involve overhauls beyond the original bison interface. I guess my question is: If I send two patches, of equal size, one uses re2c and is much cleaner and faster; another uses a kluged-up flex with its new, backward-incompatible reentrant API, would you reject one and apply the other? If you are willing to let alternative scanners go in, I'd much rather working on that instead of trying to work around the bison/flex interface. Code is easier for me to write than English. Hence: 09:22 @audreyt imcc scanner is now reentrant. 09:22 @audreyt I think it wouldn't take more than another hour to get it based on re2c 09:22 @audreyt but I'm willing to take what is felt more comfortable. :-) Audrey PGP.sig Description: This is a digitally signed message part
Re: IMCC Reentrancy
Audrey Tang wrote: Indeed, and I'd like to apologize publicly for the snipping. Accepted and forgiven. However, the re2c or regel-based scanner refactoring isn't different from a flex upgrade patch, as it (by definition) can't affect IMCC's public API at all. An additional advantage is that they will let us rid of the flaky API situation with flex. In any case it wouldn't take 6 months. In vsoni's original words: a. Remove flex and implement re2c b. Remove static and global variables The full quote in context is: Since flex is not generating reeentrant code, this option will get rid of flex altogether and replace it with re2c. This would require significant reworking on the code. So the plan of action would be as follows: a. Remove flex and implement re2c b. Remove static and global variables Apart from this we also need to refactor the code to get rid of arrays to a hash table implementation for macros. All in all this would be over hauling lot of code. And you answered: The cost/benefit balance on this solution is not good. A lot of people are depending on IMCC now, and a refactor of that magnitude will throw several important projects on Parrot into a dead stall. Yup. Always take the estimate of the developer and multiply it by at least 3. If the developer thinks it will require significant reworking, it's likely to be a massive overhaul. It will involve overhauls, but again, the public interface -- at bison level and above -- cannot break. So the dead stall ruling -- effectively dismissing re2c and other scanner alternatives instantly -- strikes me as extremely surprising. It's not the definition of the interface I'm concerned about, it's the behavior behind the interface. Can you guarantee that you can substitute re2c for flex without changing any behavior of IMCC? If you say Yes, I'll still be suspicious the answer will turn out to be No. I'm also not convinced that re2c is a significant improvement over flex. I'd rather spend that developer time on things that are significant improvements. I am convinced that we need to avoid yanking working systems out from under developers whenever possible. Allison
Re: IMCC Reentrancy
Vishal Soni wrote: The current implementation is implemented using Flex and YACC. Flex implementation has limitations in C mode. The C lexer generated by flex cannot be reentrant/threadsafe. Flex generates thread-safe parsers only in C++ mode. This limition of flex will defeat the whole effort of removing global variables from IMCC. In my opinion if we cannot get global variable free code from flex there is no sense in proceeding with cleaning up the other global variables. This is unfortunate, but not entirely surprising. 1st Option: Hack it and patch it to death !!! --- Since flex is not generating reentrant code, this option will get rid of flex altogether and replace it with re2c. This would require significant reworking on the code. So the plan of action would be as follows: a. Remove flex and implement re2c b. Remove static and global variables Apart from this we also need to refactor the code to get rid of arrays to a hash table implementation for macros. All in all this would be over hauling lot of code. The cost/benefit balance on this solution is not good. A lot of people are depending on IMCC now, and a refactor of that magnitude will throw several important projects on Parrot into a dead stall. So, my answer is: No. 2nd: Inaction is the best action !!! --- Lets not do anything a leave the code as it is. Just say IMCC is not re-entrant/thread-safe and leave it there We will address this issue in future. I highly doubt it this is the route we want to take For the short-term, this is the route we want to take. A new PIR/PASM compiler isn't absolutely necessary for a 1.0 release. IMCC doesn't really need to be reentrant, it just needs to produce bytecode. So, my answer is: Yes, but... 3rd Option: Back to drawing board !!! This option would require a complete re-write of IMCC ( possibly could call it PIRC). The cons of this approach is we will have to re-implement the whole IMCC again. The programming languages will have to live with IMCC limitations as long as the new version is ready. The pros of this approach are a. A clean implementation rather than a prototypish implementation b. Make PIR compiler production release ready. The way the compiler sits right now it is not a good release candidate. c. Structure the code in a way that is easy to maintain and extend. The 3rd option is lot of work but might be a good option in the long run. IMCC was originally implemented as a separate compiler. After a while, we found it to be so much better than the existing assembler that we made it the primary way of producing bytecode. It's okay to repeat the cycle by experimenting with a new compiler that produces bytecode, and later decide if we want to replace IMCC with it. This doesn't interfere with IMCC's development. So, my answer is: Yes, but... re2c and lemon aren't enough of an improvement over flex and bison to be worth the pain of rewriting IMCC from scratch. If we do create a new way of producing bytecode (and it's a safe bet that we will at some point), I would lean toward using our own tools. - Patrick is already looking into implementing a version of PGE in C. This will be an infinitely better parser than any existing alternatives, so it's worth waiting for. - We already want an OST(opcode syntax tree)-to-bytecode compiler that bypasses PIR for the compiler tools. That same compiler could be used to implement PIR (combined with a lightweight version of TGE in C). - IMCC is not a straight translator, it also performs optimizations. These should be implemented in a modular way, with a standard interface, so that developers can swap in new and improved optimizers as we go along. The best place to hook them is probably off the OST-to-bytecode compiler. This approach does mean that the tools to start an IMCC rewrite aren't available yet. It's a long-term solution (possibly post-1.0), so we can afford to take a long-term view. Allison
Re: IMCC Reentrancy
On Mon, 2006-07-17 at 14:49 -0700, Allison Randal wrote: re2c and lemon aren't enough of an improvement over flex and bison to be worth the pain of rewriting IMCC from scratch. If we do create a new way of producing bytecode (and it's a safe bet that we will at some point), I would lean toward using our own tools. - Patrick is already looking into implementing a version of PGE in C. This will be an infinitely better parser than any existing alternatives, so it's worth waiting for. - We already want an OST(opcode syntax tree)-to-bytecode compiler that bypasses PIR for the compiler tools. That same compiler could be used to implement PIR (combined with a lightweight version of TGE in C). - IMCC is not a straight translator, it also performs optimizations. These should be implemented in a modular way, with a standard interface, so that developers can swap in new and improved optimizers as we go along. The best place to hook them is probably off the OST-to-bytecode compiler. Allison having said that we need an API for byte code generation that supports plug n play optimizers would it make sense to start implementing this layer. This API could be used for OST to byte code generation. Later when Patrick's PGE to C parser generator is ready we could use his code to implement the PIR compiler and just use the API's that we write for byte code generation. Initially for prototyping purposes we might just use the existing flex/yacc or re2c/lemon. Allison should this development wait or can we start working on it? Will we need a PDD before we can commence working on this API. Let me know your thoughts. It might not hurt to start working on a Prototype API and see how it fits withe OST-to-bytecode compiler. This approach does mean that the tools to start an IMCC rewrite aren't available yet. It's a long-term solution (possibly post-1.0), so we can afford to take a long-term view. Allison
Re: IMCC Reentrancy
Vishal Soni wrote: Allison having said that we need an API for byte code generation that supports plug n play optimizers would it make sense to start implementing this layer. This API could be used for OST to byte code generation. Later when Patrick's PGE to C parser generator is ready we could use his code to implement the PIR compiler and just use the API's that we write for byte code generation. Yes, this will be valuable. Initially for prototyping purposes we might just use the existing flex/yacc or re2c/lemon. The current PGE implementation is the best prototyping substitute: a) the output from it will be nearly identical to the output from the C version, and b) we also want to be able to use the OST-to-bytecode compiler from language-compilers that use the PIR versions of PGE/TGE, so it makes sense to build it that way from the start. Ultimately we'll want to remove the PIR-PGE-PIR dependency loop, but this is a good start. Allison should this development wait or can we start working on it? Will we need a PDD before we can commence working on this API. Let me know your thoughts. It might not hurt to start working on a Prototype API and see how it fits withe OST-to-bytecode compiler. Let's go for an agile, iterative approach to the spec. Write up some initial thoughts on the shape of the API and post them to parrot-porters. The group can do sanity-checking/brainstorming, and then you can start a prototype based on the result. After we've played with the prototype a bit (and probably after you've modified it a few times based on feedback from the group), I'll write a PDD to flesh out the spec, fill in any holes, and address any problems encountered along the way. Thanks, Allison
Re: IMCC Reentrancy
Let's go for an agile, iterative approach to the spec. Write up some initial thoughts on the shape of the API and post them to parrot-porters. The group can do sanity-checking/brainstorming, and then you can start a prototype based on the result. After we've played with the prototype a bit (and probably after you've modified it a few times based on feedback from the group), I'll write a PDD to flesh out the spec, fill in any holes, and address any problems encountered along the way. Allison this sounds great. To get started I will need some reference to the OST format. Can you please point me in the right direction (some documentation or sample code shall do.)? I will assume the implementation of the Byte Code Generation/ Optimization API will be implemented in C (TGE could use loadlib or some PMC mechanism to call it). Let me know if my assumption is correct or does this API need to be in PIR. Thanks, Allison
Re: IMCC Reentrancy
Vishal Soni wrote: Allison this sounds great. To get started I will need some reference to the OST format. Can you please point me in the right direction (some documentation or sample code shall do.)? Start with languages/punie/lib/POST/ and languages/punie/lib/PIRGrammar.tg. This is the most developed existing prototype implementation of OST nodes and an OST-to-PIR translator, which should give you a general idea of what we'll be looking for. I will assume the implementation of the Byte Code Generation/ Optimization API will be implemented in C (TGE could use loadlib or some PMC mechanism to call it). Let me know if my assumption is correct or does this API need to be in PIR. Yes, C is the right way to go. Allison
Re: IMCC Reentrancy
Audrey Tang wrote: As I'm writing this, I noticed that Allison has ruled that we go with PIR/PGE and eventually C-based libpge instead -- since a lexer refactoring that doesn't affect the IMCC API will somehow throw important projects on Parrot into a dead stall, and thread safety for PIR compilation is not a 1.0 goal anyway -- I'll abandon working on this, and focus on helping getting a C-based libpge started instead. :-) LOL :) Audrey, I love you dear, but you always have an interesting way of interpreting what I say. :) Yes, I'm not willing to start a 6+ month project to gut IMCC. The cost is too great and the benefit isn't great enough. If you have a way to make IMCC reentrant that involves upgrading to a more recent version of flex and passing one additional parameter, go for it! Send us a patch and if it passes all the tests, we'll apply it. It's still true that: - We need an OST-to-bytecode compiler for the compiler tools. (I suspect it will solve some of your problems too, as you'll no longer need to embed Parrot to generate Parrot bytecode. You'll be able to generate it from a C library instead and just run the bytecode on Parrot.) - A PIR parser written in PGE is a good idea (and will be dead simple anyway, as PIR is a simple language). - A version of PGE written in C is a good idea, because it will spread Perl 6 regexes/grammars far and wide. (It will be difficult, because of all the Parrot features that will have to be reimplemented in a standalone PGE. But, it is possible.) - If those things combine to produce a cleaner, more maintainable alternative to IMCC, it's good for Parrot. If not, then the separate components are still good for Parrot. There's more than one way to do it, Sometimes you should do both, Allison
Re: IMCC Reentrancy
在 2006/7/18 上午 1:21 時,Allison Randal 寫到: LOL :) Audrey, I love you dear, but you always have an interesting way of interpreting what I say. :) Yes, I'm not willing to start a 6+ month project to gut IMCC. The cost is too great and the benefit isn't great enough. Indeed, and I'd like to apologize publicly for the snipping. However, the re2c or regel-based scanner refactoring isn't different from a flex upgrade patch, as it (by definition) can't affect IMCC's public API at all. An additional advantage is that they will let us rid of the flaky API situation with flex. In any case it wouldn't take 6 months. In vsoni's original words: a. Remove flex and implement re2c b. Remove static and global variables And you answered: The cost/benefit balance on this solution is not good. A lot of people are depending on IMCC now, and a refactor of that magnitude will throw several important projects on Parrot into a dead stall. So, my answer is: No. It will involve overhauls, but again, the public interface -- at bison level and above -- cannot break. So the dead stall ruling -- effectively dismissing re2c and other scanner alternatives instantly -- strikes me as extremely surprising. If you have a way to make IMCC reentrant that involves upgrading to a more recent version of flex and passing one additional parameter, go for it! Send us a patch and if it passes all the tests, we'll apply it. As flex 2.5.30+ is not API compatible with the current flex IMCC is using, I wonder how it is different from re2c or regel, in particular that shoehorning an additional YYLEX parameter to make it work with bison will also involve overhauls beyond the original bison interface. I guess my question is: If I send two patches, of equal size, one uses re2c and is much cleaner and faster; another uses a kluged-up flex with its new, backward-incompatible reentrant API, would you reject one and apply the other? If you are willing to let alternative scanners go in, I'd much rather working on that instead of trying to work around the bison/flex interface. - A version of PGE written in C is a good idea, because it will spread Perl 6 regexes/grammars far and wide. (It will be difficult, because of all the Parrot features that will have to be reimplemented in a standalone PGE. But, it is possible.) Well, as discussed in #parrot, an offline-parser (i.e. one that does not permit changes to the gramamr during parsing) with rule syntax can be much more easily generated as a C-emitter backend from either PIR/PGE or Perl5/PCR. I'm looking into it with vsoni right now,. Audrey PGP.sig Description: This is a digitally signed message part