[Mono-dev] Questions on Cecil
Hi, I have some (more) questions related to Cecil. 1. Suppose I can make Cecil throw an exception from somewhere deep down in its core. An example would be setting a public field to null and then getting a NullReferenceException when some internal Cecil method tries to call methods on that pointer. Is that then a Cecil bug that I should report or is it intended behavior? I.e., does Cecil guarantee good behavior in the face of it being used inappropriately? 2. How do I get Cecil to tell me the difference between the following two C# methods (currently my program turns the first into the second). void method1() { // this method prints 13 try { try { throw new System.NullReferenceException () } catch (System.Exception) System.Console.WriteLine(1); throw; } catch { System.Console.WriteLine(2); } } catch { System.Console.WriteLine(3); } } void method2() { // this methods prints 12 try { try { try { throw new System.NullReferenceException () } catch (System.Exception) System.Console.WriteLine(1); throw; } } catch { System.Console.WriteLine(2); } } catch { System.Console.WriteLine(3); } } Actually, I looked at the ECMA spec, and I can't find anything in there that explains how something like the situation in method1 can even be represented. E.g. the explanation of the rethrow bytecode is silent on the matter. So are the explanations related to exceptions that I have been able to find in the spec, or perhaps I am misreading them. Regards Bjarke Roune ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] Cecil bug report and a question
Hi, Jb Evain skrev: Hi, Bjarke Hammersholt Roune wrote: I have found the following bug in Cecil. Here is how to trigger it: Make an assembly with a method where the last instruction in that method is also the last instruction in a handler block. Use Cecil to load the assembly and then save it. This will result in a NullReferenceException being thrown from Mono.Cecil.Cil.CodeWriter.IsRangeFat. I have attached an example assembly that contains such a method. This seems to be already fixed. Please use the very fresh 0.4.3 release or even better, an always up to date SVN checkout. Thank you for the prompt and informative reply! I thought I was using the newest version, but I did not think to check if a new version had been released. Sorry about that. I have taken a look at 0.4.3, and you are correct that it can read and then write the bug-triggering assembly that I attached to my previous bug report without problems. The issue has not been entirely fixed, however. I have attached an assembly that makes Cecil output an invalid assembly. This is using Cecil 0.4.3 - I have not checked the version of Cecil in SVN. Specifically MS peverify gives the following complaint when I run it on the output: [IL]: Error: [E:\net\o.exe : Bug::Main][offset 0x0007] Handler starts in the middle of an instruction. As far as I can tell the problem is rather the handler end than the start, but maybe I am wrong (I can't inspect it manually as ildasm does not display what the actual ranges are, probably because in this case the handler is invalid). I believe I have tracked the problem down to the following method in CodeWriter.cs: int GetLength (Instruction start, Instruction end, Instruction last) { return (end == null ? last.Offset + last.OpCode.Size : end.Offset) - start.Offset; } As far as I have been able to figure out, OpCode.Size is only the size of the opcode itself - it does not include any parameters that follow the opcode. Thus the ending offset is miscalculated in case the last instruction in the method has any parameters, and that instruction is also the last instruction in an exception handler. Adding a dummy return instruction at the end of the instruction stream also works as a temporary work around for this bug, as return instructions do not take parameters. It would probably be a good idea to document how ranges are represented in Cecil in the FAQ, as there are several reasonable ways it could work, and it is not possible to tell from the Cecil interface which one it is. Good idea! Feel tree to add another entry to http://www.mono-project.com/Cecil:FAQ The thing is, I am not sure how ranges are handled in Cecil. I think it works like this: all ranges of instructions are represented as half open intervals [start, end), where start and end are pointers to actual Instructions. start and end can be null if they are past the last instruction. Is this correct? I also have a question: I have an example of an assembly output by the MS C# compiler that shrinks from 3.072 bytes to 2.048 bytes simply by loading and then saving it using Cecil. How does that happen? Cecil does not preserve the .rsrc (aka Win32 resources) section or the PE file for the moment. So it saves some place. What are the consequences of this other than smaller assemblies? Regards Bjarke Roune bug2 Description: Binary data ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] Cecil bug report and a question
Hi, I have found the following bug in Cecil. Here is how to trigger it: Make an assembly with a method where the last instruction in that method is also the last instruction in a handler block. Use Cecil to load the assembly and then save it. This will result in a NullReferenceException being thrown from Mono.Cecil.Cil.CodeWriter.IsRangeFat. I have attached an example assembly that contains such a method. The problem is that Cecil represents a range of consecutive instructions as a reference to the first instruction in the range and a reference to the instruction just past the end of the range. This design breaks down if the end of the range is the last instruction, so that there are no instructions past it. When Cecil reads a range of that kind, it will set the end pointer of the range to null, but when it writes the range, it assumes that the end pointer is not null. If you are using Cecil, you can work around this bug by adding a dummy return instruction to the end of methods that has this problem. Then the dummy return instruction can be set as the HandlerEnd of the block, thus avoiding the problem. It would probably be a good idea to document how ranges are represented in Cecil in the FAQ, as there are several reasonable ways it could work, and it is not possible to tell from the Cecil interface which one it is. I also have a question: I have an example of an assembly output by the MS C# compiler that shrinks from 3.072 bytes to 2.048 bytes simply by loading and then saving it using Cecil. How does that happen? Regards Bjarke Roune bug Description: Binary data ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] Random CIL generator
Can anyone point me to a random CIL generator? As far as I can tell from Google, no one has written one yet. I ask because I don't want to go around considering how to write one if one already exists. It would be useful to do testing on CIL-based tools as described at http://testingsoftware.blogspot.com/2005/10/testing-for-zero-bugs_31.html Regards Bjarke Roune ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] CIL to CIL optimizer
On Mon, 2006-08-14 at 19:39 +0200, Bjarke Hammersholt Roune wrote: 3. A CCO can decrease the size of an assembly by such optimizations as dead code/function/class elimination (the last two only in the absence of reflection), better use of the stack, better instruction selection, redundancy elimination, better ordering of basic blocks and so on. This decreases start up time. I'd like to understand better what you mean by class elimination (I guess function elimination menas inline...)? And also, what does instruction selection mean at CIL level? By function elimination i mean removing functions that never get called and that are not externally visible. This situation could come about as a result of inlining, yes. By dead class I mean a class that does not get referenced anywhere and that is not externally visible. I don't know how prevalent those things are. It would be reasonable to have a command line switch to inform the CCO that the only externally visible thing that needs to be preserved is the main function. Instruction selection is choosing which CIL instructions to emit. This can have an impact on code size, such as by choosing the short version of some instructions. It could also mean constructing the float 1.0 by taking an integer 1 and casting it to float, which is shorter than loading it directly and which the JIT should be able to optimize away. Choosing local variable numbers in an instruction selection aware way can also help. E.g. loading and storing local variables is only available in short form instructions for the first 4, so it is better to let those 4 be the most used ones. I give you some examples of things that a CCO cannot do as effectively as a JIT, so the JIT must (well, should...) do them anyway. [1] Dead code elimination: some code is generated internally by the JIT, so the CCO cannot even see it. This is particularly true with inlining (unless you also do inlining in the CCO itself, which is tricky because it is likely to break things like tracing, debugging and AOP frameworks). It would be possible to do some inlining, but I agree that the JIT can do a much, much better job here. For one thing, a CCO has no chance to inline methods from referenced assemblies that it doesn't have access to or that might be different on the machines where the code is to be run. I can see how inlining might generate dead code. Btw, does Mono internally generate dead code? [2] Redundancy elimination: again, some code sequences are totally invisible in CIL (array element and field address computations, or locals initializations, and, if we get smart, vtable accesses...). Moreover, redundancy elimination is really tricky to get right, and it can easily turn into an arch dependent issue. I have been badly burnt by the SSAPRE experience in the JIT about this: on paper, PRE is always a win, and my SSAPRE implementation does nothing else than removing redundancies, yet it can *easily* make the code *worse* :-( The problem is that sometimes storage (especially registers!) is more valuable than computational CPU cycles, so PRE is a tradeoff, and you must get it right on each architecture... Does SSAPRE have other issues than increased register pressure? Do you think support for rematerialization in the register allocator would solve most of the issues with SSAPRE? Anyway in general I'm skeptical about omnicomprehensive approaches like the one you describe, not because they are not feasible, but because I have the impression that they are not worth the effort. I mean, when the control flow reaches the catch/finally, you do not know the specific path it took to get there, so even in the perfect case you will have very large phi functions where you cannot know which of the arguments is the right one (because statically none of them is right), so you cannot make inferences on the phi value. In the end, having a dummy store statement instead of the phi is the same thing IMHO, but cheaper. I have no experience with this issue, so I don't know how much difference it makes. It is easy to conjure up examples where it does make a difference, but that does not really prove anything. Path-sensitive analyzes do need the CFG to express all the possible paths, though. My rationale for making the CFG as expressive as possible is that it reduces the number of special cases in optimizations, that I suspect it will make a difference in some situations, and that I think it is necessary to do very precise analysis in order to do something like findbugs for Java. Also, a CCO needs to be quite good at tracking exceptions as they can really inhibit the opportunities for optimization. Another nice thing with an expressive CFG is that the only barrier to code motion within a basic block is that data needs to be computed before it can be used. As a practical matter, I also think
Re: [Mono-dev] CIL to CIL optimizer
Massimiliano Mantione skrev: First things first: it's nice you'll do the CIL-CIL optimizer, but there are plans to do this in the JIT anyway, where these things belong. I say where these things belong not to discourage you, but because the JIT has generally more visibility on how the code will really look like (it can inline methods, knows that the address of certain things maybe is a already constant, and in the future will also have visibility of the method prologue-epilogue), so things like SCCP and PRE should be performed by the JIT anyway if we want to take full advantage of them. Not to mention the issues involved in tuning redundancy elimination, which can easily get architecture dependent... That said, obviously until the JIT is not there having an external optimizer can be useful :-) I think optimization of something like CIL belongs in BOTH the JIT and in a CIL-CIL optimizer (I abbreviate this as CCO in the rest of the email). There are several reasons that I think this. You have already given some good reasons that optimization belongs in the JIT even in the face of an excellent CCO. The CCO also has to take extra care to preserve the constraints of the CIL format, which can be burdensome. E.g. CIL concepts cannot be eliminated by translating them to something simpler, at least not without being able to translate them back later when the optimized assembly is to be output. There are reasons that a CCO is desirable too, though: 1. The JIT cannot afford to do expensive whole-program optimizations and a CCO can, so the CCO can optimize the program in ways that the JIT cannot. In the same vein, the JIT will only aggressively optimize hot parts of the code while a CCO can take its time and optimize everything. 2. It takes time before the JIT has optimized the running program, while the effect of the CCO will be available at start up. 3. A CCO can decrease the size of an assembly by such optimizations as dead code/function/class elimination (the last two only in the absence of reflection), better use of the stack, better instruction selection, redundancy elimination, better ordering of basic blocks and so on. This decreases start up time. 4. Some JITs do not optimize very aggressively. CCO optimizations are available to all JITs, and different CCOs can be run in sequence. 5. Some optimizations, like dead code elimination, can be done in a CCO and then the JIT does not have to do them (or does not have to do them as much). The CCO could embed some metadata in the assembly to indicate which optimizations have already been performed. The same trick applies to the results of expensive analyzes, though relying on that information without checking it could be unsafe. 6. A CCO can output data that enables the JIT to start up faster and optimize better. An example is SafeTSA which is a Java-equivalent bytecode format where the code is already in SSA form and where some array bounds checks and some null checks can be safely eliminated. On top of this the SafeTSA format actually takes up less space then the normal Java bytecode format. Another example is a paper I read that explained how to annotate Java bytecode so that the JIT could do fast and high quality register allocation in little time. This scheme is, surprisingly, independent of the number of registers on the target machine. The last two points have the drawback that they require respectively slight and extensive cooperation between the CCO and the JIT. I don't plan to implement point 6, btw. As a practical matter, I also think that it will take less effort to implement optimizations in ML than in C. On a personal note I have been programming a Groebner Basis computation program for quite a while in C++, and I am looking forward to programming in a functional-style language again. I cannot foresee how problematical the natural barrier to optimization in a CCO that Miguel mentions will be. It is certainly there in the sense that a JIT can do more optimization than a CCO if it had all the time in the world. I guess I will have to write a CCO and see what happens to find out. In any case, my CCO would certainly serve the purpose that Miguel mentions of cleaning up the output of mcs. Then, about the CFG shape and issues when try[/catch]/finally clauses are present, this discussion already reached mono-devel-list: http://marc.theaimsgroup.com/?l=mono-devel-listm=111336503228737w=2 I'd really like you to comment on that thread. I think that discussion is a good example of how the different constraints on JITs and CCOs play out. You are discussing whether to let the CFG reflect the possibility that an instruction could throw an exception. On one hand this would improve precision of analyzes, but it would also make the CFG much larger which could increase memory consumption and decrease performance of the JIT optimizer. It sounded like you ended up agreeing on the less precise
Re: [Mono-dev] CIL to CIL optimizer
Miguel de Icaza skrev: Hello, First things first: it's nice you'll do the CIL-CIL optimizer, but there are plans to do this in the JIT anyway, where these things belong. I think there is a natural barrier imposed by the CIL language on the kinds of optimizations that should be performed at each stage. Are there in your mind optimizations that could profitably be performed at the CIL level, beyond fixing things that the JIT could figure out by itself? /Bjarke ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] Questions on Cecil
I've begun writing the interface layer to Cecil for my CIL optimizer, and I have some questions on using Cecil that I have not been able to find an answer to on my own. Any help would be appreciated. 0. I assume Cecil is sufficiently related to Mono for questions about it to be on-topic on this list, especially as I don't see any Cecil-specific mailing list. Am I correct? (if not: sorry! :-) ) 1. I'm using ildasm and the largely uncommented Cecil source as documentation. Is there some actual documentation other than the FAQ? 2. I need to know the types of the things on the stack. Can Cecil help with that? If not, is there some nice way to get .NET to handle the operations on the type lattice? 3. When emitting a Br (branch) instruction, Cecil insists on getting the target instruction immediately. The trouble is that when I am writing out CIL to Cecil from the CIL representation internal to my program, it is often the case that the target of the branch has not been emitted yet. Currently I get around this by passing it a dummy Nop instruction and then correcting the target in a second pass. Is there some way to avoid constructing a dummy instruction, such as by getting Cecil to accept a null as the (preliminary) target? (When I try to do that the obvious way I get an exception) 4. Cecil requires that the type of a local variable be specified as a TypeReference, so I need to get those. Is there some way to get a TypeReference for built in types like System.Int32 without loading an assembly? The way I am doing it now is getting it as assembly.MainModule.TypeReferences[System.Int32]. 5. I would like to emit the short form of the Br instruction when this is possible. My initial idea is to always emit the short form, and then expand to the long form where necessary. That makes it necessary to take into account the possibility of a cascade where expanding one instruction increases the distance to the target of another Br instruction which then also needs to be expanded to the long form and so on. I would like to avoid this trouble, so is there some way to get Cecil to do this for me? /Bjarke Roune ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
[Mono-dev] CIL to CIL optimizer
Hi, I am considering implementing a CIL to CIL optimizer as suggested at http://www.mono-project.com/Cecil. I have some questions and some ideas up for criticism. Having previously implemented an optimizing compiler in ML in a course, I think that SML.net (http://www.cl.cam.ac.uk/Research/TSG/SMLNET/) would be better suited to the task than C#. F# would be nice too, but it doesn't have an open source compiler as far as I can tell. The proof-of-concept I have in mind would do the following steps for a subset of the full CIL. This subset would include a conditional branch opcode, the addition opcode and full support of exception handling including (the dreaded) try...finally. 1. Read in an assembly using Cecil 2. Construct the control flow graph while converting to a quadruple format (i.e. eliminate the stack in favor of registers) 3. Convert to SSA form using the algorithm from http://www.hipersoft.rice.edu/grads/publications/dom14.pdf 4. Do intra-procedural conditional constant propagation as described on page 447 of modern compiler implementation in ML. (http://portal.acm.org/citation.cfm?id=103136) 5. Eliminate phi-nodes using moves and convert to CIL (i.e. reintroduce the stack) 6. Output assembly using Cecil The next step would be to support at least the subset of CIL that is output by the Mono C# compiler. Then it would be nice to extend the optimization to be inter-procedural. Then I would focus on outputting better CIL code like eliminating some of those moves coming from SSA. I have plenty ideas for what to do after that, but I'll have to see how much time I have. Having taken a look at the ECMA CIL spec, I don't see anything in there that should present really big problems other than exceptions and the finally construct (which is a problem even without exceptions). Exceptions are annoying because of the way they complicate analyzing control flow, but after having thought and read about how to handle them for some time I think they will be manageable. The try ... finally construct, on the other hand, now that just seems painful to me. As in red-hot-pincers-jammed-into-the-eyes- while-being-electrocuted-naked-in-a-snow-storm painful. Having scoured the web to the best of my ability, I have find PLENTY of material on how to deal with try ... catch in Java bytecode and CIL, but somehow people usually never bother to explain how they handle try ... finally. I guess how to efficiently construct a precise CFG and do SSA form in the face of try ... finally is just obvious to everyone but me. A solution to the problem involving higher-order functions is described at http://flint.cs.yale.edu/flint/publications/lamjvm.pdf, though that solution seems unsuitable for an imperative representation. A different solution used in SafeTSA is described at http://www.google.dk/url?sa=tct=rescd=1url=http%3A%2F%2Fwww.cs.utsa.edu%2F~vonronne%2Fpubs%2Fjvr-dissertation.pdfei=f43aRKLBJ7nWwgHZvsGhCAsig2=nhWafzjm_synZsuIHpXhqg which consists of making several copies of the code in the finally clause. The potential for code size explosion is huge, but on the other hand finally clauses are usually small... but it is still not a satisfying solution. An obvious alternative would be to represent finally blocks as what they are and then having every single pass of the optimizer be aware of how to deal with this in some ad-hoc fashion. I don't like that either because all the details just seem horrendous to deal with as far as I can tell. Another idea is putting the finally code in a function call with plenty of ref parameters. This should work nicely together with inter-procedural optimizations. It ought to be possible to reinsert these calls as a finally block in most cases when outputting the CIL. I'm not sure this will work well, though. So far the best solution I can think of is to copy the finally code if it is small and insert function calls if the finally code is large. Always inserting function calls and having a general inliner pass might work well too. If anyone has any ideas or experience they would like to share, I would love to hear it. Especially on the topic of try ... finally! Regards Bjarke Roune ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] CIL to CIL optimizer
Zoltan Varga skrev: Hi, What is the problem with try-finally ? There are two cases: - if there is an exception, the runtime will call it, just with the catch clause - if there is no exception, control flow just falls from the end of the try block to the beginning of the finally block. This is just a normal control flow edge. The problem is that what happens after the finally block is done depends on the context that caused it to run. Here is a simple example in pseudo code: try { if (a) { A: x = 2; goto E; } B: x = 1; } finally { C: print hello world!; } D: print x; E: return Imagine a control flow graph of this code with the labels naming the nodes. Then there will be edges A-C, B-C, C-D, C-E and D-E. The definition of x that reaches D will thus have to go through C, and then C will need a phi-function to get the correct definition of x. This will make it look like the definition of x from A is needed at C, even though obviously it is not. The basic problem is that the control flow graph makes it look like the path A-C-D is possible. In this example this means that we cannot discover that x=2 is dead code. Suppose that the x=2 at A was something throwing an exception instead and we had a catch handler on the innermost try. Then C would need an outgoing edge to this handler, making it look like falling through at B could cause an exception! So if we don't do something special for finally, we can get preposterous information. I should perhaps say that I am interested in precise analysis to do good optimization, but also because I would like the program to have capabilities similar to the Java findbugs program, though that would not be a priority at first. I am sure this CAN be fixed for each analysis by doing the equivalent of path sensitive analysis but restricting it to finally. Basically this will be the same thing as making the analysis act as if the finally block was inlined everywhere it could be called, but without actually inlining it. I am not sure how that should interact with SSA-form. Even if we use the naive approach the control flow graph will need to know that removing an edge TO a finally block might or might not necessitate removing an edge FROM it, depending on whether or not a different branch from inside the corresponding try block has another branch that needs to go the same outer destination. If that does not make sense consider removing the last branch to a label outside a finally-protected try block as compared to removing one of several branches that all go to the same label outside the try block. I expect to discover more of these kinds of differences from normal control flow graph behavior as implementation proceeds, since finally directs control flow in a different way from anything else. I hope I have convincingly argued that finally needs special treatment and that that treatment is non-trivial if we need reasonably precise analysis of program behavior. /Bjarke ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list
Re: [Mono-dev] CIL to CIL optimizer
Rafael Teixeira skrev: Have you looked into the implementation of the SSA optimizations our JIT already does? You may find useful insights on it, even if it optimizes not CIL but another intermediary representation and targets native instructions. :) I did but in ssa.c it says: void mono_ssa_compute (MonoCompile *cfg) { int i, idx; MonoBitSet *set; MonoMethodVar *vinfo = g_new0 (MonoMethodVar, cfg-num_varinfo); MonoInst *inst, *store, **stack; g_assert (!(cfg-comp_done MONO_COMP_SSA)); /* we dont support methods containing exception clauses */ g_assert (mono_method_get_header (cfg-method)-num_clauses == 0); g_assert (!cfg-disable_ssa); [...] } I assume this means that methods with catch, filter, finally or fault do not get converted to or optimized in SSA form. That is when I stopped looking, but perhaps I have misunderstood the comment? Bjarke ___ Mono-devel-list mailing list Mono-devel-list@lists.ximian.com http://lists.ximian.com/mailman/listinfo/mono-devel-list