RE: GSoC proposal: Provide optimizations feedback through post-compilation messages
Quite lengthy but very interesting mail! It took me a while to formulate a proper reply :) Feedback can be scarce, but don't let that stop you from submitting a proposal. Either way, can you keep me informed about any progress? I might wish to help though that would probably be later in the cycle (got a lot queued up for the comming months). Submitted :) The reviews are not too positive yet, my biggest efforts go into making my plan clear. If any progress, help will be very appreciable indeed. Great that's exactly what I'm aiming at:) It's not just presenting the results of static analysis in real-time, as I actually dislike most kinds of it like finding memory leaks, to me that seems like an attempt to make the computer do what it's really bad at (understanding the code). I just want to give the programmer the fullest picture of the situation but at the same time make it so it doesn't become noise that interferes. More or less you can say the goal is To provide feedback that allows the user to extend his understanding of the program. That mostly means giving access to all the information that can be unambiguously concluded from the code by the computer. To what degree we carry it and how much the compiler is involved is only a question of practicality and performance. I quite agree for the most part, still there is a subtle nuance on which I want to argue: Do we really help the programmer by offering all the valuable information that is possible to infer? Ten years from now, would he/she be a better programmer if we had not let him/her strive to simulate the program in mind, or code a portion in assembly and finally learn about machine architecture? My point is to avoid creating an interface that assists of helps the programmer, as he/she might become dependent on it. This is just helping in the short term, and the only person who ever learns something is the one who actually creates the compiler. If a statement could sum my view, it would be that the user improves through his/her use of the interface (here the feedback messages). How does it make a difference in practice? I want to minimize the information given :) The reason I want to introduce feedback messages is that this particular information (the inner workings of compilers) is very hard to find in practice. I want to give a slight help to put the user on the rails, nothing more. Perfect! However, how to do that so that it actually works seems a bit complex. The first (practically unsolvable) issue is what actually constitutes better code, as given two pieces one may be faster in some cirtumstances while the other in different. But as I understand that's not really what we're trying to tell the user, rather we want him to explore for himself what's possible and what are the results and why they are the way they are? I'm guessing this will unfortunately (or fortunately) require him to actually see and undestand the intermediate code, see how it changes after different optimizations, and see the output assembly. Personally I really need/want that;) Though my end target is a bit more to broaden the abstraction when programming (both up and down), so not to just show what's happening with the code but also allow the programmer to interact with it on that lower level. LLVM seems like the perfect fit for that but I've got some gripes with it, and that is still far away in the future. Excellent! Letting the user explore by himself sounds great, and seing the output assembly/IR besides is indeed a must. I like the idea that compilation is a cooperation between programmer and machine (as far as the programmer is inclined to help of course). It would also be nice to see compilation be split at Value range propagation, as one could verify it is properly computed, before proceeding into optimizations. Unfortunately I only saw 36m of it as it broke and seeking doesn't work on vimeo for me, so I'll watch the rest later. To me it touches on some of the right issues/concepts but in slightly the wrong way, and it completely ignores some issues. Agreed. (Only the first half of the video is relevant for the programming prototype) Thibault
Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
On Mon, 2 Apr 2012 19:57:20 + Thibault Raffaillac t...@kth.se wrote: Bump! Let me renew my interest in contributing through GSoC with post-compilation feedback (This was not an early april joke). Do you think it could lead to an acceptable GSoC proposal? (mentor interested?) Feedback can be scarce, but don't let that stop you from submitting a proposal. Either way, can you keep me informed about any progress? I might wish to help though that would probably be later in the cycle (got a lot queued up for the comming months). @Tomasz: On the interaction side I totally agree that communication between compiler and programmer is scarce (and there is room for improvement). Focusing too soon on the editor would overlook the vast users needs though, as: _ some users do not use an IDE (and will kindly refuse); _ some users do not need more communication, as they already know what GCC can and cannot do; _ some users do not want more communication, as they have other business to focus on; Sure, I'm one of the people who don't use an IDE as it causes more issues than it solves for me. This isn't meant for everyone the same way anything else isn't, it just can't;p Still looking at it, other languages, different IDEs, I'd say my way of tackling the issues is more usable and useful than most other, and could easily see wider adoption. Btw my experience is mostly in low-level kernel/driver programming, 2/3d graphics, games. I think the editor being split from the compiler is good thing. There still exist tools to expose static analysis data from the compiler (and choose the editor to visualize it with), but fundamentally they are assisting him/her rather than helping him/her improve. Instead of gathering loads of data on the optimizations/analysis performed, and filtering it for visualization by the user, we could relate the optimization technique used so that the user truly knows what GCC is capable of (instead of guessing by observation). Great that's exactly what I'm aiming at:) It's not just presenting the results of static analysis in real-time, as I actually dislike most kinds of it like finding memory leaks, to me that seems like an attempt to make the computer do what it's really bad at (understanding the code). I just want to give the programmer the fullest picture of the situation but at the same time make it so it doesn't become noise that interferes. More or less you can say the goal is To provide feedback that allows the user to extend his understanding of the program. That mostly means giving access to all the information that can be unambiguously concluded from the code by the computer. To what degree we carry it and how much the compiler is involved is only a question of practicality and performance. My proposal is thus not to be confused with a static analysis visualization: the programmer learns what techniques are implemented in GCC (or in compilers in general), how to write code that is more easily compiled, and can further browse the Intwawaernet for detailed theory on the techniques involved. Perfect! However, how to do that so that it actually works seems a bit complex. The first (practically unsolvable) issue is what actually constitutes better code, as given two pieces one may be faster in some cirtumstances while the other in different. But as I understand that's not really what we're trying to tell the user, rather we want him to explore for himself what's possible and what are the results and why they are the way they are? I'm guessing this will unfortunately (or fortunately) require him to actually see and undestand the intermediate code, see how it changes after different optimizations, and see the output assembly. Personally I really need/want that;) Though my end target is a bit more to broaden the abstraction when programming (both up and down), so not to just show what's happening with the code but also allow the programmer to interact with it on that lower level. LLVM seems like the perfect fit for that but I've got some gripes with it, and that is still far away in the future. The point on the possible-optimizations-which-could-be-enabled-if-specific- -constraint-is-lifted is particularly interesting, but is also extremely risky if the compiler makes a stupid remark on a constraint which can obviously (for the programmer) not be lifted. If ever, I would introduce it with a LOT of care. Yes and no. First of all I don't necessarily mean for the compiler/editor to suggest anything to the programmer, rather if the programmer asks just say what's physically possible, and not what's right, since if the compiler could do that it would just perform the optimization. Furthermore the situation with my source code is that I can probably make all this in such a form that it is actually usable and useful which seems to me close to impossible with normal languages. I can also with almost no effort store within the source code the dialogue between
Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
Bump! Let me renew my interest in contributing through GSoC with post-compilation feedback (This was not an early april joke). Do you think it could lead to an acceptable GSoC proposal? (mentor interested?) @Tomasz: On the interaction side I totally agree that communication between compiler and programmer is scarce (and there is room for improvement). Focusing too soon on the editor would overlook the vast users needs though, as: _ some users do not use an IDE (and will kindly refuse); _ some users do not need more communication, as they already know what GCC can and cannot do; _ some users do not want more communication, as they have other business to focus on; I think the editor being split from the compiler is good thing. There still exist tools to expose static analysis data from the compiler (and choose the editor to visualize it with), but fundamentally they are assisting him/her rather than helping him/her improve. Instead of gathering loads of data on the optimizations/analysis performed, and filtering it for visualization by the user, we could relate the optimization technique used so that the user truly knows what GCC is capable of (instead of guessing by observation). My proposal is thus not to be confused with a static analysis visualization: the programmer learns what techniques are implemented in GCC (or in compilers in general), how to write code that is more easily compiled, and can further browse the Internet for detailed theory on the techniques involved. The point on the possible-optimizations-which-could-be-enabled-if-specific- -constraint-is-lifted is particularly interesting, but is also extremely risky if the compiler makes a stupid remark on a constraint which can obviously (for the programmer) not be lifted. If ever, I would introduce it with a LOT of care. Thibault ps: As for an editor with real-time feedback on static analysis and more, I am 100% with you :) (and there are some promising prototypes, like in this talk: http://vimeo.com/36579366) Hello all, My name is Thibault Raffaillac, CS degree student at Kungliga Tekniska Högskolan, Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille, France). GCC currently provides no concise way to inform the user whether it applied an expected optimization (ie, it understood the code). As a result, some will do premature optimizations when they do not trust the compiler, and some others will create overly convoluted code with blind belief in the compiler. This is especially relevant for users non-initiated to the internals of GCC. The project I would like to propose is a feedback for the optimizations performed by GCC. To avoid binding users to the compiler, I would focus on some very standard optimizations across vendors, or for some specific yet nice features I would indicate their specificity to GCC/an architecture. The feedback would be triggered when compilation is successful, and display a couple of different messages each time it is run: gcc --feedback test.c test.c:xx:x: info: All operands being constant, constant folding was applied to assign '2560' to 'a' test.c:xx:x: info: GCC could not fold constants here because... test.c:xx:x: info: As integers are stored in binary format, strength reduction was applied to replace '* 8' by ' 3' test.c:xx:x: info: Basic block vectorization was applied to pack the 3 independent additions into a single SIMD instruction test.c:xx:x: info: GCC implements unordered_map as open-addressed hash tables, with double hashing probing As a difference with the internal verbose messages, here they would form a set, and the system would remember those already displayed and decrease their frequency of occurence between compilations. All messages would explain what triggered them, cite the optimization name, and describe the consequence. As for the work plan, it would consist in: _ Enumerating all possible messages in the messages set. _ Implementing a function receiving feedback from each optimization unit and choosing whether to display it: info_printf(enum INFO_INDEX, const char*, ...); _ Write a formatting guide for adding messages in the set. My academic background includes compiler construction, C programming and Human- Computer Interactions. I am very much interested in the usability of compilers (on which I am currently carrying my degree thesis - http://www.csc.kth.se/~traf/traf-sketch.pdf) and thus would be glad to contribute to GCC. If this can be of interest, suggestions are welcome! Best regards, Thibault (http://www.csc.kth.se/~traf/)
Re: GSoC proposal: Provide optimizations feedback through post-compilation messages
On Tue, 27 Mar 2012 22:33:39 + Thibault Raffaillac t...@kth.se wrote: Hello all, My name is Thibault Raffaillac, CS degree student at Kungliga Tekniska Högskolan, Stockholm, Sweden (in double-degree partnership with Ecole Centrale Marseille, France). GCC currently provides no concise way to inform the user whether it applied an expected optimization (ie, it understood the code). As a result, some will do premature optimizations when they do not trust the compiler, and some others will create overly convoluted code with blind belief in the compiler. This is especially relevant for users non-initiated to the internals of GCC. The project I would like to propose is a feedback for the optimizations performed by GCC. To avoid binding users to the compiler, I would focus on some very standard optimizations across vendors, or for some specific yet nice features I would indicate their specificity to GCC/an architecture. The feedback would be triggered when compilation is successful, and display a couple of different messages each time it is run: gcc --feedback test.c test.c:xx:x: info: All operands being constant, constant folding was applied to assign '2560' to 'a' test.c:xx:x: info: GCC could not fold constants here because... test.c:xx:x: info: As integers are stored in binary format, strength reduction was applied to replace '* 8' by ' 3' test.c:xx:x: info: Basic block vectorization was applied to pack the 3 independent additions into a single SIMD instruction test.c:xx:x: info: GCC implements unordered_map as open-addressed hash tables, with double hashing probing As a difference with the internal verbose messages, here they would form a set, and the system would remember those already displayed and decrease their frequency of occurence between compilations. All messages would explain what triggered them, cite the optimization name, and describe the consequence. As for the work plan, it would consist in: _ Enumerating all possible messages in the messages set. _ Implementing a function receiving feedback from each optimization unit and choosing whether to display it: info_printf(enum INFO_INDEX, const char*, ...); _ Write a formatting guide for adding messages in the set. My academic background includes compiler construction, C programming and Human- Computer Interactions. I am very much interested in the usability of compilers (on which I am currently carrying my degree thesis - http://www.csc.kth.se/~traf/traf-sketch.pdf) and thus would be glad to contribute to GCC. If this can be of interest, suggestions are welcome! Best regards, Thibault (http://www.csc.kth.se/~traf/) Hi Thibault, I completely agree, and it's actually a part of what I'm targeting in the long term, so I think we might be able to join forces. I'm also thinking of a gsoc project though in different areas (there's an email in the list about them on 19.03), so maybe we could do separate parts that combine into something even more awesome;) I think a huge part of the issue is in the medium of communication between the programmer and compiler. I'm targeting an environment where the source code editor practically becomes the compiler's front-end. My project allows extremely dynamic presentation of the source code, so I can e.g. - easily inform the programmer about anything in an unobtrusive manner within the code, - give him different perspectives of the same code, - allow him to give precise and detailed information to the compiler about possible code optimizations without making the code unreadable. The first two points may seem already solved by eclipse, xcode or whatever other gigantic ide, but I'm talking about a much larger scale of feedback presented instantly like: ex/implicit and inferred typing info, constant folds, dead code, unfolded loops, data flow, vector operations, tree view of expressions. The first issue is that for any non trivial amount of code you'll end up with thousands of messages 90% of which are probably not very interesting (similarly to warnings in a certain style of objective programming in C). As long as the output is not interleaved with the code at the right place and the delay from writing to getting feedback is too long, the feature will loose much of its usefullness. Though don't misunderstand me, I think it's still better to have the info in any form than not. The last point is probably the more important, as there often is a large amount of optimizations that cannot be done due to for example pointer aliasing rules, but the programmer knows that the optimization is safe. I can easily add literally hundreds of markers like this expression is volatile, the result of this function call will not change within this loop, these two pointers don't alias and it wouldn't obfuscate the code as much as with normal languages. Furthermore my editor can easily list only the meaningful options for a given