(Is there a reason we're not having this conversation on m5-dev? I moved it back there for posterity, and in case anyone else wants to chime in. Note that we're discussing better processing of the C++ code snippets in the ISA description language.)
On Tue, Aug 25, 2009 at 1:08 AM, Gabe Black<gbl...@eecs.umich.edu> wrote: > nathan binkert wrote: >>> One other possible answer is to try and build a reasonable approximate >>> C++ parser that's good enough for the snippets that we use. That >>> would be non-trivial, but since (1) we don't have to generate code, so >>> we only have to be more accurate than the current regex system and (2) >>> it's not the end of the world if there are tricky things the parser >>> can't handle and we're forced to rewrite a few code snippets, I don't >>> see it as an impossible task. I think there are multiple >>> public-domain C++ grammars we could start with too. >>> >> >> I have a couple of comments here. First, C++ can't be parsed with >> ply. I'm not sure which parts of the language are the problem, but >> the language is ambiguous and not context free. That said, what >> exactly are the problems with what we have? I can try to see if I can >> improve things (or teach gabe enough about ply so he can do it.) >> >> Nate >> > > I actually try to avoid the problem areas so I can't list them > exhaustively, but basically the way things work follows this basic rule > (I think). If the name of the operand appears in the text of the code > with or without an optional type modifier, it's an operand. If it's in > front of an equal sign, it's a destination, if not, it's a source. Even > though that's pretty simple it works remarkably well. Unfortunately it's > confused by things like pass by reference function arguments, using it > as a temporary without actually meaning to access it's original value > (ie. reading it to compute flag bits), setting it conditionally, and > maybe a few other things. It would be really hard to get those things > right without understanding the syntax of C++, and even then, without > knowing how functions are defined, etc., perfectly parsing the C++ won't > give you all the information you might need. That's what makes making > g++ figure it out attractive since it necessarily figures out all those > things at some point. The hard/impossible part is tricking it into using > that information to set up the operand index arrays in the static inst, > set up the reading and writing code, etc. I think templates kind of, > sort of might do the trick, but I just don't think you can get it to > automatically fill in the members of a class at construction time based > on the code in its member functions. Yes, doing a full parse is impossible for a number of reasons, not just the fact that C++ is context sensitive, but that in the case of the code snippets you don't even have all the context (and I think trying to generate the full context as Gabe is suggesting is probably impractical, as that would require sucking in lots of header files for each snippet and only lengthen compile times even further). That's why I said "approximate". My (half-baked) thought was to build a parser that at least understood the basics of C++ expression syntax and could parse the snippets by making some charitable assumptions about what was a type and what was not (or perhaps we could require the use of typename declarations... I'd hope not too much, but it could be a fallback for resolving ambiguities). Note that we already effectively restrict these snippets to a subset of C++ to avoid confusing the regexes, so I'm sure whatever we do would enable a larger subset than what's currently supported. I think this would solve most of Gabe's issues, since it could tell when the only read of an operand occurs after a write, not get confused by operand mentions in comments, robustly distinguish RHS from LHS of assignments, etc. More importantly, it would solve the biggest problem with the status quo, which is that right now there's no indication that the regex scan is getting confused because you've strayed out of the supported subset and encountered any of Gabe's issues; you have to look at the instruction object definition and notice that the operand list is not what you expected. A key potential capability of a real parser would be for it to robustly determine when it can't figure out what's going on, so at least we could avoid these silent errors. Note that some of Gabe's issues aren't related to the parser and are more fundamental. In particular: - It's not clear what to do about conditional updates. They can't really be handled properly in hardware the face of register renaming, so my inclination is that if the parser could recognize situations where an update only occurs on one branch of an if statement then it should flag the snippet as an error. I'm not sure what Gabe has in mind. There's no support in any of our models for indicating a conditional output anyway. - Pass by reference operands should also just be flagged as errors, since there's no way to know if the operand is read, written, or both. Steve _______________________________________________ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev