"Zach the Mystic" <reachbutminusthisz...@googlymail.com> wrote in message news:oxcqgprnwnsuzngfi...@forum.dlang.org... > > I would like to play devil's advocate myself, at least on 0 -> Loc(0). > > I found that in the source, the vast, vast majority of Loc instances were > named, of course, 'loc'. Of the few other ones, only 'endloc' was ever > assigned to 0. The token matcher could substitute: > > 'loc = 0' -> 'loc = Loc(0)' > 'endloc = 0' -> 'endloc = Loc(0)' >
This is fairly rare. > As long as it had a list of the D's AST classes, a pretty conservative > attempt to knock out a huge number of additional cases is: > 'new DmdClassName(0' -> 'new DmdClassName(Loc(0)' > Yes, this mostly works, and is exactly what I did in a previous attempt. > The core principle with the naive approach is to take advantage of > specific per-project conventions such as always giving the Loc first. The > more uniformity with which the project has been implemented, the more > likely this approach will work. > > A lot of those other implicit conversions I do agree seem daunting. The > naive approach would require two features, one, a basic way of tracking a > variable's type. For example, it could have a list of known 'killer' types > which cause problems. When it sees one it records the next identifier it > finds and associates it to that type for the rest of the function. It may > then be slightly better able to known patterns where conversion is > desirable. The second feature would be a brute force way of saying, "You > meet pattern ZZZ: if in function XXX::YYY, replace it with WWW, else > replace with UUU." This is clearly the point of diminishing returns for > the naive approach, at which point I could only hope that a good > abstraction could make up a lot of ground when found necessary. > My experience was that you don't need to explicitly track which function you are in, just keeping track of the file and matching a longer pattern is enough. Here is one of the files of patterns I made: http://dpaste.dzfl.pl/3c9be703 Obviously this could be shorter with a dsl, and towards the end I started using a less verbose SM + DumpOut approach. > The point of diminishing returns for the whole naive approach is reached > when for every abstraction you add, you end up breaking as much code as > you fix. Then you're stuck with the grunt work of adding special case > after special case, and you might as well try something else at that > point... > Yeah... > My current situation is that my coding skills will lag behind my ability > to have ideas, so I don't have anything rearding my approach up and > running for comparison, but I want the conversation to be productive, so > I'll give you the ideas I've had since yesterday. > > I would start by creating a program which converts the source by class, > one class at a time, and one file for each. It has a list of classes to > convert, and a list of data, methods, and overrides for each class - it > will only include what's on the list, so you can add classes and functions > one step at a time. For each method or override, a file to find it in, and > maybe a hint as to about where the function begins in said file. > That is waaaay to much information to gather manually. There are a LOT of classes and functions in dmd. > You may have already thought of these, but just to say them out loud, some > more token replacements I was thinking of: > > 'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {' > -> > 'this(...ABC...) > { > super(...XYZ...);' > > Standard reference semantics: > 'DTreeClass *' -> 'DTreeClass' > > Combined, they look like this: > 'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2) > : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2) > {' > -> > 'this(Loc loc, Expression e1, Expression e2) > { > super(loc, TOKoror, sizeof(OrOrExp), e1, e2);' > Like I said, I went down this path before, and made some progress. It resulted in a huge list of cases. My second attempt was to 'parse' c++, recognising preprocessor constructs as regular ones. The frequent use of #ifdef cutting expressions makes this very, very difficult. So my current approach is to filter out the preprocessor conditionals first, before parsing. #defines and #pragmas survive to parsing. In short, doing this at the token level works, but because you're transforming syntax, not text, it's better to work on a syntax tree.