On Tue, Aug 20, 2013 at 02:48:27AM +0200, Dicebot wrote: > On Tuesday, 20 August 2013 at 00:34:38 UTC, Ramon wrote: > >Well, I'm afraid that's what templates are. One (or the compiler) > >fills them in and that's it. > > > >In other words: Templates are compile time while (real) generics > >are run time. This basically comes down to have some way of > >designating classes as, for instance, comparable and then either > >running along the object chain comparing all built in objects > >(with built in compare functionality) or having a compare > >implemented (Of course, there is also arithmetic functions, etc.).
In that case, the "generics" you're talking about amounts basically to OO polymorphism. You have the same machine code that can process diverse types by using indirection to abstract away the implementation details. This is no doubt useful, as OO itself proves, but it does come with a cost: using indirection incurs a (small, but nevertheless non-zero) performance hit. Inside inner loops, this can be a performance killer. At the machine code level, it's actually not possible to use the same code for, e.g., comparing two ints vs. comparing two floats. You need different machine instructions for them, so there is no single piece of code that can be reused for both types. You have to somehow switch between them at runtime depending on what types are being passed in. (It gets even hairier if you're comparing, say, ints and floats, in which case additional instructions must be used for promoting one type to another so that they are comparable.) So, say you call your function "less". In order to actually run, you need one version for comparing ints, another for comparing floats, etc.. This is what templates do. Alternatively, you use indirection: int and float can be associated with some static data structure that describes each type, say it has a function pointer that, for ints, point to the function that compares int, and for floats, point to the function that compares floats. Then the caller doesn't actually directly call the low-level int/float-specific functions, but they always look up the function pointer. This is, in essence, how OO polymorphism works: each class has a vtable with function pointers to the specific implementation of each overloaded method. Then at runtime, you call whatever function the vtable points to, thus achieving runtime genericity. The problem with indirection is that it's expensive: given two objects to be compared, you need to dereference the pointer to those objects, then dereference the pointer to their respective vtables, then dereference the function pointer in the vtables in order to call the function. Templates, OTOH, are compile-time bound: the compiler ascertains at compile-time that your particular piece of code is comparing two ints, so it bypasses all of the expensive runtime dereferencing, and directly calls the function for comparing ints. The resulting code is inflexible, in the sense that you can't change, at runtime, the arguments to floats -- it would fail horribly -- but it avoids 3 pointer dereferences. When inside an inner loop, this can mean the difference between smooth animation and unusably jerky animation. The cost, of course, is that if you need the same piece of code for comparing both ints and floats, then the compiler has to generate two copies of the code, one to handle ints, and the other to handle floats. The saving grace, as Dicebot points out, is that if these copies of code are small enough, they will be inlined, so you can even save on the cost of a function call. This somewhat reduces the resulting code size -- you save on function call instructions and stack push/pops, etc., but you're still paying for the duplicated code. This is the current state of the art. Now, my dream is that one day, perhaps compilers will get smart enough that you don't even need to worry about the distinction between templates and runtime polymorphism anymore -- you specify what you want to get done, and the compiler determines, based on characteristics of the target machine and how the program as a whole will use that particular piece of code, whether to use indirection or template expansion. Or maybe even a mix of both, depending on the situation. [...] > Reminds me: how hard is writing own linker is again? :) Honestly, I think it's about time linker technology is rethought and developed further. Possible developements are automatic elision of unused code sections (already implemented in some linkers), automatic merging of identical sections (not sure if implemented yet -- may require language support), link-time inlining, reordering of symbols to increase code locality during execution (optimize for CPU caches), etc.. Or more ambitiously, better integration with compiler so that the linker has access to compile-time structures to help it make decisions about optimization. Present-day object file formats are too far along the process to their executable form to permit many optimizations that could in theory be performed by the linker, requiring instead hacks like weak symbols, unused section GC, etc.. On the more mundane side, we need better algorithms for improving linker performance. Current symbol resolution algorithms don't scale very well when your object files are large or have large numbers of symbols. Surely there are ways of improving the asymptotic complexity of these things! T -- It is impossible to make anything foolproof because fools are so ingenious. -- Sammy