On Sun, Mar 6, 2011 at 9:03 PM, Marvin Humphrey <[email protected]> wrote: > On Fri, Mar 04, 2011 at 11:27:59PM -0800, Nathan Kurz wrote: >> On Wed, Mar 2, 2011 at 10:30 PM, Marvin Humphrey <[email protected]> >> wrote: >> > Arguably, we don't even need the "final" keyword. We'd should benchmark to >> > confirm my recollection about the performance implications, but I'll bet we >> > could remove it with no immediate impact on Lucy. >> >> I'd suggest this as the cleanest solution. Intuitively, I'd think >> that the benefit of 'final' would be very small, such that if one >> really cares about performance one should inline the function call >> completely and not worry about saving a single dereference. > > Sounds good -- I'll work up a patch. We'll leave the "inline" keyword in > Clownfish, but drop the "final" keyword.
Great. I think you could get away with dropping 'inline' as well. My point was not that Clownfish needs to inline things, but that if you really want to squeeze out the last drop of performance by avoiding function calls, you'll have to take control of the compilation yourself, likely by rewriting the entire core class in some unreadable and unmodifiable fashion. It would be interesting, though, to someday benchmark the potential advantage here. Once you're generating code, it wouldn't be hard to test generate a monolithic 'final' library as well, with inter-class inlining. But if one was to take this route for anything production, I think it would make more sense as an overall compile time option rather than a method-by-method keyword: -O lock-it-all-down. > Looking forward, we'll need to think about how to design our classes and > interfaces so that time-critical functionality can been inlined whenever > possible. While there is some small gain to be had here, I don't think it should be a priority. I can be as cycle-count-conscious as anyone, but once you're memory bound there isn't that much advantage is optimizing cycles much further. I think we can get far by keeping the base architecture fast (as it currently is) and concentrating on data layout. To the extent that one does worry about cycles, it's not the function calls that need to be avoided, rather the mispredicted branches. So long as you take the same convoluted path every time, modern processors are monstrously efficient. Let's make the Northbridge scream for mercy. Nathan Kurz [email protected]
