John Nagle wrote: > > Modifying "at a distance" is exactly what I'm getting at. That's the > killer from an optimizing compiler standpoint. The compiler, or a > maintenance programmer, looks at a block of code, and there doesn't seem > to be anything unusual going on. But, if in some other section of > code, something does a "setattr" to mess with the first block of code, > something unusual can be happening. This is tough on both optimizing > compilers and maintenance programmers.
Agreed. It's tempting to look at some code and say which types one thinks are being manipulated, but the actual program behaviour can be quite different. Adding explicit type declarations "anchors" the names to a very restrictive set of types - typically those permitted through interfaces or inheritance in many object-oriented languages - and the challenge, as we've already discussed, is to attempt to "anchor" the names when they are in a sense "floating free" without any explicit annotations from the programmer. Python also presents other challenges. Unlike systems programming languages, almost nothing is a local operation: each Python function is mostly a product of function calls and the occasional conditional or looping construct, themselves wired up using yet more function calls. Whilst the expection is that most of these will return sensible results (eg. a call to the iter method on a sequence returns an iterator), there isn't any guarantee that this will be the case, and even then the precise iterator involved can vary. Some might claim that the class of operations which could be done locally are precisely the ones which would benefit from identification and optimisation, namely those involving primitive types, yet some pretty good solutions for such cases exist already: Pyrex, various numeric packages, and so on. > Python has that capability mostly because it's free in an > "everything is a dictionary" implementation. ("When all you have > is a hash, everything looks like a dictionary".) But that limits > implementation performance. Most of the time, nobody is using > "setattr" to mess with the internals of a function, class, or > module from far, far away. But the cost for that flexibility is > being paid, unnecessarily. I've heard claims that virtual machines are evolving to make dictionary-based access more performant, and I guess Microsoft's DLR and any JVM improvements might point in that direction, but Shed Skin shows the way by avoiding such things entirely. > I'm suggesting that the potential for "action at a distance" somehow > has to be made more visible. > > One option might be a class "simpleobject", from which other classes > can inherit. ("object" would become a subclass of "simpleobject"). > "simpleobject" classes would have the following restrictions: > > - New fields and functions cannot be introduced from outside > the class. Every field and function name must explicitly appear > at least once in the class definition. Subclassing is still > allowed. I suppose one could mandate that only methods defined inside class blocks belong to the class and subclasses, and that various kinds of optimisations can then be applied to the code in those methods such that the self parameter's type is constrained in a way probably already imposed by CPython. This would forbid the adding of functions to classes and instances after the fact, but in my fairly conservative world of programming, I hardly ever do such things. I do add attributes to instances outside those instances (by the above definition) quite often, though. > - Unless the class itself uses "getattr" or "setattr" on itself, > no external code can do so. This lets the compiler eliminate the > object's dictionary unless the class itself needs it. I think this is a natural progression from determining the access needs of particular classes and their objects. > This lets the compiler see all the field names and assign them fixed slots > in a fixed sized object representation. Basically, this means simple objects > have a C/C++ like internal representation, with the performance that comes > with that representation. Yes, I agree that this would be desirable. > With this, plus the "Shed Skin" restrictions, plus the array features of > "numarray", it should be possible to get computationally intensive code > written in Python up to C/C++ levels of performance. Yet all the dynamic > machinery of Python remains available if needed. > > All that's necessary is not to surprise the compiler. One thing I had cause to think about again today was the cost of function invocations. Python not only has pervasive dynamic dispatch, but there are also such things as *args and **kwargs and the cost of dealing with them. However, such things are very useful in integrating components and extending class hierarchies, so I wouldn't want to see them vanish in their entirety. Paul -- http://mail.python.org/mailman/listinfo/python-list