On 2009-11-13, at 18:02, Robert Brown wrote:
> Common Lisp and Scheme were designed by people who wanted to write complicated
> systems on machines with a tiny fraction of the horsepower of current
> workstations. They were carefully designed to be compiled efficiently, which
> is not the case with Python. There really is a difference here. Python the
> language has features that make fast implementations extremely difficult.
Not true. Common Lisp was designed primarily by throwing together all of the
features in every Lisp implementation the design committee was interested in.
Although the committee members were familiar with high-performance compilation,
the primary impetus was to achieve a standardized language that would be
acceptable
to the Lisp community. At the time that Common Lisp was started, there was still
some sentiment that Lisp machines were the way to go for performance.
As for Scheme, it was designed primarily to satisfy an aesthetic of minimalism.
Even
though Guy Steele's thesis project, Rabbit, was a Scheme compiler, the point
here was
that relatively simple compilation techniques could produce moderately
reasonable
object programs. Chez Scheme was indeed first run on machines that we would
nowadays
consider tiny, but so too was C++. Oh, wait, so was Python!
I would agree that features such as exec and eval hurt the speed of Python
programs,
but the same things do the same thing in CL and in Scheme. There is a mystique
about
method dispatch, but again, the Smalltalk literature has dealt with this issue
in the
past.
Using Python 3 annotations, one can imagine a Python compiler that does the
appropriate
thing (shown in the comments) with the following code.
import my_module # static linking
__private_functions__ = ['my_fn'] # my_fn doesn't appear in the module
dictionary.
def my_fn(x: python.int32): # Keeps x in a register
def inner(z): # Lambda-lifts the function, no nonlocal
vars
return z // 2 # does not construct a closure
y = x + 17 # Via flow analysis, concludes that y can
be registerized;
return inner(2 * y) # Uses inline integer arithmetic
instructions.
def blarf(a: python.int32):
return my_fn(a // 2) # Because my_fn isn't exported, it can be
inlined.
A new pragma statement (which I am EXPLICITLY not proposing; I respect and
support
the moratorium) might be useful in telling the implementation that you don't
mind
integer overflow.
Similarly, new library classes might be created to hold arrays of int32s or
doubles.
Obviously, no Python system does any of these things today. But there really is
nothing stopping a Python system from doing any of these things, and the
technology
is well-understood in implementations of other languages.
I am not claiming that this is _better_ than JIT. I like JIT and other runtime
things
such as method caches better than these because you don't have to know very
much about
the implementation in order to take advantage of them. But there may be some
benefit
in allowing programmers concerned with speed to relax some of Python's dynamism
without ruining it for the people who need a truly dynamic language.
If I want to think about scalability seriously, I'm more concerned about
problems that
Python shares with almost every modern language: if you have lots of processors
accessing
a large shared memory, there is a real GC efficiency problem as the number of
processors
goes up. On the other hand, if you have a lot of processors with some degree of
private
memory sharing a common bus (think the Cell processor), how do we build an
efficient
implementation of ANY language for that kind of environment?
Somehow, the issues of Python seem very orthogonal to performance scalability.
-- v
--
http://mail.python.org/mailman/listinfo/python-list