Hello,

On Sat, 20 Mar 2021 10:54:10 -0500
Skip Montanaro <skip.montan...@gmail.com> wrote:

> Back in the late 90s (!) I worked on a reimagining of the Python
> virtual machine as a register-based VM based on 1.5.2. I got part of
> the way with that, but never completed it. In the early 2010s, Victor
> Stinner got much further using 3.4 as a base. The idea (and dormant
> code) has been laying around in my mind (and computers) these past
> couple decades, so I took another swing at it starting in late 2019
> after retirement, mostly as a way to keep my head in the game. While I
> got a fair bit of the way, it stalled. I've picked it up and put it
> down a number of times in the past year, often needing to resolve
> conflicts because of churn in the current Python virtual machine.

I guess it should be a good idea to answer what's the scope of this
project - is it research one or "production" one? If it's research one,
why be concerned with the churn of over-modern CPython versions?
Wouldn't it be better to just use some scalable, incremental
implementation which would allow to forward-port it to a newer version,
if it ever comes to that?

Otherwise, if it's "production", who's the "customer" and how they
"compensate" you for doing work (chasing the moving target) which is
clearly of little interest to you and conflicts with the goal of the
project?

[]

> I started on what could only very generously be called a PEP which you
> can read here. It includes some of the history of this work as well as
> details about what I've managed to do so far:
> 
> https://github.com/smontanaro/cpython/blob/register2/pep-9999.rst
> 
> If you think any of this is remotely interesting (whether or not you
> think you'd like to help), please have a look at the "PEP".

Some comments on it:

1. I find it to be rather weak on motivational part. It's starts with a
phrase like:

> This PEP proposes the addition of register-based instructions to the
> existing Python virtual machine, with the intent that they eventually
> replace the existing stack-based opcodes.

Sorry, what? The purpose of register-based instructions is to just
replace stack-based instructions? That's not what's I'd like to hear as
the intro phrase. You probably want to replace one with the other
because register-based ones offer some benefit, faster execution
perhaps? That's what I'd like to hear instead of "deciphering" that
between the lines.

> They [2 instruction sets] are almost completely distinct.

That doesn't correspond to the mental image I would have. In my list,
the 2 sets would be exactly the same, except that stack-based encode
argument locations implicitly, while register-based - explicitly. Would
be interesting to read (in the following "pep" sections) what makes them
"almost completely distinct".

> Within a single function only one set of opcodes or the other will
> be used at any one time.

That would be the opposite of "scalable, incremental" development
approach mentioned above. Why not allow 2 sets to freely co-exist, and
migrate codegeneration/implement code translation gradually?

> ## Motivation

I'm not sure the content of the section corresponds much to its title.
It jumps from background survey of the different Python VM optimizations
to (some) implementation details of register VM - leaving "motivation"
somewhere "between the lines".

> Despite all that effort, opcodes which do nothing more than move data
> onto or off of the stack (LOAD_FAST, LOAD_GLOBAL, etc) still account
> for nearly half of all opcodes executed.

... And - you intend to change that with a register VM? In which way and
how? As an example, LOAD_GLOBAL isn't going anywhere - it loads a
variable by *symbolic* name into a register.

> Running Pyperformance using a development version of Python 3.9
> showed that the five most frequently executed pure stack opcodes
> (LOAD_FAST, STORE_FAST, POP_TOP, DUP_TOP and ROT_TWO) accounted for
> 35% of all executed instructions.

And you intend to change that with a register VM? How?

Quick google search leads to
https://www.strchr.com/x86_machine_code_statistics (yeah, that's not
VM, it's RM (real machine), stats over different VMs would be
definitely welcome):

> The most popular instruction is MOV (35% of all instructions). 

So, is the plan to replace 35% of "five most frequently executed pure
stack opcodes" with 35% of register-register move instructions? If not,
why it would be different and how would you achieve that?

> They are low-cost instructions (compared with CALL_FUNCTION for
> example), but still eat up time and space in the virtual machine

But that's the problem of any VM - it's slow by definition. There can
be less slow and more slow VMs, but VMs can't be fast. So, what's the
top-level motivation - is it "making CPython fast" or "making CPython a
little bit less slow"? By how much?

> Consider the layout of the data section of a Frame object:
> All those LOAD_FAST and STORE_FAST instructions just copy pointers
> between chunks of RAM which are just a few bytes away from each other
> in memory.

Ok, but LOAD_DEREF and STORE_DEREF instructions also just copy pointers
(with extra dereferencing, but that's a detail). It's unclear why you
try to ignore them ("cell" registers), putting ahead "locals" and
"stack" registers. The actual register instructions implementation would
just treat any frame slot as a register with continuous numbering,
allowing to access all of locals, cells, and stack locs in the same
way. In that regard, trying to rearrange 3 groups at this stage seems
like rather unneeded implementation complexity with no clear motivation.

> Instead, registers should be cleared upon last reference.

Worth discussing how to handle that. Apparently, only a way with
explicit DECREF instruction would scale, but that shows that
a register-based VM not only decreases # of generated instructions, but
also increases it it other areas. The overall tally may be not what's
expected. 

> Implemented ... some CALL_FUNCTION instructions

One of the significant omissions in the "pep" is the lack of discussion
of "register based" calling calling convention.

> most container-related BUILD instructions

Other "arbitrary number of arguments" instructions beyond
CALL_FUNCTION* are next hard case which is worth discussing.

> OTOH, maybe RVM opcode names should look more like traditional
> assembler instructions. (The author is getting on in years and finds
> something which looks more like assembler attractive, given his
> initial experience programming computers in the dark ages.) Instead
> of BINARY_ADD_REG, you might call it BAR. 

IMHO that's as retrograde as it can get. I'd suggest to re-evaluate
reasons why "traditional assemblers" were made as they were. The
reasons might be: a) desire of one vendor to not fall down to
"intellectual property" claims of another vendor; b) minor to the
previous, but the desire for vendor-lock users. That are the reasons
why vendors went out of their way to obfuscate their instruction names
and make as much variability as possible for simple things like "move"
or "add".

Most modern platform-independent assemblers (IRs, though really ILs)
follow the syntax of (mostly) normal programming languages (of course
just with the flat, instead of structured, syntax). E.g. LLVM IR or
Mypyc IR:
https://github.com/python/mypy/blob/master/mypyc/test-data/irbuild-basic.test#L117

Back to the actual topic, I'd guess just suffixing existing instruction
names with "_R" should be enough, e.g. BINARY_ADD -> BINARY_ADD_R. And
of course, having LHS first. E.g. "BINARY_ADD_R a, b, c" means "a = b +
c".

> Because
> this covers a fair bit of the CPython implementation, chances to
> contribute in a number of areas exist, even if you have never delved
> into Python's internals. Questions/comments/pull requests welcome.
> 
> Skip Montanaro

[]

-- 
Best regards,
 Paul                          mailto:pmis...@gmail.com
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/VXJ6MEX5EXHXUNB45ODP4VT2KUDAYNTE/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to