Hello, On Sat, 20 Mar 2021 10:54:10 -0500 Skip Montanaro <skip.montan...@gmail.com> wrote:
> Back in the late 90s (!) I worked on a reimagining of the Python > virtual machine as a register-based VM based on 1.5.2. I got part of > the way with that, but never completed it. In the early 2010s, Victor > Stinner got much further using 3.4 as a base. The idea (and dormant > code) has been laying around in my mind (and computers) these past > couple decades, so I took another swing at it starting in late 2019 > after retirement, mostly as a way to keep my head in the game. While I > got a fair bit of the way, it stalled. I've picked it up and put it > down a number of times in the past year, often needing to resolve > conflicts because of churn in the current Python virtual machine. I guess it should be a good idea to answer what's the scope of this project - is it research one or "production" one? If it's research one, why be concerned with the churn of over-modern CPython versions? Wouldn't it be better to just use some scalable, incremental implementation which would allow to forward-port it to a newer version, if it ever comes to that? Otherwise, if it's "production", who's the "customer" and how they "compensate" you for doing work (chasing the moving target) which is clearly of little interest to you and conflicts with the goal of the project? [] > I started on what could only very generously be called a PEP which you > can read here. It includes some of the history of this work as well as > details about what I've managed to do so far: > > https://github.com/smontanaro/cpython/blob/register2/pep-9999.rst > > If you think any of this is remotely interesting (whether or not you > think you'd like to help), please have a look at the "PEP". Some comments on it: 1. I find it to be rather weak on motivational part. It's starts with a phrase like: > This PEP proposes the addition of register-based instructions to the > existing Python virtual machine, with the intent that they eventually > replace the existing stack-based opcodes. Sorry, what? The purpose of register-based instructions is to just replace stack-based instructions? That's not what's I'd like to hear as the intro phrase. You probably want to replace one with the other because register-based ones offer some benefit, faster execution perhaps? That's what I'd like to hear instead of "deciphering" that between the lines. > They [2 instruction sets] are almost completely distinct. That doesn't correspond to the mental image I would have. In my list, the 2 sets would be exactly the same, except that stack-based encode argument locations implicitly, while register-based - explicitly. Would be interesting to read (in the following "pep" sections) what makes them "almost completely distinct". > Within a single function only one set of opcodes or the other will > be used at any one time. That would be the opposite of "scalable, incremental" development approach mentioned above. Why not allow 2 sets to freely co-exist, and migrate codegeneration/implement code translation gradually? > ## Motivation I'm not sure the content of the section corresponds much to its title. It jumps from background survey of the different Python VM optimizations to (some) implementation details of register VM - leaving "motivation" somewhere "between the lines". > Despite all that effort, opcodes which do nothing more than move data > onto or off of the stack (LOAD_FAST, LOAD_GLOBAL, etc) still account > for nearly half of all opcodes executed. ... And - you intend to change that with a register VM? In which way and how? As an example, LOAD_GLOBAL isn't going anywhere - it loads a variable by *symbolic* name into a register. > Running Pyperformance using a development version of Python 3.9 > showed that the five most frequently executed pure stack opcodes > (LOAD_FAST, STORE_FAST, POP_TOP, DUP_TOP and ROT_TWO) accounted for > 35% of all executed instructions. And you intend to change that with a register VM? How? Quick google search leads to https://www.strchr.com/x86_machine_code_statistics (yeah, that's not VM, it's RM (real machine), stats over different VMs would be definitely welcome): > The most popular instruction is MOV (35% of all instructions). So, is the plan to replace 35% of "five most frequently executed pure stack opcodes" with 35% of register-register move instructions? If not, why it would be different and how would you achieve that? > They are low-cost instructions (compared with CALL_FUNCTION for > example), but still eat up time and space in the virtual machine But that's the problem of any VM - it's slow by definition. There can be less slow and more slow VMs, but VMs can't be fast. So, what's the top-level motivation - is it "making CPython fast" or "making CPython a little bit less slow"? By how much? > Consider the layout of the data section of a Frame object: > All those LOAD_FAST and STORE_FAST instructions just copy pointers > between chunks of RAM which are just a few bytes away from each other > in memory. Ok, but LOAD_DEREF and STORE_DEREF instructions also just copy pointers (with extra dereferencing, but that's a detail). It's unclear why you try to ignore them ("cell" registers), putting ahead "locals" and "stack" registers. The actual register instructions implementation would just treat any frame slot as a register with continuous numbering, allowing to access all of locals, cells, and stack locs in the same way. In that regard, trying to rearrange 3 groups at this stage seems like rather unneeded implementation complexity with no clear motivation. > Instead, registers should be cleared upon last reference. Worth discussing how to handle that. Apparently, only a way with explicit DECREF instruction would scale, but that shows that a register-based VM not only decreases # of generated instructions, but also increases it it other areas. The overall tally may be not what's expected. > Implemented ... some CALL_FUNCTION instructions One of the significant omissions in the "pep" is the lack of discussion of "register based" calling calling convention. > most container-related BUILD instructions Other "arbitrary number of arguments" instructions beyond CALL_FUNCTION* are next hard case which is worth discussing. > OTOH, maybe RVM opcode names should look more like traditional > assembler instructions. (The author is getting on in years and finds > something which looks more like assembler attractive, given his > initial experience programming computers in the dark ages.) Instead > of BINARY_ADD_REG, you might call it BAR. IMHO that's as retrograde as it can get. I'd suggest to re-evaluate reasons why "traditional assemblers" were made as they were. The reasons might be: a) desire of one vendor to not fall down to "intellectual property" claims of another vendor; b) minor to the previous, but the desire for vendor-lock users. That are the reasons why vendors went out of their way to obfuscate their instruction names and make as much variability as possible for simple things like "move" or "add". Most modern platform-independent assemblers (IRs, though really ILs) follow the syntax of (mostly) normal programming languages (of course just with the flat, instead of structured, syntax). E.g. LLVM IR or Mypyc IR: https://github.com/python/mypy/blob/master/mypyc/test-data/irbuild-basic.test#L117 Back to the actual topic, I'd guess just suffixing existing instruction names with "_R" should be enough, e.g. BINARY_ADD -> BINARY_ADD_R. And of course, having LHS first. E.g. "BINARY_ADD_R a, b, c" means "a = b + c". > Because > this covers a fair bit of the CPython implementation, chances to > contribute in a number of areas exist, even if you have never delved > into Python's internals. Questions/comments/pull requests welcome. > > Skip Montanaro [] -- Best regards, Paul mailto:pmis...@gmail.com _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VXJ6MEX5EXHXUNB45ODP4VT2KUDAYNTE/ Code of Conduct: http://python.org/psf/codeofconduct/