[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

David Cuthbert via Python-Dev Wed, 04 Dec 2019 01:09:29 -0800

Oddly, I did not get Mark's original e-mail, but am seeing replies here. 
Piggybacking off of James' email here...

    On 03/12/2019 16:15, Mark Shannon wrote:
    > Hi Everyone,
    > 
    > I am proposing a new PEP, still in draft form, to impose a limit of one 
    > million on various aspects of Python programs, such as the lines of code 
    > per module.

My main concern about this PEP is it doesn't specify the behavior when a given 
limit is exceeded. Whether you choose 10 lines or 10 billion lines as the rule, 
someone annoying (like me) is going to want to know what's going to happen if I 
break the rule.

Non-exhaustively, you could:
1. Say the behavior is implementation defined
2. Physically prohibit the limit from being exceeded (limited by 
construction/physics)
3. Generate a warning
4. Raise an exception early (during parse/analysis/bytecode generation)
5. Raise an exception during runtime

The first two will keep people who hate limits happy, but essentially give the 
limit no teeth. The last three are meaningful but will upset people when a 
previously valid program breaks.

1. The C and C++ standards are littered with limits (many of which you have to 
violate to create a real-world program) that ultimately specify that the 
resulting behavior is "implementation defined." Most general-purpose compilers 
have reasonable implementations (e.g. I can actually end my file without a 
newline and not have it call abort() or execve("/usr/bin/nethack"), behaviors 
both allowed by the C99 standard). You could go this route, but the end result 
isn't much better than not having done the PEP in the first place (beyond 
having an Ivory Tower to sit upon and taunt the unwashed masses, "I told you 
so," when you do decide to break their code).

Don't go this route unless absolutely necessary. Of course, the C/C++ standard 
isn't for an implementation; this PEP has the luxury of addressing a single 
implementation (CPython).

2. Many of Java's limits are by construction. You can't exceed 2**16 bytecode 
instructions for a method because they only allocated a uint16_t (u2 in the 
classfile spec) for the program counter in various places. (Bizarrely, the size 
of the method itself is stored as a uint32_t/u4.) I believe these limits are 
less useful because you'll never hit them in a running program; you simply 
can't create an invalid program. This would be like saying the size of Python 
bytecode is limited to the number of particles in the universe (~10**80). You 
don't have to specify the consequences because physics won't let you violate 
them.

This is more useful for documenting format limits, but probably doesn't achieve 
what you're trying to achieve.

3. Realistically, this is probably what you'd have to do in the first version 
for PEP adoption to get non-readers of python-dev@ ready, but, again, it 
doesn't achieve what you're setting out to do. We'd still accept programs that 
exceed these limits, and whatever optimizations that depend on these limits 
being in place wouldn't work.

Which brings us to the real meat, 4&5.

Some limits don't really distinguish between these cases. Exceeding the total 
bytecode size for a module, for example, would have to fail at bytecode 
generation time (ignoring truly irrational behavior like silently truncating 
the bytecode). But others aren't so cut-and-dry. For example, a module that is 
compliant except for a single function that contains too many local variables. 
Whether you do 4 or 5 isn't so obvious:

Pros of choosing 4 (exception at load):
* I'm alerted of errors early, before I start a 90-hour compute job, only to 
have it crash in the write_output() function.
* Don't have to keep a poisoned function that your optimizers have to special 
case.

Pros of choosing 5 (exception at runtime):
* If I never call that function (maybe it's something in a library I don't 
use), I don't get penalized.
* In line with other Python (mis-)behaviors, e.g. raising NameError() at 
runtime if you typo a variable name.

On Tue 12/03/19, 10:05 AM, "Rhodri James" <[email protected]> wrote:    
    On 03/12/2019 16:15, Mark Shannon wrote:
    > Isn't this "640K ought to be enough for anybody" again?
    > -------------------------------------------------------
    > 
    > The infamous 640K memory limit was a limit on machine usable resources.
    > The proposed one million limit is a limit on human generated code.
    > 
    > While it is possible that generated code could exceed the limit,
    > it is easy for a code generator to modify its output to conform.
    > The author has hit the 64K limit in the JVM on at least two occasions 
    > when generating Java code.
    > The workarounds were relatively straightforward and
    > probably wouldn't have been necessary with a limit of one million 
    > bytecodes or lines of code.

    I can absolutely guarantee that this will come back and bite you. 
    Someone out there will be doing something more complicated than you 
    think is plausible, and eventually someone will hit your limits.  It may 
    not take as long as you think, either.

I'm in between Rhodri and Mark here.

I've also been bitten by the 64k JVM bytecode limit when generating code, but I 
did *not* find it so easy to work around. What was a dumb translator suddenly 
had to get a lot more smarts.

Having predictable behavior *is* important, though, and having limits with 
specified behavior when those limits are exceeded helps. Keep in mind that I'm 
going to be annoyed when I hit those limits, so having an engineering 
justification for why the limit was set to a certain value will go a long way 
into buying you credibility. One million does not feel credible -- that's 
"we're setting a limit because we couldn't be bothered to figure out what the 
limit should be." OTOH, 16,777,215 (2**24-1) does feel credible -- that's "no 
processor is capable of holding this many TLB entries in the level 2 cache with 
retpolines active without introducing extreme swapping on write-limited SSDs, 
but you can get around it if you're willing to adjust this constant and 
recompile." Or whatever. (Ok, don't BS us like I just did, but you get the 
idea. :-) )

Dave

_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/SF3RM6B7FF63F7OTTDEY2GH4C5RG6DCX/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: PEP proposal to limit various aspects of a Python program to one million.

Reply via email to