Oddly, I did not get Mark's original e-mail, but am seeing replies here.
Piggybacking off of James' email here...
On 03/12/2019 16:15, Mark Shannon wrote:
> Hi Everyone,
>
> I am proposing a new PEP, still in draft form, to impose a limit of one
> million on various aspects of Python programs, such as the lines of code
> per module.
My main concern about this PEP is it doesn't specify the behavior when a given
limit is exceeded. Whether you choose 10 lines or 10 billion lines as the rule,
someone annoying (like me) is going to want to know what's going to happen if I
break the rule.
Non-exhaustively, you could:
1. Say the behavior is implementation defined
2. Physically prohibit the limit from being exceeded (limited by
construction/physics)
3. Generate a warning
4. Raise an exception early (during parse/analysis/bytecode generation)
5. Raise an exception during runtime
The first two will keep people who hate limits happy, but essentially give the
limit no teeth. The last three are meaningful but will upset people when a
previously valid program breaks.
1. The C and C++ standards are littered with limits (many of which you have to
violate to create a real-world program) that ultimately specify that the
resulting behavior is "implementation defined." Most general-purpose compilers
have reasonable implementations (e.g. I can actually end my file without a
newline and not have it call abort() or execve("/usr/bin/nethack"), behaviors
both allowed by the C99 standard). You could go this route, but the end result
isn't much better than not having done the PEP in the first place (beyond
having an Ivory Tower to sit upon and taunt the unwashed masses, "I told you
so," when you do decide to break their code).
Don't go this route unless absolutely necessary. Of course, the C/C++ standard
isn't for an implementation; this PEP has the luxury of addressing a single
implementation (CPython).
2. Many of Java's limits are by construction. You can't exceed 2**16 bytecode
instructions for a method because they only allocated a uint16_t (u2 in the
classfile spec) for the program counter in various places. (Bizarrely, the size
of the method itself is stored as a uint32_t/u4.) I believe these limits are
less useful because you'll never hit them in a running program; you simply
can't create an invalid program. This would be like saying the size of Python
bytecode is limited to the number of particles in the universe (~10**80). You
don't have to specify the consequences because physics won't let you violate
them.
This is more useful for documenting format limits, but probably doesn't achieve
what you're trying to achieve.
3. Realistically, this is probably what you'd have to do in the first version
for PEP adoption to get non-readers of python-dev@ ready, but, again, it
doesn't achieve what you're setting out to do. We'd still accept programs that
exceed these limits, and whatever optimizations that depend on these limits
being in place wouldn't work.
Which brings us to the real meat, 4&5.
Some limits don't really distinguish between these cases. Exceeding the total
bytecode size for a module, for example, would have to fail at bytecode
generation time (ignoring truly irrational behavior like silently truncating
the bytecode). But others aren't so cut-and-dry. For example, a module that is
compliant except for a single function that contains too many local variables.
Whether you do 4 or 5 isn't so obvious:
Pros of choosing 4 (exception at load):
* I'm alerted of errors early, before I start a 90-hour compute job, only to
have it crash in the write_output() function.
* Don't have to keep a poisoned function that your optimizers have to special
case.
Pros of choosing 5 (exception at runtime):
* If I never call that function (maybe it's something in a library I don't
use), I don't get penalized.
* In line with other Python (mis-)behaviors, e.g. raising NameError() at
runtime if you typo a variable name.
On Tue 12/03/19, 10:05 AM, "Rhodri James" <[email protected]> wrote:
On 03/12/2019 16:15, Mark Shannon wrote:
> Isn't this "640K ought to be enough for anybody" again?
> -------------------------------------------------------
>
> The infamous 640K memory limit was a limit on machine usable resources.
> The proposed one million limit is a limit on human generated code.
>
> While it is possible that generated code could exceed the limit,
> it is easy for a code generator to modify its output to conform.
> The author has hit the 64K limit in the JVM on at least two occasions
> when generating Java code.
> The workarounds were relatively straightforward and
> probably wouldn't have been necessary with a limit of one million
> bytecodes or lines of code.
I can absolutely guarantee that this will come back and bite you.
Someone out there will be doing something more complicated than you
think is plausible, and eventually someone will hit your limits. It may
not take as long as you think, either.
I'm in between Rhodri and Mark here.
I've also been bitten by the 64k JVM bytecode limit when generating code, but I
did *not* find it so easy to work around. What was a dumb translator suddenly
had to get a lot more smarts.
Having predictable behavior *is* important, though, and having limits with
specified behavior when those limits are exceeded helps. Keep in mind that I'm
going to be annoyed when I hit those limits, so having an engineering
justification for why the limit was set to a certain value will go a long way
into buying you credibility. One million does not feel credible -- that's
"we're setting a limit because we couldn't be bothered to figure out what the
limit should be." OTOH, 16,777,215 (2**24-1) does feel credible -- that's "no
processor is capable of holding this many TLB entries in the level 2 cache with
retpolines active without introducing extreme swapping on write-limited SSDs,
but you can get around it if you're willing to adjust this constant and
recompile." Or whatever. (Ok, don't BS us like I just did, but you get the
idea. :-) )
Dave
_______________________________________________
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/SF3RM6B7FF63F7OTTDEY2GH4C5RG6DCX/
Code of Conduct: http://python.org/psf/codeofconduct/