[Python-ideas] Re: Suggestion for behaviour change import mechanics

Richard Vogel Tue, 29 Oct 2019 15:08:05 -0700

Although I suspect the actual reasoning is just that, because these
are both rare use cases, nobody bothered to design the behavior around
either of them; instead, they just went with the simplest
implementation that handles the non-rare cases as intended, and then
just documented what it does in the rare cases as the behavior. And
then, in the years since then (especially at 3.0 and when the new
import system was implemented a few versions later) nobody had a
compelling reason to change the rules.

Thats propably what it mostly is. A mixture between there is no
straight-forward always correct way and history :)

Would you really want two different modules to be the same just
because they happened to have the same contents, or happened to get
those contents by executing the same source? You definitely wouldn’t
want them to be identical (imagine if a.py and b.py were empty, you
did `import a; import b; a.spam=2; b.spam=3`, and a.spam was now 3.)
And I don’t think you’d want two non-identical modules to be equal.


Don't think so, thats a point. So its not so straight forward to define
whats identical.

(As a side note, modules don’t always have an absolute path. The most
common way to get a module is as a .py file in a directory on
sys.path, but there are other ways—cached .pyc flies delivered without
the .py file, extension modules, modules inside zip archives, even
modules that use arbitrary custom finders to pull them off a web
server or out of a database, or simulate hierarchy on top of a flat
filesystem, or whatever. Some of these have a confusing path, some
have no path at all. They do always have a loader spec, which could
theoretically serve the same purpose. But modules don’t remember the
loader spec used to load them. Plus, loader specs, unlike names, are a
low-level detail that most Python developers probably never learn.)

That's  a lot of information and a lot of coner cases, which I partly
can't even understand. Guess for that one needs to understand the
concept of a loader spec.

Nevertheless, could Python cache that spec and cache also a checksum of
whats coming out after resolveing the loader ?
So given that Python could recognize the case where you actually import
something similiar with a different loader spec (for example the case
where something in path is imported fully quallified and non fully
qualified).
Given that Python could spill out a warning that this happened (also in
your circular case for example) and just do what it does now still.
Then a import keyword extension like *import twin ~~~* would explicitly
do what it does now anyways (Giving you the same thing in a different
realization, so basically just removing the warning in that case) and
*import union ~~~* which returns the same realization and removing the
warning. This would force the user into knowing what they do and give
the user the chance to see that such a thing happened and start thinking
about what they actually wanted to do in the first place.

Would that be a thing that - at least in terms of feasibility - could be
done?

Am 29.10.2019 um 20:54 schrieb Andrew Barnert:

On Oct 29, 2019, at 11:45, Richard Vogel <mer...@gmx.net
<mailto:mer...@gmx.net>> wrote:

What happens if you break this rule is exactly what you’re seeing.
The way Python ensures that doing `import spam` twice results in the
same spam module object (so your globals don’t all get duplicated)
is by storing modules by qualified name in sys.modules. So if the
same file has two different qualified names, it’s two separate modules.

I got that. That explains the behaviour I got, where Enum-Entries
suddenly where unequal to the "same" Enum-Entries causing a crash.
Which actually was better than having this behaviour happening unseen
and having the supposed same thing multiple times. That would have
resulted in two differen EventQueues. Cannot imagine all the time I
would have spent until I would have realized having two of them
suddenly ;)

Is there a reasoning for that behavior of Python?

I suspect this goes so far back into the mists of time that there’s no
mailing list discussion or anything. But I can take a guess.

First, the real issue here is that it’s confusing to have the same
module exist under two different names. Normally, the module spam.eggs
and the module eggs shouldn’t be the same thing. Especially given
that, unlike most objects in Python, modules know their qualified
names, and actually _need_ to know them for things like pickle and
multiprocessing to work. It would be misleading if you looked at the
qualified name of spam.eggs and got back something other than
"spam.eggs", and not just to human readers, but to code.

While there are rare occasions when you might want spam.eggs and eggs
to be the same thing, there are also rare occasions when you might
want the same source to import as two separate objects. And both are
less common than doing it mistakenly. Since these are both rare, the
One Obvious Way To Do each one ought to be something obviously unusual
(manipulating sys.modules, or manually using importlib) that signals
your unusual intention to your readers.

Although I suspect the actual reasoning is just that, because these
are both rare use cases, nobody bothered to design the behavior around
either of them; instead, they just went with the simplest
implementation that handles the non-rare cases as intended, and then
just documented what it does in the rare cases as the behavior. And
then, in the years since then (especially at 3.0 and when the new
import system was implemented a few versions later) nobody had a
compelling reason to change the rules.

The biggest consequence is the case where script.py is a runnable
script but also a module, and thanks to a circular import somewhere it
ends up getting imported indirectly by itself, so you have modules
named "__main__" and "script" built from the same source. That one
actually comes up, because you don’t really have to break any rules
for it to happen (circular imports are legal, and work fine in some
cases, even if they’re confusing in general and don’t work in other
cases and should usually be avoided), but it’s effectively the same
problem you’re running into. People have actually made proposals to
fix that in some way (whether to make it a detectable error, or to
special-case things so sys.modules['script'] = sys.modules['__main__']
from the start, or something else), but I don’t think anyone’s come up
with a proposal that everyone else liked. If you want to know more
about how people think about this whole wider issue, maybe search for
the proposals on that narrower one.


Why a thing isn't equal when its physically the same thing, meaning
the checksum is the same or its the same absolute path or ....

Some things in Python act like “values”, where there’s a notion of
equality based on equal contents—int, str, tuple, namedtuple and
dataclass types, etc. But most other things act like “objects”, where
no object is equal to anything but itself. Sometimes it’s about
implicit (especially mutable) state—two different file objects are
never equal, even if they represent the same disk file with the same
position. Sometimes it’s about needing to be able to create distinct
things—two Enum members from different classes are never equal even if
they have the same name and value, and two Enum classes are never
equal even if they have exactly the same members. (Imagine if you had
code that did different things with ForegroundColor and
BackgroundColor objects, and then they magically became the same type
when you added bright background colors.)

Would you really want two different modules to be the same just
because they happened to have the same contents, or happened to get
those contents by executing the same source? You definitely wouldn’t
want them to be identical (imagine if a.py and b.py were empty, you
did `import a; import b; a.spam=2; b.spam=3`, and a.spam was now 3.)
And I don’t think you’d want two non-identical modules to be equal.

But of course their contents wouldn’t be equal, because even a module
built from an empty .py file has some default attributes, including
__name__, which will be different between a and b.

(As a side note, modules don’t always have an absolute path. The most
common way to get a module is as a .py file in a directory on
sys.path, but there are other ways—cached .pyc flies delivered without
the .py file, extension modules, modules inside zip archives, even
modules that use arbitrary custom finders to pull them off a web
server or out of a database, or simulate hierarchy on top of a flat
filesystem, or whatever. Some of these have a confusing path, some
have no path at all. They do always have a loader spec, which could
theoretically serve the same purpose. But modules don’t remember the
loader spec used to load them. Plus, loader specs, unlike names, are a
low-level detail that most Python developers probably never learn.)

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/T4XF6JW2ILVPQOYH2OHP6VPYW4KGQOB4/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: Suggestion for behaviour change import mechanics

Reply via email to