Eric Snow <ericsnowcurren...@gmail.com> added the comment: On Wed, Oct 20, 2021 at 6:11 PM Barry A. Warsaw <rep...@bugs.python.org> wrote: > I guess a question to answer then is whether we philosophically want the > module attributes to be equivalent to the spec attributes. And by > equivalent, I mean enforced to be exactly so, and thus a proxy. To me, the > duplication is a wart that we should migrate away from so there’s only one > place for these attributes, and that should be the spec. > > Here is the mapping we currently describe in the docs: > > mod.__name__ === __spec__.name > mod.__package__ === __spec__.parent > mod.__loader__ === __spec__.loader > mod.__file__ === __spec__.origin > mod.__path__ === __spec__.submodule_search_locations > mod.__cached__ === __spec__.cached > > But right now, they don’t have to stay in sync, and I don’t think it’s > reasonable to put the onus on the user to keep them in sync, because it’s > unclear what code uses which attribute. Okay, so you can just set them both > to be safe, but then you can’t do that with __spec__.parent/__package__
Currently any of the module attrs can be different than the spec. In two cases they can legitimately be different: __name__ (with __main__) and __file__ (with frozen stdlib modules). For the rest, they should be in sync. Treating the spec as the single source of truth makes sense. My only concern has been that you can no longer determine how a module was originally imported once the spec is changed. However, I just realized that you can always run importlib.util.find_spec() to reproduce that info (with some minor caveats). So now I'm less concerned about that. :) Notably, users have forever(?) been able to modify all of the module attrs, with impact on the import system: __package__ and __path__ affecting later imports, and the rest affecting reload. FWIW, an "advantage" of the module attrs is that they can be set in the module code. The same is true for the corresponding spec attrs but with just enough indirection to require more intent. Regardless, the idea of post-import modifications to modules/specs has always made me uncomfortable. As a user I'd expect an alternative that feels less like a (non-obvious) low-level hack. ==== To me here are the important questions: 1. when does code ever modify the module attrs (or spec) and why? 2. should we distinguish the roles of the module attrs and spec (how-module-was-loaded vs. how-module-will-reload vs. how-module-impacts-other-imports)? 3. would it make sense to store spec modifications separately from the spec (e.g. on the module)? 4. which attrs should be deprecated? 5. should any module attrs (the ones that don't get eventually removed) be read-only? What about spec attrs? 6. would it be better to provide importlib.util.* helpers to address those needs, instead of having folks modify the module/spec directly? My take: 1. that would be nice to know :) 2. that depends on what matters in practice. My gut says the distinctions aren't important enough to do anything about it, except where there are legitimate differences between the module and spec. Currently the module attrs cover all three roles. The spec only covers how-module-was-loaded (but is used as a fallback for the other two roles in *most* cases). Those two special cases, with __name__ and __file__ being out of sync, are meaningful only for introspection, rather than affecting the import machinery. (In the case of frozen modules that have __file__ set, note that spec.has_location is False.) I'm not sure how these fit in with the different roles. Advantages to keeping the spec exclusively how-module-was-imported: * it's what I'd expect; having to call importlib.util.find_spec() isn't the obvious thing * the loader can modify the spec, so importlib.util.find_spec() won't necessarily match None of those appear important enough to warrant keeping the status quo. The disadvantages seem heavier (maintenance costs and user confusion with (unnecessarily) having multiple sources of truth). 3. probably not, though it depends on (2) However, if all those module attrs become read-only then we would need to figure out where to store __name__ and __file__ in those special cases. 4. everything except __name__ and __file__ (and probably __path__) 5. for modules, yes; for the spec, only if we stick with the one role On modules I'd expect all of them to become properties regardless, with most of them becoming read-only eventually: getter: * proxy the corresponding spec attr * a deprecation warning if it isn't an attr that needs to stay setter: * proxy the corresponding spec attr * a deprecation warning for now on all attrs * a deprecation error later on all attrs * an AttributeError even later (do not make it a data-only descriptor) A bonus advantage of properties is that they would reduce clutter on the module __dict__. What about __path__? We'll probably keep it as a traditional indicator that the module is a package. However, do we make it a read-only proxy of spec.module_search_locations? (We already use a __path__ proxy for namespace packages.) 6. it probably isn't worth it. Due to the extra indirection, modifying the spec seems like a more deliberate (non-accidental or confused) action than changing the module attrs. That's probably enough "help". However, in cases where multiple attrs together have specific meaning, such helpers might be helpful for users. ==== Regarding __file__ being different from spec.origin, it might be worth revisiting the question of "origin" vs. "location" on the spec. Note that, in the case of frozen stdlib modules, spec.has_location is False even though __file__ is set. That smells fishy to me. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue45540> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com