On 2/16/06, Nick Coghlan <[EMAIL PROTECTED]> wrote: > Guido van Rossum wrote: > > Do you have unit tests for everything? I believe I fixed a bug in the > > code that reads a bytecode file (it wasn't skipping the timestamp).
[Hey, I thought I sent that just to you. Is python-dev really interested in this?] > I haven't worked the filesystem based tests into the unit tests yet, and even > the manual tests I was using managed to leave out compiled bytecode files (as > you noticed). I'll fix that. > > Given I do my testing on Linux, I probably still would have forgotten the 'rb' > mode definitions on the relevant calls to open() though. . . But running the unit tests on Windows would have revealed the problem. > > +++ pep-0338.txt (working copy) > > - The optional argument ``init_globals`` may be used to pre-populate > > + The optional argument ``init_globals`` may be a dictionary used to > > pre-populate > > the globals dictionary before the code is executed. The supplied > > dictionary will not be modified. > > I just realised that anything that's a legal argument to "dict.update" will > work. I'll fix the function description in the PEP (and the docs patch as > well). I'm not sure that's a good idea -- you'll never be able to switch to a different implementation then. > > --- runpy.py Wed Feb 15 15:56:07 2006 > > def get_data(self, pathname): > > ! # XXX Unfortunately PEP 302 assumes text data :-( > > ! return open(pathname).read() > > Hmm. > > The PEP itself requests that a string be returned from get_data(), but doesn't > require that the file be opened in text mode. Perhaps the PEP 302 emulation > should use binary mode here? Otherwise there could be strange data corruption > bugs on Windows. But PEP 302 shows as its only example reading from a file with a .txt extension. Adding spurious \r characters is also data corruption. We should probably post to python-dev a request for clarification of PEP 302, but in the mean time I vote for text mode. > > --- 337,349 ---- > > > > # This helper is needed as both the PEP 302 emulation and the > > # main file execution functions want to read compiled files > > + # XXX marshal can also raise EOFError; perhaps that should be > > + # turned into ValueError? Some callers expect ValueError. > > def _read_compiled_file(compiled_file): > > magic = compiled_file.read(4) > > if magic != imp.get_magic(): > > raise ValueError("File not compiled for this Python version") > > + compiled_file.read(4) # Throw away timestamp > > return marshal.load(compiled_file) > > I'm happy to convert EOFError to ValueError here if you'd prefer (using the > string representation of the EOFError as the message in the ValueError). > > Or did you mean changing the behaviour in marshal itself? No -- the alternative is to catch EOFError in _read_compiled_file()'s caller, but that seems worse. You should check marshal.c if it can raise any *other* errors (perhaps OverflowError?). Also, *perhaps* it makes more sense to return None instead of raising ValueError? Since you're always catching it? (Or are you?) > > --- 392,407 ---- > > loader = _get_loader(mod_name) > > if loader is None: > > raise ImportError("No module named " + mod_name) > > + # XXX get_code() is an *optional* loader feature. Is that okay? > > code = loader.get_code(mod_name) > > If the loader doesn't provide access to the code object or the source code, > then runpy can't really do anything useful with that module (e.g. if its a C > builtin module). Given that PEP 302 states that if you provide get_source() > you should also provide get_code(), this check should be sufficient to let > runpy.run_module get to everything it can handle. OK. But a loader could return None from get_code() -- do you check for that? (I don't have the source handy here.) > A case could be made for converting the attribute error to an ImportError, I > guess. . . I'm generally not keen on that; leave it. > > filename = _get_filename(loader, mod_name) > > if run_name is None: > > run_name = mod_name > > + # else: > > + # XXX Should we also set sys.modules[run_name] = > > sys.modules[mod_name]? > > + # I know of code that does "import __main__". It should > > probably > > + # get the substitute __main__ rather than the original > > __main__, > > + # if run_name != mod_name > > return run_module_code(code, init_globals, run_name, > > filename, loader, as_script) > > Hmm, good point. How about a different solution, though: in run_module_code, I > could create a new module object and put it temporarily in sys.modules, and > then remove it when done (restoring the original module if necessary). That might work too. What happens when you execute "foo.py" as __main__ and then (perhaps indirectly) something does "import foo"? Does a second copy of foo.py get loaded by the regular loader? > That would mean any module with code that looks up "sys.modules[__name__]" > would still work when run via runpy.run_module or runpy.run_file. Yeah, except if they do that, they're likely to also *assign* to that. Well, maybe that would just work, too... > I also realised that sys.argv[0] should be restored to its original value, > too. Yup. > I'd then change the "as_script" flag to "alter_sys" and have it affect both of > the above operations (and grab the import lock to avoid other import or > run_module_code operations seeing the altered version of sys.modules). Makes sense. I do wonder if runpy.py isn't getting a bit over-engineered -- it seems a lot of the functionality isn't actually necessary to implement -m foo.bar, and the usefulness in other circumstances is as yet unproven. What do you think of taking a dose of YAGNI here? (Especially since I notice that most of the public APIs are very thin layers over exec or execfile -- people can just use those directly.) > > --- 439,457 ---- > > > > Returns the resulting top level namespace dictionary > > First tries to run as a compiled file, then as a source file > > + XXX That's not the same algorithm as used by regular import; > > + if the timestamp in the compiled file is not equal to the > > + source file's mtime, the compiled file is ignored > > + (unless there is no source file -- then the timestamp > > + is ignored) > > They're doing different things though - the import process uses that algorithm > to decide which filename to use (.pyo, .pyc or .py). This code in run_file is > trying to decide whether the supplied filename points to a compiled file or a > source file without a tight coupling to the specific file extension used (e.g. > so it works for Unix Python scripts that rely on the shebang line to identify > which interpreter to use to run them). > > I'll add a comment to that effect. Ah, good point. So you never go from foo.py to foo.pyc, right? > Another problem that occurred to me is that the module isn't thread safe at > the moment. The PEP 302 emulation isn't protected by the import lock, and the > changes to sys.argv in run_module_code will be visible across threads (and may > clobber each other or the original if multiple threads invoke the function). Another reason to consider cutting it down to only what's needed by -m; -m doesn't need thread-safety (I think). > On that front, I'll change _get_path_loader to acquire and release the import > lock, and the same for run_module_code when "alter_sys" is set to True. OK, just be very, very careful. The import lock is not a regular mutex and if you don't release it you're stuck forever. Just use try/finally... -- --Guido van Rossum (home page: http://www.python.org/~guido/) _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com