Eric Snow added the comment: For interpreter startup, stats are not involved for builtin and frozen modules[1]. They are tied to imports that involve traversing sys.path (a.k.a. PathFinder). Most stats happen in FileFinder.find_loader. The remainder are for source (.py) files (a.k.a. SourceFileLoader).
Here's a rough sketch of what typically happens currently during the import of a path-based module[2], as related to stats (and other FS access): (lines with FS access start with *) def load_module(fullname): suffixes = ['.cpython-34m.so', '.abi3.so', '.so', '.py', '.pyc'] tailname = fullname.rpartition('.')[2] for entry in sys.path: * mtime = os.stat(entry).st_mtime if mtime != cached_mtime: * cached_listdir = os.listdir(entry) if tailname in cached_listdir: basename = entry/tailname * if os.stat(basename).st_mode implies directory: # superfluous? # package? for suffix in suffixes: full_path = basename + suffix * if os.stat(full_path).st_mode implies file: if is_extension: * <dlopen>(full_path) elif is_sourceless: * open(full_path).read() else: load_from_source(full_path) return # ...non-package module? for suffix in suffixes: full_path = entry/tailname + suffix if tailname + suffix in cached_listdir: * if os.stat(full_path).st_mode implies file: # superfluous? if is_extension: * <dlopen>(full_path) elif is_sourceless: * open(full_path).read() else: load_from_source(full_path) def load_from_source(sourcepath): * st = os.stat(sourcepath) if st: * open(bytecodepath).read() else: * open(sourcepath).read() * os.stat(sourcepath).st_mode for parent in ancestor_dirs(sourcepath): * os.stat(parent).st_mode -> missing_parents for parent in missing_parents: * os.mkdir(parent) * open(tempname).write() * os.replace(tempname, bytecodepath) Obviously there are some unix-isms in there. Windows ends up not that different though. stat/FS count ------------- load_module (*per path entry*): (add 1 listdir to each if the cache is stale) not found: 1 stat non-package dir: 7 (num_suffixes + 2 stats) package (best): 4/5-9+ (3 stats, 1 read or load_from_source) package (worst): 8/9-13+ (num_suffixes + 2 stats, 1 read or load_from_source) non-package module 3/4-8+ (best): (2 stats, 1 read or load_from_source) non-package module 7/8-12+ (worst): (num_suffixes + 1 stats, 1 read or load_from_source) non-package module + dir (best): 10/11-15+ (num_suffixes + 4 stats, 1 read or load_from_source) non-package module + dir (best): 14/15-19+ (num_suffixes * 2 + 3 stats, 1 read or load_from_source) load_from_source: cached: 2 (1 stat, 1 read) uncached, no parents: 4 (2 stats, 1 write, 1 replace) uncached, no missing parents: 5+ (num_parents + 2 stats, 1 write, 1 replace) uncached, missing parents: 6+ (num_parents + 2 stats, num_missing mkdirs, 1 write, 1 replace) Highlights: * the common case is not fast (for the sake of the slight possibility that files may change between imports)--not as much an issue during interpreter startup. * up to 5 different suffixes with a separate stat for each (with extension module suffixes tried first). * the size and ordering of sys.path has a decided impact on # stats. * if a module is cached, a lot less FS access happens. * the more nested a module, the more access happen. * namespace packages don't have much impact on performance. Possible improvements: * provide an internal mechanism to turn on/off caching all stats (don't worry about staleness) and maybe expose it via a context manager/API. (not unlike what Christian put in his patch.) * at least do some temporally local caching where the risk of staleness is particularly small. * Move .py ahead of extension modules (or just behind .cpython-34m.so)? * non-packages are more common than packages (?) so look for those first (hard to make effective without breaking key import semantics). * remove 2 possibly superfluous stats? [1] Maybe we should freeze the stdlib. <0.5 wink> [2] importing a module usually involves importing the module's parent and its parent and so forth. Each of those incurs the same stat hits all over again (though usually packages have only 1 path entry to traverse). The stdlib is pretty flat (particularly among modules involved during startup) so this is less of an issue for this ticket. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19216> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com