Re: [Python-ideas] importlib: making FileFinder easier to extend
Basically what you're after is a way to extend the default finder with a new file type. Historically you didn't want this because of the performance hit of the extra stat call to check that new file extension (this has been greatly alleviated in Python 3 through the caching of directory contents). But I would still argue that you don't necessarily want this for e.g. the stdlib or any other random project which might just happen to have a file with the same file extension as the one you want to have special support for. I also don't think we want a class attribute to contains the default loaders since not everyone will want those default semantics in all cases either. Since we're diving into deep levels of customization I would askew anything that makes assumptions for what you want. I think the best we could consider is making importlib.machinery._get_supported_loaders() a public API. That way you can easily construct a finder with the default loaders plus your custom ones. After that you can then provide a custom sys.path_hooks entry that recognizes the directories which contain your custom file type. If that seems reasonable then feel free to open an enhancement request at bugs.python.org to discuss the API and then we can discuss how to implement a PR for it. On Wed, 7 Feb 2018 at 07:04 Erik Bray wrote: > Hello, > > Brief problem statement: Let's say I have a custom file type (say, > with extension .foo) and these .foo files are included in a package > (along with other Python modules with standard extensions like .py and > .so), and I want to make these .foo files importable like any other > module. > > On its face, importlib.machinery.FileFinder makes this easy. I make a > loader for my custom file type (say, FooSourceLoader), and I can use > the FileFinder.path_hook helper like: > > sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo']))) > sys.path_importer_cache.clear() > > Great--now I can import my .foo modules like any other Python module. > However, any standard Python modules now cannot be imported. The way > PathFinder sys.meta_path hook works, sys.path_hooks entries are > first-come-first-serve, and furthermore FileFinder.path_hook is very > promiscuous--it will take over module loading for *any* directory on > sys.path, regardless what the file extensions are in that directory. > So although this mechanism is provided by the stdlib, it can't really > be used for this purpose without breaking imports of normal modules > (and maybe it's not intended for that purpose, but the documentation > is unclear). > > There are a number of different ways one could get around this. One > might be to pass FileFinder.path_hook loaders/extension pairs for all > the basic file types known by the Python interpreter. Unfortunately > there's no great way to get that information. *I* know that I want to > support .py, .pyc, .so etc. files, and I know which loaders to use for > them. But that's really information that should belong to the Python > interpreter, and not something that should be reverse-engineered. In > fact, there is such a mapping provided by > importlib.machinery._get_supported_file_loaders(), but this is not a > publicly documented function. > > One could probably think of other workarounds. For example you could > implement a custom sys.meta_path hook. But I think it shouldn't be > necessary to go to higher levels of abstraction in order to do > this--the default sys.path handler should be able to handle this use > case. > > In order to support adding support for new file types to > sys.path_hooks, I ended up implementing the following hack: > > # > import os > import sys > > from importlib.abc import PathEntryFinder > > > @PathEntryFinder.register > class MetaFileFinder: > """ > A 'middleware', if you will, between the PathFinder sys.meta_path hook, > and sys.path_hooks hooks--particularly FileFinder. > > The hook returned by FileFinder.path_hook is rather 'promiscuous' in > that > it will handle *any* directory. So if one wants to insert another > FileFinder.path_hook into sys.path_hooks, that will totally take over > importing for any directory, and previous path hooks will be ignored. > > This class provides its own sys.path_hooks hook as follows: If inserted > on sys.path_hooks (it should be inserted early so that it can supersede > anything else). Its find_spec method then calls each hook on > sys.path_hooks after itself and, for each hook that can handle the > given > sys.path entry, it calls the hook to create a finder, and calls that > finder's find_spec. So each sys.path_hooks entry is tried until a > spec is > found or all finders are exhausted. > """ > > def __init__(self, path): > if not os.path.isdir(path): > raise ImportError('only directories are supported', path=path) > > self.path = path >
[Python-ideas] importlib: making FileFinder easier to extend
Hello, Brief problem statement: Let's say I have a custom file type (say, with extension .foo) and these .foo files are included in a package (along with other Python modules with standard extensions like .py and .so), and I want to make these .foo files importable like any other module. On its face, importlib.machinery.FileFinder makes this easy. I make a loader for my custom file type (say, FooSourceLoader), and I can use the FileFinder.path_hook helper like: sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo']))) sys.path_importer_cache.clear() Great--now I can import my .foo modules like any other Python module. However, any standard Python modules now cannot be imported. The way PathFinder sys.meta_path hook works, sys.path_hooks entries are first-come-first-serve, and furthermore FileFinder.path_hook is very promiscuous--it will take over module loading for *any* directory on sys.path, regardless what the file extensions are in that directory. So although this mechanism is provided by the stdlib, it can't really be used for this purpose without breaking imports of normal modules (and maybe it's not intended for that purpose, but the documentation is unclear). There are a number of different ways one could get around this. One might be to pass FileFinder.path_hook loaders/extension pairs for all the basic file types known by the Python interpreter. Unfortunately there's no great way to get that information. *I* know that I want to support .py, .pyc, .so etc. files, and I know which loaders to use for them. But that's really information that should belong to the Python interpreter, and not something that should be reverse-engineered. In fact, there is such a mapping provided by importlib.machinery._get_supported_file_loaders(), but this is not a publicly documented function. One could probably think of other workarounds. For example you could implement a custom sys.meta_path hook. But I think it shouldn't be necessary to go to higher levels of abstraction in order to do this--the default sys.path handler should be able to handle this use case. In order to support adding support for new file types to sys.path_hooks, I ended up implementing the following hack: # import os import sys from importlib.abc import PathEntryFinder @PathEntryFinder.register class MetaFileFinder: """ A 'middleware', if you will, between the PathFinder sys.meta_path hook, and sys.path_hooks hooks--particularly FileFinder. The hook returned by FileFinder.path_hook is rather 'promiscuous' in that it will handle *any* directory. So if one wants to insert another FileFinder.path_hook into sys.path_hooks, that will totally take over importing for any directory, and previous path hooks will be ignored. This class provides its own sys.path_hooks hook as follows: If inserted on sys.path_hooks (it should be inserted early so that it can supersede anything else). Its find_spec method then calls each hook on sys.path_hooks after itself and, for each hook that can handle the given sys.path entry, it calls the hook to create a finder, and calls that finder's find_spec. So each sys.path_hooks entry is tried until a spec is found or all finders are exhausted. """ def __init__(self, path): if not os.path.isdir(path): raise ImportError('only directories are supported', path=path) self.path = path self._finder_cache = {} def __repr__(self): return '{}({!r})'.format(self.__class__.__name__, self.path) def find_spec(self, fullname, target=None): if not sys.path_hooks: return None for hook in sys.path_hooks: if hook is self.__class__: continue finder = None try: if hook in self._finder_cache: finder = self._finder_cache[hook] if finder is None: # We've tried this finder before and got an ImportError continue except TypeError: # The hook is unhashable pass if finder is None: try: finder = hook(self.path) except ImportError: pass try: self._finder_cache[hook] = finder except TypeError: # The hook is unhashable for some reason so we don't bother # caching it pass if finder is not None: spec = finder.find_spec(fullname, target) if spec is not None: return spec # Module spec not found through any of the finders return None def invalidate_caches(self): for finder in self._finder_cache.values(): finder.invalidate_caches() @classmet