Re: [Python-ideas] importlib: making FileFinder easier to extend

2018-02-20 Thread Brett Cannon
Basically what you're after is a way to extend the default finder with a
new file type. Historically you didn't want this because of the performance
hit of the extra stat call to check that new file extension (this has been
greatly alleviated in Python 3 through the caching of directory contents).
But I would still argue that you don't necessarily want this for e.g. the
stdlib or any other random project which might just happen to have a file
with the same file extension as the one you want to have special support
for.

I also don't think we want a class attribute to contains the default
loaders since not everyone will want those default semantics in all cases
either. Since we're diving into deep levels of customization I would askew
anything that makes assumptions for what you want.

I think the best we could consider is making
importlib.machinery._get_supported_loaders() a public API. That way you can
easily construct a finder with the default loaders plus your custom ones.
After that you can then provide a custom sys.path_hooks entry that
recognizes the directories which contain your custom file type.

If that seems reasonable then feel free to open an enhancement request at
bugs.python.org to discuss the API and then we can discuss how to implement
a PR for it.

On Wed, 7 Feb 2018 at 07:04 Erik Bray  wrote:

> Hello,
>
> Brief problem statement: Let's say I have a custom file type (say,
> with extension .foo) and these .foo files are included in a package
> (along with other Python modules with standard extensions like .py and
> .so), and I want to make these .foo files importable like any other
> module.
>
> On its face, importlib.machinery.FileFinder makes this easy.  I make a
> loader for my custom file type (say, FooSourceLoader), and I can use
> the FileFinder.path_hook helper like:
>
> sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
> sys.path_importer_cache.clear()
>
> Great--now I can import my .foo modules like any other Python module.
> However, any standard Python modules now cannot be imported.  The way
> PathFinder sys.meta_path hook works, sys.path_hooks entries are
> first-come-first-serve, and furthermore FileFinder.path_hook is very
> promiscuous--it will take over module loading for *any* directory on
> sys.path, regardless what the file extensions are in that directory.
> So although this mechanism is provided by the stdlib, it can't really
> be used for this purpose without breaking imports of normal modules
> (and maybe it's not intended for that purpose, but the documentation
> is unclear).
>
> There are a number of different ways one could get around this.  One
> might be to pass FileFinder.path_hook loaders/extension pairs for all
> the basic file types known by the Python interpreter.  Unfortunately
> there's no great way to get that information.  *I* know that I want to
> support .py, .pyc, .so etc. files, and I know which loaders to use for
> them.  But that's really information that should belong to the Python
> interpreter, and not something that should be reverse-engineered.  In
> fact, there is such a mapping provided by
> importlib.machinery._get_supported_file_loaders(), but this is not a
> publicly documented function.
>
> One could probably think of other workarounds.  For example you could
> implement a custom sys.meta_path hook.  But I think it shouldn't be
> necessary to go to higher levels of abstraction in order to do
> this--the default sys.path handler should be able to handle this use
> case.
>
> In order to support adding support for new file types to
> sys.path_hooks, I ended up implementing the following hack:
>
> #
> import os
> import sys
>
> from importlib.abc import PathEntryFinder
>
>
> @PathEntryFinder.register
> class MetaFileFinder:
> """
> A 'middleware', if you will, between the PathFinder sys.meta_path hook,
> and sys.path_hooks hooks--particularly FileFinder.
>
> The hook returned by FileFinder.path_hook is rather 'promiscuous' in
> that
> it will handle *any* directory.  So if one wants to insert another
> FileFinder.path_hook into sys.path_hooks, that will totally take over
> importing for any directory, and previous path hooks will be ignored.
>
> This class provides its own sys.path_hooks hook as follows: If inserted
> on sys.path_hooks (it should be inserted early so that it can supersede
> anything else).  Its find_spec method then calls each hook on
> sys.path_hooks after itself and, for each hook that can handle the
> given
> sys.path entry, it calls the hook to create a finder, and calls that
> finder's find_spec.  So each sys.path_hooks entry is tried until a
> spec is
> found or all finders are exhausted.
> """
>
> def __init__(self, path):
> if not os.path.isdir(path):
> raise ImportError('only directories are supported', path=path)
>
> self.path = path
>  

[Python-ideas] importlib: making FileFinder easier to extend

2018-02-07 Thread Erik Bray
Hello,

Brief problem statement: Let's say I have a custom file type (say,
with extension .foo) and these .foo files are included in a package
(along with other Python modules with standard extensions like .py and
.so), and I want to make these .foo files importable like any other
module.

On its face, importlib.machinery.FileFinder makes this easy.  I make a
loader for my custom file type (say, FooSourceLoader), and I can use
the FileFinder.path_hook helper like:

sys.path_hooks.insert(0, FileFinder.path_hook((FooSourceLoader, ['.foo'])))
sys.path_importer_cache.clear()

Great--now I can import my .foo modules like any other Python module.
However, any standard Python modules now cannot be imported.  The way
PathFinder sys.meta_path hook works, sys.path_hooks entries are
first-come-first-serve, and furthermore FileFinder.path_hook is very
promiscuous--it will take over module loading for *any* directory on
sys.path, regardless what the file extensions are in that directory.
So although this mechanism is provided by the stdlib, it can't really
be used for this purpose without breaking imports of normal modules
(and maybe it's not intended for that purpose, but the documentation
is unclear).

There are a number of different ways one could get around this.  One
might be to pass FileFinder.path_hook loaders/extension pairs for all
the basic file types known by the Python interpreter.  Unfortunately
there's no great way to get that information.  *I* know that I want to
support .py, .pyc, .so etc. files, and I know which loaders to use for
them.  But that's really information that should belong to the Python
interpreter, and not something that should be reverse-engineered.  In
fact, there is such a mapping provided by
importlib.machinery._get_supported_file_loaders(), but this is not a
publicly documented function.

One could probably think of other workarounds.  For example you could
implement a custom sys.meta_path hook.  But I think it shouldn't be
necessary to go to higher levels of abstraction in order to do
this--the default sys.path handler should be able to handle this use
case.

In order to support adding support for new file types to
sys.path_hooks, I ended up implementing the following hack:

#
import os
import sys

from importlib.abc import PathEntryFinder


@PathEntryFinder.register
class MetaFileFinder:
"""
A 'middleware', if you will, between the PathFinder sys.meta_path hook,
and sys.path_hooks hooks--particularly FileFinder.

The hook returned by FileFinder.path_hook is rather 'promiscuous' in that
it will handle *any* directory.  So if one wants to insert another
FileFinder.path_hook into sys.path_hooks, that will totally take over
importing for any directory, and previous path hooks will be ignored.

This class provides its own sys.path_hooks hook as follows: If inserted
on sys.path_hooks (it should be inserted early so that it can supersede
anything else).  Its find_spec method then calls each hook on
sys.path_hooks after itself and, for each hook that can handle the given
sys.path entry, it calls the hook to create a finder, and calls that
finder's find_spec.  So each sys.path_hooks entry is tried until a spec is
found or all finders are exhausted.
"""

def __init__(self, path):
if not os.path.isdir(path):
raise ImportError('only directories are supported', path=path)

self.path = path
self._finder_cache = {}

def __repr__(self):
return '{}({!r})'.format(self.__class__.__name__, self.path)

def find_spec(self, fullname, target=None):
if not sys.path_hooks:
return None

for hook in sys.path_hooks:
if hook is self.__class__:
continue

finder = None
try:
if hook in self._finder_cache:
finder = self._finder_cache[hook]
if finder is None:
# We've tried this finder before and got an ImportError
continue
except TypeError:
# The hook is unhashable
pass

if finder is None:
try:
finder = hook(self.path)
except ImportError:
pass

try:
self._finder_cache[hook] = finder
except TypeError:
# The hook is unhashable for some reason so we don't bother
# caching it
pass

if finder is not None:
spec = finder.find_spec(fullname, target)
if spec is not None:
return spec

# Module spec not found through any of the finders
return None

def invalidate_caches(self):
for finder in self._finder_cache.values():
finder.invalidate_caches()

@classmet